trick in the bag: case inside group_concat to return nulls

Posted in database on September 22nd, 2011 by admin

The group_concat function in MySQL aggregates results into comma separated values. It ignores null values. In other words, if your result set has 10 rows and you’re using group_concat against one of the columns in a query, and one of your results for the column is null, you will get 9 instead of 10 results.

I had a java programmer using group_concat on some columns in a multi-row query to make the query 1 row, so that comma separated values of each column are returned in a single-row result set.

Java guys do this a lot to construct “maps” of data. The problem is (as stated above), if I use group_concat, and one of the values is null, group_concat will return n-1 values. If I’m making a map, and one of the columns in my map has too few columns, my map is no longer usable. Values don’t line up. Let me explain what I am talking about.

Take this example:


mysql> select * from test_concat;
+----+-------+----------+
| id | color | shape |
+----+-------+----------+
| 1 | blue | square |
| 2 | green | triangle |
| 3 | NULL | rhombus |
| 4 | red | circle |
+----+-------+----------+
4 rows in set (0.00 sec)

I’ve got three columns. The color column has one null value, corresponding to the ID 3 and the shape “rhombus”. Now, let’s group_concat these columns.


mysql> select group_concat(id) ids, group_concat(color) colors, group_concat(shape) shapes from test_concat;
+---------+----------------+--------------------------------+
| ids | colors | shapes |
+---------+----------------+--------------------------------+
| 1,2,3,4 | blue,green,red | square,triangle,rhombus,circle |
+---------+----------------+--------------------------------+
1 row in set (0.00 sec)

As you can see, group_concat has eliminated the null in the color column. If I tried to map this data set in a program, I would come up with (1, blue, square), (2, green, triangle) — so far so good right… (3, red, rhombus) – now wait a sec. Rhombus had a null color in the real data set. Now the circle will end up with no color. This is wrong.

The “colors” result set does not have enough values in the map. group_concat has removed the null, so these values don’t “line up” correctly.

Well – embedding a little case statement inside your group_concat will solve this problem.

In the previous statement, what if I replaced:

group_concat(color) colors

with:


group_concat((case when color is null then 'null' else color end)) as colors

Here is the modified statement and result set.


mysql> select group_concat(id) ids, group_concat((case when color is null then 'null' else color end)) colors, group_concat(shape) shapes from test_concat;
+---------+---------------------+--------------------------------+
| ids | colors | shapes |
+---------+---------------------+--------------------------------+
| 1,2,3,4 | blue,green,null,red | square,triangle,rhombus,circle |
+---------+---------------------+--------------------------------+
1 row in set (0.00 sec)

Voila. Now we have enough values in each column and the map stays intact.

Just another trick in the bag.

Tags: , , , ,

Innodb raw tablespace rocks

Posted in Uncategorized on September 21st, 2011 by admin

Through some testing the last couple of days I have found using raw devices with innodb provides an approximate 15% performance gain off the top. MySQL 5.5 looks promising as well. Enabling the performance schema has some overhead. I went back to a net zero gain when using raw devices with the performance schema enabled on mysql 5.5. That’s all I have time for right now.

July passing away

Posted in sentimental on July 29th, 2011 by admin

The last 7 weeks has been full. I am looking forward to posting some technology snippets within the month of August that share some of the work that I have been doing. The exciting part is that, while I didn’t complete all of my scaling objectives for my databases yet, I have written a lot of data generation code in perl, some of which I may be able to share.

A major accomplishment on the personal optimization front is that I weathered the family beach trip with 20+ relatives extremely well and now I can’t imagine not doing it next year.

On a somber note, my family lost a member this month. An uncle passed away — technically a step-uncle, but I couldn’t help but tell work simply, “uncle”, when I told them I had to take a day, and that’s what I consider him to be. I can think back to several memorable times, perhaps as many as my full blood uncles, but in a fraction of the time spent with him. And one thing’s for sure, there are not many people in this world who will stand up at your rehearsal dinner and share one of the most endearing and engaging stories of the night – endearing about you, but also endearing himself to the whole crowd, the way only my uncle could do. I don’t know what compelled him to count me as special and do that at such an important time in my life, but it made me count him as special, too. He told a story about a time when I was riding in his car with my brother when I was maybe 9 years old. I’m not sure where we had been but it was a fun time. It was starting to rain and there was thunder and he said “Listen y’all that’s God bowling….”. Reportedly, I said something to the effect that thunder was as a result of expanding air from the heat of the lightning. What a geek I always have been. But he managed to make it sound not just goofy as it should have been portrayed, but also talked about what *other* qualities that embodied at age (whatever I was). Under the age of 10 or whatever. He complimented me on my intelligence at that young age and things of that nature. I just remember laughing a lot and remembering that it was a special moment in my wedding weekend. I can’t tell the story right; now the storyteller is gone. I’m still sad, and I had no idea it would make me feel this way when I heard that he had cancer almost 2 years ago. To be quite honest, I never went to see him. I should have. I don’t know much but I know the world isn’t quite as good as it was when he was in it. I will tell you, Yancey must have been a great father to his three boys.

This month we’re working on improving software deployment processes at work. In personal life, we’re having a little work done to the house, I’m trying to get better at going to the gym. All the typical things you would expect for someone the ripe age of thirty something.

I look forward to some technical posts within the next couple of weeks. It’s in my goals.

The Challenges of June

Posted in database, security on June 8th, 2011 by admin

In June I have two challenges. One is to get ramped up into my new DBA role and the second is to retain a CISSP certification. On both fronts there are challenges.

On the database front, I have some interesting tasks.

One is producing obfuscated mysqldumps on a regular basis, for which I am either going to use a tool called the Databene Benerator. A developer colleague has already worked with the tool to some degree and has passed on his configs for mass insertion of obfuscated data. I might also write my own Perl solution, since I am not as savvy or fond of working with XML configs for java objects in the Databene package. However, the java is probably a good exercise.

Another is figuring out the right storage engine to use for high volume inserts into two tables that are, basically, just glorified log files. Whether it’s the archive engine or MyISAM, it will be interesting to find out what produces the right performance and usability.

Next is just good old fashioned performance testing and optimization. The performance testing is just capturing a day’s worth of transactions, then going through the software piece by piece in order to test each individual function. The optimization is going to start with slow queries but will branch out to optimizing queries in our most important sections of the application.

My accomplishments over the past week were writing some basic scripts to populate a database with log4j performance logs from the apps, then creating some basic cgi to render some initial reports. I hope to enhance these with graphs. These reports give us visibility into the performance of the application, method by method.

I have a long way to go before I’m able to quickly and automatically test our database, able to show performance trends over time, and to figure out when our app is going to break. however, I should be able to obfuscate some data this week and get some slow queries fixed. One thing at a time.

On the (ISC)^2 certification side, I just have to finish up 50 or so hours of CPE credits before the end of the month. I did 6 hours today, and if I keep up the work, then I’ll be done soon. We shall see if I make it.

All in all, this has been a very fun-filled two weeks in a new job. I really like focusing on data again, along with some good people.

Gearing up for database administration

Posted in career, database on May 21st, 2011 by admin

For the past few years I have been in the managed services business for a leading data center operator. The experience has been wonderful. I’ve gotten a lot of experience with selling and delivering solutions to customers in many industries and of many different sizes. I’ve seen the sales side, I’ve seen the provisioning side, I have seen the managed services side, and I have done so with some successes I’ll never forget and some learning opportunities that will benefit me forever.

I recommend anyone technical take a look at the sales side at least for a little while. You’ll learn about selling the solution the customer needs based on finding the “latent pain”, through discovery and good communication. After all, As service providers, we are all in sales. We create loyalty and experiences that lead to renewals and maybe even an up-sell here and there.

I’m moving back into database administration, one of my favorite areas. I’m currently looking at different ways to automatically classify queries, perform regression testing, and track results. One of the first things I’ll be doing in my new job is to find out how far a database can go before it needs to be load balanced and replicated, federated, or simply given better resources. Perhaps I should be focused on monitoring production first, but test suites are going to be very important in my future environment. I will post more as I discover more. Ah MySQL, how simple are thee. I can use UNIX tools to manipulate ye and ye will not break.

Tags: , , ,