Thursday, January 29, 2009

MySQL bench marks, comparing MyISAM to InnoDB and evaluating the performance hit of foreign keys

During an interview earlier this week, the question of how using foreign keys would impact performance on MySQL. I decided that it would be worthwhile to actually check this out in practice. I started out with a basic structure as follows:

CREATE TABLE IF NOT EXISTS `a` (
`id` int(11) NOT NULL,
`data` varchar(255) NOT NULL,
`data2` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE IF NOT EXISTS `b` (
`id` int(11) NOT NULL,
`a_id` int(11) NOT NULL,
`data3` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Then added a second database where we had a foreign key:

CREATE TABLE IF NOT EXISTS `b` (
`id` int(11) NOT NULL,
`a_id` int(11) NOT NULL,
`data3` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
KEY `a_id` (`a_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

ALTER TABLE `b`
ADD CONSTRAINT `b_ibfk_1` FOREIGN KEY (`a_id`) REFERENCES `a` (`id`) ON DELETE CASCADE;

And just to make things complete, I added a third database identical to the first, but using MyISAM as the engine.

The first thing is to do a data population. I wrote a perl script to generate 10,000 inserts of random data into table A, each with 0-5 random inserts into table B.
Next a selection of queries. I did 500 variations of SELECT * FROM a INNER JOIN b ON (a.id=b.a_id) WHERE a.id=?

For deletes, I deleted 500 rows, using DELETE FROM a WHERE id=?; DELETE FROM b WHERE a_id=? for the databases without the foreign key and the simple DELETE FROM a for the foreign key database.

The results for inserts:
databaseinsert time (s)select time (s)delete time (s)
InnoDB, no FK4.50.253
InnoDB, FK4.00.01
MyISAM1.00.750

I had always assumed that the advantage of MyISAM over InnoDB was the opposite of what it turned out to be with this benchmark. Perhaps even more surprisingly, the foreign key version of the InnoDB database was uniformly faster than the non FK version. I would have expected to see some overhead at least in the insertion stage, but even on multiple trials, I found a consistent advantage for the foreign keys on the insertion.

Saturday, January 17, 2009

Digging back in

Having recently been downsized from oversee.net, I'm using some of the time to revisit some of my old projects. At the top of the stack is lalocalfood.com, where the first phase is an improved version of my map of Los Angeles farmer's markets. Coming back to some of the old code, it's interesting to see how much time is necessary to recover my memories of some of this code I haven't touched in years. That said, once I get over that, I'm finding that my code has surprisingly good design where very little effort is required to handle new cases. I should be finished with the scraper code to collect the farmer's market data from the two main sites with one more day's work.