One of the most frequent errors we got when working with Magento were transactions locking and deadlocking in the database. This usually happened when saving products, and also when customers where making purchases (which prevented them from actually making the purchase, not really something you want!). This locking problems ocurred a lot when we were doing reindexing of the database. ‘SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction’ was an error message we got used to read almost every day.
After some research, we found that the problem in reindexing came primarily from a foreign key in the index_process_event table, which referenced the index_process table. When you did a reindex of any of the indexes, the related row for that index would get locked right away, and stood locked until the transaction was complete (which could, in some cases, take up to 30 minutes, in the case of the url_rewrite index).
Then, when someone was making a purchase, the index_process_event table would get updated, detailing that the product that was being bought, had to be reindex. As that table had a Foreign key to the index_process table, it had to get a read lock (to prevent any changes on the referenced table during the transaction, rendering the result of the transaction inconsistent), which was unable to get until the reindexing was complete.
So, what did we do?
Well, first of all, we got rid of that Foreign Key. Yes, I know, consistency is very important, and an automated check by the database is safe and great. But in this case, it was preventing the business from working, so it was better to remove it. We did some intensive thinking on the matter, and decided that the likelihood of doing an insert that broke the integrity was close to none, and even in the case that it happened, as all reads from that table where done by that very key, the broken record would never be retrieved. This resulted on no more locking because of that index_process table.
Another thing we did, was to change Magento to use Background Reindexing. Out of the box, Magento has 2 reindexing methods: On save, and Manual. On the On save method, every time you update a product or a category, it just goes and updates the related indexes. This provides immediacy, but as it is doing all the changes inside the transaction, and before actually returning a response to the customer, it is very slow to be used. The other method, Manual, just runs a full reindexing of those tables whenever you do it (whether when you clicked on the button on the backend, or by running a command in the console, probably using cron). This is better for when you are not indexing, but when you launch a reindex it pretty much blocks every other activity in the website.
So we implemented Background Reindexing. It works pretty much like On Save reindexing, but instead of doing it inside of the originating transaction, we just stored the task to be done, and then a cron job goes through each of the tasks and reindexes one at a time. It also reindexes each element in it’s own transaction, so the locks do not last for long (usually less than one second). This allows us to parallelize reindexing as well, so the website is usually updated as soon as the changes are made.
This is the code I created to make Background reindexing a reality. Keep in mind that this might cause some problems as the indexing could be done not in order. bg_indexing.tar
I also added a locking mechanism to the cache (I am using Redis now, and were using Memcached, both of them support a way for doing locks) so I could have more than one indexer running simultaneously.
The Indexer::indexEventsCron is called from a cron from inside Magento.
Actually, we started using beanstalkd, which is a task queue, and indexing is being done from there, but the basic principle is the same.