MySQL bulk update with Talend Open Studio
Par Pierrick, jeudi 29 novembre 2007 à 16:44 / categorie: Talend / tags: / #108 / rss
3 years ago, I introduced in PhpWebGallery a very fast way to update several lines of the same table, at once. See PhpWebGallery Subversion revision 625 for details. I don't remember how this idea came to me, but I've implemented it as a component in Talend Open Studio. The purpose is to improve speed on mass updates.
The standard way to update several lines of a table, with different values for each line of course, is to perform a query for each line to update. In a web application it is a really bad thing not to know in advance the number of queries for each page. In any other situation, it's not good because it's very slow.
The idea is the following:
- create a temporary table, copied from the table to update
- load data into the temporary table (use LOAD DATA INFILE if you can, or at least MySQL specific extended inserts)
- perform a single update query joining the temporary table with the table to update
- drop the temporary table
For example:
mysql> CREATE TABLE users ( mysql> id INT NOT NULL, mysql> firstname VARCHAR(10), mysql> lastname VARCHAR(10), mysql> age INT, mysql> weight INT, mysql> PRIMARY KEY (id) mysql> ); mysql> INSERT INTO users mysql> (id, firstname, lastname, age, weight) mysql> VALUES mysql> (1, 'pierrick', 'le gall', 26, 70), mysql> (2, 'stephane', 'mallet', 32, 75), mysql> (3, 'erwann', 'le gall', 2, 10) mysql> ; mysql> SELECT * FROM users; +----+-----------+----------+------+--------+ | id | firstname | lastname | age | weight | +----+-----------+----------+------+--------+ | 1 | pierrick | le gall | 26 | 70 | | 2 | stephane | mallet | 32 | 75 | | 3 | erwann | le gall | 2 | 10 | +----+-----------+----------+------+--------+ mysql> CREATE TABLE users_update_data ( mysql> id INT, mysql> age INT, mysql> weight INT mysql> ); mysql> INSERT INTO users_update_data mysql> (id, age, weight) mysql> VALUES mysql> (1, 27, 65), mysql> (3, 3, 12) mysql> ; mysql> UPDATE mysql> users AS t1, mysql> users_update_data AS t2 mysql> SET mysql> t1.age = t2.age, mysql> t1.weight = t2.weight mysql> WHERE t1.id = t2.id mysql> ; mysql> SELECT * FROM users; +----+-----------+----------+------+--------+ | id | firstname | lastname | age | weight | +----+-----------+----------+------+--------+ | 1 | pierrick | le gall | 27 | 65 | | 2 | stephane | mallet | 32 | 75 | | 3 | erwann | le gall | 3 | 12 | +----+-----------+----------+------+--------+
Of course, for such a small data set, the improvement is not interesting... Just imagine you have 1M users and 100K lines to update.
So I've implemented this algorithm in Talend Open Studio with tMysql(Output)UpdateBulkExec. As it's a bulk operation, we create a file from the input data flow, we load it into the temporary table and perform the single update query. To illustrate the improvement with a benchmark, I've filled a table with 1M (one million) lines and updated 100K lines. The bulk updates is 6.5 times faster than standard update, the job execution time goes from 13.0 seconds to 2.0 seconds.
As described in Feature 509, this new component will come in Perl project with Talend Open Studio 2.3.0M2.
Commentaires
Aucun commentaire pour le moment.
Ajouter un commentaire
Les commentaires pour ce billet sont fermés.