The idea is the following:

  1. create a temporary table, copied from the table to update
  2. load data into the temporary table (use LOAD DATA INFILE if you can, or at least MySQL specific extended inserts)
  3. perform a single update query joining the temporary table with the table to update
  4. drop the temporary table

For example:

mysql> CREATE TABLE users (
mysql>   id INT NOT NULL,
mysql>   firstname VARCHAR(10),
mysql>   lastname VARCHAR(10),
mysql>   age INT,
mysql>   weight INT,
mysql>   PRIMARY KEY (id)
mysql> );
 
mysql> INSERT INTO users
mysql>   (id, firstname, lastname, age, weight)
mysql>   VALUES
mysql>   (1, 'pierrick', 'le gall', 26, 70),
mysql>   (2, 'stephane', 'mallet', 32, 75),
mysql>   (3, 'erwann', 'le gall', 2, 10)
mysql> ;
 
mysql> SELECT * FROM users;
+----+-----------+----------+------+--------+
| id | firstname | lastname | age  | weight |
+----+-----------+----------+------+--------+
|  1 | pierrick  | le gall  |   26 |     70 | 
|  2 | stephane  | mallet   |   32 |     75 | 
|  3 | erwann    | le gall  |    2 |     10 | 
+----+-----------+----------+------+--------+
 
mysql> CREATE TABLE users_update_data (
mysql>   id INT,
mysql>   age INT,
mysql>   weight INT
mysql> );
 
mysql> INSERT INTO users_update_data
mysql>   (id, age, weight)
mysql>   VALUES
mysql>   (1, 27, 65),
mysql>   (3, 3, 12)
mysql> ;
 
mysql> UPDATE
mysql>     users AS t1,
mysql>     users_update_data AS t2
mysql>   SET
mysql>     t1.age = t2.age,
mysql>     t1.weight = t2.weight
mysql>   WHERE t1.id = t2.id
mysql> ;
 
mysql> SELECT * FROM users;
+----+-----------+----------+------+--------+
| id | firstname | lastname | age  | weight |
+----+-----------+----------+------+--------+
|  1 | pierrick  | le gall  |   27 |     65 | 
|  2 | stephane  | mallet   |   32 |     75 | 
|  3 | erwann    | le gall  |    3 |     12 | 
+----+-----------+----------+------+--------+ 

Of course, for such a small data set, the improvement is not interesting... Just imagine you have 1M users and 100K lines to update.

So I've implemented this algorithm in Talend Open Studio with tMysql(Output)UpdateBulkExec. As it's a bulk operation, we create a file from the input data flow, we load it into the temporary table and perform the single update query. To illustrate the improvement with a benchmark, I've filled a table with 1M (one million) lines and updated 100K lines. The bulk updates is 6.5 times faster than standard update, the job execution time goes from 13.0 seconds to 2.0 seconds.

As described in Feature 509, this new component will come in Perl project with Talend Open Studio 2.3.0M2.