Talend 2.2.0M1 and Perl code performances
Par Pierrick, mardi 21 août 2007 à 18:06 / categorie: Talend / tags: / #102 / rss

Richard and I have both worked 2 weeks on a main improvement proposed by Richard.
.----------------------------------------------------. | job | TOS 2.1.1 | TOS 2.2.0M1 | improvement | +------------+-----------+-------------+-------------+ | Scenario 2 | 20.8 s | 16.9 s | 18.8 % | | Scenario 3 | 81.2 s | 30.4 s | 62.6 % | '------------+-----------+-------------+-------------'
- Scenario 3 details in Talendforge wiki
- Scenario 2 details in Talendforge wiki
Each component works on a Perl array. This Perl array is the translation of a database row or a file line into what I call a Talend row: @row. In Talend 2.1.x and earlier, @row was copied from the input connection to the current component which transforms the @row and then copy it to the output connection. 2 copies for each component.

In this job example, somewhere in the code, we had:
while (my @tMysqlInput_1 = $sth->fetchrow_array()) { # ... my @row3 = @tMysqlInput_1; my @tFilterRow_1 = @row3; # ... my @row4 = @tFilterRow_1; my @tLogRow_1 = @row4; # ... }
The new way of doing things is:
while (my $tMysqlInput_1 = $sth->fetchrow_arrayref()) { # ... my $row3 = $tMysqlInput_1; my $tFilterRow_1 = $row3; # ... my $row4 = $tFilterRow_1; my $tLogRow_1 = $row4; # ... }
Where $tMysqlInput_1 is an array reference. No data copy, only memory address copy.
In a job that has very few components (not many copies) and a small schema (@row will be very small), this is not a real problem. Of course, in a complex job with huge schema the improvement becomes very interesting.
Commentaires
1. Le jeudi 23 août 2007 à 15:25, par VDigital
Ajouter un commentaire
Les commentaires pour ce billet sont fermés.