Wednesday, September 24 2008
By Pierrick Le Gall on Wednesday, September 24 2008, 00:07 - Piwigo

PhpWebGallery becomes Piwigo. The name of the project is changing. This is not a fork, just a rename.
What are the advantages of "Piwigo" over "PhpWebGallery", or "why did we decide to change?" :
- shorter : easier to remind
- unique : a search engine request on "Piwigo" will bring you to the Piwigo project pages
- no PHP in the name : I now find it odd to make the technology obvious in the application name
- keeps the PWG letters
Continue reading...
Wednesday, July 16 2008
By Pierrick Le Gall on Wednesday, July 16 2008, 00:31 - Perl
I need to load a huge number of data in memory with a Perl hash. The value corresponding to each key is an array of scalar data. The key is most of the time created with a single field of my array, but it can be made of several fields. The number of fields in the array may vary a lot, but most of the time it will be around 5 scalar values.
Continue reading...
Wednesday, June 18 2008
By Pierrick Le Gall on Wednesday, June 18 2008, 00:21 - Talend
Another new feature in Talend Open Studio 2.4.0 : the ability to install ecosytem components directly from your installed Talend Open Studio. No need to browse the web application ecosytem to find the component that fit your need.

Continue reading...
Friday, June 13 2008
By Pierrick Le Gall on Friday, June 13 2008, 00:49 - Talend

New feature in Talend Open Studio 2.4, the ability to run iterations in parallel. After a tLoop or a tFileList you can set the "number of parallel executions" on the iterate link. If you're running on a quad core computer, it might be interesting to ask for 4 executions in parallel. 4 executions in parallel means that 4 "iterations" will be executed in parallel as long as some iterations are remaining. If your tFileList finds 1,000 files, you won't have 1,000 parallel executions, but 4, theoretically reducing the total execution time by 4.
Continue reading...
Thursday, June 5 2008
By Pierrick Le Gall on Thursday, June 5 2008, 00:47 - Talend

I am proud to introduce you to the TalendForge planet. A planet is an online aggregation of several blog feeds related to a common topic. Currently registered planet members are Stéphane Mallet (Java developer at Talend), Sebastiao Correia (Java developer at Talend), Olivier Carbone (Training and Support manager at Talend) and me (Community Manager, Perl developer at Talend).
The purpose of this planet is to keep you informed about Talend Open Studio and related technologies/products, from the inside. This is not the Talend corporate blog, but a place where people who make Talend Open Studio give information to Talend Open Studio users.
As far as I'm concerned, only my posts with the "Talend" tag will be visible in this planet. TalendForge.org planet was placed into orbit thanks to Planet Planet. This Python script refreshes the planet every one hour.
The planet earth picture comes from wikipedia commons.
Thursday, May 22 2008
By Pierrick Le Gall on Thursday, May 22 2008, 00:14

With Talend Open Studio 2.4 and a Perl project, component tPipeRow appears in the palette. tPipeRow sends each input row to an external command, fetch the returned line and send it to the next component. tPipeRow does not launch the external script as many times as there are input rows, but only once at the data flow initialization. It makes the whole thing very performant. For each STDIN line, the external script must produce one STDOUT line, without buffering.
On a technical point of view, it's very interesting to see 2 scripts running in parallel and communicating through file descriptors. tPipeRow code is very short (but was not that short to write) because it simply uses IPC::Open2.
Continue reading...
Saturday, April 12 2008
By Pierrick Le Gall on Saturday, April 12 2008, 00:15 - Talend

After many months of work, Talend development team is proud to announce the public availability of our automated test results. You can browse them on talendforge.org. As said on the about view: It should allow our development team to: detect regressions, ensure backward compatibility, follow-up bug fixing
Continue reading...
Monday, March 17 2008
By Pierrick Le Gall on Monday, March 17 2008, 00:50 - Talend

In Talend Open Studio, the multithreading option makes possible to execute 2 subjobs in parallel. It was implemented in Java code generation last summer (see feature 1335) for TOS 2.1 and I implemented it for Perl code generation last monday (see feature 3302). Current multithreading option was not implemented with threads in Perl, but with processes, I fork the parent process in children.
Continue reading...
Wednesday, February 20 2008
By Pierrick Le Gall on Wednesday, February 20 2008, 23:13 - Perl

Here is a sample of a positional:
Pierrick LE GALL 026169
Erwann LE GALL 002080
Larry WALL 053174
We have the firstname on 11 characters, lastname on 11 characters, age on 3 characters and size on 3 characters. We want to extract these fields into an array. I propose to use the unpack function.
Continue reading...
Friday, December 21 2007
By Pierrick Le Gall on Friday, December 21 2007, 00:08 - Talend
Talend Open Studio 2.3.0M2 is out. Let me list you what's new concerning Perl generation, compared to the current main release 2.2.3. As you will see, Perl code generation is still in progress :-) 13 new components, 8 new features in existing components. In this blog post, I only list news about Perl code generation, there are of course more new features, they are fully listed on the official ChangeLog page in releases 2.3.0M1 and 2.3.0M2.
Continue reading...
Thursday, November 29 2007
By Pierrick Le Gall on Thursday, November 29 2007, 16:44 - Talend
3 years ago, I introduced in PhpWebGallery a very fast way to update several lines of the same table, at once. See PhpWebGallery Subversion revision 625 for details. I don't remember how this idea came to me, but I've implemented it as a component in Talend Open Studio. The purpose is to improve speed on mass updates.
The standard way to update several lines of a table, with different values for each line of course, is to perform a query for each line to update. In a web application it is a really bad thing not to know in advance the number of queries for each page. In any other situation, it's not good because it's very slow.
Continue reading...
Wednesday, November 28 2007
By Pierrick Le Gall on Wednesday, November 28 2007, 11:33 - Talend
In feature 2378, I've implemented MySQL specific extended insert mode. Extended insert means that instead of inserting lines one by one, you insert many lines in the same insert query. Don't get confuse with a transaction mecanism, it's not. The advantage is speed.
To illustrate the performance improvement we'll have in Talend Open Studio 2.3.0M2 using extended inserts, I've created a benchmark : we read lines from a delimited file and we insert them in a table. 3 simple fields per line (numeric id, firstname, lastname). 1 million of lines to insert.
Continue reading...
Friday, November 23 2007
By Pierrick Le Gall on Friday, November 23 2007, 17:51 - Talend
I've updated the first Talend Open Studio "use case" I wrote nearly one year ago with release 1.1.0RC1. This time I use new feature from Talend Open Studio 2.2.x : tUnite and tNormalize avoid the temporary file and the "include sub directories" option in tFileList makes the job smarter.

Monday, October 22 2007
By Pierrick Le Gall on Monday, October 22 2007, 22:09 - GNU/linux

We're using Debian Etch (with GNU/Linux) as a server at Talend office. We need to reach a remote Microsoft SQL Server database. The first step is to perform a select query in the command line.
We need to install FreeTDS: FreeTDS is a set of libraries for Unix and Linux that allows your programs to natively talk to Microsoft SQL Server and Sybase databases.
. We have to define an "interface" for the Microsoft SQL Server in the FreeTDS "interfaces" file. At the end of the line,w use sqsh, a command line client for Sybase and Microsoft SQL Server.
Continue reading...
Friday, September 7 2007
By Pierrick Le Gall on Friday, September 7 2007, 16:12 - GNU/linux

A long time ago, I've tried to use connect to a SSH server with my private key in a batch mode (with a cron task). I didn't find the way to do it. Now I have. It is as simple as to have no passphrase on your private key. Less secure (but still much more secure than FTP connection) but makes SSH possible in cron task.
Continue reading...
Tuesday, August 21 2007
By Pierrick Le Gall on Tuesday, August 21 2007, 18:06 - Talend

Richard and I have both worked 2 weeks on a main improvement proposed by Richard.
.----------------------------------------------------.
| job | TOS 2.1.1 | TOS 2.2.0M1 | improvement |
+------------+-----------+-------------+-------------+
| Scenario 2 | 20.8 s | 16.9 s | 18.8 % |
| Scenario 3 | 81.2 s | 30.4 s | 62.6 % |
'------------+-----------+-------------+-------------'
Continue reading...
Friday, July 6 2007
By Pierrick Le Gall on Friday, July 6 2007, 12:08 - Développement

As another reminder for myself, here is a list of join examples with MySQL (to compare with Oracle behaviour in previous blog ticket)
Continue reading...
By Pierrick Le Gall on Friday, July 6 2007, 11:56 - Développement

As a reminder for myself, here is a list of join examples using Oracle.
Continue reading...
Thursday, May 10 2007
By Pierrick Le Gall on Thursday, May 10 2007, 10:10 - Opensource
PEM is an opensource web application that let project users share their own project extensions. PEM stands for Project Extension Manager.

Continue reading...
Tuesday, April 17 2007
By Pierrick Le Gall on Tuesday, April 17 2007, 11:48 - Subversion

When your Subversion repository gets bigger and bigger, you need to find a solution to backup only what's new, and not the whole repository. Thanks to Subversion revisions, we can easily identify what's new since last backup. I've used this principle to write a Perl script making incremental backup.
Continue reading...