Pierrick Le Gall website

To content | To menu | To search

Wednesday, September 24 2008

PhpWebGallery turns Piwigo

PhpWebGallery logo Piwigo logo

PhpWebGallery becomes Piwigo. The name of the project is changing. This is not a fork, just a rename.

What are the advantages of "Piwigo" over "PhpWebGallery", or "why did we decide to change?" :

  • shorter : easier to remind
  • unique : a search engine request on "Piwigo" will bring you to the Piwigo project pages
  • no PHP in the name : I now find it odd to make the technology obvious in the application name
  • keeps the PWG letters

Continue reading...

Wednesday, July 16 2008

When memory matters

memory.jpg I need to load a huge number of data in memory with a Perl hash. The value corresponding to each key is an array of scalar data. The key is most of the time created with a single field of my array, but it can be made of several fields. The number of fields in the array may vary a lot, but most of the time it will be around 5 scalar values.

Continue reading...

Wednesday, June 18 2008

Install ecosystem components directly from Talend Open Studio

Another new feature in Talend Open Studio 2.4.0 : the ability to install ecosytem components directly from your installed Talend Open Studio. No need to browse the web application ecosytem to find the component that fit your need.

post_117-01.png

Continue reading...

Friday, June 13 2008

Parallel executions on iterate links

iterate_parallel-1.png

New feature in Talend Open Studio 2.4, the ability to run iterations in parallel. After a tLoop or a tFileList you can set the "number of parallel executions" on the iterate link. If you're running on a quad core computer, it might be interesting to ask for 4 executions in parallel. 4 executions in parallel means that 4 "iterations" will be executed in parallel as long as some iterations are remaining. If your tFileList finds 1,000 files, you won't have 1,000 parallel executions, but 4, theoretically reducing the total execution time by 4.

Continue reading...

Thursday, June 5 2008

TalendForge.org planet is on its orbit

planet earth

I am proud to introduce you to the TalendForge planet. A planet is an online aggregation of several blog feeds related to a common topic. Currently registered planet members are Stéphane Mallet (Java developer at Talend), Sebastiao Correia (Java developer at Talend), Olivier Carbone (Training and Support manager at Talend) and me (Community Manager, Perl developer at Talend).

The purpose of this planet is to keep you informed about Talend Open Studio and related technologies/products, from the inside. This is not the Talend corporate blog, but a place where people who make Talend Open Studio give information to Talend Open Studio users.

As far as I'm concerned, only my posts with the "Talend" tag will be visible in this planet. TalendForge.org planet was placed into orbit thanks to Planet Planet. This Python script refreshes the planet every one hour.

The planet earth picture comes from wikipedia commons.

Thursday, May 22 2008

External command piped to Talend data flow

post_114-01.png

With Talend Open Studio 2.4 and a Perl project, component tPipeRow appears in the palette. tPipeRow sends each input row to an external command, fetch the returned line and send it to the next component. tPipeRow does not launch the external script as many times as there are input rows, but only once at the data flow initialization. It makes the whole thing very performant. For each STDIN line, the external script must produce one STDOUT line, without buffering.

On a technical point of view, it's very interesting to see 2 scripts running in parallel and communicating through file descriptors. tPipeRow code is very short (but was not that short to write) because it simply uses IPC::Open2.

Continue reading...

Saturday, April 12 2008

Automated test results on talendforge.org

colibri.png

After many months of work, Talend development team is proud to announce the public availability of our automated test results. You can browse them on talendforge.org. As said on the about view: It should allow our development team to: detect regressions, ensure backward compatibility, follow-up bug fixing

Continue reading...

Monday, March 17 2008

multithreading for Perl jobs

feature3302_no_parallel.png feature3302_parallel.png

In Talend Open Studio, the multithreading option makes possible to execute 2 subjobs in parallel. It was implemented in Java code generation last summer (see feature 1335) for TOS 2.1 and I implemented it for Perl code generation last monday (see feature 3302). Current multithreading option was not implemented with threads in Perl, but with processes, I fork the parent process in children.

Continue reading...

Wednesday, February 20 2008

Extract fields from a positionnal file with Perl

perlpowered.png

Here is a sample of a positional:

Pierrick   LE GALL    026169
Erwann     LE GALL    002080
Larry      WALL       053174

We have the firstname on 11 characters, lastname on 11 characters, age on 3 characters and size on 3 characters. We want to extract these fields into an array. I propose to use the unpack function.

Continue reading...

Friday, December 21 2007

Talend Open Studio 2.3.0M2 is out

tos.png Talend Open Studio 2.3.0M2 is out. Let me list you what's new concerning Perl generation, compared to the current main release 2.2.3. As you will see, Perl code generation is still in progress :-) 13 new components, 8 new features in existing components. In this blog post, I only list news about Perl code generation, there are of course more new features, they are fully listed on the official ChangeLog page in releases 2.3.0M1 and 2.3.0M2.

Continue reading...

Thursday, November 29 2007

MySQL bulk update with Talend Open Studio

3 years ago, I introduced in PhpWebGallery a very fast way to update several lines of the same table, at once. See PhpWebGallery Subversion revision 625 for details. I don't remember how this idea came to me, but I've implemented it as a component in Talend Open Studio. The purpose is to improve speed on mass updates.

The standard way to update several lines of a table, with different values for each line of course, is to perform a query for each line to update. In a web application it is a really bad thing not to know in advance the number of queries for each page. In any other situation, it's not good because it's very slow.

Continue reading...

Wednesday, November 28 2007

MySQL extended insert mode in Talend Open Studio

In feature 2378, I've implemented MySQL specific extended insert mode. Extended insert means that instead of inserting lines one by one, you insert many lines in the same insert query. Don't get confuse with a transaction mecanism, it's not. The advantage is speed.

To illustrate the performance improvement we'll have in Talend Open Studio 2.3.0M2 using extended inserts, I've created a benchmark : we read lines from a delimited file and we insert them in a table. 3 simple fields per line (numeric id, firstname, lastname). 1 million of lines to insert.

Continue reading...

Friday, November 23 2007

New whitelist generator with TOS 2.3.0M1

I've updated the first Talend Open Studio "use case" I wrote nearly one year ago with release 1.1.0RC1. This time I use new feature from Talend Open Studio 2.2.x : tUnite and tNormalize avoid the temporary file and the "include sub directories" option in tFileList makes the job smarter.

whitelist generator with TOS, version 2

Monday, October 22 2007

Debian Linux as a Microsoft SQL Server client

Debian logo Microsoft SQL Server

We're using Debian Etch (with GNU/Linux) as a server at Talend office. We need to reach a remote Microsoft SQL Server database. The first step is to perform a select query in the command line.

We need to install FreeTDS: FreeTDS is a set of libraries for Unix and Linux that allows your programs to natively talk to Microsoft SQL Server and Sybase databases.. We have to define an "interface" for the Microsoft SQL Server in the FreeTDS "interfaces" file. At the end of the line,w use sqsh, a command line client for Sybase and Microsoft SQL Server.

Continue reading...

Friday, September 7 2007

SSH, key authentication and batch mode

OpenSSH logo

A long time ago, I've tried to use connect to a SSH server with my private key in a batch mode (with a cron task). I didn't find the way to do it. Now I have. It is as simple as to have no passphrase on your private key. Less secure (but still much more secure than FTP connection) but makes SSH possible in cron task.

Continue reading...

Tuesday, August 21 2007

Talend 2.2.0M1 and Perl code performances

Talend logo

Richard and I have both worked 2 weeks on a main improvement proposed by Richard.

.----------------------------------------------------.
| job        | TOS 2.1.1 | TOS 2.2.0M1 | improvement |
+------------+-----------+-------------+-------------+
| Scenario 2 |    20.8 s |      16.9 s |      18.8 % |
| Scenario 3 |    81.2 s |      30.4 s |      62.6 % |
'------------+-----------+-------------+-------------'

Continue reading...

Friday, July 6 2007

MySQL joins

MySQL logo

As another reminder for myself, here is a list of join examples with MySQL (to compare with Oracle behaviour in previous blog ticket)

Continue reading...

Oracle joins

Oracle logo

As a reminder for myself, here is a list of join examples using Oracle.

Continue reading...

Thursday, May 10 2007

PEM

PEM is an opensource web application that let project users share their own project extensions. PEM stands for Project Extension Manager.

PEM in action

Continue reading...

Tuesday, April 17 2007

Subversion incremental backup

Subversion log

When your Subversion repository gets bigger and bigger, you need to find a solution to backup only what's new, and not the whole repository. Thanks to Subversion revisions, we can easily identify what's new since last backup. I've used this principle to write a Perl script making incremental backup.

Continue reading...

- page 1 of 2