<?xml version="1.0" encoding="UTF-8" ?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
  xmlns:admin="http://webns.net/mvcb/"
  xmlns:content="http://purl.org/rss/1.0/modules/content/"
  xmlns="http://purl.org/rss/1.0/">

<channel rdf:about="http://le-gall.net/pierrick/blog/index.php/">
  <title>Blog de Pierrick Le Gall</title>
  <description><![CDATA[]]></description>
  <link>http://le-gall.net/pierrick/blog/index.php/</link>
  <dc:language>fr</dc:language>
  <dc:creator></dc:creator>
  <dc:rights></dc:rights>
  <dc:date>2008-02-21T00:59:39+01:00</dc:date>
  <admin:generatorAgent rdf:resource="http://www.dotclear.net/" />
  
  <sy:updatePeriod>daily</sy:updatePeriod>
  <sy:updateFrequency>1</sy:updateFrequency>
  <sy:updateBase>2008-02-21T00:59:39+01:00</sy:updateBase>
  
  <items>
  <rdf:Seq>
    <rdf:li rdf:resource="http://le-gall.net/pierrick/blog/index.php/2008/02/21/111-ce-blog-a-demenage" />
  <rdf:li rdf:resource="http://le-gall.net/pierrick/blog/index.php/2008/02/21/110-this-blog-has-moved" />
  <rdf:li rdf:resource="http://le-gall.net/pierrick/blog/index.php/2007/12/21/109-talend-open-studio-230m2-is-out" />
  <rdf:li rdf:resource="http://le-gall.net/pierrick/blog/index.php/2007/11/29/108-mysql-bulk-update-with-talend-open-studio" />
  <rdf:li rdf:resource="http://le-gall.net/pierrick/blog/index.php/2007/11/28/107-mysql-extended-insert-mode-in-talend-open-studio" />
  <rdf:li rdf:resource="http://le-gall.net/pierrick/blog/index.php/2007/11/23/106-new-whitelist-generator-with-tos-230m1" />
  <rdf:li rdf:resource="http://le-gall.net/pierrick/blog/index.php/2007/11/12/105-talend-open-studio-aux-journees-perl-2007" />
  <rdf:li rdf:resource="http://le-gall.net/pierrick/blog/index.php/2007/10/22/104-debian-linux-as-a-microsoft-sql-server-client" />
  <rdf:li rdf:resource="http://le-gall.net/pierrick/blog/index.php/2007/09/07/103-ssh-key-authentication-and-batch-mode" />
  <rdf:li rdf:resource="http://le-gall.net/pierrick/blog/index.php/2007/08/21/102-talend-220m1-and-perl-code-performances" />
  </rdf:Seq>
  </items>
</channel>

<item rdf:about="http://le-gall.net/pierrick/blog/index.php/2008/02/21/111-ce-blog-a-demenage">
  <title>Ce blog a déménagé</title>
  <link>http://le-gall.net/pierrick/blog/index.php/2008/02/21/111-ce-blog-a-demenage</link>
  <dc:date>2008-02-21T00:59:39+01:00</dc:date>
  <dc:language>fr</dc:language>
  <dc:creator>Pierrick</dc:creator>
  <dc:subject>General</dc:subject>
  <description>Ce blog a déménagé : nouveau blog (français uniquement)


Je suis en train de diviser mon site web en 2 sections : anglais et français. Je ne souhaite plus lire du français et de l'anglais sur la même page. Je pense que c'est déroutant pour tous les lecteurs.


Le système antispam...</description>
  <content:encoded><![CDATA[ <p>Ce blog a déménagé&nbsp;: <a href="http://le-gall.net/pierrick/fr/blog" hreflang="fr">nouveau blog</a> (français uniquement)</p>


<p>Je suis en train de diviser mon site web en 2 sections&nbsp;: anglais et français. Je ne souhaite plus lire du français et de l'anglais sur la même page. Je pense que c'est déroutant pour tous les lecteurs.</p>


<p>Le système antispam du nouveau blog est plus avancé, les commentaires y sont donc ouvert. J'espère que vous réagirez à mes billets.</p>]]></content:encoded>
</item>
<item rdf:about="http://le-gall.net/pierrick/blog/index.php/2008/02/21/110-this-blog-has-moved">
  <title>This blog has moved</title>
  <link>http://le-gall.net/pierrick/blog/index.php/2008/02/21/110-this-blog-has-moved</link>
  <dc:date>2008-02-21T00:21:50+01:00</dc:date>
  <dc:language>fr</dc:language>
  <dc:creator>Pierrick</dc:creator>
  <dc:subject>General</dc:subject>
  <description>This blog has moved to a new location: new blog (english only)


I'm dividing my website in 2 sections: english and french. I don't want anymore to read french and english on the same page. I think it's confusing for both readers.


The new blog antispam system is more advanced, so comments are...</description>
  <content:encoded><![CDATA[ <p>This blog has moved to a new location: <a href="http://le-gall.net/pierrick/en/blog" hreflang="en">new blog</a> (english only)</p>


<p>I'm dividing my website in 2 sections: english and french. I don't want anymore to read french and english on the same page. I think it's confusing for both readers.</p>


<p>The new blog antispam system is more advanced, so comments are open, I hope you'll react to my post.</p>]]></content:encoded>
</item>
<item rdf:about="http://le-gall.net/pierrick/blog/index.php/2007/12/21/109-talend-open-studio-230m2-is-out">
  <title>Talend Open Studio 2.3.0M2 is out</title>
  <link>http://le-gall.net/pierrick/blog/index.php/2007/12/21/109-talend-open-studio-230m2-is-out</link>
  <dc:date>2007-12-21T00:08:07+01:00</dc:date>
  <dc:language>en</dc:language>
  <dc:creator>Pierrick</dc:creator>
  <dc:subject>Talend</dc:subject>
  <description> Talend Open Studio 2.3.0M2 is out. Let me list you what's new concerning Perl generation, compared to the current main release 2.2.3. As you will see, Perl code generation is still in progress :-) 13 new components, 8 new features in existing components. In this blog post, I only list news about Perl code generation, there are of course more new features, they are fully listed on the official ChangeLog page in releases 2.3.0M1 and 2.3.0M2.</description>
  <content:encoded><![CDATA[<p><img src="/pierrick/blog/images/post_109/tos.png" alt="TOS logo" style="float:left; margin: 0 1em 1em 0;" /> Talend Open Studio 2.3.0M2 is out. Let me list you what's new concerning Perl generation, compared to the current <em>main</em> release 2.2.3. As you will see, Perl code generation is still in progress :-) 13 new components, 8 new features in existing components. In this blog post, I only list news about Perl code generation, there are of course more new features, they are fully listed on the <a href="http://talendforge.org/bugs/changelog_page.php" hreflang="en">official ChangeLog page</a> in releases 2.3.0M1 and 2.3.0M2.</p> <ul>
<li><a href="http://talendforge.org/bugs/view.php?id=2299" hreflang="en">Feature 2299</a> new component tPivotOutputDelimited fills a schema dynamically created based on input data content.</li>
<li><a href="http://talendforge.org/bugs/view.php?id=1675" hreflang="en">Feature 1675</a> tFileInputPositionnal component can optionally trim (left an right) all columns by itself, no need of a dedicated tPerlRow for this task.</li>
<li><a href="http://talendforge.org/bugs/view.php?id=491" hreflang="en">Feature 491</a> new component tPostgresqlOutputBulkExec (tPOBE) performs fast inserts in a PostgreSQL database. Contrary to tMysqlOutputBulkExec, tPOBE does not write a temporary file before loading it into the database. <a href="http://talendforge.org/bugs/view.php?id=2582" hreflang="en">Feature 2582</a> adds components tPostgresqlOutputBulk (tPOB) and tPostgresqlBulkExec (tPBE) with the same behaviour as MySQL equivalents : tPOB writes a delimited file prepared to be loaded in a PostgreSQL table, tPBE load a file in a table.</li>
<li><a href="http://talendforge.org/bugs/view.php?id=2243" hreflang="en">Feature 2243</a> new component tSchemaComplianceCheck can be used anywhere in your data flow. If the row does not match the constraints specified in the schema, the row goes to the reject output.</li>
<li><a href="http://talendforge.org/bugs/view.php?id=2274" hreflang="en">Feature 2274</a> tFileInputCSV merged into tFileInputDelimited and <a href="http://talendforge.org/bugs/view.php?id=1804" hreflang="en">Feature 1804</a> tFileOutputCSV merged into tFileOutputDelimited. We have a single tFile*Delimited with extra options related to the more complex CSV format.</li>
<li><a href="http://talendforge.org/bugs/view.php?id=2242" hreflang="en">Feature 2242</a> new component tReplaceList let the user define a list of search &amp; replace words in an external data source. For example, you can ask the communication team to write the bad words search and replace list in a database.</li>
<li><a href="http://talendforge.org/bugs/view.php?id=2378" hreflang="en">Feature 2378</a> MySQL specific extended insert mode added to tMysqlOutput, as described in a recent blog post <a href="http://le-gall.net/pierrick/blog/index.php/2007/11/28/107" hreflang="en">MySQL extended insert mode in Talend Open Studio</a>. Performances are strongly increased.</li>
<li><a href="http://talendforge.org/bugs/view.php?id=2300" hreflang="en">Feature 2300</a> table actions (create, clear, truncate, etc.) are available in PostgreSQL bulk components, same thing for MySQL with <a href="http://talendforge.org/bugs/view.php?id=2499" hreflang="en">Feature 2499</a>.</li>
<li><a href="http://talendforge.org/bugs/view.php?id=2515" hreflang="en">Feature 2515</a> new component tSocketInput and <a href="http://talendforge.org/bugs/view.php?id=2438" hreflang="en">Feature 2438</a> new component tWaitForSocket.</li>
<li><a href="http://talendforge.org/bugs/view.php?id=2552" hreflang="en">Feature 2552</a> component tSVNLogInput integrated <a href="http://talendforge.org/ext/extension_view.php?eid=4" hreflang="en">from ecosystem</a>.</li>
<li><a href="http://talendforge.org/bugs/view.php?id=509" hreflang="en">Feature 509</a> new component tMysqlUpdateBulkExec described in a recent blog post <a href="http://le-gall.net/pierrick/blog/index.php/2007/11/29/108" hreflang="en">MySQL bulk update with Talend Open Studio</a>, performance are also increasing. <a href="http://talendforge.org/bugs/view.php?id=2567" hreflang="en">Feature 2567</a> is the equivalent for PostgreSQL.</li>
<li><a href="http://talendforge.org/bugs/view.php?id=2063" hreflang="en">Feature 2063</a> in tMysqlOutput, user can select update keys independently from schema. Same for <em>"updatable"</em> and <em>"insertable"</em> columns. This feature was a bit complicated to code and I wait for user feedback before adding it to other database output component.</li>
<li><a href="http://talendforge.org/bugs/view.php?id=2539" hreflang="en">Feature 2539</a> new component tJoin. tJoin is a simplified tMap, it does only a join (inner or outer) between 2 data sets. In my opinion, the advantages of tJoin compared to tMap are: simpler code (less bugs), graphically more obvious (a tMap can do so many thing you have to open it to remember what's done inside). The drawback of tJoin compared to tMap is mainly to have a standard GUI, no drag &amp; drop and beautiful arrows.</li>
<li><a href="http://talendforge.org/bugs/view.php?id=2032" hreflang="en">Feature 2032</a> new component tSystemInput can produce a data flow based on a system command output, for example, you can perform a Unix <em>find</em>, filter and write the file listing in a database (to perform a file listing, you'd better use tFileList, it was just an example of Unix well known command producing line by line output).</li>
<li><a href="http://talendforge.org/bugs/view.php?id=964" hreflang="en">Feature 964</a> father job doesn't follow the OnError link if child job has already caught the error in its own OnError link. This feature might be very important for users using deep hierarchy job calls.</li>
<li><a href="http://talendforge.org/bugs/view.php?id=2110" hreflang="en">Feature 2110</a> tFileList can match files and/or directories.</li>
</ul>]]></content:encoded>
</item>
<item rdf:about="http://le-gall.net/pierrick/blog/index.php/2007/11/29/108-mysql-bulk-update-with-talend-open-studio">
  <title>MySQL bulk update with Talend Open Studio</title>
  <link>http://le-gall.net/pierrick/blog/index.php/2007/11/29/108-mysql-bulk-update-with-talend-open-studio</link>
  <dc:date>2007-11-29T16:44:32+01:00</dc:date>
  <dc:language>en</dc:language>
  <dc:creator>Pierrick</dc:creator>
  <dc:subject>Talend</dc:subject>
  <description>  3 years ago, I introduced in PhpWebGallery a very fast way to update several lines of the same table, at once. See PhpWebGallery Subversion revision 625 for details. I don't remember how this idea came to me, but I've implemented it as a component in Talend Open Studio. The purpose is to improve speed on mass updates.


The standard way to update several lines of a table, with different values for each line of course, is to perform a query for each line to update. In a web application it is a really bad thing not to know in advance the number of queries for each page. In any other situation, it's not good because it's very slow.</description>
  <content:encoded><![CDATA[<p><img src="/pierrick/blog/images/post_108/logo-talend-fast.png" alt="" style="float:right; margin: 0 0 1em 1em;" /> <img src="/pierrick/blog/images/post_108/mysql.gif" alt="" style="float:right; margin: 0 0 1em 1em;" /> 3 years ago, I introduced in PhpWebGallery a very fast way to update several lines of the same table, at once. See <a href="http://svn.gna.org/viewcvs/phpwebgallery?rev=625&amp;view=rev" hreflang="en">PhpWebGallery Subversion revision 625</a> for details. I don't remember how this idea came to me, but I've implemented it as a component in Talend Open Studio. The purpose is to improve speed on mass updates.</p>


<p>The standard way to update several lines of a table, with different values for each line of course, is to perform a query for each line to update. In a web application it is a really bad thing not to know in advance the number of queries for each page. In any other situation, it's not good because it's very slow.</p> <p>The idea is the following:</p>
<ol>
<li>create a temporary table, copied from the table to update</li>
<li>load data into the temporary table (use LOAD DATA INFILE if you can, or at least MySQL specific extended inserts)</li>
<li>perform a single update query joining the temporary table with the table to update</li>
<li>drop the temporary table</li>
</ol>

<p>For example:</p>

<pre>[mysql]
mysql&gt; create table users (
mysql&gt;   id int not null,
mysql&gt;   firstname varchar(10),
mysql&gt;   lastname varchar(10),
mysql&gt;   age int,
mysql&gt;   weight int,
mysql&gt;   primary key (id)
mysql&gt; );

mysql&gt; insert into users
mysql&gt;   (id, firstname, lastname, age, weight)
mysql&gt;   VALUES
mysql&gt;   (1, 'pierrick', 'le gall', 26, 70),
mysql&gt;   (2, 'stephane', 'mallet', 32, 75),
mysql&gt;   (3, 'erwann', 'le gall', 2, 10)
mysql&gt; ;

mysql&gt; select * from users;
+----+-----------+----------+------+--------+
| id | firstname | lastname | age  | weight |
+----+-----------+----------+------+--------+
|  1 | pierrick  | le gall  |   26 |     70 | 
|  2 | stephane  | mallet   |   32 |     75 | 
|  3 | erwann    | le gall  |    2 |     10 | 
+----+-----------+----------+------+--------+

mysql&gt; create table users_update_data (
mysql&gt;   id int,
mysql&gt;   age int,
mysql&gt;   weight int
mysql&gt; );

mysql&gt; insert into users_update_data
mysql&gt;   (id, age, weight)
mysql&gt;   VALUES
mysql&gt;   (1, 27, 65),
mysql&gt;   (3, 3, 12)
mysql&gt; ;

mysql&gt; UPDATE
mysql&gt;     users AS t1,
mysql&gt;     users_update_data AS t2
mysql&gt;   SET
mysql&gt;     t1.age = t2.age,
mysql&gt;     t1.weight = t2.weight
mysql&gt;   WHERE t1.id = t2.id
mysql&gt; ;

mysql&gt; SELECT * FROM users;
+----+-----------+----------+------+--------+
| id | firstname | lastname | age  | weight |
+----+-----------+----------+------+--------+
|  1 | pierrick  | le gall  |   27 |     65 | 
|  2 | stephane  | mallet   |   32 |     75 | 
|  3 | erwann    | le gall  |    3 |     12 | 
+----+-----------+----------+------+--------+
</pre>


<p>Of course, for such a small data set, the improvement is not interesting... Just imagine you have 1M users and 100K lines to update.</p>


<p>So I've implemented this algorithm in Talend Open Studio with tMysql(Output)UpdateBulkExec. As it's a bulk operation, we create a file from the input data flow, we load it into the temporary table and perform the single update query. To illustrate the improvement with a benchmark, I've filled a table with 1M (one million) lines and updated 100K lines. The bulk updates is 6.5 times faster than standard update, the job execution time goes from 13.0 seconds to 2.0 seconds.</p>


<p>As described in <a href="http://talendforge.org/bugs/view.php?id=509" hreflang="en">Feature 509</a>, this new component will come in Perl project with Talend Open Studio 2.3.0M2.</p>]]></content:encoded>
</item>
<item rdf:about="http://le-gall.net/pierrick/blog/index.php/2007/11/28/107-mysql-extended-insert-mode-in-talend-open-studio">
  <title>MySQL extended insert mode in Talend Open Studio</title>
  <link>http://le-gall.net/pierrick/blog/index.php/2007/11/28/107-mysql-extended-insert-mode-in-talend-open-studio</link>
  <dc:date>2007-11-28T11:33:32+01:00</dc:date>
  <dc:language>en</dc:language>
  <dc:creator>Pierrick</dc:creator>
  <dc:subject>Talend</dc:subject>
  <description> In feature 2378, I've implemented MySQL specific extended insert mode. Extended insert means that instead of inserting lines one by one, you insert many lines in the same insert query. Don't get confuse with a transaction mecanism, it's not. The advantage is speed.


To illustrate the performance improvement we'll have in Talend Open Studio 2.3.0M2 using extended inserts, I've created a benchmark : we read lines from a delimited file and we insert them in a table. 3 simple fields per line (numeric id, firstname, lastname). 1 million of lines to insert.</description>
  <content:encoded><![CDATA[<p><img src="/pierrick/blog/images/post_107/post_107-02.png" alt="" style="float:left; margin: 0 1em 1em 0;" /> In <a href="http://talendforge.org/bugs/view.php?id=2378" hreflang="en">feature 2378</a>, I've implemented MySQL specific extended insert mode. Extended insert means that instead of inserting lines one by one, you insert many lines in the same insert query. Don't get confuse with a transaction mecanism, it's not. The advantage is speed.</p>


<p>To illustrate the performance improvement we'll have in Talend Open Studio 2.3.0M2 using extended inserts, I've created a benchmark : we read lines from a delimited file and we insert them in a table. 3 simple fields per line (numeric id, firstname, lastname). 1 million of lines to insert.</p> <p><img src="/pierrick/blog/images/post_107/post_107-01.png" alt="" /> <img src="/pierrick/blog/images/post_107/post_107-03.png" alt="" /></p>


<pre>rows per insert  job execution time improvement N times faster
             1               102.7         N/A              1
            10                28.6      72.1 %            3.6
           100                15.6      84.8 %            6.6
           500                14.1      86.3 %            7.3
          1000                14.3      86.1 %            7.2
          5000                13.9      86.5 %            7.4
         10000                14.7      85.7 %            7.0</pre>



<p>As you can read, performance improvement is huge. When inserting lines by block of 1000, the whole job is nearly 8 times faster. We also see that a too high value decreases performance. This is because the more we have high value, the more we use memory.</p>


<p>Warning: don't use a very high value for this new parameter because MySQL limits the size of the query in max_allowed_packet, which is 16M by default on my MySQL 5.0.45.</p>


<p>As a comparison, the bulk insert is still a lot faster. The same operation is done in 6.5 seconds.</p>]]></content:encoded>
</item>
<item rdf:about="http://le-gall.net/pierrick/blog/index.php/2007/11/23/106-new-whitelist-generator-with-tos-230m1">
  <title>New whitelist generator with TOS 2.3.0M1</title>
  <link>http://le-gall.net/pierrick/blog/index.php/2007/11/23/106-new-whitelist-generator-with-tos-230m1</link>
  <dc:date>2007-11-23T17:51:10+01:00</dc:date>
  <dc:language>en</dc:language>
  <dc:creator>Pierrick</dc:creator>
  <dc:subject>Talend</dc:subject>
  <description>I've updated the first Talend Open Studio "use case" I wrote nearly one year ago with release 1.1.0RC1. This time I use new feature from Talend Open Studio 2.2.x : tUnite and tNormalize avoid the temporary file and the "include sub directories" option in tFileList makes the job smarter.





Full...</description>
  <content:encoded><![CDATA[ <p>I've updated the first Talend Open Studio "use case" I wrote nearly one year ago with release 1.1.0RC1. This time I use new feature from Talend Open Studio 2.2.x : tUnite and tNormalize avoid the temporary file and the "include sub directories" option in tFileList makes the job smarter.</p>


<p><img src="/pierrick/blog/images/post_106/use_case_01_v2-01.png" alt="whitelist generator with TOS, version 2" style="display:block; margin:0 auto;" /></p>

<ul>
<li><a href="http://talendforge.org/wiki/doku.php?id=use_case:1" hreflang="en">Full description of the use case, in talendforge.org wiki</a></li>
<li><a href="http://le-gall.net/pierrick/blog/index.php/2007/01/03/92-whitelist-generator-with-talend-open-studio" hreflang="en">First use case, one year ago</a></li>
</ul>]]></content:encoded>
</item>
<item rdf:about="http://le-gall.net/pierrick/blog/index.php/2007/11/12/105-talend-open-studio-aux-journees-perl-2007">
  <title>Talend Open Studio aux journées Perl 2007</title>
  <link>http://le-gall.net/pierrick/blog/index.php/2007/11/12/105-talend-open-studio-aux-journees-perl-2007</link>
  <dc:date>2007-11-12T09:42:42+01:00</dc:date>
  <dc:language>fr</dc:language>
  <dc:creator>Pierrick</dc:creator>
  <dc:subject>Talend</dc:subject>
  <description>Les journées Perl 2007 auront lieu à Lyon. Sur le campus où j'ai fait mes études d'ingénieur. Si la SNCF le veut bien, j'y serai pour écouter les autre présentateurs mais aussi pour co-présenter une conférence avec Richard, également développeur Perl chez Talend.


Devant une...</description>
  <content:encoded><![CDATA[ <p><img src="/pierrick/blog/images/post_105/fpw2007-logo.jpg" alt="Logo Journées Perl 2007" style="float:right; margin: 0 0 1em 1em;" /></p>


<p>Les journées Perl 2007 auront lieu à Lyon. Sur le campus où j'ai fait mes études d'ingénieur. Si la SNCF le veut bien, j'y serai pour écouter les autre présentateurs mais aussi pour co-présenter une conférence avec Richard, également développeur Perl chez Talend.</p>


<p>Devant une assemblée de développeurs Perl plus ou moins expérimentés, nous allons tenter de démontrer que dans certains cas, utiliser un générateur de code est plus avantageux que de coder directement le script. Notre objectif n'est pas de dire que TOS doit se substituer à tout développement spécifique en Perl mais bien de convaincre de l'intérêt à concevoir en 15 minutes un script qui prendrait plusieurs jours à coder à la main.</p>


<p>Pour faire cette démonstration, nous allons mettre sur notre job de la lecture XML, de l'aggregation, de l'écriture en base ainsi que d'autres petites surprises.</p>


<p>J'ajoute qu'afin d'attirer les foules, nous allons mettre en jeu un Ipod Nano 8GB qui sera gagné par tirage au sort. Ca fait au moins une bonne raison de venir :-)</p>

<ul>
<li><a href="http://conferences.mongueurs.net/fpw2007/" hreflang="fr">Site des journées Perl 2007</a></li>
</ul>]]></content:encoded>
</item>
<item rdf:about="http://le-gall.net/pierrick/blog/index.php/2007/10/22/104-debian-linux-as-a-microsoft-sql-server-client">
  <title>Debian Linux as a Microsoft SQL Server client</title>
  <link>http://le-gall.net/pierrick/blog/index.php/2007/10/22/104-debian-linux-as-a-microsoft-sql-server-client</link>
  <dc:date>2007-10-22T22:09:40+02:00</dc:date>
  <dc:language>en</dc:language>
  <dc:creator>Pierrick</dc:creator>
  <dc:subject>GNU/linux</dc:subject>
  <description> 


We're using Debian Etch (with GNU/Linux) as a server at Talend office. We need to reach a remote Microsoft SQL Server database. The first step is to perform a select query in the command line.


We need to install FreeTDS: FreeTDS is a set of libraries for Unix and Linux that allows your programs to natively talk to Microsoft SQL Server and Sybase databases.. We have to define an "interface" for the Microsoft SQL Server in the FreeTDS "interfaces" file. At the end of the line,w use sqsh, a command line client for Sybase and Microsoft SQL Server.</description>
  <content:encoded><![CDATA[<p><img src="/pierrick/blog/images/post_104/debian.png" alt="Debian logo" style="float:left; margin: 0 1em 1em 0;" /> <img src="/pierrick/blog/images/post_104/mssql.jpg" alt="Microsoft SQL Server" style="float:right; margin: 0 0 1em 1em;" /></p>


<p>We're using Debian Etch (with GNU/Linux) as a server at Talend office. We need to reach a remote Microsoft SQL Server database. The first step is to perform a select query in the command line.</p>


<p>We need to install FreeTDS: <q>FreeTDS is a set of libraries for Unix and Linux that allows your programs to natively talk to Microsoft SQL Server and Sybase databases.</q>. We have to define an "interface" for the Microsoft SQL Server in the FreeTDS "interfaces" file. At the end of the line,w use sqsh, a command line client for Sybase and Microsoft SQL Server.</p> <pre>[bash]
$ sudo apt-get install freetds-dev sqsh
$ vi /etc/freetds/freetds.conf
</pre>

<pre>
 # talend alias is bound to a MS SQL Server 2000
 [talend]
        host = talend-dbms
        port = 1433
        tds version = 8.0
</pre>

<pre>[bash]
$ sqsh -Uroot -P******* -Stalend
[...]
1&gt; select count(*) from sales;
2&gt; \go

 -----------
        1000

(1 row affected)
1&gt; quit
</pre>

<ul>
<li><a href="http://www.freetds.org" hreflang="en">FreeTDS</a></li>
<li><a href="http://www.sqsh.org" hreflang="en">SQSH</a></li>
</ul>]]></content:encoded>
</item>
<item rdf:about="http://le-gall.net/pierrick/blog/index.php/2007/09/07/103-ssh-key-authentication-and-batch-mode">
  <title>SSH, key authentication and batch mode</title>
  <link>http://le-gall.net/pierrick/blog/index.php/2007/09/07/103-ssh-key-authentication-and-batch-mode</link>
  <dc:date>2007-09-07T16:12:53+02:00</dc:date>
  <dc:language>en</dc:language>
  <dc:creator>Pierrick</dc:creator>
  <dc:subject>GNU/linux</dc:subject>
  <description>


A long time ago, I've tried to use connect to a SSH server with my private key in a batch mode (with a cron task). I didn't find the way to do it. Now I have. It is as simple as to have no passphrase on your private key. Less secure (but still much more secure than FTP connection) but makes SSH possible in cron task.</description>
  <content:encoded><![CDATA[<p><img src="/pierrick/blog/images/Openssh.png" alt="OpenSSH logo" style="float:right; margin: 0 0 1em 1em;" /></p>


<p>A long time ago, I've tried to use connect to a SSH server with my private key in a batch mode (with a cron task). I didn't find the way to do it. Now I have. It is as simple as to <strong>have no passphrase on your private key</strong>. Less secure (but still much more secure than FTP connection) but makes SSH possible in cron task.</p> <pre>[bash]
$ ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/pierrick/.ssh/id_dsa): /home/pierrick/.ssh/id_dsa2
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: /home/pierrick/.ssh/id_dsa2.pub.
Your identification has been saved in /home/pierrick/.ssh/id_dsa2.
Your public key has been saved in /home/pierrick/.ssh/id_dsa2.pub.
</pre>


<p>Add the content of /home/pierrick/.ssh/id_dsa2.pub into remote ~/.ssh/authorized_keys</p>

<pre>[bash]
$ export EDITOR=vi; crontab -e
* * * * * ssh -i /home/pierrick/.ssh/id_dsa2 remote_user@remote_server 'echo $(date) &gt;&gt; /tmp/pierrick.log'
</pre>


<p>And see that every second, the date is appended to the remote /tmp/pierrick.log</p>]]></content:encoded>
</item>
<item rdf:about="http://le-gall.net/pierrick/blog/index.php/2007/08/21/102-talend-220m1-and-perl-code-performances">
  <title>Talend 2.2.0M1 and Perl code performances</title>
  <link>http://le-gall.net/pierrick/blog/index.php/2007/08/21/102-talend-220m1-and-perl-code-performances</link>
  <dc:date>2007-08-21T18:06:31+02:00</dc:date>
  <dc:language>en</dc:language>
  <dc:creator>Pierrick</dc:creator>
  <dc:subject>Talend</dc:subject>
  <description>


Richard and I have both worked 2 weeks on a main improvement proposed by Richard.


.----------------------------------------------------.
| job        | TOS 2.1.1 | TOS 2.2.0M1 | improvement |
+------------+-----------+-------------+-------------+
| Scenario 2 |    20.8 s |      16.9 s |      18.8 % |
| Scenario 3 |    81.2 s |      30.4 s |      62.6 % |
'------------+-----------+-------------+-------------'



Scenario 3 details in Talendforge wiki
Scenario 2 details in Talendforge wiki
</description>
  <content:encoded><![CDATA[<p><img src="/pierrick/blog/images/logo-talend-fast.png" alt="Talend logo" style="float:right; margin: 0 0 1em 1em;" /></p>


<p>Richard and I have both worked 2 weeks on a main improvement proposed by Richard.</p>

<pre>
.----------------------------------------------------.
| job        | TOS 2.1.1 | TOS 2.2.0M1 | improvement |
+------------+-----------+-------------+-------------+
| Scenario 2 |    20.8 s |      16.9 s |      18.8 % |
| Scenario 3 |    81.2 s |      30.4 s |      62.6 % |
'------------+-----------+-------------+-------------'
</pre>

<ul>
<li><a href="http://www.talendforge.org/wiki/doku.php?id=performances:scenario_3" hreflang="en">Scenario 3 details</a> in Talendforge wiki</li>
<li><a href="http://www.talendforge.org/wiki/doku.php?id=performances:scenario_2" hreflang="en">Scenario 2 details</a> in Talendforge wiki</li>
</ul> <p>Each component works on a Perl array. This Perl array is the translation of a database row or a file line into what I call a Talend row: @row. In Talend 2.1.x and earlier, @row was copied from the input connection to the current component which transforms the @row and then copy it to the output connection. 2 copies for each component.</p>


<p><img src="/pierrick/blog/images/post_102/post_102-01.png" alt="simple job example" style="display:block; margin:0 auto;" /></p>


<p>In this job example, somewhere in the code, we had:</p>

<pre>[perl]
while (my @tMysqlInput_1 = $sth-&gt;fetchrow_array()) {
    # ...
    my @row3 = @tMysqlInput_1;
    my @tFilterRow_1 = @row3;
    # ...
    my @row4 = @tFilterRow_1;
    my @tLogRow_1 = @row4;
    # ...
}
</pre>


<p>The new way of doing things is:</p>

<pre>[perl]
while (my $tMysqlInput_1 = $sth-&gt;fetchrow_arrayref()) {
    # ...
    my $row3 = $tMysqlInput_1;
    my $tFilterRow_1 = $row3;
    # ...
    my $row4 = $tFilterRow_1;
    my $tLogRow_1 = $row4;
    # ...
}
</pre>


<p>Where $tMysqlInput_1 is an array reference. No data copy, only memory address copy.</p>


<p>In a job that has very few components (not many copies) and a small schema (@row will be very small), this is not a real problem. Of course, in a complex job with huge schema the improvement becomes very interesting.</p>

<ul>
<li><a href="http://talendforge.org" hreflang="en">Talendforge</a></li>
<li><a href="http://talend.com" hreflang="en">Talend</a></li>
<li><a href="http://www.talendforge.org/bugs/view.php?id=1588">Feature 1588 in Talendforge bugtracker</a></li>
</ul>]]></content:encoded>
</item>

</rdf:RDF>
