Wiki

Aller au contenu | Aller au menu | Aller à la recherche

This page is not available in requested lang, reverting to default lang
 

Goals

This script manipulate a svnadmin dump file to modify history. In particular it allows to track rename, copy and create coherent filtered dump file.

Specification

The main goal of this script if to help to explode a subversion repository which contains a lot of projects. In order to keep history of every projects through renaming/inter project copy, some specific step should be taken.

For example, with svndumpfilter you cannot take into consideration the fact that you have change the project name (and you will loose the project name change + every revision before this point). In order to prevent this, the rename of the project should removed and every former revision should take into the fact that this rename has been removed.

You should also consider the fact that file which have been copied from a project to another, need to have theirs revision kept to be able to keep history of the target file.

Step that should be considered:

  • remove rename of trunk/ and tags/
  • follow history to keep commit foreach revision for a file belonging a directory

This command is driven by command line options. In the case that input file cannot be randomly accessed, some operations will required read/write access to a the filesystem with enough space to store a full copy of the input dump!

Command line options:

  • -o|--output file: output to file (otherwise output to stdout)
  • -i|--input file: input from file (otherwise input from stdin)
  • -t|--temp dir: temporary directory where to store date (otherwise tmp dir)
  • --include path@revision: include all path which are bound to path@revision
  • --exclude path@revision: exclude all path which are bound to path@revision
  • --reparent path@revision new-path: reparent everything bound to path@revision, remove rename
  • --drop-empty-rev: remove empty revision
  • --extract-project path@revision: do all operation to extract a project from a repository of projects.

The extract project operation is a composition of the following action:

  • --include path@revision
  • --drop-empty-revision
  • --reparent path@revision ""

Details of operations

Design

The root of everything is the History. History is a graph of Node. Each node is a path, a begin revision and an optional end revision. History describes relationship between each node: parent/children (a/b is the parent of a/b/c), copy_from/copy_to (SVN copy). It also provides some basic operation to iter through the graph and mark some special node (mark_bound/mark_children). The most common way to operate this graph is as following:

  • clone an history
  • mark some node
  • filter the history based on marked node

A basic operation is a filter. A filter process a svn dump record stream. It changes the record, skip it or add new one. A filter should provide the modified history that should represents the stream of record that will go through it. Computing history that will represent the output stream of record should be enough to compute every action to take on the stream.

Filter are made to be connected. At level n, the filter use the history and record stream of level (n-1). Level 0 is always "Load" which reads a svndump file. Last level should be "Save" which create a new svndump file.

Here is a short list of available filter:

  • Load/Save: load/save dump file
  • Include: include list of path@revision, exclude everything else
  • Exclude: exclude list of path@revision, include everything else
  • DropEmptyRev: remove "Revision" record which has no record
  • Reparent: transform path@revision in new_path

Load/Save

This two filters are quiet straight forward:

  • Load create history of input file,
  • Save doesn't change history
  • Load read record from file
  • Save write record to file

Include/Exclude

This commands works just as svndumpfilter, but consider file copy_from/copy_to and parent/children relation.

Revision X1X2X3
BeforeA trunk/B A CD trunk/B AM B/trunkM B/trunk
svndump --exclude B/trunk@X4A trunk/BD trunk/B AM B/trunk M B/trunk
svndump --exclude trunk/B@X1A trunk/BD trunk/B AM B/trunkM B/trunk

History is computed by :

  • marking node concerned by Exclude/Include (mark_bound(no_parent)/mark_bound )
  • filter node marked from history
  • record is skipped if it doesn't exist in history (path and revision are considered to check liveness)

Reparent

Reparent is the most complex operation of this library. It process in many steps to compute history (but once the new history is computed, the stream processing is simple).

  • compute new_path foreach concerned node:
    • width-first iteration
    • copy relation keep new_path
    • parent relation add the relative path of children to new_path
    • new_path is computed only if it doesn't have yet been set to a node
    • width-first iteration can be improved with a rename detector
  • rebuild history by setting new_path to each node + add parent if required
  • merge node that can be merge (i.e. rename removal)

Comparison with svndumpfilter

svndumpfilter has been designed more as a proof of concept for manipulating subversion dump file. It has also been designed to be able to remove (obliterate) certain file of a repository. It does this job quiet well given the fact that the file has never been moved/recreated.

But if you want to keep version and be able to extract projects from a subversion repository you will need something that is greater than this simple tools.

This is the role -- and the reason to be -- of svndump-utils. It tries to answer all problem that svndumpfilter has.

Svndump-utils tries to be smarter than svndumpfilter by taking into consideration path, revision and graph algorithm on history.

Status

In progress

You can get a copy of the current work, using darcs:

darcs get http://le-gall.net/sylvain+violaine/scm/raw/darcs/svndump-utils/