README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56

Note; this is raw, and likes to dump things in cwd- logs namely.

To run it, first get yourself a copy of gentoo-x86 CVS; place that in
cvs-repo in this directory- this can be a partial copy of CVS, or full-
that said, it needs to conform to thus:

$(pwd)/cvs-repo/CVSROOT
$(pwd)/cvs-repo/gentoo-x86/*

From there, get yourself a userinfo.xml, and place it into $(pwd).  To get a userinfo.xml ,
log into a gentoo.org host and run perl_ldap -U ; it'll create userinfo.xml in cwd.

From there, ./script.sh is your main point of entry; it'll process that,
going parallel, using $(pwd)/output for temp space- it's suggested that
be tmpfs (much like cvs-repo).

As each category/directory/component is finished, a git repo is generated,
some basic blob rewrites are done ($Header related).  Two core directories
will exist in each; cvs2svn-tmp (which holds the fast-import data w/in), and
git; a recomposed bare git repository of that slice of gentoo-x86 history.

Once that category/component is finished, it's moved into $(pwd)/final , and
another component is started; script.sh currently will run at grep -c MHz /proc/cpuinfo parallelism.

Upon finishing the cvs->git conversion, the content needs to be reintegrated.

create-git.sh exists for this.  It looks in $(pwd)/final, and creates the new
repo in $(pwd)/git/work; this is a bare repo.

Roughly, it does this via generating an empty repo, setting up alternates into slice of
history, setting up refs/heads/source/* space for each slice of history,
then forcing a date-ordered fast-export- manipulating the resultant stream
(stripping resets, rewriting the commit field to point to refs/heads/master, rewriting
commit messages to convert some basic structured information into git footers), and
spitting that out.

It creates two dumps of intermediate data as it's going; export-stream-raw , and
export-stream-rewritten; the first is git fast-export raw output, the second is
the rewritten stream.  Each are ~490MB (they're small due to the fact that
since we're exporting/importing w/in the same repo, we don't have to send blobs
through the stream- they can be directly referenced in the command stream).

Now that that is done, we have a recomposed history in refs/heads/master.
From there, we do prun'ing/gc'ing, and force a git repack -Adf.

That repo is ready to go at that point.


For a full production run, ./script.sh --full # is the fastest form (basically allows the final linearization,
deduplication, etc, to run in parallel as work is available.  not a huge gain, but shaves a minute off or so).

Finally, certain conversion have occured commit message wise; git footers now exist,
standardizing the package-manager/signingkey/repoman footers.

In particularly, the signing 'key' of 'ultrabug' was removed (user from far in the past).
Additionally, 'wiliamh@gentoo.org' was converted to 0x30C46538 (the actual key; done to standardize it).