Let's suppose you first import the cvs repo into git, but then you realise you missed some authors.
Before being able to do a git cvsimport, you need a checkout of the module or cvs subdir that you want to turn into its own git repo.
For ease of use I defined CVSCMD as
cvs -z9 -d :pserver:my_cvs_id@cvs.server.com:/root_dirYou will need to replace the items written in italics according to you situation, more exactly, you need to define 'my_cvs_id', 'cvs.server.com' and 'root_dir'. If your acces method to the server is not pserver, you should change that accordingly. This information should be available from your project admin or pages.
Check out the desired module or even subdir of a module
$CVSCMD checkout -d localdirname MODULE/path/to/subdir
git cvsimport -A ../authors -m -z 600 -C ../new-git-repo -R
How to find out the commits which do need rewriting
The way to limit yourself only to the commits that had no cvs-git author and commit information on git-cvsimport time is to use a filter like this:
git log -E --author='^[^@]*$' --pretty=format:%hThis tells git log to print only the abbreviated hashes (%h) for the commits that have NO '@' sign in the 'Author:', which happens if no cvs user id to git author and email was provided in the authors file and git cvsimport time.
We will use this command's output to tell later git filter-branch which commits need rewriting. *
But before that...
How do we find if our authors file is complete?
For this task we'll use a slighly modified form of the previous command and some shell script magic.
git log -E --author='^[^@]*$' --pretty=format:%an | sort -u > all-leftout-cvs-authorsAnd now in all-leftout-cvs-authors we'll have a sorted list of all cvs id's which were not handled in the original git-cvsimport. In my case there are only 19 such ids:
$ wc -l all-leftout-cvs-authors
19 all-leftout-cvs-authors
Nice, that will be easy to fix. Now edit your all-leftout-cvs-authors file to add the relevant information in a format similar to this:
john = John van Code <john@code.temple.tld>In case you can't make a complete cvs-user-to-name-and-email map, you might want to use stubs of the following form in order to be able to easily identify later such commits, if you prefer (or you could let them unaltered at al ;-):
jimmy = Jimmy O'Document <jimmy@documenting.com>
cvsid=cvsid <cvsid@cvs.server.com>
How to actually do the filtering to fix history (using git-filter-branch and a script)
After this is done, we'll need just one more piece, the command to do the altering itself which reads as follow (note that my final authors file is called new-authors and that I placed this in a script in order to be able to easily run it without trying to escape all spaces and such madness):
You might wonder what's up with the commented git filter-branch line with the --remap-cvs option. This script will NOT work for you as long as you have the stock git-filter-branch script and keep the option --remap-cvs while not patching your git-filter-branch script (/usr/lib/git-core/git-filter-branch), but that option will provide a file with the mappings from the old to the new commit ids. If you want that function, too, you'll want to apply this patch to git-filter-branch:[ "$authors_file" ] || export authors_file=$HOME/new-authors
#git filter-branch -f --remap-cvs --env-filter '
git filter-branch -f --env-filter '
get_name () {
grep "^$1=" "$authors_file" | sed "s/^.*=\(.*\)\ .*$/\1/"
}
get_email () {
grep "^$1=" "$authors_file" | sed "s/^.*\ <\(.*\)>$/\1/"
}
if grep -q "^$GIT_COMMITTER_NAME" "$authors_file" ; then
GIT_AUTHOR_NAME=$(get_name "$GIT_COMMITTER_NAME") &&
GIT_AUTHOR_EMAIL=$(get_email "$GIT_COMMITTER_NAME") &&
GIT_COMMITTER_NAME="$GIT_AUTHOR_NAME" &&
GIT_COMMITTER_EMAIL="$GIT_AUTHOR_EMAIL" &&
export GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL &&
export GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL
fi
' -- --all
diff --git a/git-filter-branch b/git-filter-branch
old mode 100644
new mode 100755
index ae602e3..d1f7ef6
--- a/git-filter-branch
+++ b/git-filter-branch
@@ -149,6 +149,11 @@ do
prune_empty=t
continue
;;
+ --remap-cvs)
+ shift
+ remap_cvs=t
+ continue
+ ;;
-*)
;;
*)
@@ -368,6 +373,33 @@ while read commit parents; do
die "could not write rewritten commit"
done <../revs
+# Rewrite the cvs-revisions file, if requested and the file exists
+
+ORIG_CVS_REVS_FILE="${GIT_DIR}/cvs-revisions"
+if [ -f "$ORIG_CVS_REVS_FILE" ]; then
+ if [ "$remap_cvs" ]; then
+ printf "CVS remapping requested\n"
+
+ CVS_REVS_FILE="$tempdir/cvs-revisions"
+ cp "$ORIG_CVS_REVS_FILE" "$CVS_REVS_FILE"
+ printf "\nFound $ORIG_CVS_REVS_FILE; will copy and alter it as $CVS_REVS_FILE\n"
+ cvs_remap__commit_count=0
+ newcommits="$(ls ../map/ | wc -l)"
+ for commit in ../map/* ; do
+ cvs_remap__commit_count=$(($cvs_remap__commit_count+1))
+ printf "\rRemap CVS commit $commit ($cvs_remap__commit_count/$newcommits)"
+
+ oldsha1="$(basename $commit)"
+ read newsha1 < $commit
+ sed -i "s@$oldsha1\$@$newsha1@" "$CVS_REVS_FILE"
+ done
+ else
+ warn "\nNo CVS remapping requested, but cvs-revisions file found. All CVS mappings will be lost.\n"
+ fi
+elif [ "$remap_cvs" ]; then
+ warn "\nWARNING: CVS remap was ignored, since no original cvs-revisions file was found\n"
+fi
+
# If we are filtering for paths, as in the case of a subdirectory
# filter, it is possible that a specified head is not in the set of
# rewritten commits, because it was pruned by the revision walker.
@@ -491,6 +523,11 @@ if [ "$filter_tag_name" ]; then
done
fi
+if [ "$remap_cvs" -a -f "$CVS_REVS_FILE" ]; then
+ mv "$ORIG_CVS_REVS_FILE" "$ORIG_CVS_REVS_FILE.original"
+ cp "$CVS_REVS_FILE" "$ORIG_CVS_REVS_FILE"
+fi
+
cd ../..
rm -rf "$tempdir"
Then, after running this script, let's call it filter, you should have a brand new git repo with the appropriate authors and their emails set.
P.S.: I have started writing this post some time ago but stopped just before the last part, the one with the filter script. I realise I might be missing something in the explanation, but if you have problems, please comment so I can help you fixing them.
P.P.S.: * I realised in the filter script at some point I wanted to do something like:
But I think I remember that $R didn't work on the whole history, but only on that revision, or some other weird of that sort. I know I ended up not filtering explicitly those revisions, but the entire history. I hope this helps.for R in $(git log -E --author='^[^@]*$' --pretty=format:%H | head -n 2) ; do
[the same git filter branch command above but ending in ...]
' $R
done
4 comments:
1/ How to rewrite the commit log itself ? In my example, an old CVS log template asked people to prefix with "Not a Bug" as first line. You can imagine how little readable the git short log is now ...
2/ You might want to use cvs2git instead, it converts to git (one shot, not done to follow daily an active cvs repo) way better than git-cvsimport
Your access method should NEVER be “pserver”, you know.
//mirabilos (maintainer of cvs)
@Simon P.: 1/ Judging by the man page of git-filter-branch, you should use --msg-filter <command>.
The man page says for this option:
This is the filter for rewriting the commit messages. The argument
is evaluated in the shell with the original commit message on
standard input; its standard output is used as the new commit
message.
So from what I understand, the command could simply be a something like "grep -v '^Not a Bug$'" which should remove all lines containing that string and ONLY that string. YMMV with whitespaces, but I think is doable.
2/ Not if you don't have access to the CVS server or a CVS archive.
@Anonymous: Not my choice. But why is that? Security reasons? 'Cause if that's the case, it's not relevant.
Please note that the filter script is not ran from the current directory, so your $authors_file must be provided with an absolute path.
Post a Comment