Showing posts with label patch. Show all posts
Showing posts with label patch. Show all posts

Thursday, 16 February 2012

HOWTO: Git - reauthor/fix author and committer email and author name after a git cvsimport

You might find yourself at some moment when your git repository imported from CVS does not contain all the correct names and email addresses of the commits which were once in CVS but are now part of your project history in your git repo. Or you might do a cvsimport which missed a few authors.

Let's suppose you first import the cvs repo into git, but then you realise you missed some authors.

Before being able to do a git cvsimport, you need a checkout of the module or cvs subdir that you want to turn into its own git repo.

For ease of use I defined CVSCMD as
cvs -z9 -d :pserver:my_cvs_id@cvs.server.com:/root_dir
You will need to replace the items written in italics according to you situation, more exactly, you need to define 'my_cvs_id', 'cvs.server.com' and 'root_dir'. If your acces method to the server is not pserver, you should change that accordingly. This information should be available from your project admin or pages.


Check out the desired module or even subdir of a module

$CVSCMD checkout -d localdirname MODULE/path/to/subdir

git cvsimport -A ../authors -m -z 600 -C ../new-git-repo -R

How to find out the commits which do need rewriting

The way to limit yourself only to the commits that had no cvs-git author and commit information on git-cvsimport time is to use a filter like this:
git log -E --author='^[^@]*$' --pretty=format:%h
This tells git log to print only the abbreviated hashes (%h) for the commits that have NO '@' sign in the 'Author:', which happens if no cvs user id to git author and email was provided in the authors file and git cvsimport time.

We will use this command's output to tell later git filter-branch which commits need rewriting. *

But before that...

How do we find if our authors file is complete?

For this task we'll use a slighly modified form of the previous command and some shell script magic.
git log -E --author='^[^@]*$' --pretty=format:%an | sort -u > all-leftout-cvs-authors
And now in all-leftout-cvs-authors we'll have a sorted list of all cvs id's which were not handled in the original git-cvsimport. In my case there are only 19 such ids:
$ wc -l all-leftout-cvs-authors
19 all-leftout-cvs-authors

Nice, that will be easy to fix. Now edit your all-leftout-cvs-authors file to add the relevant information in a format similar to this:
john = John van Code <john@code.temple.tld>
jimmy = Jimmy O'Document <jimmy@documenting.com>
In case you can't make a complete cvs-user-to-name-and-email map, you might want to use stubs of the following form in order to be able to easily identify later such commits, if you prefer (or you could let them unaltered at al ;-):
cvsid=cvsid <cvsid@cvs.server.com>

How to actually do the filtering to fix history (using git-filter-branch and a script)

After this is done, we'll need just one more piece, the command to do the altering itself which reads as follow (note that my final authors file is called new-authors and that I placed this in a script in order to be able to easily run it without trying to escape all spaces and such madness):

[ "$authors_file" ] || export authors_file=$HOME/new-authors

#git filter-branch -f --remap-cvs --env-filter '
git filter-branch -f --env-filter '

get_name () {
grep "^$1=" "$authors_file" | sed "s/^.*=\(.*\)\ .*$/\1/"
}

get_email () {
grep "^$1=" "$authors_file" | sed "s/^.*\ <\(.*\)>$/\1/"
}

if grep -q "^$GIT_COMMITTER_NAME" "$authors_file" ; then
GIT_AUTHOR_NAME=$(get_name "$GIT_COMMITTER_NAME") &&
GIT_AUTHOR_EMAIL=$(get_email "$GIT_COMMITTER_NAME") &&
GIT_COMMITTER_NAME="$GIT_AUTHOR_NAME" &&
GIT_COMMITTER_EMAIL="$GIT_AUTHOR_EMAIL" &&
export GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL &&
export GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL
fi
' -- --all
You might wonder what's up with the commented git filter-branch line with the --remap-cvs option. This script will NOT work for you as long as you have the stock git-filter-branch script and keep the option --remap-cvs while not patching your git-filter-branch script (/usr/lib/git-core/git-filter-branch), but that option will provide a file with the mappings from the old to the new commit ids. If you want that function, too, you'll want to apply this patch to git-filter-branch:

diff --git a/git-filter-branch b/git-filter-branch
old mode 100644
new mode 100755
index ae602e3..d1f7ef6
--- a/git-filter-branch
+++ b/git-filter-branch
@@ -149,6 +149,11 @@ do
prune_empty=t
continue
;;
+ --remap-cvs)
+ shift
+ remap_cvs=t
+ continue
+ ;;
-*)
;;
*)
@@ -368,6 +373,33 @@ while read commit parents; do
die "could not write rewritten commit"
done <../revs

+# Rewrite the cvs-revisions file, if requested and the file exists
+
+ORIG_CVS_REVS_FILE="${GIT_DIR}/cvs-revisions"
+if [ -f "$ORIG_CVS_REVS_FILE" ]; then
+ if [ "$remap_cvs" ]; then
+ printf "CVS remapping requested\n"
+
+ CVS_REVS_FILE="$tempdir/cvs-revisions"
+ cp "$ORIG_CVS_REVS_FILE" "$CVS_REVS_FILE"
+ printf "\nFound $ORIG_CVS_REVS_FILE; will copy and alter it as $CVS_REVS_FILE\n"
+ cvs_remap__commit_count=0
+ newcommits="$(ls ../map/ | wc -l)"
+ for commit in ../map/* ; do
+ cvs_remap__commit_count=$(($cvs_remap__commit_count+1))
+ printf "\rRemap CVS commit $commit ($cvs_remap__commit_count/$newcommits)"
+
+ oldsha1="$(basename $commit)"
+ read newsha1 < $commit
+ sed -i "s@$oldsha1\$@$newsha1@" "$CVS_REVS_FILE"
+ done
+ else
+ warn "\nNo CVS remapping requested, but cvs-revisions file found. All CVS mappings will be lost.\n"
+ fi
+elif [ "$remap_cvs" ]; then
+ warn "\nWARNING: CVS remap was ignored, since no original cvs-revisions file was found\n"
+fi
+
# If we are filtering for paths, as in the case of a subdirectory
# filter, it is possible that a specified head is not in the set of
# rewritten commits, because it was pruned by the revision walker.
@@ -491,6 +523,11 @@ if [ "$filter_tag_name" ]; then
done
fi

+if [ "$remap_cvs" -a -f "$CVS_REVS_FILE" ]; then
+ mv "$ORIG_CVS_REVS_FILE" "$ORIG_CVS_REVS_FILE.original"
+ cp "$CVS_REVS_FILE" "$ORIG_CVS_REVS_FILE"
+fi
+
cd ../..
rm -rf "$tempdir"


Then, after running this script, let's call it filter, you should have a brand new git repo with the appropriate authors and their emails set.


P.S.: I have started writing this post some time ago but stopped just before the last part, the one with the filter script. I realise I might be missing something in the explanation, but if you have problems, please comment so I can help you fixing them.

P.P.S.: * I realised in the filter script at some point I wanted to do something like:
for R in $(git log -E --author='^[^@]*$' --pretty=format:%H | head -n 2) ; do
[the same git filter branch command above but ending in ...]
' $R
done
But I think I remember that $R didn't work on the whole history, but only on that revision, or some other weird of that sort. I know I ended up not filtering explicitly those revisions, but the entire history. I hope this helps.

Thursday, 19 August 2010

OpenLDAP and Active Directory - authentication issues

At some point I had to debug an issue with some code I worked on in the past. It was using OpenLDAP to connect to an Active Directory server to get some information. At some point I got a report that the authentication failed with an ugly error.

Username is stored. Authenticating as domain\user.
Enter password:
ldap_search_ext: Operations error (1)
additional info: 00000000: LdapErr: DSID-0C090627, comment: In order to perform this operation a successful bind must be completed on the connection., data 0, vece

I looked for the meaning of that error and after some staring at different possible explanations, I ended up on a page dealing with some python-ldap code. My code was in C, but it was clear it was the underlying library issuing the error message, so I looked at what it said:
Normally, this error indicates that you're attempting to bind anonymously, which Active Directory (sensibly) doesn't allow by default. We were supplying credentials to bind, though, and changing the base DN on the search to a sub-OU was all that was necessary to get the search to work. It turns out that python-ldap was binding anonymously, so the error was only sort of a red herring.
This was really strange because the authentication was actually done, as it was obvious from the messages and the traffic (analyzed with wireshark). Later in that post there were some hints that indicated that parts of the data might be stored on another server and the suggested fix was to instruct the library not to try chase referrals.
ldap.set_option(ldap.OPT_REFERRALS, 0)
I tried to see what was going on with our server using a ldapsearch command and at the end of the output there were some referrals specified.

[..]
# search reference
# refldap://ForestDnsZones.domeniu.ro/DC=ForestDnsZones,DC=domeniu,DC=ro

# search reference
# refldap://DomainDnsZones.domeniu.ro/DC=DomainDnsZones,DC=domeniu,DC=ro

# search reference
# refldap://domeniu.ro/CN=Configuration,DC=domeniu,DC=ro

# search result

# numResponses: 5
# numEntries: 1
# numReferences: 3
Bingo! So, after looking at the man page, I added this bit of code:

+ /* do not chase referrals */
+ if (ldap_set_option (ld,LDAP_OPT_REFERRALS,LDAP_OPT_OFF)!=LDAP_SUCCESS) {
+ ldap_perror(ld,"ldap_set_option");
+ return NULL;
+ }
+

And then it worked. I hope this helps others that might be in the same situation as I was.

Saturday, 11 July 2009

RFC 2822: mom+dad@some.fqdn.is.ok; likewise+mom.is.a.*@is.ok.too

Dear form creators,

Please stop trying to be smart asses and say an address such as mom+dad@some.fqdn.ok is not OK. As a matter of fact IT IS!


If you want to be shocked, find out that even *@some.other.fqdn.ok is ALSO OK!

And If you really want to be correct and validate addresses against some regexp, there are only some really LOOOOOONG ones which should make it clear that your itsy-bitsy regexp which pretends to match valid email addresses, IS WRONG!


Correct regexps/codes that validate email addresses look something like this or like this. So please stop te nonsense.

Reasonable email addresses can and do contain . and + along many other characters in the local part (i.e. the part before the @).

PLEASE GET THIS THROUGH YOUR THICK SKULLS: THE ONLY RELIABLE WAY TO VALIDATE THE VALIDITY OF AN EMAIL ADDRESS IS TO TRY TO SEND MAIL TO IT.

Sunday, 2 March 2008

pbuilder - /etc/pbuilderrc no longer a conffile (333294)

I have just finished a completely functional patch for pbuilder's bug #333294. I started this during debconf7, to be more precise, just after Junichi's talk about pbuilder. This patch makes pbuilder smarter smarter about the information it places in the MIRRORSITE variable of the /etc/pbuilderrc.

And now that file is no longer a conffile[1], and it asks some questions via debconf. And since debconf is translatable, there is a Romanian translation, too.

I have published my changes in my pbuilder git repo on alioth.

Anyone interested in the changes can pull from that repo:

git://git.debian.org/git/users/eddyp-guest/pbuilder.git
or
http://git.debian.org/git/users/eddyp-guest/pbuilder.git


[1] this was the main reason the patch wasn't good enough, since modifying a conffile is RC.

Wednesday, 14 November 2007

Lesson relearned: when Linux networking weirdess occurs...

My relearned lesson for the day: when Linux networking weirdness occurs in a NAT environment, remember to try MTU clamping.

Thanks to the comments by Justin and Sesse, I was fast-tracked to the core of the problems I have been experiencing since Thursday, MTU issues. What's worse (from my pov) is that I have encountered this issue before with the provider I had in Timișoara, but, since that ISP was using PPPoE and my current ISP in Bucharest doesn't, I never really made the connection. I even had a commented out iptables rule for MTU clamping in my firewall script.

The rule I am talking about looks like this:

iptables -t mangle -A POSTROUTING -p tcp --tcp-flags SYN,RST SYN -o $EXT_IF -j TCPMSS --clamp-mss-to-pmtu

or like the one I have been using (seems more logical to me):

iptables -I FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu


Note that this is not a fix, but a workaround and the real problem is over-zealous admins or weird setups[1] which think that banning TCP fragmentation (or the entire ICMP traffic) is a way to secure networks.


Once again, thanks to everybody who read and/or commented about my issue.

[1] Sesse told me that in his case there was a transparent proxy involved when he exeprienced MTU weirdness.

Thursday, 28 June 2007

My small contribution to pbuilder...

... is a patch that allows a decent default mirror selection via some heuristic and a debconf question.
This is bug number 333294.

The patch is part of the git repo at http://users.alioth.debian.org/~eddyp-guest/git/pbuilder/.git. Since it uses debconf it has po-debconf support and a Romanian translation. The language should be in line with the one used in other templates, but I don't mind if anyone from Project Smith suggests some changes.