Thursday 6 November 2008

Coordinating localization via git

Note: this is quite long and might not be interesting for people not involved with /in either of: debian i18n, debian l10n, git, shell scripting.


For some time now I have been the de-facto coordinator of the Romanian localization team. During this time I was faced multiple times with problems related to motivation of the team members, setting goals for a release, coordination of the changes inside the team, integration of new translators in the team, loosing team members, my lack of time on some occasions and other problems.

I thought a lot about how to improve on those points and I was never satisfied with the answers I got. What I think was the worst problem in the team was the inability to set clear goals, especially during lenny's development cycle.

During sarge's and etch's development cycles we were in an infancy state and setting as the goal to have a fully translated installation process was really enough to keep people motivated. For sarge we missed by a small margin, but the translations were of poor quality, while for etch we worked more on improving the translations.


But during lenny it was bad, really bad. My free time had been shrinking for a while, starting with etch's release, and we were unable to set clear goal, since for etch we managed to have the installation process fully translated and a few other translations. There was no way for us to reach 100% translations during lenny's development cycle, so setting that as a goal was really unrealistic. Percents by themselves don't mean anything for people, and as long as there's no substance to those numbers, there's no motivation to reach for one arbitrary percent.

I tried to set as a goal translating the packages installed by default in a new installation, but that hit the eternal question How do we know easily which are those packages?. This remained sometimes unanswered or got an unsatisfying answer. Also, there was a goal to have correct diacritics for Romanian in lenny, to have aspell-ro that uses the correct diacritics.

I even got to a point that I, myself, lost my motivation and set myself a personal goal of overrunning the level the translations that the language just above Romanian had in the po-debconf l10n statistics (ranking between languages). This was nice way for me to keep myself motivated, but I had my reserves in making this motivation public out of fear of being misinterpreted, because, by a strange coincidence, that next language was Hungarian, and in Romania's history there was some friction between Hungaria and Romania, while there are still some tensions with the Hungarian minority in Romania, in areas where they represent the local majority.

I managed to reach my personal goal, but this wasn't addressing the big picture.



So, sometime around the start of this year I started thinking about ways to coordinate the Romanian localization team in order to have:
  • a clear goal at any given time
  • a way to always be able to change that goal as we go
  • a way to sync with eventual calls for translations, or the current sid translations
  • stats immediately available
  • automated checks for spelling, correct diacritics usage and other checks that might be useful (e.g. translation completeness)
  • an easy way to assign somebody else as a language coordinator (I would appreciate some help or I might even consider stepping down)
  • easy integration of new translators (by providing immediate answer to the question "What can I do to help with the translations?")
For short, a tool that would allow the team to work more efficiently while having the possibility to set clear goals in order to keep people motivated.



So after some pondering, I thought that creating a repository with the translations and the helper tools that would do the funky sync, checks, stats would be the best way to do that. So I started hacking on that somewhere around July-August and I published the result, but without much publicity, since it is still incomplete.

Some of the technical details are still in a haze, but I have a general idea and I got some basic functionality.


Today I decided that I should announce this semi-officially though my blog, maybe I get some input, ideas, or even contributions (I really should write a TODO).




I give you the Debian L10n Romanian coordination repository.

This is a git repository that has some tools to facilitate translation coordination and the translations that are current in the distro for the team.


Can be cloned with:

git clone git://git.debian.org/git/users/eddyp-guest/debian-ro-repo.git

or, if you're behind a restrictive firewall:

git clone http://git.debian.org/git/users/eddyp-guest/debian-ro-repo.git



Currently the work flow for updating a translation is as follows:
  1. source _bin/polibs (. _bin/polibs)
  2. cd foo
  3. po_refresh
  4. complete the translation
  5. po_rearrange "ro.po"
  6. git add "ro.po" && git commit -m "updated translation for foo"
  7. send the translation ("git format-patch origin" and send the patches by mail, or, alternatively, just "git push")


Features:
  • provides a po_refresh function that can import material directly from http://i18n.debian.net/material, but can also allow manual imports (template.pot from a call from translation)
    • for a new translation: source _bin/polibs (. _bin/polibs), make a directory with the name of the source package, cd into it, and run po_refresh
  • po_rearrange - beautify and unify the layout of PO files (facilitating compact and sane diff-ing for PO files)
  • po_merge uses compendium, if present
Planned features:
  • sync translations/templates from package VCS-es (Vcs-* headers and debcheckout should be the means to the end)
  • po_rearrange should be called as a pre-commit hook; should either reject the commit if the po file was not rearranged, or automatically rearranged before the commit
  • generate stats
  • add commands for "what's outdated", "what needs review", "submit translation", and maybe "reserve translation for offline use"
  • conflict merges should be done via po_merge (.gitattributes is key here)
  • support other file types (?) - does this make any sense?
  • periodic and automated sync with sid for all translations

Problems:
  • security - running tests automatically from files within the repo doesn't seem too wise, but looks like the only way to get automated testing on any translator machine; maybe keeping the code in a submodule might address this issue?
  • entry level translators still have a hard time - UI sucks now; there should be a wrapper command that should use the library functions and should provide a useful help
  • is git too difficult ? - git backend usage maybe should be cloaked?
  • still in development/alpha stage - I still haven't figured some of the issues
  • central repo or really distributed - should there be a central git repo where the coordinator(s) do the pushes? it seems the central repo with a small pushers team for new translators (which can't commit directly) might actually facilitate interactions between experienced and new translators to instruct/bring up to speed the rookies

I was hoping that the release notes for lenny would facilitate from this infrastructure, but unfortunately I was lately in a really inactive period wrt Debian.



Questions, suggestions, ideas are welcome.

No comments: