Saturday 1 March 2008

TDebs - arch is not important, is it?

Updates:
  1. the arch issue raised by Frans was not the object of my post. Also, not all points are invalid/reiterated, just the ones iterated here. Thanks Frans.
  2. add a note that there is an image that proves that .mo endianness is irelevant - usefull for readers that might miss that



While looking at Neil Williams' talk(hi-res/lo-res) from FOSDEM about cross building and tdebs I was STUNNED about the many (not all, thanks Frans) points that were reiterated during that talk when there were already answered.

Well, I guess you have to repeat yourself a lot to get your ideas through to the other side. So, let's do that.

All (except the first) of the following have been already answered one way or another on the TDebs wiki page.

Q: .mo files are arch independent or not? Neil asks again on his blog.
A: It doesn't matter, does it?


0 eddy@bounty ~ $ file /usr/share/locale/ro/LC_MESSAGES/wormux.mo
/usr/share/locale/ro/LC_MESSAGES/wormux.mo: GNU message catalog (big endian), revision 0, 228 messages
0 eddy@bounty ~ $ uname -a
Linux bounty 2.6.24.2-bounty #1 SMP Wed Feb 13 02:34:09 EET 2008 x86_64 GNU/Linux
(there is a picture here which proves my point, if you can't see it, click here).



Q: The packages file will explode
A: No, that's addressed.

Q: How do you select languages?
A: Simply use what we have and expand on that.

Q: Do tdebs support anything else than gettext?
A: Yes.

7 comments:

Anonymous said...

.mo are arch dependent. This is not an easy point to solve. I was the one of the guy talking about this at FOSDEM:
* for slow hardware use big endian on little endian computer could be a performance issue
* the binary file format aligned begin of string at specific offset in the file.

You must kept in mind that most of the time the .mo will be mmapped and its structure used directly. If you build on an arch that doesn't have the same endian/memory alignement, the result is not guaranteed to work (and surely not at full speed).

eddyp said...

> * for slow hardware use big endian on
> little endian computer could be a
> performance issue

That doesn't warrant a reason for arch any. It rather aks for some kind of bite swap in the tdeb postinst.

... Or the build of the arch all tdeb to be done on the slowest arch (arm*).

> * the binary file format aligned begin
> of string at specific offset in the file.

That could be a problem, but is that really what happens? I mean, really, does the gettext code work that way?

> most of the time the .mo will be
> mmapped and its structure used directly

Sorry, but that sounds really as if gettext decides from time to time to change the way it works...

Anonymous said...

Eddy, you missed the point that these TDebs are being created within Emdebian because Debian does not have support yet and Emdebian needs TDebs now. Therefore the Emdebian TDeb support is customised to cross-building support and embedded usage. This means that the .mo file will be commonly built on the wrong architecture, it will have the wrong endianness and it will be installed on a low resource embedded device that simply cannot waste time converting every locale file either at runtime or during the postinst. The .mo file needs to be the correct structure in the archive. All it needs is for TDebs to be linked into the normal autobuilder network.
Architecture of TDebs is important if Debian ever wants to be The Universal OS.
The talk did cover that em_installtdeb is for the Emdebian version of TDebs that has already been implemented and that these are expected to be compatible with Debian TDebs but will not contain translated manpages etc. so they will be built for Emdebian from Debian sources.

Anonymous said...

(Your blog does not appear to be translated and I cannot understand the errors so I have to post anonymously but it's Neil, honest.

Anonymous said...

I have spent some hours reading code of gettext to write my own gettext library (in OCaml).

It can works without problem on all arches even if built on a different arches. You can also fall back to the kind of method i was using (read and compile the .mo file into memory, rather than using embedded hashtable of the .mo file).

The only problem is that gettext is really tuned to work better when everything has been built specifically for a single arch.

Using the other way is just like compiling .po -> .mo every time you load a .mo file. This is a huge waste of time for small device.

eddyp said...

> It can works without problem on all
> arches even if built on a different
> arches.

OK, that's less grave, so we're one step farther down the road. :-)

> The only problem is that gettext is
> really tuned to work better when
> everything has been built specifically
> for a single arch.

OK, then building from the start for the slowest arch is a good way to improve performance.

> Using the other way is just like
> compiling .po -> .mo every time you
> load a .mo file. This is a huge waste
> of time for small device.

Yet again, building for the slowest arch or doing the rearangements in postinst are both viable solutions.

And doing that in postinst is not an issue, since you only do that once for each package, and in most cases that is expected to be a more time consuming task, so rearranging according to current arch is a minuscule price to pay to optimize for the most common case - loading the mo file.

Anonymous said...

Building for the slowest arch is not important, if you reorganize everything at end.

The real good idea (TM) is to ship .po and build .mo in the postinst.

Shipping already built .mo and rearrange it on the fly is as much work as recompiling the .po...

And indeed, this way you get a "Arch: all" package -- as far as the upstream .po is "Arch: all".

This is not what is done actually in tdeb, as far as i have understood it.