Rambling around foo: nslu2

Showing posts with label nslu2. Show all posts

Wednesday, 20 May 2015

Linksys NSLU2 adventures into the NetBSD land passed through JTAG highlands - part 2 - RedBoot reverse engineering and APEX hacking

(continuation of Linksys NSLU2 adventures into the NetBSD land passed through JTAG highlands - part 1; meanwhile, my article was mentioned briefly in BSDNow Episode 89 - Exclusive Disjunction around minute 36:25)

Choosing to call RedBoot from a hacked Apex

As I was saying in my previous post, in order to be able to automate the booting of the NetBSD image via TFTP, I opted for using a 2nd stage bootloader (planning to flash it in the NSLU2 instead of a Linux kernel), and since Debian was already using Apex, I chose Apex, too.

The first problem I found was that the networking support in Apex was relying on an old version of the Intel NPE library which I couldn't find on Intel's site. The new version was incompatible/not building with the old build wrapper in Apex, so I was faced with 3 options:

Fight with the availabel Intel code and try to force it to compile in Apex
Incorporate the NPE driver from NetBSD into a rump kernel to be included in Apex instead of the original Intel code, since the NetBSD driver only needed an easily compilable binary blob
Hack together an Apex version that simulates the typing necessary RedBoot commands to load via TFTP the netbsd image and execute it.

After taking a look at the NPE driver buildsystem, I concluded there were very few options less attractive that option 1, among which was hammering nails through my forehead as a improvement measure against the severe brain damage which I would probably be likely to be inflicted with after dealing with the NPE "build system".

Option 2 looked like the best option I could have, given the situation, but my NetBSD foo was too close to 0 to even dream to endeavor on such a task. In my opinion, this still remains the technically superior solution to the problem since is very portable and a flexible way to ensure networking works in spite of the proprietary NPE code.

But, in practice, the best option I could implement at the time was option 3. I initially planned to pre-fill from Apex my desired commands into the RedBoot buffer that stored the keyboard strokes typed by the user:

load -r -b 0x200000 -h 192.168.0.2 netbsd-nfs.bin
g

Since this was the first time ever for me I was going to do less than trivial reverse engineering in order to find the addresses and signatures of interesting functions in the RedBoot code, it wasn't bad at all that I had a version of the RedBoot source code.

When stuck with reverse engineering, apply JTAG

The bad thing was that the code Linksys published as the source of the RedBoot running inside the NSLU2 was, in fact, a different code which had some significant changes around the code pieces I was mostly interested in. That in spite of the GPL terms.

But I thought that I could manage. After all, how hard could it be to identify the 2-3 functions I was interested in and 1 buffer? Even if I only had the disassembled code from the slug, it shouldn't be that hard.

I struggled with this for about 2-3 weeks on the few occasions I had during that time, but the excitement of leaning something new kept me going. Until I got stuck somewhere between the misalignment between the published RedBoot code and the disassembled code, the state of the system at the time of dumping the contents from RAM (for purposes of disassemby), the assembly code generated by GCC for some specific C code I didn't have at all, and the particularities of ARM assembly.

What was most likely to unblock me was to actually see the code in action, so I decided attaching a JTAG dongle to the slug and do a session of in-circuit-debugging was in order.

Luckily, the pinout of the JTAG interface was already identified in the NSLU2 Linux project, so I only had to solder some wires to the specified places and a 2x20 header to be able to connect through JTAG to the board.

JTAG connections on Kinder (the NSLU2 targeting NetBSD)

After this was done I tried immediately to see if when using a JTAG debugger I could break the execution of the code on the system. The answer was sadly, no.

The chip was identified, but breaking the execution was not happening. I tried this in OpenOCD and in another proprietary debugger application I had access to, and the result was the same, breaking was not happening.

$ openocd -f interface/ftdi/olimex-arm-usb-ocd.cfg -f board/linksys_nslu2.cfg

Open On-Chip Debugger 0.8.0 (2015-04-14-09:12)

Licensed under GNU GPL v2

For bug reports, read

http://openocd.sourceforge.net/doc/doxygen/bugs.html

Info : only one transport option; autoselect 'jtag'

adapter speed: 300 kHz

Info : ixp42x.cpu: hardware has 2 breakpoints and 2 watchpoints

0

Info : clock speed 300 kHz

Info : JTAG tap: ixp42x.cpu tap/device found: 0x29277013 (mfg: 0x009,

part: 0x9277, ver: 0x2)

[..]

$ telnet localhost 4444

Trying ::1...

Trying 127.0.0.1...

Connected to localhost.

Escape character is '^]'.

Open On-Chip Debugger

> halt

target was in unknown state when halt was requested

in procedure 'halt'

> poll

background polling: on

TAP: ixp42x.cpu (enabled)

target state: unknown

Looking into the documentation I found a bit of information on the XScale processors[X] which suggested that XScale processors might necessarily need the (otherwise optional) SRST signal on the JTAG interface to be able to single step the chip.

This confused me a lot since I was sure other people had already used JTAG on the NSLU2.

The options I saw at the time were:

my NSLU2 did have a fully working JTAG interface (either due to the missing SRST signal on the interface or maybe due to a JTAG lock on later generation NSLU-s, as was my second slug)
nobody ever single stepped the slug using OpenOCD or other JTAG debugger, they only reflashed, and I was on totally new ground

I even contacted Rod Whitby, the project leader of the NSLU2 project to try to confirm single stepping was done before. Rod told me he never did that and he only reflashed the device.

This confused me even further because, from what I encountered on other platforms, in order to flash some device, the code responsible for programming the flash is loaded in the RAM of the target microcontroller and that code is executed on the target after a RAM buffer with the to be flashed data is preloaded via JTAG, then the operation is repeated for all flash blocks to be reprogrammed.

I was aware it was possible to program a flash chip situated on the board, outside the chip, by only playing with the chip's pads, strictly via JTAG, but I was still hoping single stepping the execution of the code in RedBoot was possible.

Guided by that hope and the possibility the newer versions of the device to be locked, I decided to add a JTAG interface to my older NSLU2, too. But this time I decided I would also add the TRST and SRST signals to the JTAG interface, just in case single stepping would work.

This mod involved even more extensive changes than the ones done on the other NSLU, but I was so frustrated by the fact I was stuck that I didn't mind poking a few holes through the case and the prospect of a connector always sticking out from the other NSLU2, which was doing some small, yet useful work in my home LAN.

It turns out NOBODY single stepped the NSLU2

After biting the bullet and soldering JTAG interface with also the TRST and the SRST signals connected as the pinout page from the NSLU2 Linux wiki suggested, I was disappointed to observe that I was not able to single step the older NSLU2 either, in spite of the presence of the extra signals.

I even tinkered with the reset configurations of OpenOCD, but had not success. After obtaining the same result on the proprietary debugger, digging through a presentation made by Rod back in the hay day of the project and the conversations on the NSLU2 Linux Yahoo mailing list, I finally concluded:

Actually nobody single stepped the NSLU2, no matter the version of the NSLU2 or connections available on the JTAG interface!

So I was back to square 1, I had to either struggle with disassembly, reevaluate my initial options, find another option or even drop entirely the idea. At that point I was already committed to the project, so dropping entirely the idea didn't seem like the reasonable thing to do.

Since I was feeling I was really close to finish on the route I had chosen a while ago, I was not any significantly more knowledgeable in the NetBSD code, and looking at the NPE code made me feel like washing my hands, the only option which seemed reasonable was to go on.

Digging a lot more through the internet, I was finally able to find another version of the RedBoot source which was modified for Intel ixp42x systems. A few checks here and there revealed this newly found code was actually almost identical to the code I had disassembled from the slug I was aiming to run NetBSD on. This was a huge step forward.

Long story short, a couple of days later I had a hacked Apex that could go through the RedBoot data structures, search for available commands in RedBoot and successfully call any of the built-in RedBoot commands!

Testing with loading this modified Apex by hand in RAM via TFTP then jumping into it to see if things woked as expected revealed a few small issues which I corrected right away.

Flashing a modified RedBoot?! But why? Wasn't Apex supposed to avoid exactly that risky operation?

Since the tests when executing from RAM were successful, my custom second stage Apex bootloader for NetBSD net booting was ready to be flashed into the NSLU2.

I added two more targets in the Makefile in the code on the dedicated netbsd branch of my Apex repository to generate the images ready for flashing into the NSLU2 flash (RedBoot needs to find a Sercomm header in flash, otherwise it will crash) and the exact commands to be executed in RedBoot are also print out after generation. This way, if the command is copy-pasted, there is no risk the NSLU2 is bricked by mistake.

After some flashing and reflashing of the apex_nslu2.flash image into the NSLU2 flash, some manual testing, tweaking and modifying the default built in APEX commands, checking that the sequence of commands 'move', 'go 0x01d00000' would jump into Apex, which, in turn, would call RedBoot to transfer the netbsd-nfs.bin image from a TFTP to RAM and then execute it successfully, it was high time to check NetBSD would boot automatically after the NSLU was powered on.

It didn't. Contrary to my previous tests, no call made from Apex to the RedBoot code would return back to Apex, not even the execution of a basic command such as the 'version' command.

It turns out the default commands hardcoded into RedBoot were 'boot; exec 0x01d00000', but I had tested 'boot; go 0x01d0000', which is not the same thing.

While 'go' does a plain jump at the specified address, the 'exec' command also does some preparations so it allows a jump into the Linux kernel and those preparations break some environment the RedBoot commands expect. I don't know which those are and didn't had the mood or motivation to find out.

So the easiest solution was to change the RedBoot's built-in command and turn that 'exec' into a 'go'. But that meant this time I was actually risking to brick the NSLU, unless I
was able to reflash via JTAG the NSLU2.

(to be continued - next, changing RedBoot and bisecting through the NetBSD history)

[X] Linksys NSLU2 has an XScale IXP420 processor which is compatible at ASM level with the ARMv5TEJ instruction set

Friday, 8 May 2015

Linksys NSLU2 adventures into the NetBSD land passed through JTAG highlands - part 1

About 2 months ago I set a goal to run some kind of BSD on the spare Linksys NSLU2 I had. This was driven mostly by curiosity, after listening to a few BSDNow episodes and becoming a regular listener, but it was a really interesting experience (it was also somewhat frustrating, mostly due to lacking documentation or proprietary code).

Looking for documentation on how to install any BSD flavour on the Linksys NSLU2, I have found what appears to be some too-incomplete-to-be-useful-for-a-BSD-newbie information about installing FreeBSD, no information about OpenBSD and some very detailed information about NetBSD on the Linksys NSLU2.

I was very impressed by the NetBSD build.sh script which can be used to cross-compile the entire NetBSD system - to do that, it also builds the appropriate toolchain - NetBSD kernel and the base system, even when ran on a Linux host. Having some experience with cross compilation for GNU/Linux embedded systems I can honestly say this is immensely impressive, well done NetBSD!

Gone were a few failed attempts to properly follow the instruction and lots of hours of (re)building, but then I had the kernel and the sets (the NetBSD system is split into several parts which are grouped by functionality, these are the sets), so I was in the position to have to set things up to be able to net boot - kernel loading via TFTP and rootfs on NFS.

But it wouldn't be challenging if the instructions were followed to the letter, so the first thing I wanted to change was that I didn't want to run dhcpd just to pass the DHCP boot configuration to the NSLU2, that seemed like a waste of resources since I already had dnsmasq running.

After some effort and struggling with missing documentation, I managed to use dnsmasq to pass DHCP boot parameters to the slug, but also use it as TFTP server - after some time I documented this for future reference on my blog and expect to refer to it in the future.

Setting up NFS wasn't a problem, but, when trying to boot, I found that I managed to misread at least 3 or 4 times some of the NSLU2 related information on the NetBSD wiki. To be able to debug what was happening I concluded the slug should have a serial console attached to it, which helped a lot.

Still the result was that I wasn't able to boot the trunk version of the NetBSD code on my NSLU2.

Long story short, with the help of some people from the #netbsd IRC channel on Freenode and from the port-arm NetBSD mailing list I found out that I might have a better chance with specific older versions. In practice what really worked was the code from the netbsd_6_1 branch.

Discussions on the port-arm mailing list, some digging into the (recently found) PR (problem reports), and a successful execution of the trunk kernel (at the time, version 7.99.4) together with 6.1.5 userspace lead me to the conclusion the NetBSD userspace for armbe was broken in the trunk branch.

And since I concluded this would be a good occasion to learn a few details about NetBSD, I set out to git bisect through the trunk history to identify when this happened. But that meant being able to easily load kernels and run them from TFTP, which was not how the RedBoot bootloader flashed into the slug behaves by default.

Be default, the RedBoot bootloader flashed into the NSLU2 waits for 2 seconds for a manual interaction (it waits for a ^C) on the serial console or on the telnet RedBoot prompt, then, if no such event happens, it copies the Linux image it has in flash starting with adress 0x50060000 into RAM at address 0x01d00000 (after stripping the Sercomm header) and then executes the copied code from RAM.

Of course, this is not a very handy way to try to boot things from TFTP, so my first idea to overcome this limitation was to use a second stage bootloader which would do the loading via TFTP of the NetBSD kernel, then execute it from RAM. Flashing this second stage bootloader instead of the Linux kernel at 0x50060000 would make sure that no manual intervention except power on would be necessary when a new kernel+userspace pair is ready to be tested.

Another advantage was that I would not risk bricking the NSLU2 since I would not be changing RedBoot, the original bootloader.

I knew Apex was used as the second stage bootloader in Debian, so I started configuring my own version of the APEX bootloader to make it work for the netbsd-nfs.bin image to be loaded via TFTP.

My first disappointment was that Apex was did not support receiving the boot parameters via DHCP, but only via RARP (it was clear it was less tested with BOOTP or DHCP) and TFTP was documented in the code as being problematic. That meant that I would have to hard code the boot configuration or configure RARP, but that wasn't too bad.

Later I found out that I wasted time on that avenue because the network driver in Apex was some Intel code (NPE Access Library) which can't be freely distributed, but could have been downloaded from Intel's site back in 2008-2009. The bad news was that current versions did not work at all with the old patch work that was done in Apex to allow for the driver made for Linux to compile in a world of its own so it could be incorporated in Apex.

I was stuck and the only options I were:

Fight with the available Intel code and make it compile in Apex
Incorporate the NPE driver from NetBSD into a rump kernel which will be included in Apex, since I knew the NetBSD driver only needed a very easily obtainable binary blob, instead of the entire driver as was in Apex before
Hack together an Apex version that simulates the typing of the necessary commands to load the netbsd-nfs.bin image inside RedBoot, or in other words, call from Apex the RedBoot functions necessary to load from TFTP and execute NetBSD.

Option 1 did not look that appealing after looking into the horrible Intel build system and its endless dependencies into a specific Linux kernel version.

Option 2 was more appealing, but since I didn't knew NetBSD and only tried once to build and run a NetBSD rump kernel, it seemed like a doable project only for an experienced NetBSD developer or at least an experienced NetBSD user, which I was not.

So I was left with option 3, which meant I had to do some reverse engineering of the code, because, although RedBoot is GPL, Linksys did not publish the source from which the running RedBoot was built from.

(continues here)

Thursday, 30 April 2015

Linksys NSLU2 JTAG help requested

Some time ago I have embarked on a jurney to install NetBSD on one of my two NSLU2-s. I have ran into all sorts of hurdles and problems which I finally managed to overcome, except one:

The NSLU I am using has a standard 20 pin ARM JTAG connector attached to it (as per this page http://www.nslu2-linux.org/wiki/Info/PinoutOfJTAGPort, only TDI, TDO, TMS, TCK, Vref and GND signals), but, although the chip is identified, I am unable to halt the CPU:

$ openocd -f interface/ftdi/olimex-arm-usb-ocd.cfg -f board/linksys_nslu2.cfg

Open On-Chip Debugger 0.8.0 (2015-04-14-09:12)

Licensed under GNU GPL v2

For bug reports, read

http://openocd.sourceforge.net/doc/doxygen/bugs.html

Info : only one transport option; autoselect 'jtag'

adapter speed: 300 kHz

Info : ixp42x.cpu: hardware has 2 breakpoints and 2 watchpoints

Info : clock speed 300 kHz

Info : JTAG tap: ixp42x.cpu tap/device found: 0x29277013 (mfg: 0x009,

part: 0x9277, ver: 0x2)

[..]

$ telnet localhost 4444

Trying ::1...

Trying 127.0.0.1...

Connected to localhost.

Escape character is '^]'.

Open On-Chip Debugger

> halt

target was in unknown state when halt was requested

in procedure 'halt'

> poll

background polling: on

TAP: ixp42x.cpu (enabled)

target state: unknown

My main goal is to make sure I can flash the device via JTAG, in case I break it, but it would be ideal if I could use the JTAG to single step through the code.

I have found that other people have managed to flash the device via JTAG without the other signals, and some have even changed the bootloader (and had JTAG confirmed as backup solution), so I am stuck.

So if anyone can give some insights into ixp42x / Xscale / NSLU2 specific JTAG issues or hints regarding this issue on OpenOCD or other such tool, I would be really grateful.

Note: I have made a hacked second stage Apex bootloader to laod the NetBSD image via TFTP, but the default RedBoot sequence 'boot; exec 0x01d00000' should be 'boot; go 0x01d00000' for NetBSD to work, so I am considering changing the RedBoot partition to alter that command. The gory details can be summed as my Apex is calling RedBoot functions to be network enabled (because Intel's NPE current code is not working on Apex) and I have tested this to work with go, but not with exec.

Monday, 9 February 2015

uClibc based toolchain using Gentoo for NSLU2

I was asked in the previous post why I didn't used the Debian armel port for my NSLU2.

My intention was to create a uclibc based system (and a uclibc based toolchain) for my NSLU2. This was not obvious in the post because building the uclibc based toolchain resulted in this error:

crossdev armv5-softfloat-linux-uclibceabi

[..]/var/tmp/portage/cross-armv5-softfloat-linux-uclibceabi/gcc-4.9.2/work/gcc-4.9.2/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc:67:21: fatal error: wordexp.h: No such file or directory
#include
^
compilation terminated.
Makefile:416: recipe for target 'sanitizer_platform_limits_posix.lo' failed
make[4]: *** [sanitizer_platform_limits_posix.lo] Error 1
make[4]: *** Waiting for unfinished jobs....

So, yesterday, to not forget what I did, I tried building a glibc based toolchain and posted that.

Today, after looking at the offending error, it seems gcc 4.9.2 assumes wordexp.h is always available on non-Android platforms and Gentoo does not make that file availble when installing uclibc.

~~I think the problem was introduced in gcc in 2013 with this commit, but I haven't cheked in detail.~~ What I know for sure is that gcc 4.8.3 works at this moment with uclibc and the buildroot guys are still using gcc 4.8.4 in their default uClibc based toolchain. So here it is the command that generated the uClibc based toolchain:

crossdev --g 4.8.3 armv5-softfloat-linux-uclibceabi

I hope this helps other people. Yes, I know I should report the issue to GCC/Gentoo after further investigation.

Tuesday, 22 January 2013

(Rsnapshot) backup and security - I see problems

In my previous post I was asking for suggestions for backup solutions that would be open/free software, do backups over the network to a local HDD, be cross platform to allow Windows and Linux clients and not be too CPU/memory hungry (on the server).

Several people suggested rsnapshot, BackupPC, areca-backup, and rsync. Thank you for all your suggestions, you have been a tremendous help. I have decided to give rsnapshot a try since it was suggested to me that it would actually do what is supposed to do for Windows clients, too (which was initially my perceived show stopper for rsnapshot).

Still, when getting to the implementation, I was a little disappointed by the very permissive access that needs to be provided on the client machines, since the backup is initiated from the backup server. Even the so called more secure suggested solutions seem way too permissive for my taste, since losing the control over the backup system means basically giving total access to the data from all client machines, which is quite a big problem in my opinion.

The data-transfer mechanism employed by rsnapshot is simply

S ==(connects and reads all data)==> C
S stores data in the final storage area

Am I the only one seeing a problem with this idea? If the server can connect to all your client machines and read all areas as it pleases, even if you restrict it to some directories, the data is already compromised when the backup server is compromised (think .ssh private keys, files with wireless network passwords and so on; I won't say card information - you don't keep credit/debit card information on your computer, or at least not in plain text, do you?).

What I would consider a better alternative would be a server-initiated dialogue which goes a little like this (S is server, C is client, '=' represent connections via ssh):

S ---(requests backup initiation procedure)---> C
S waits for a defined period of time that C connects back to send (already encrypted) data; if it doesn't arrive, it aborts
S <===(sends encrypted data to be backed up)=== C
S <-(signals the completion of data transfer)-- C
S stores the data in the final storage area

This way, the server can allow access from the clients only in designated areas (even a chroot is possible) from designated clients, access can even be provided only after a port knocking procedure and only during the backup time frame (since the server initiates the negotiation, it can expect only then the knocks, but only then), so the server is quite well secured. The connection to the server can even be done through an unprivileged account, it can even be one account per client machine which can be limited to a scponly shell, if you care for that level of security.

On the other hand, the client information is secure since it can be encrypted directly on the client machine and sent only after encryption, the client machine can decide and control what it sends, while the backup server can only store what the client provides. Also, if the server is compromised, the clients' data and system aren't compromised at all, since the data is on the backup machine, but is encrypted with a key only known on the client (and a backup copy of it can be stored somewhere safe).

I am aware this approach can be problematic for permission (user/group preservation), but it doesn't happen if there is a local <-> remote user mapping or simply the numeric IDs are kept.

I am also aware this means smarter clients and might mean the Windows machine might not be able to implement this completely, but a little more security than "here is all my data" can still be achieved, can't it?

What do other people think? Am I insane or paranoid?

I think I can implement this type of protocol in some scripts (at least one for server and one for clients) and use the backup_script feature of rsnapshot to keep this clean and nice within rsnapshot.

What might prove problematic with this approach is that rsync spedup is lost (might be?) because the copy is done to a temporary directory which, I assume, is empty, so tough luck. Another problem seems to be that every time the backup is done, the client has to encrypt each of the files to backup, which seems to be a real performance penalty, especially if the data to be backed up is quite large.

Is there an encryption layer that does this automatically at file level in the same/similar manner that LUKS does for entire block devices? Having the right file names, but with scrambled/encrypted contents seems to be the ideal solution from this PoV.

Thanks for reading and possible suggestions you might point me to.

P.S.: I just thought of this, if there was an encryption layer implemented with fuse which is mounted in some directory on the client machine, the default rsnapshot mechanism could actually work, and this would mitigate the data accessibility issue and the performance issue since that file system could be contained within a chroot and the encryption/scrambling would be done transparently on the client, so no data is plainly accessible. Does anybody know such a FUSE implementation that does on-the-fly file encryption?

P.P.S.: EncFS does exactly what I want with its --reverse option which is exactly designed for this purpose:

Normally EncFS provides a plaintext view of data on demand. Normally it stores enciphered data and displays plaintext data. With --reverse it takes as source plaintext data and produces enciphered data on-demand. This can be useful for creating remote encrypted backups, where you do not wish to keep the local files unencrypted.

Great!

Friday, 18 January 2008

NSLU2 on flash - script +not+ broken :-D

Update: Why don't I learn my lesson and stop working too late? Making a fool of myself in public isn't something to be proud of.

Ignore this post if you're seeing it for the first time.

Martin, David, this script is broken. When trying to set kernel parameters, there should be a new line, it shouldn't be explicitly removed. I know it looks weird, but this seems to be proven by this output:


kinder:/etc/init.d# ./params4flash start
kinder:/etc/init.d# cat params4flash
#! /bin/sh

exit 0

PATH=/sbin:/bin

# Max time to wait for writeout
MAX_AGE=120
CENT_AGE=$((100 * $MAX_AGE))
# Max percent of mem to use for dirty pages
DRATIO=10
# Once we write, do so until this many percent of mem is still in use
DBRATIO=1

case "$1" in
start)
   echo -n 0         >> /proc/sys/vm/swappiness
   echo -n $MAX_AGE  >> /proc/sys/vm/laptop_mode
   echo -n $CENT_AGE >> /proc/sys/vm/dirty_writeback_centisecs
   echo -n $CENT_AGE >> /proc/sys/vm/dirty_expire_centisecs
   echo -n $DRATIO   >> /proc/sys/vm/dirty_ratio
   echo -n $DBRATIO  >> /proc/sys/vm/dirty_background_ratio
   ;;
restart|reload|force-reload)
   echo "Error: argument '$1' not supported" >&2
   exit 3
   ;;
stop)
   # No-op
   ;;
*)
   echo "Usage: $0 start|stop" >&2
   exit 3
   ;;
esac
exit 0
kinder:/etc/init.d# cat /proc/sys/vm/swappiness
60

Wednesday, 9 January 2008

my laptop is back

Yesterday I got my laptop back. It was in service since the 6th of December.

They replaced the battery, and although it looked like the built-in charge indicator was defect, actually the button on this battery is buried deeper than it was on the previous one and I have to press really hard on it to trigger it.

According to the tests made yesterday night, this battery holds for about 2.5-3 hours when playing music. Also it has 96% of its designed capacity. I guess is OK enough since I don't want to be spared from my laptop anymore.

Next on the agenda:

restore data sanity (I did clean up most of the data from the HDD before handing down the laptop for the case when I was forced to hand it over; lucky for me, I managed to keep the HDD)
install armel Debian on my new[1] NSLU2, Kinder (more news on that later ;-) )
fix my local network chaos triggered by the problems that hit Ritter (my older NSLU2)
finish the wiki theme
work a little more on svn-buildpackage and kill more of its bugs
walk through my (game) packages' bugs and try to fix them, answer, etc.
try to make bluetooth transfers to work from and to my laptop

BTW, I still hold to my opinion that Dell and Emag suck, although I might buy desktops from Dell, laptops are a no-no until they fix their broken batteries and their broken "1-year warranty for batteries" policy.

[1] I bought a new NSLU2 after I left my laptop in service and it has been waiting since then to run the Debian Arm EABI (armel) port

Tuesday, 27 November 2007

Updates: NSLU2, Andrew S. Tanenbaum in .ro

Last weekend was as hectic as my life has been lately: I have been trying to restore sanity into my NSLU2, I went to a lecture from Andrew S. Tanenbaum and I made a 2.5 hours drive to my parents in about 4 hours because of the fog.

First, my slug:

refuses to recognise the USB NIC I have been using until the latest incidents (it either says 'not accepting address, error -71' or 'device descriptor read/64, error -71')
sometimes reboots when I insert the USB NIC
either doesn't boot at all or boots really slowly when the USB NIC is inserted
(obviously) doesn't show the NIC in lsusb listing when is not recognised

Since the USB NIC works on my laptop, I suspect a hardware problem with the slug. Bummer!

Dear no-so-lazyweb, is there a way to install Debian on an ASUS WL-500G Premium router without loosing wireless ability? Or, is there a way to make use of my USB NIC with the ASUS router?

Second, Andrew S. Tanenbaum visited Romania and lectured Friday at the University „Politehnica” Bucharest.

He presented Minix3's architecture and the advantages it has over monolithic OSes. I attended the lecture (although I am not a student anymore) and found it quite nice and well prepared, but I had the feeling that sometimes he was trying to avoid or to bash topics that were not putting Minix into a good light or challenged its title of being the first open[0] OS based on a micro-kernel architecture[1]. In spite of that, I found him to be a really good speaker and I liked the overall presentation, although, I also expected some on the spot demos or at least some recordings.

The things that I remember:

2.4 millions subtle code alterations in drivers with only 80000 driver crashes (of course, no kernel crashes)
simulation of network driver repeated crashes at different time intervals and how it affects performance - a 30% degradation at crashes that occur once every second and an insignificant degradation at crashes occurring at each 10 seconds
every driver has a set of rights assigned to it; it was difficult for them to define this - this sounds a lot like SELinux issues
messages have a fixed length
there is no dynamic memory allocation within the kernel
the kernel is 5000 lines of code (all drivers are in user space)
really secure system
there were performance comparisons with Minix2 and the hit was about 20%; still, is said that L4 has only an approximate 2-5% performance hit because of the micro-kernel architecture
apparently the FreeBSD kernel has only 3 bugs /1000 lines of code
Minix uses a BSD license

I also got a Minix live CD (which is more like the Gentoo Linux install CD - just console in the live system) and made an installation of Minix in a qemu machine[2]. Unfortunately, I don't think I'll have the time to dwell into the source.

I was thinking, would it worth the effort to try to make a GNU/Hurd/Minix system (i.e. replace Mach with Minix's micro-kernel)? BTW, is Debian GNU/Hurd now based on L4 or does it still uses Mach?

Note: Some of my work colleagues suggested that the presentation was the same as one he made at linux.conf.au last year, but I can't confirm/infirm that since I didn't saw the recording.

I won't write about the "fog drive", but I'll just say it wasn't pleasant at all, and I felt I was in driving in The Twilight Zone for the whole Friday evening.

[0] he gave credit to QNX
[1] For instance, I tried to ask him twice if he felt that GNU Hurd was violating the micro-kernel paradigm or if he can compare it to Minix' architecture. I had the impression that both times he avoided to answer and started the usual Hurd bashing, "they have been developing it for 20+ years, but got nothing working", meanwhile "Minix is here". After the lecture/presentation somebody told me that AST shortly said that they "were similar, but different". I didn't catch that line.
[2] thanks to qemu-launcher it is trivial to create and manage multiple qemu virtual machines

Tuesday, 13 November 2007

good news, bad news, such is life

After a really crazy weekend in which it looked like I wasn't able to set up NAT[1], I went to work with the gear and tested the exact same thing I did at home and it worked without a glitch.

So, now I suspect that there might be something wrong with my ISP or the DSL modem, while ritter (my NSLU2) is fine. As a bonus I just installed Debian armel on it (installation report to arrive soon).

When I got home, after the laptop came back from hibernate[2] I saw these messages on all my terminals:

Message from syslogd@bounty at Mon Nov 12 23:57:04 2007 ...
bounty kernel: Uhhuh. NMI received for unknown reason b0.

Message from syslogd@bounty at Mon Nov 12 23:57:04 2007 ...
bounty kernel: You have some hardware problem, likely on the PCI bus.

Message from syslogd@bounty at Mon Nov 12 23:57:04 2007 ...
bounty kernel: Dazed and confused, but trying to continue

Is this a good time to panic?
I guess I'll have to dig into this, too.

In other news, svn-buildpackage 0.6.23 has been uploaded to unstable and it fixes yet another 7 bugs, which brings svn-bp's bug count down to 19 open valid bugs (2 more if you count wontfix bugs, too). This is the lowest bug count of svn-buildpackage since at least the end of April, according to the bug count graph.

Also, oolite 1.65-6 was also uploaded to unstable and fixes the breakage due to the gnustep build tools changes. Unfortunately, on arm is dep-waiting for libgnustep-base-dev which is broken on this arch.

Thanks to my sponsors, you know who you are ;-) .

[1] I have been trying to set up a simple NAT machine for more than 6 hours and, although everything seemed to be OK, checked and double checked it didn't work
[2] which works now only if I don't use fglrx which is broken from this PoV (bug 449095)

Saturday, 10 November 2007

nslu2 broken?

Thursday evening, when I came from work I found my nslu2 not working. It is/was a router and a local mirror. I tried to understand what's going on, but I couldn't. Everything looked fine, except that it did not resolve any addresses and the ping to my provider was not giving any results.

I tried to restart dnsmasq, although it seemed it was running fine, and then, in a desperate gesture (after trying to understand what was going on) I thought of restarting networking. I got locked on the outside and was forced to restart the system.

Everything looked to come back to normal. Since it was late, I set myself to investigate the next morning what happened. But yesterday morning, the network was again down and the slug was inaccessible.

Since the only way to restore sanity in such a situation (because I don't have a serial console on the slug) was to reset and hope for the best, I did that.

The slug didn't start anymore. It seemed to cycle through boot -> reset -> reboot. I tried to see what was going on and connected the hard disk to my laptop. The filesystem was clean, although /var/log/boot.0 was a directory (and that directory had the same content as /var/lib/dpkg/info). I manually removed that, but that didn't make the slug boot.

It seems my slug is kind of dead.

Now I would like to know if is there any way to do remotely what flash-kernel does from within debian.

Wednesday, 22 August 2007

my nslu2 runs an eabi kernel

Thanks to Riku Voipio's advice, my slug now runs an EABI kernel with OABI compat support. As a bonus I have the latest 2.6.18-5 abi kernel :-) .

Things to remember when trying to do the same with your slug:

running make menuconfig (somehow) with the debian/arch/arm/config-ixp4xx file as config will, most definitely disable the compilation of the internal driver
before flashing a new linux image, check that the drivers for the network interfaces you expect to be up after reboot are present in the deb
arm iptables will fail with an eabi kernel, even if it has oabi compat support, you will need to install iptables in the armel chroot; so if you use iptables at all, you need to pass that invocation to the iptables binary from the armel chroot

I just chroot-ed into an armel chroot and rrdtool made some nice and meaningful graphs.

Instead of:

I got:

Some complaints:

migrating to armel will probably be a pain, if care is not taken to migrate settings, scripts and everything in /etc
the load spikes up just to make these graphs :-(

Tuesday, 7 August 2007

tilting at windmills - the NSLU2 issues

It turns out I was playing Don Quijote when I was trying to fix the reset issues I had with my slug.

Of course, I was suspecting the USB rack for the issues, but the issues never showed up on the laptop... until now.

I decied to temporary use a USB stick for the FS and started copying files over to the stick. During the copying the rack disconnected and everything became more clear. I was tilting at the windmills all the time.

BTW, be smarter than myself and just change the UUID to mach the old one instead of regenerating the initrd and reflashing it:

tune2fs -U the_uuid_of_the_previous_root_partition /dev/sda1

Oh, next on the agenda will be to use my ipod mini as a hdd for the slug in order to prevent ruining the stick... And another thing, total silence is really nice, even if the rack was not that noisy.

Rambling around foo