Friday, 8 May 2015

Linksys NSLU2 adventures into the NetBSD land passed through JTAG highlands - part 1

About 2 months ago I set a goal to run some kind of BSD on the spare Linksys NSLU2 I had. This was driven mostly by curiosity, after listening to a few BSDNow episodes and becoming a regular listener, but it was a really interesting experience (it was also somewhat frustrating, mostly due to lacking documentation or proprietary code).

Looking for documentation on how to install any BSD flavour on the Linksys NSLU2, I have found what appears to be some too-incomplete-to-be-useful-for-a-BSD-newbie information about installing FreeBSD, no information about OpenBSD and some very detailed information about NetBSD on the Linksys NSLU2.

I was very impressed by the NetBSD build.sh script which can be used to cross-compile the entire NetBSD system - to do that, it also builds the appropriate toolchain - NetBSD kernel and the base system, even when ran on a Linux host. Having some experience with cross compilation for GNU/Linux embedded systems I can honestly say this is immensely impressive, well done NetBSD!

Gone were a few failed attempts to properly follow the instruction and lots of hours of (re)building, but then I had the kernel and the sets (the NetBSD system is split into several parts which are grouped by functionality, these are the sets), so I was in the position to have to set things up to be able to net boot - kernel loading via TFTP and rootfs on NFS.

But it wouldn't be challenging if the instructions were followed to the letter, so the first thing I wanted to change was that I didn't want to run dhcpd just to pass the DHCP boot configuration to the NSLU2, that seemed like a waste of resources since I already had dnsmasq running.

After some effort and struggling with missing documentation, I managed to use dnsmasq to pass DHCP boot parameters to the slug, but also use it as TFTP server - after some time I documented this for future reference on my blog and expect to refer to it in the future.

Setting up NFS wasn't a problem, but, when trying to boot, I found that I managed to misread at least 3 or 4 times some of the NSLU2 related information on the NetBSD wiki. To be able to debug what was happening I concluded the slug should have a serial console attached to it, which helped a lot.

Still the result was that I wasn't able to boot the trunk version of the NetBSD code on my NSLU2.

Long story short, with the help of some people from the #netbsd IRC channel on Freenode and from the port-arm NetBSD mailing list I found out that I might have a better chance with specific older versions. In practice what really worked was the code from the netbsd_6_1 branch.

Discussions on the port-arm mailing list, some digging into the (recently found) PR (problem reports), and a successful execution of the trunk kernel (at the time, version 7.99.4) together with 6.1.5 userspace lead me to the conclusion the NetBSD userspace for armbe was broken in the trunk branch.

And since I concluded this would be a good occasion to learn a few details about NetBSD, I set out to git bisect through the trunk history to identify when this happened. But that meant being able to easily load kernels and run them from TFTP, which was not how the RedBoot bootloader flashed into the slug behaves by default.

Be default, the RedBoot bootloader flashed into the NSLU2 waits for 2 seconds for a manual interaction (it waits for a ^C) on the serial console or on the telnet RedBoot prompt, then, if no such event happens, it copies the Linux image it has in flash starting with adress 0x50060000 into RAM at address 0x01d00000 (after stripping the Sercomm header) and then executes the copied code from RAM.

Of course, this is not a very handy way to try to boot things from TFTP, so my first idea to overcome this limitation was to use a second stage bootloader which would do the loading via TFTP of the NetBSD kernel, then execute it from RAM. Flashing this second stage bootloader instead of the Linux kernel at 0x50060000 would make sure that no manual intervention except power on would be necessary when a new kernel+userspace pair is ready to be tested.

Another advantage was that I would not risk bricking the NSLU2 since I would not be changing RedBoot, the original bootloader.

I knew Apex was used as the second stage bootloader in Debian, so I started configuring my own version of the APEX bootloader to make it work for the netbsd-nfs.bin image to be loaded via TFTP.

My first disappointment was that Apex was did not support receiving the boot parameters via DHCP, but only via RARP (it was clear it was less tested with BOOTP or DHCP) and TFTP was documented in the code as being problematic. That meant that I would have to hard code the boot configuration or configure RARP, but that wasn't too bad.

Later I found out that I wasted time on that avenue because the network driver in Apex was some Intel code (NPE Access Library) which can't be freely distributed, but could have been downloaded from Intel's site back in 2008-2009. The bad news was that current versions did not work at all with the old patch work that was done in Apex to allow for the driver made for Linux to compile in a world of its own so it could be incorporated in Apex.

I was stuck and the only options I were:
  1. Fight with the available Intel code and make it compile in Apex
  2. Incorporate the NPE driver from NetBSD into a rump kernel which will be included in Apex, since I knew the NetBSD driver only needed a very easily obtainable binary blob, instead of the entire driver as was in Apex before
  3. Hack together an Apex version that simulates the typing of the necessary commands to load the netbsd-nfs.bin image inside RedBoot, or in other words, call from Apex the RedBoot functions necessary to load from TFTP and execute NetBSD.
Option 1 did not look that appealing after looking into the horrible Intel build system and its endless dependencies into a specific Linux kernel version.

Option 2 was more appealing, but since I didn't knew NetBSD and only tried once to build and run a NetBSD rump kernel, it seemed like a doable project only for an experienced NetBSD developer or at least an experienced NetBSD user, which I was not.

So I was left with option 3, which meant I had to do some reverse engineering of the code, because, although RedBoot is GPL, Linksys did not publish the source from which the running RedBoot was built from.


(continues here)

No comments: