Friday, 3 August 2007

more news from the nslu2 front

The main reason for the lack of activity during this week was that I had some problems with my router, a NSLU2 running the debian arm port. This was the machine I wanted the softfloat rrdtool I was talking about in these 2 posts.

It turns out that for some weird reason the USB controller of the slug resets when there is too much traffic going on. I got lots and lots of messages like:


$ grep reset -A 1 /var/log/syslog | grep -E '(reset|repeated)'
Aug 3 07:02:27 ritter kernel: usb 3-1: reset high speed USB device using ehci_hcd and address 2
...
Aug 3 11:50:29 ritter kernel: usb 3-1: reset high speed USB device using ehci_hcd and address 2
Aug 3 11:51:03 ritter last message repeated 4 times
Aug 3 11:53:15 ritter kernel: usb 3-1: reset high speed USB device using ehci_hcd and address 2
Aug 3 11:53:46 ritter kernel: usb 3-1: reset high speed USB device using ehci_hcd and address 2
Aug 3 11:55:48 ritter kernel: usb 3-1: reset high speed USB device using ehci_hcd and address 2
Aug 3 11:56:07 ritter last message repeated 2 times
Aug 3 11:56:52 ritter kernel: usb 3-1: reset high speed USB device using ehci_hcd and address 2
Aug 3 11:56:57 ritter kernel: usb 3-1: reset high speed USB device using ehci_hcd and address 2



That wouldn't have been a problem as long as the damn thing worked, but in some cases after the reset the root filesystem was lost and all the services running on the machine died (except the networking itself since all of that is part of the kernel and is loaded in memory).

It took me about three evenings to get to this conclusion (imagine being locked out of a machine which advertised running ssh, http, and dns services, being able to telnet and getting a respons, but the full protocol failed).

It all became more clear when, while being connected via ssh I got his result:

eddy@ritter ~ $ ls
-bash: ls: command not found

"WHAT?" was the first reaction, then I figured that life without ls is unbearable and came up with this bit:

alias ls='for I in * ; do echo $I ; done'

Quite cool to understand a little bit about the system. Still, cat was unavailable so having /proc and /sys available did not help that much. At some point I was thinking about a busybox shell, tried that, it didn't work since it wanted to remove some important bits like the initramfs tools.


I googled again and found a good starting point for a discussion of the same problem someone else was having. I ended up trying all the proposed solutions: rmmod-ing ehci-hcd, blacklisting the module, regenerating a initrd image with the blacklisted module... None of these worked (individually) and I ended up having an even more unstable system when ehci was not present (I was loosing the root FS after aprox. 1 min after logging in remotely immediately after start). Restoring the image with ehci was a pain (not to mention the time spent trying to figure a way to revert the change safely, since bricking the machine was not out of the question - imagine loosing the FS while flashing the ram drive image).

I finally managed to revert the ehci enabled image and tried the workaround proposed in this gentoo BR. I tried 128 and now I am at 64 and I am still getting those resets; Still something good came out of this, I followed Michael Prokop's Use root=UUID on NSLU2 article and I think I have now a stable root file system just thanks to him.

Now I am stuck:


grep -E '(max_sect|reset)' /var/log/syslog
Aug 3 07:02:27 ritter kernel: usb 3-1: reset high speed USB device using ehci_hcd and address 2
Aug 3 08:27:38 ritter kernel: usb 3-1: reset high speed USB device using ehci_hcd and address 2
Aug 3 08:38:43 ritter manual message: max_sectors was set to 128
Aug 3 08:55:25 ritter kernel: usb 3-1: reset high speed USB device using ehci_hcd and address 2
Aug 3 09:31:47 ritter kernel: usb 3-1: reset high speed USB device using ehci_hcd and address 2
Aug 3 09:54:11 ritter set_max_sectors: max_sectors was set to 128
Aug 3 10:09:32 ritter set_max_sectors: max_sectors was set to 128
Aug 3 10:15:27 ritter set_max_sectors: sda's max_sectors was set to 128
Aug 3 10:24:24 ritter set_max_sectors: sda's max_sectors was set to 128
Aug 3 10:38:12 ritter set_max_sectors: sda's max_sectors was set to 128
Aug 3 10:47:04 ritter kernel: usb 3-1: reset high speed USB device using ehci_hcd and address 2
Aug 3 10:48:27 ritter set_max_sectors: sda's max_sectors was set to 64
Aug 3 11:04:36 ritter kernel: usb 3-1: reset high speed USB device using ehci_hcd and address 2
Aug 3 11:06:16 ritter kernel: usb 3-1: reset high speed USB device using ehci_hcd and address 2
....
Aug 3 15:05:58 ritter kernel: usb 3-1: reset high speed USB device using ehci_hcd and address 2

Should I try setting to 32?

In case you're wondering, I am doing the setting with this script:

#!/bin/sh

MAX_SECTORS=64
DISK=$(ls -l `grep 'uuid.* / ' /etc/fstab | awk '{print $1;}'` | sed 's#.*/\(.*\)$#\1#' | tr -d '[0-9]')

[ -z "$DISK" ] && logger -t 'set_max_sectors' "error: could not detect root disk" && exit

echo "$MAX_SECTORS" > /sys/block/$DISK/device/max_sectors
logger -t 'set_max_sectors' "${DISK}'s max_sectors was set to ${MAX_SECTORS}"

exit 0


Dear (not at all) lazy web, what can I do to fix/workaround this issue? Any help would be gratly appreciated (personal mail or comments).

Note: I am reluctant to try the fix proposed in the last comment that claims to fix the issue since compiling an arm kernel natively is a pain and I would like to have an official kernel from Debian Etch, as I do now.

4 comments:

Anonymous said...

Missing cat? No problem, just like you did with ls...

alias cat='while read I; do echo "$I"; done <'

eddyp said...

cool, thanks!
I guess I was to tired to think about that solution myself :-)

Anonymous said...

I have this prob too
"usb 1-1: reset high speed USB device using ehci_hcd and address 2"

and it seems there is no solution for it. It just gets on my nerves!!!

good luck

Oğuz said...

Hello,

I am having exactly the same problem.
Did you find any solution for it?