Wednesday, 17 June 2009

[help] kernel: same config as debian, but mine doesn't boot

Update2: I finally managed to figure out what was wrong. The pristine kernel was missing this patch ata_piix-ich8-fix-native-mode-slave-port.patch which I got from the linux-patch-debian-2.6.18_2.6.18.dfsg.1-24etch2_all.deb package.

Fortunately the package was still available from oldstable, but I am wondering when will snapshot.d.[no] return.

Now I can get back to detecting which is the commit that broke sleep, although I suspect I am searching for 9666f40:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=9666f400





Update: I managed to install grub2 and the initramfs fails to find the root because it can't find the "st" volume group. Since I compiled in the kernel the support for LVM, it was clear that any issue that might have appeared was due to the initramfs.

All this seems to be due to this:

(initramfs) dmsetup ls
/proc/misc: No entry for device-mapper found
Is device-mapper driver missing from kernel?
Failure to communicate with kernel device-mapper driver.
/proc/misc: No entry for device-mapper found
Is device-mapper driver missing from kernel?
Incompatible libdevmapper 1.0.2.27 (2008-06-25)(compat) and kernel driver
Command failed

So now I just have to figure out how to patch the old kernels to work or how to install an older libdevmapper.




My laptop doesn't resume properly from sleep (although hibernate/resume works), although it worked at some point in the past with 2.6.18 (at least the one in Debian Etch worked, kind of).

In an attempt to git bisect in order to find which was the commit responsible for the regression, I tried to compile the vanilla 2.6.18 Linux kernel with the exact configuration (with minor differences) as the Debian Etch kernel, but I was surprised to see that my make-kpkg compiled kernel didn't boot.

The differences are:

--- config-2.6.18-6-amd64    2009-06-17 00:57:56.000000000 +0300
+++ /boot/config-2.6.18-heidi 2009-06-17 10:36:49.000000000 +0300
@@ -1,7 +1,7 @@
#
# Automatically generated make config: don't edit
-# Linux kernel version: 2.6.18
-# Thu Dec 25 21:04:29 2008
+# Linux kernel version: 2.6.18-heidi
+# Wed Jun 17 10:36:49 2009
#
CONFIG_X86_64=y
CONFIG_64BIT=y
@@ -1045,7 +1045,7 @@
CONFIG_IDE_GENERIC=m
CONFIG_BLK_DEV_CMD640=y
# CONFIG_BLK_DEV_CMD640_ENHANCED is not set
-CONFIG_BLK_DEV_IDEPNP=m
+CONFIG_BLK_DEV_IDEPNP=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
# CONFIG_BLK_DEV_OFFBOARD is not set
@@ -1069,7 +1069,6 @@
CONFIG_BLK_DEV_HPT34X=m
# CONFIG_HPT34X_AUTODMA is not set
CONFIG_BLK_DEV_HPT366=m
-CONFIG_BLK_DEV_JMICRON=m
CONFIG_BLK_DEV_SC1200=m
CONFIG_BLK_DEV_PIIX=m
CONFIG_BLK_DEV_IT821X=m
@@ -1144,7 +1143,6 @@
CONFIG_AIC79XX_DEBUG_ENABLE=y
CONFIG_AIC79XX_DEBUG_MASK=0
CONFIG_AIC79XX_REG_PRETTY_PRINT=y
-CONFIG_SCSI_ARCMSR=m
CONFIG_MEGARAID_NEWGEN=y
CONFIG_MEGARAID_MM=m
CONFIG_MEGARAID_MAILBOX=m
@@ -1360,6 +1358,7 @@
CONFIG_ADAPTEC_STARFIRE_NAPI=y
CONFIG_B44=m
CONFIG_FORCEDETH=m
+CONFIG_DGRS=m
CONFIG_EEPRO100=m
CONFIG_E100=m
CONFIG_FEALNX=m
@@ -1418,6 +1417,7 @@
#
CONFIG_TR=y
CONFIG_IBMOL=m
+CONFIG_3C359=m
CONFIG_TMS380TR=m
CONFIG_TMSPCI=m
CONFIG_ABYSS=m
@@ -2088,7 +2088,6 @@
CONFIG_SENSORS_ATXP1=m
CONFIG_SENSORS_DS1621=m
CONFIG_SENSORS_F71805F=m
-# CONFIG_SENSORS_F75375S is not set
CONFIG_SENSORS_FSCHER=m
CONFIG_SENSORS_FSCPOS=m
CONFIG_SENSORS_GL518SM=m
@@ -2116,7 +2115,6 @@
CONFIG_SENSORS_W83781D=m
CONFIG_SENSORS_W83791D=m
CONFIG_SENSORS_W83792D=m
-CONFIG_SENSORS_W83793=m
CONFIG_SENSORS_W83L785TS=m
CONFIG_SENSORS_W83627HF=m
CONFIG_SENSORS_W83627EHF=m
@@ -2350,6 +2348,7 @@
CONFIG_VIDEO_BTCX=m
CONFIG_VIDEO_IR=m
CONFIG_VIDEO_TVEEPROM=m
+CONFIG_USB_DABUSB=m

#
# Graphics support
@@ -2737,6 +2736,19 @@
CONFIG_USB_SERIAL_GARMIN=m
CONFIG_USB_SERIAL_IPW=m
CONFIG_USB_SERIAL_KEYSPAN_PDA=m
+CONFIG_USB_SERIAL_KEYSPAN=m
+# CONFIG_USB_SERIAL_KEYSPAN_MPR is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA28 is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA28X is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA28XA is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA28XB is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA19 is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA18X is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA19W is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA19QW is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA19QI is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA49W is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA49WLC is not set
CONFIG_USB_SERIAL_KLSI=m
CONFIG_USB_SERIAL_KOBIL_SCT=m
CONFIG_USB_SERIAL_MCT_U232=m
@@ -2756,6 +2768,8 @@
#
# USB Miscellaneous drivers
#
+CONFIG_USB_EMI62=m
+CONFIG_USB_EMI26=m
CONFIG_USB_AUERSWALD=m
CONFIG_USB_RIO500=m
CONFIG_USB_LEGOTOWER=m
@@ -3002,7 +3016,6 @@
CONFIG_ADFS_FS=m
# CONFIG_ADFS_FS_RW is not set
CONFIG_AFFS_FS=m
-# CONFIG_ASFS_FS is not set
CONFIG_HFS_FS=m
CONFIG_HFSPLUS_FS=m
CONFIG_BEFS_FS=m
@@ -3201,6 +3214,7 @@
CONFIG_SECURITY_NETWORK_XFRM=y
CONFIG_SECURITY_CAPABILITIES=y
# CONFIG_SECURITY_ROOTPLUG is not set
+CONFIG_SECURITY_SECLVL=m
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE=0

The initramfs simply stopped at an early point with some errors which look really weird taking into account debian kernel's shouldn't be that different from mine (photo described below):


Loading, please wait...
unknown keysym 'endash'
/etc/boottime.kmap.gz:23: syntax error
syntax error in map file
key bindings not changed
usb 1-2: device descriptor read/all, error -84
ata_piix 0000:00:1f.2: invalid MAP value 2
resume: libcrypt version: 1.4.1
resume: Could not stat the resume device file '/dev/sda5'
Please type in the full path name to try again
or press ENTER to boot the system:

I suspect the key map error, the usb error and the resume error to be unrelated to the boot problem.

For some reason I suspect the ata_piix error to be related.


After pressing enter more messages appeared:


The image reads further:

mount: mounting /dev/root on /root failed: No such device
mount: mounting /dev on /root/dev failed: No such device or directory
mount: mounting /sys on /root/sys failed: No such device or directory
mount: mounting /proc on /root/proc failed: No such device or directory
Target filesystem doesn't have /sbin/init.
No init found. Try passing init= bootarg.


(BusyBox prompt follows here).


I looked over the net for some hints, but i wan't able to find a solution.


Since I am forced to use Lilo (/boot on LVM) and I didn't managed to make grub-pc work on this system I am kind of stuck and don't know what to do to make the damn kernel boot.

I am running Debian Lenny, but I am willing to backport a few packages, if ncessary.

Help would be really appreciated.

7 comments:

Kapil Hari Paranjape said...

IIRC there was some change to udev and 2.6.18 needs an older
udev in order to boot. If you have enough spare disk space it may be
better to try and re-create a minimal etch system to boot from to
check the regression in suspend/resume.

Rik said...

Are you passing the 'quiet' parameter to the kernel? If so, try removing it to see more output.

Anonymous said...

IIRC there was some change to udev and 2.6.18 needs an older
udev in order to boot. If you have enough spare disk space it may be
better to try and re-create a minimal etch system to boot from to
check the regression in suspend/resume.


You missed the part where I said that the Debian 2.6.18 kernel from Etch is working and mine isn't.


EddyP.

Anonymous said...

Are you passing the 'quiet' parameter to the kernel? If so, try removing it to see more output.

That was a real challenge. I was using lilo as a boot loader since I had /boot on LVM, while lilo is not as flexible as grub when it comes to changing boot parameters easily.

I managed to install grub-pc (aka grub2), but still I can't boot my kernel.

I'll post an update later tonight.


EddyP

kbloom said...

This looks like an initramfs issue rather than a kernel issue.

What are the contents of /dev at the point that you get the busybox prompt?

Does a copy of udev live on the initramfs? Is there a version mismatch there (i.e. an etch udev on the working initramfs and a lenny udev which is 2.6.18-incompatible on the broken initramfs)? Does the new kernel work with the old initramfs?

Martijn said...

Maybe you should supply the filesystems by UUID instead of device name?

erjc said...

your bios may be lying to you. investigate apic=[quiet|verbose|debug], no apic, nomce and friends instead of quiet. dmesg or kernel.log snippets could be helpful.