Saturday 20 June 2009

MSI PR200WX-058EU sleep - 91a6c462b02d8dc02dbe95e5a407d78078a38d01 is first bad commit

Nailed it!

After a rocky start, I managed to find the commit that broke sleep for my laptop.


91a6c462b02d8dc02dbe95e5a407d78078a38d01 is first bad commit
commit 91a6c462b02d8dc02dbe95e5a407d78078a38d01
Author: H. Peter Anvin

Date: Wed Jul 11 12:18:57 2007 -0700

Use the new x86 setup code for x86-64; unify with i386

This unifies arch/*/boot (except arch/*/boot/compressed) between
i386 and x86-64, and uses the new x86 setup code for x86-64 as well.

Signed-off-by: H. Peter Anvin

Signed-off-by: Linus Torvalds



Simply reverting this commit wouldn't fix the problem entirely since the screen was always blank after the successful resumes; OTOH, the script used for testing was supposed to do stuff after resume, stuff which would have effects visible on the hard-disk, so it was visible on the next reboot if the last sleep/resume cycle was successful or not.


Great. Oh, and git bisect rules!




Now, if you're interested in finding a regression in the kernel and you might be interested in how I automated the thing, here are some small scripts I used:

  • linux-build - a wrapper script around make-kpkg to build .deb packages of the linux kernels I build; I used it way before this bisect, but now I modified it in such a way the kernels are clearly versioned and indicate the commit to which they correspond, too
  • sleepit - a script that automated the actions needed for a linux kernel to be tested; is really trivial and highly specialized on sleep/resume debugging; it assumes to be ran in the directory where you'd later want to grab dmesg-s outputs from
  • sleeptest - a wrapper script that is smart enough to detect if the current kernel is a kernel to be tested or a stable (regular kernel) one
    • if the kernel is a stable one:
      • looks for the signs left by the last test kernel and depending on them, mark the kernel bad or good in the bisect; this would result in a new checkout which would be processed or, if the bad commit was identified, the script would stop
      • in the case of a new bisect, the new checkout is cleaned up, patched, built, then the script installs the new linux-image .deb[1] and update-grub[2], leaving the reboot command at my discretion for the eventual case something went awry; a failure to compile the kernel in an automated fashion would have dropped me in an interactive console which meant I had to manually do the steps necessary to be ready to boot into the next kernel
    • if the kernel is a test kernel run the sleepit script
The main script was the sleeptest script which is ran as root to allow sleep commands, installation of the kernel and update-grub; when building, the build is done via su to my user.

As a supplemental speed up, I configured libpam-usb to authenticate root and myself through a USB storage device, which is quite cool. I am still pondering if I should keep this enabled or migrate to something like libpam-rsa[*].

Of course, the scripts contain stuff hard-coded into them (my user name for one), but they can easily be modified to remove those limitations (generally they use variables).


linux-build


#!/bin/sh
# License: GPLv2+/MIT
# Author: Eddy Petrișor
#
# Acest script trebuie rulat din directorul nucleului cu comanda:
# linux-build [--no-headers] [--rebuild]
#
# This script must be ran from the kernel tree directory with
# linux-build [--no-headers] [--rebuild]

FATTEMPT=../attempt

TARGETS="kernel-image kernel-headers modules_config modules"
[ "$1" = "--no-headers" ] && shift && TARGETS="$(echo $TARGETS | sed 's#kernel-headers ##')"

if [ -f $ATTEMPT ]
then
ATT=`cat "${FATTEMPT}"`
if [ $# -eq 0 ]
then
ATT=`expr $ATT + 1`
make-kpkg clean
else
if [ $# -eq 1 ] && [ $1 = '--rebuild' ]
then
# nothing to do, we are already set
echo 'Preparing for rebuild'
else
echo 'Illegal parameters'
exit 2
fi
fi
else
ATT=1
fi

# no problem if is rewritten on rebuild
echo "$ATT" >$FATTEMPT

# must define MODULE_LOC for mol module compilation
DIR=`pwd`
cd ..
MODULE_LOC=$(pwd)/modules
# this didn't work
# export ALL_PATCH_DIR=$(pwd)/linux-patches
cd ${DIR}

echo "Modules should be here: ${MODULE_LOC}"
echo "Stop by ctrl+c, if the independent modules aren't there"

# press ctrl+c, if needed -- disabled for now
#read

export MODULE_LOC
export CONCURRENCY_LEVEL=$(grep -c 'processor' /proc/cpuinfo)

[ -d .git ] && PREFIX="g$(git log --pretty=oneline --max-count=1 | cut -c 1-8)-" || PREFIX=""
APPEND=$PREFIX$(hostname)

#time make-kpkg --rootcmd fakeroot --revision ${ATT} --stem linux --append-to-version -`hostname` --config menuconfig --initrd --uc --us kernel-image kernel-headers modules_config modules
#time make-kpkg --rootcmd fakeroot --revision ${ATT} --stem linux --append-to-version -`hostname` --added-patches 'ata_piix-ich8-fix-map-for-combined-mode.patch,ata_piix-ich8-fix-native-mode-slave-port.patch' --config silentoldconfig --initrd --uc --us kernel-image kernel-headers modules_config modules
time make-kpkg --rootcmd fakeroot --revision ${ATT} --stem linux --append-to-version -$APPEND --config silentoldconfig --initrd --uc --us $TARGETS


sleepit


#!/bin/sh

FAILEDRESUME=/failed-resume
RESUMED=/resumed

modprobe i915
invoke-rc.d acpid stop
echo "$(uname -r)" > $FAILEDRESUME
dmesg >dmesg_before_$(uname -r); echo mem > /sys/power/state; dmesg >dmesg_after_$(uname -r); sync
echo 'resumed, oh my god' > resumed
echo "$(uname -r)" >> $RESUMED
rm -f $FAILEDRESUME
sync
sleep 10
reboot



sleeptest


#!/bin/sh

RESULTSDIR=/root/var/debug/sleep/regression
UNAMER="$(uname -r)"
FAILEDSLEEPFILE=/failed-resume
RESUMED=/resumed
SOURCEDIR=/home/eddy/usr/src/linux/linux-2.6

check_same_commit ()
{
local COMMIT
COMMIT=$(git log --pretty=oneline --max-count=1 | cut -c 1-8)
[ "$COMMIT" = "$1" ] && return 0 || return 1
}

get_rev_from_unamer ()
{
echo "$1" | sed 's#.*-g\([0-9a-f]*\)-heidi#\1#'
}

mark_bad ()
{
cd $SOURCEDIR
su -c 'git reset --hard HEAD' eddy
su -c 'git bisect bad' eddy
cd -
}

mark_good ()
{
cd $SOURCEDIR
su -c 'git reset --hard HEAD' eddy
su -c 'git bisect good' eddy
cd -
}

compile_next ()
{
cd $SOURCEDIR
if [ -f $FAILEDSLEEPFILE ] ; then
LKVER=$(cat $FAILEDSLEEPFILE)
else
LKVER=$(tail -n 1 $RESUMED)
fi
PREVCOMMIT=$(get_rev_from_unamer "$LKVER")

if check_same_commit "$PREVCOMMIT" ; then
echo "It looks like you got your result!"
exit 1337 # of course $? isn't 1337, but anyways
fi

su -c 'make clean && rm -fr debian && git reset --hard HEAD && patch -p1 < lkver="$(cat">>> BAD <<< $LKVER ($(get_rev_from_unamer $LKVER))" mark_bad else LKVER=$(tail -n 1 $RESUMED) echo "Marking >>> good <<< $LKVER ($(get_rev_from_unamer $LKVER))" mark_good fi compile_next && \ cd $SOURCEDIR/.. && \ echo 'Installing the linux-image and running update-grub && reboot' && \ dpkg -i $(ls linux-image-*_$(cat attempt)_*.deb) && \ update-grub fi


You have my permission to use, modify and redistribute these scripts or modified versions based on these under the terms of the MIT license.


[*] because the libpam-rsa package seems to be unmaintained (especially upstream), while libpam-usb seems to inactive (maybe is considered finished by upstream?)

[1] I didn't automate the removal of the previous test kernel, but that could have been done easily

[2] I haven't made a custom grub section for the test kernels in such a way they would boot by default at the next reboot since I considered that to be too cumbersome for the moment (although I had /vmlinuz symlinks) and it was simpler to select manually the kernel

Thursday 18 June 2009

Howto: transitioning to grub2 from lilo (LVM)

Failing to boot my self compiled kernel, I got to the conclusion that, in order to try to debug the initramfs issues I might have I needed to pass easily different break=* parameters on the kernel command line to identify which part of the boot within the initramfs goes awry.

Since I was stuck with lilo and booting with different parameters is a pain when using it, I decided is time to try to install and start using grub-pc, aka grub2.


Documentation on migrating from lilo (or even grub 1) is lacking, even more for cases where you have the /boot directory on LVM, like I do. IMHO, one of the most important things during a migration is to never lose the ability to boot properly your system, which means that it was a must not to write the grub code into the MBR unless I was 100% sure I will be able to boot with grub at the next reboot.


I tried quite a few approaches (installing grub in a partition, trying to create a real /boot partition), but they all failed on way or another, so I will not describe any of those failing methods.


What I found to work was to:
  • install the grub-pc package,
  • create its configuration and set it up (creates the module files in /boot/grub and /boot/grub/grub.cfg),
  • then install the boot code on an external USB stick, while nothing else was to written on the stick (so no data lost from the stick);
  • after successfully booting from the USB stick, install the boot image on the internal harddisk's MBR
  • reboot and be happy with grub2
Note that, although I run lenny, I decided to use sid's grub package since the little information on grub2 that existed pointed to grub-mkconfig which doesn't exist in lenny's grub-common package. Due to some bug I found earlier at work when trying to migrate my workstation, I knew the squeeze version wasn't good either because it failed to properly boot systems with / on LVM. The earliest version that works is 1.96+20090603-2 and installs without problems directly from sid on a lenny system.

Now lets go over the steps in more detail*.

Creating the configuration is a matter of creating the device.map file and grub.cfg:
update-grub


Create a core.img file with support for lvm:
grub-mkimage --output=/boot/grub/core.img ext2 chain pc gpt biosdisk lvm


Install on the MBR of the USB stick (making sure lvm will be visible):
grub-install --modules="pc ext2 biosdisk gpt chain lvm" /dev/sdb


Try to boot from the stick at least one kernel. If it fails, you probably didn't add in the proper modules (see the possible module names in /usr/lib/grub/i386-pc and add the necessary ones in the modules list). If you're dropped into the GRUB rescue prompt, use ls and set to check if the root and prefix variables are set correctly and if the lvm root is visible.

If you manage to boot, you're ready to replace lilo or your old boot loader.


Install on the internal disk MBR:
grub-install --modules="pc ext2 biosdisk gpt chain lvm" /dev/sda


Reboot and be happy.



Please note that you might have added some optional parameters in your old lilo.conf which aren't present in the new /boot/grub/grub.cfg file and you might want to add those. If you want to add some option to all the images, you'll probably want to append those parameters to the value of this variable from /etc/default/grub:

GRUB_CMDLINE_LINUX="quiet"

For instance, since I want to find when sleep stopped working for my machine, I changed mine into:

GRUB_CMDLINE_LINUX="acpi_sleep=beep ec_intr=0"


After that, run update-grub once more to propagate the changes into /boot/grub/grub.cfg.





* skipping package installation; just use aptitude's text user interface and install grub-pc and grub-common from sid; depending on the system you might need to install grub-efi, grub-ieee1275 or even grub-linuxbios instead of grub-pc

Wednesday 17 June 2009

[help] kernel: same config as debian, but mine doesn't boot

Update2: I finally managed to figure out what was wrong. The pristine kernel was missing this patch ata_piix-ich8-fix-native-mode-slave-port.patch which I got from the linux-patch-debian-2.6.18_2.6.18.dfsg.1-24etch2_all.deb package.

Fortunately the package was still available from oldstable, but I am wondering when will snapshot.d.[no] return.

Now I can get back to detecting which is the commit that broke sleep, although I suspect I am searching for 9666f40:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=9666f400





Update: I managed to install grub2 and the initramfs fails to find the root because it can't find the "st" volume group. Since I compiled in the kernel the support for LVM, it was clear that any issue that might have appeared was due to the initramfs.

All this seems to be due to this:

(initramfs) dmsetup ls
/proc/misc: No entry for device-mapper found
Is device-mapper driver missing from kernel?
Failure to communicate with kernel device-mapper driver.
/proc/misc: No entry for device-mapper found
Is device-mapper driver missing from kernel?
Incompatible libdevmapper 1.0.2.27 (2008-06-25)(compat) and kernel driver
Command failed

So now I just have to figure out how to patch the old kernels to work or how to install an older libdevmapper.




My laptop doesn't resume properly from sleep (although hibernate/resume works), although it worked at some point in the past with 2.6.18 (at least the one in Debian Etch worked, kind of).

In an attempt to git bisect in order to find which was the commit responsible for the regression, I tried to compile the vanilla 2.6.18 Linux kernel with the exact configuration (with minor differences) as the Debian Etch kernel, but I was surprised to see that my make-kpkg compiled kernel didn't boot.

The differences are:

--- config-2.6.18-6-amd64    2009-06-17 00:57:56.000000000 +0300
+++ /boot/config-2.6.18-heidi 2009-06-17 10:36:49.000000000 +0300
@@ -1,7 +1,7 @@
#
# Automatically generated make config: don't edit
-# Linux kernel version: 2.6.18
-# Thu Dec 25 21:04:29 2008
+# Linux kernel version: 2.6.18-heidi
+# Wed Jun 17 10:36:49 2009
#
CONFIG_X86_64=y
CONFIG_64BIT=y
@@ -1045,7 +1045,7 @@
CONFIG_IDE_GENERIC=m
CONFIG_BLK_DEV_CMD640=y
# CONFIG_BLK_DEV_CMD640_ENHANCED is not set
-CONFIG_BLK_DEV_IDEPNP=m
+CONFIG_BLK_DEV_IDEPNP=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
# CONFIG_BLK_DEV_OFFBOARD is not set
@@ -1069,7 +1069,6 @@
CONFIG_BLK_DEV_HPT34X=m
# CONFIG_HPT34X_AUTODMA is not set
CONFIG_BLK_DEV_HPT366=m
-CONFIG_BLK_DEV_JMICRON=m
CONFIG_BLK_DEV_SC1200=m
CONFIG_BLK_DEV_PIIX=m
CONFIG_BLK_DEV_IT821X=m
@@ -1144,7 +1143,6 @@
CONFIG_AIC79XX_DEBUG_ENABLE=y
CONFIG_AIC79XX_DEBUG_MASK=0
CONFIG_AIC79XX_REG_PRETTY_PRINT=y
-CONFIG_SCSI_ARCMSR=m
CONFIG_MEGARAID_NEWGEN=y
CONFIG_MEGARAID_MM=m
CONFIG_MEGARAID_MAILBOX=m
@@ -1360,6 +1358,7 @@
CONFIG_ADAPTEC_STARFIRE_NAPI=y
CONFIG_B44=m
CONFIG_FORCEDETH=m
+CONFIG_DGRS=m
CONFIG_EEPRO100=m
CONFIG_E100=m
CONFIG_FEALNX=m
@@ -1418,6 +1417,7 @@
#
CONFIG_TR=y
CONFIG_IBMOL=m
+CONFIG_3C359=m
CONFIG_TMS380TR=m
CONFIG_TMSPCI=m
CONFIG_ABYSS=m
@@ -2088,7 +2088,6 @@
CONFIG_SENSORS_ATXP1=m
CONFIG_SENSORS_DS1621=m
CONFIG_SENSORS_F71805F=m
-# CONFIG_SENSORS_F75375S is not set
CONFIG_SENSORS_FSCHER=m
CONFIG_SENSORS_FSCPOS=m
CONFIG_SENSORS_GL518SM=m
@@ -2116,7 +2115,6 @@
CONFIG_SENSORS_W83781D=m
CONFIG_SENSORS_W83791D=m
CONFIG_SENSORS_W83792D=m
-CONFIG_SENSORS_W83793=m
CONFIG_SENSORS_W83L785TS=m
CONFIG_SENSORS_W83627HF=m
CONFIG_SENSORS_W83627EHF=m
@@ -2350,6 +2348,7 @@
CONFIG_VIDEO_BTCX=m
CONFIG_VIDEO_IR=m
CONFIG_VIDEO_TVEEPROM=m
+CONFIG_USB_DABUSB=m

#
# Graphics support
@@ -2737,6 +2736,19 @@
CONFIG_USB_SERIAL_GARMIN=m
CONFIG_USB_SERIAL_IPW=m
CONFIG_USB_SERIAL_KEYSPAN_PDA=m
+CONFIG_USB_SERIAL_KEYSPAN=m
+# CONFIG_USB_SERIAL_KEYSPAN_MPR is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA28 is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA28X is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA28XA is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA28XB is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA19 is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA18X is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA19W is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA19QW is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA19QI is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA49W is not set
+# CONFIG_USB_SERIAL_KEYSPAN_USA49WLC is not set
CONFIG_USB_SERIAL_KLSI=m
CONFIG_USB_SERIAL_KOBIL_SCT=m
CONFIG_USB_SERIAL_MCT_U232=m
@@ -2756,6 +2768,8 @@
#
# USB Miscellaneous drivers
#
+CONFIG_USB_EMI62=m
+CONFIG_USB_EMI26=m
CONFIG_USB_AUERSWALD=m
CONFIG_USB_RIO500=m
CONFIG_USB_LEGOTOWER=m
@@ -3002,7 +3016,6 @@
CONFIG_ADFS_FS=m
# CONFIG_ADFS_FS_RW is not set
CONFIG_AFFS_FS=m
-# CONFIG_ASFS_FS is not set
CONFIG_HFS_FS=m
CONFIG_HFSPLUS_FS=m
CONFIG_BEFS_FS=m
@@ -3201,6 +3214,7 @@
CONFIG_SECURITY_NETWORK_XFRM=y
CONFIG_SECURITY_CAPABILITIES=y
# CONFIG_SECURITY_ROOTPLUG is not set
+CONFIG_SECURITY_SECLVL=m
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE=0

The initramfs simply stopped at an early point with some errors which look really weird taking into account debian kernel's shouldn't be that different from mine (photo described below):


Loading, please wait...
unknown keysym 'endash'
/etc/boottime.kmap.gz:23: syntax error
syntax error in map file
key bindings not changed
usb 1-2: device descriptor read/all, error -84
ata_piix 0000:00:1f.2: invalid MAP value 2
resume: libcrypt version: 1.4.1
resume: Could not stat the resume device file '/dev/sda5'
Please type in the full path name to try again
or press ENTER to boot the system:

I suspect the key map error, the usb error and the resume error to be unrelated to the boot problem.

For some reason I suspect the ata_piix error to be related.


After pressing enter more messages appeared:


The image reads further:

mount: mounting /dev/root on /root failed: No such device
mount: mounting /dev on /root/dev failed: No such device or directory
mount: mounting /sys on /root/sys failed: No such device or directory
mount: mounting /proc on /root/proc failed: No such device or directory
Target filesystem doesn't have /sbin/init.
No init found. Try passing init= bootarg.


(BusyBox prompt follows here).


I looked over the net for some hints, but i wan't able to find a solution.


Since I am forced to use Lilo (/boot on LVM) and I didn't managed to make grub-pc work on this system I am kind of stuck and don't know what to do to make the damn kernel boot.

I am running Debian Lenny, but I am willing to backport a few packages, if ncessary.

Help would be really appreciated.

Thursday 4 June 2009

Solution: E: Cannot get debconf version. Is debconf installed?

If you ever get this error when running apt-get or aptitude:

E: Cannot get debconf version. Is debconf installed?

Then your go to /var/lib/dpkg/ and make sure the files status, available and diversions are not empty. If they are, copy the corresponding *-old file into the proper file and be happy.


This is how they looked in a cowbuilder chroot of mine which refused to build packages (I highlighted the important zero-ed files):

root@twix:/# cd /var/lib/dpkg/
root@twix:/var/lib/dpkg# ls -l
total 280
drwxr-xr-x 2 root root 4096 Apr 13 2008 alternatives
-rw-r--r-- 2 root root 0 Jun 3 13:46 available
-rw-r--r-- 2 root root 99608 Jun 28 2008 available-old
-rw-r--r-- 2 root root 0 Jun 3 13:46 diversions
-rw-r--r-- 2 root root 2501 Feb 28 2008 diversions-old
drwxr-xr-x 2 root root 32768 Apr 13 2008 info
-rw-r----- 2 root root 0 Dec 10 20:27 lock
drwxr-xr-x 5 root root 4096 May 26 2005 methods
drwxr-xr-x 2 root root 4096 May 26 2005 parts
-rw-r--r-- 2 root root 47 Feb 28 2008 statoverride
-rw-r--r-- 2 root root 0 Feb 28 2008 statoverride-old
-rw-r--r-- 2 root root 0 Jun 3 13:46 status
-rw-r--r-- 2 root root 115966 Jun 28 2008 status-old
drwxr-xr-x 2 root root 4096 Jun 29 2008 updates


Hmm, that looks fixable ...

root@twix:/var/lib/dpkg# cp available-old available
root@twix:/var/lib/dpkg# cp diversions-old diversions
root@twix:/var/lib/dpkg# cp status-old status
root@twix:/var/lib/dpkg# ls -l
total 508
drwxr-xr-x 2 root root 4096 Apr 13 2008 alternatives
-rw-r--r-- 1 root root 99608 Jun 4 16:10 available
-rw-r--r-- 2 root root 99608 Jun 28 2008 available-old
-rw-r--r-- 1 root root 2501 Jun 4 16:10 diversions
-rw-r--r-- 2 root root 2501 Feb 28 2008 diversions-old
drwxr-xr-x 2 root root 32768 Apr 13 2008 info
-rw-r----- 2 root root 0 Dec 10 20:27 lock
drwxr-xr-x 5 root root 4096 May 26 2005 methods
drwxr-xr-x 2 root root 4096 May 26 2005 parts
-rw-r--r-- 2 root root 47 Feb 28 2008 statoverride
-rw-r--r-- 2 root root 0 Feb 28 2008 statoverride-old
-rw-r--r-- 1 root root 115966 Jun 4 16:10 status
-rw-r--r-- 2 root root 115966 Jun 28 2008 status-old
drwxr-xr-x 2 root root 4096 Jun 29 2008 updates


Now it works :-)