1

While setting up a RAW disk image for use in a QEMU-based virtual machine, I became frustrated because QEMU would load GRUB but then GRUB would not load a menu of OSes to boot into. I've come to the conclusion that GRUB must not be able to locate the crug.cfg file, which leads me to believe it's encoded something wrong in the post-MBR gap. Do any tools exist to inspect the contents of this gap?


Here's how I'm installing GRUB into the VM image:

# Disk Image
fallocate -l $((4*1024*1024*1024)) "$file"
DEV=$(sudo losetup --show --nooverlap --find "$file")

# Partition Table
sudo parted "$DEV" mklabel msdos
sudo parted "$DEV" mkpart primary fat16 1MiB 101MiB 
sudo parted "$DEV" mkpart primary ext4 102MiB 100%
sudo parted "$DEV" set 1 boot on
sudo mkfs.vfat "${DEV}p1"
sudo mkfs.ext4 -E lazy_journal_init=1 -E lazy_itable_init=1 -E discard "${DEV}p2"

# Mounting, installing base packages, configuration, etc..
# ...

# Bootloader
sudo mkdir "$mountpoint/boot/grub"
sudo install "$grub_default_file" "$mountpoint/etc/default/grub"
sudo arch-chroot "$mountpoint" grub-install --boot-directory="/boot/grub" --target=i386-pc "$DEV"
sudo arch-chroot "$mountpoint" grub-mkconfig -o "/boot/grub/grub.cfg"

The modified $grub_default_file just has some minor modifications to turn on serial output so that I can look around from QEMU's serial console.

GRUB_CMDLINE_LINUX="quiet console=tty0 console=ttyS0,38400n8"
GRUB_TERMINAL="console serial"
GRUB_SERIAL_COMMAND="serial --speed=38400 --unit=0 --word=8 --parity=no --stop=1"

Here's what I've verified so far:

  1. The MBR contains the string "GRUB", suggesting that grub has installed itself onto the disk image
  2. The partition table has a sufficiently large post-MBR gap
  3. The first partition has the boot flag set
  4. The first partition is a vfat filesystem
  5. The first partition contains /grub/grub.cfg and related files
  6. The grub.cfg file contains the correct UUIDs for the first and second partition

The only link I can't really verify is grub locating the partition containing the configuration file. Maybe I set the wrong boot flag. Maybe I chose the wrong filesystem type. Maybe grub encoded the location of that partition incorrectly in the MBR/post-MBR gap. It's quite hard to debug.

$ sudo dd if=zonemanager bs=$((2048*512)) count=1 | strings | grep -i grub
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00742499 s, 141 MB/s
GRUB 
$ sudo parted /dev/loop1
GNU Parted 3.2
Using /dev/loop1
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) unit s                                                           
(parted) p                                                                
Model: Loopback device (loopback)
Disk /dev/loop1: 8388608s
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start    End       Size      Type     File system  Flags
 1      2048s    206847s   204800s   primary  fat16        boot, lba
 2      208896s  8388607s  8179712s  primary  ext4
$ sudo file -s /dev/loop1
/dev/loop1: DOS/MBR boot sector
$ sudo file -s /dev/loop1p1
/dev/loop1p1: DOS/MBR boot sector, code offset 0x3c+2, OEM-ID "mkfs.fat", sectors/cluster 4, reserved sectors 4, root entries )
$ sudo lsblk -f
NAME               FSTYPE      LABEL UUID                                   FSAVAIL FSUSE% MOUNTPOINT
loop1                                                                                      
├─loop1p1          vfat              58D5-B48F                                45.1M    55% /mnt/boot
└─loop1p2          ext4              0014f737-33b7-4dba-be4a-2b186e2e46a0      2.1G    39% /mnt

$ grep 58D5-B48F /mnt/boot/grub/grub.cfg 
      search --no-floppy --fs-uuid --set=root  58D5-B48F
      search --no-floppy --fs-uuid --set=root 58D5-B48F
          search --no-floppy --fs-uuid --set=root  58D5-B48F
          search --no-floppy --fs-uuid --set=root 58D5-B48F
          search --no-floppy --fs-uuid --set=root  58D5-B48F
          search --no-floppy --fs-uuid --set=root 58D5-B48F
$ grep 0014f737-33b7-4dba-be4a-2b186e2e46a0 /mnt/boot/grub/grub.cfg 
menuentry 'Arch Linux' --class arch --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-0014f737-33b7-4dba-be4a-2b186e2e46a0' {
    linux   /vmlinuz-linux root=UUID=0014f737-33b7-4dba-be4a-2b186e2e46a0 rw quiet console=tty0 console=ttyS0,38400n8 quiet
submenu 'Advanced options for Arch Linux' $menuentry_id_option 'gnulinux-advanced-0014f737-33b7-4dba-be4a-2b186e2e46a0' {
    menuentry 'Arch Linux, with Linux linux' --class arch --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-linux-advanced-0014f737-33b7-4{
        linux   /vmlinuz-linux root=UUID=0014f737-33b7-4dba-be4a-2b186e2e46a0 rw quiet console=tty0 console=ttyS0,38400n8 quiet
    menuentry 'Arch Linux, with Linux linux (fallback initramfs)' --class arch --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-linux-fal{
        linux   /vmlinuz-linux root=UUID=0014f737-33b7-4dba-be4a-2b186e2e46a0 rw quiet console=tty0 console=ttyS0,38400n8 quiet

Mistake #1: grub-install --boot-directory should have been /boot and not /boot/grub.


Current Investigation #1: I compared the grub.cfg to a known working QEMU VM. Other than the UUIDs of the devices, there's only two differences:

  1. set root=(hd0,1)
  2. Extra arguments to search

WORKING

set root='hd0,msdos1'
if [ x$feature_platform_search_hint = xy ]; then
    search --no-floppy --fs-uuid --set=root --hint-ieee1275='ieee1275//disk@0,msdos1' --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1  0959-F5DD

NON-WORKING

if [ x$feature_platform_search_hint = xy ]; then
    search --no-floppy --fs-uuid --set=root  ECB4-BE7A

Current investigation #2: When in the grub shell, I can locate and boot into early userspace.


                             GNU GRUB  version 2.02

   Minimal BASH-like line editing is supported. For the first word, TAB   
   lists possible command completions. Anywhere else TAB lists possible   
   device or file completions.                                            


grub> set pager=1
grub> echo $feature_platform_search_hint
y
grub> ls
(hd0) (hd0,msdos2) (hd0,msdos1) (fd0) 
grub> ls (hd0,msdos1)/
vmlinuz-linux initramfs-linux.img initramfs-linux-fallback.img grub/ 
grub> set root=(hd0,msdos1)
grub> linux /vmlinuz-linux root=UUID=0014f737-33b7-4dba-be4a-2b186e2e46a0  rw quiet console=tty0 console=ttyS0,38400n8 quiet
grub> initrd /initramfs-linux.img
grub> boot
Starting version 242.32-2-arch

Further, using the configfile command actually loads the menu!

grub> configfile /grub/grub.cfg

So this leads me to believe that grub can't find the config file for some reason.


Mistake #2: Boot the right damn image file.

To eliminate a variable between the working and non-working VMs, I had converted the non-working VM from a raw disk image to a qcow2 image. I had been booting that image for the last few hours because I never reverted the systemd unit. Everything after "Mistake #1" was a red-herring. I'm going to leave it up though, as it's a good learning aid.

Huckle
  • 975
  • 2
  • 8
  • 30
  • add `--verbose` to `grub-install`, does it say anything? – frostschutz Jul 07 '19 at 07:33
  • you can also pass `-kernel`, `-initrd` and `-append` parameters to qemu directly and then install grub from within the VM itself w/o chroot – frostschutz Jul 07 '19 at 07:34
  • 1
    @frostschutz Adding the `--verbose` helped me locate one mistake (passing `/boot/grub` instead of `/boot` to `grub-install`), but I still end up in the grub shell. See updated investigations. My hunch is that the config file is fine, but grub can't find it for some reason. – Huckle Jul 07 '19 at 19:25
  • Oh, and the reason I'm not just booting a live CD install grub is because I'm scripting the VM creation / OS installation. – Huckle Jul 07 '19 at 19:37
  • @frostschutz PEBKAC – Huckle Jul 07 '19 at 19:47
  • I totally missed that too. It happens... :-) – frostschutz Jul 07 '19 at 19:59

0 Answers0