Re: [PATCH v3] docs: Add debugging chapter to development documentation

grub-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3] docs: Add debugging chapter to development documentation

From:	Oskari Pirhonen
Subject:	Re: [PATCH v3] docs: Add debugging chapter to development documentation
Date:	Thu, 15 Jun 2023 01:21:57 -0500
Oops, apologies for the late reply. Reading through it again, I found a
few more small nits:

On Tue, Jun 06, 2023 at 00:48:39 -0500, Glenn Washburn wrote:
> Debugging GRUB can be tricky and require arcane knowledge. This will
> help those unfamiliar with the process to get started debugging GRUB
> with less effort.
> 
> Signed-off-by: Glenn Washburn <development@efficientek.com>
> ---
> Changes from v1:
>  * Add gdbinfo section
> ---
> Interdiff against v2:
>   diff --git a/docs/grub-dev.texi b/docs/grub-dev.texi
>   index 188ca9c7ca6e..72470b42c61a 100644
>   --- a/docs/grub-dev.texi
>   +++ b/docs/grub-dev.texi
>   @@ -638,7 +638,7 @@ various targets using @command{gdb} and the 
> @samp{gdb_grub} GDB script.
>    @section i386-pc
>    
>    The i386-pc target is a good place to start when first debugging GRUB2
>   -because in some respects its easier than EFI platforms. The reason being
>   +because in some respects it's easier than EFI platforms. The reason being
>    that the initial load address is always known in advance. To start
>    debugging GRUB2 first QEMU must be started in GDB stub mode. The following
>    command is a simple illustration:
>   @@ -688,11 +688,11 @@ it does add the module symbols with the appropriate 
> offset.
>    @section x86_64-efi
>    
>    Using GDB to debug GRUB2 for the x86_64-efi target has some similarities 
> with
>   -the i386-pc target. Please read be familiar with the @ref{i386-pc} section
>   -when reading this one. Extra care must be used to run QEMU such that it 
> boots
>   -a UEFI firmware. This usually involves either using the @samp{-bios} option
>   -with a UEFI firmware blob (eg. @file{OVMF.fd}) or loading the firmware via
>   -pflash. This document will not go further into how to do this as there are
>   +the i386-pc target. Please read and familiarize yourself with the 
> @ref{i386-pc}
>   +section when reading this one. Extra care must be used to run QEMU such 
> that it
>   +boots a UEFI firmware. This usually involves either using the @samp{-bios}
>   +option with a UEFI firmware blob (eg. @file{OVMF.fd}) or loading the 
> firmware
>   +via pflash. This document will not go further into how to do this as there 
> are
>    ample resource on the web.
>    
>    Like all EFI implementations, on x86_64-efi the (U)EFI firmware that loads
>   @@ -700,7 +700,7 @@ the GRUB2 EFI application determines at runtime where 
> the application will
>    be loaded. This means that we do not know where to tell GDB to load the
>    symbols for the GRUB2 core until the (U)EFI firmware determines it. There 
> are
>    two good ways of figuring this out when running in QEMU: use a @ref{OVMF 
> debug log,
>   -debug build of OVMF} and check the debug log or have GRUB2 say where it is
>   +debug build of OVMF} and check the debug log, or have GRUB2 say where it is
>    loaded. Neither of these are ideal because they both generally give the
>    information after GRUB2 is already running, which makes debugging early 
> boot
>    infeasible. Technically, the first method does give the load address before
>   @@ -734,11 +734,11 @@ application must be run via QEMU at least once prior 
> in order to get the
>    load address. Two methods for obtaining the load address are described in
>    two subsections below. Generally speaking, the load address does not change
>    between QEMU runs. There are exceptions to this, namely that different
>   -GRUB2 EFI applications can be run at different addresses. Also, its been
>   +GRUB2 EFI applications can be run at different addresses. Also, it has been
>    observed that after running the EFI application for the first time, the
>    second run will some times have a different load address, but subsequent
>    runs of the same EFI application will have the same load address as the
>   -second run. And its a near certainty that if the GRUB EFI binary has 
> changed,
>   +second run. And it's a near certainty that if the GRUB EFI binary has 
> changed,
>    eg. been recompiled, the load address will also be different.
>    
>    This ability to predict what the load address will be allows one to assume
>   @@ -752,7 +752,7 @@ gdb -x gdb_grub -ex 'dynamic_load_symbols @var{address 
> of .text section}'
>    @end example
>    
>    If you load the symbols in this manner and, after continuing execution, do
>   -not see output showing the loading of modules symbol, then its very likely
>   +not see output showing the loading of modules symbol, then it is very 
> likely
>    that the load address was incorrect.
>    
>    Another thing to be aware of is how the loading of the GRUB image by the
>   @@ -760,8 +760,8 @@ firmware affects previously set software breakpoints. 
> On x86 platforms,
>    software breakpoints are implemented by GDB by writing a special processor
>    instruction at the location of the desired breakpoint. This special 
> instruction
>    when executed will stop the program execution and hand control to the
>   -debugger, GDB. GDB will first saves the instruction bytes that will be
>   -overwritten at the breakpoint, and will put them back when the breakpoint
>   +debugger, GDB. GDB will first save the instruction bytes that are
>   +overwritten at the breakpoint and will put them back when the breakpoint
>    is hit. If GRUB is being run for the first time in QEMU, the firmware will
>    be loading the GRUB image into memory where every byte is already set to 0.
>    This means that if a breakpoint is set before GRUB is loaded, GDB will save
> 
>  docs/grub-dev.texi | 224 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 224 insertions(+)
> 
> diff --git a/docs/grub-dev.texi b/docs/grub-dev.texi
> index 31eb99ea2994..72470b42c61a 100644
> --- a/docs/grub-dev.texi
> +++ b/docs/grub-dev.texi
> @@ -79,6 +79,7 @@ This edition documents version @value{VERSION}.
>  * Contributing Changes::
>  * Setting up and running test suite::
>  * Updating External Code::
> +* Debugging::
>  * Porting::
>  * Error Handling::
>  * Stack and heap size::
> @@ -595,6 +596,229 @@ cp minilzo-2.10/*.[hc] grub-core/lib/minilzo
>  rm -r minilzo-2.10*
>  @end example
>  
> +@node Debugging
> +@chapter Debugging
> +
> +GRUB2 can be difficult to debug because it runs on the bare-metal and thus
> +does not have the debugging facilities normally provided by an operating
> +system. This chapter aims to provide useful information on some ways to
> +debug GRUB2 for some architectures. It by no means intends to be exhaustive.
> +The focus will be one x86_64 and i386 architectures. Luckily for some issues
> +virtual machines have made the ability to debug GRUB2 much easier, and this
> +chapter will focus debugging via the QEMU virtual machine. We will not be
> +going over debugging of the userland tools (eg. grub-install), there are
> +many tutorials on debugging programs in userland.
> +
> +You will need GDB and the QEMU binaries for your system, on Debian these
> +can be installed with the @samp{gdb} and @samp{qemu-system-x86} packages.
> +Also it is assumed that you have already successfully compiled GRUB2 from
> +source for the target specified in the section below and have some
> +familiarity with GDB. When GRUB2 is built it will create many different
> +binaries. The ones of concern will be in the @file{grub-core}
> +directory of the GRUB2 build dir. To aide in debugging we will want the
> +debugging symbols generated during the build because these symbols are not
> +kept in the binaries which get installed to the boot location. The build
> +process outputs two sets of binaries, one without symbols which gets executed
> +at boot, and another set of ELF images with debugging symbols. The built
> +images with debugging symbols will have a @file{.image} suffix, and the ones
> +without a @file{.img} suffix. Similarly, loadable modules with debugging
> +symbols will have a @file{.module} suffix, and ones without a @file{.mod}
> +suffix. In the case of the kernel the binary with symbols is named
> +@file{kernel.exec}.
> +
> +In the following sections, information will be provided on debugging on
> +various targets using @command{gdb} and the @samp{gdb_grub} GDB script.
> +
> +@menu
> +* i386-pc::
> +* x86_64-efi::
> +@end menu
> +
> +@node i386-pc
> +@section i386-pc
> +
> +The i386-pc target is a good place to start when first debugging GRUB2
> +because in some respects it's easier than EFI platforms. The reason being
> +that the initial load address is always known in advance. To start
> +debugging GRUB2 first QEMU must be started in GDB stub mode. The following
> +command is a simple illustration:
> +
> +@example
> +qemu-system-i386 -drive file=disk.img,format=raw \
> +    -device virtio-scsi-pci,id=scsi0 -S -s
> +@end example
> +
> +This will start a QEMU instance booting from @file{disk.img}. It will pause
> +at start waiting for a GDB instance to attach to it. You should change
> +@file{disk.img} to something more appropriate. A block device can be used,
> +but you may need to run QEMU as a privileged user.
> +
> +To connect to this QEMU instance with GDB, the @code{target remote} GDB
> +command must be used. We also need to load a binary image, preferably with
> +symbols. This can be done using the GDB command @code{file kernel.exec}, if
> +GDB is started from the @file{grub-core} directory in the GRUB2 build
> +directory. GRUB2 developers have made this more simple by including a GDB
> +script which does much of the setup. This file at @file{grub-core/gdb_grub}
> +of the build directory and is also installed via @command{make install}.

This sentence is definitely missing an "is" or similar, but I'd write
something like:

    This file is at grub-core/gdb_grub in the build directory

> +If not building GRUB, the distribution may have a package which installs
> +this GDB script along with debug symbol binaries, such as Debian's
> +@samp{grub-pc-dbg} package. The GDB scripts is intended to by used

If it's just a single script, this should be:

    The GDB script is intended to by used

> +like so, assuming:

Did you forget to state what the assumption is?

> +
> +@example
> +cd $(dirname /path/to/script/gdb_grub)
> +gdb -x gdb_grub
> +@end example
> +
> +Once GDB has been started with the @file{gdb_grub} script it will
> +automatically connect to the QEMU instance. You can then do things you
> +normally would in GDB like set a break point on @var{grub_main}.
> +
> +Setting breakpoints in modules is trickier since they haven't been loaded
> +yet and are loaded at addresses determined at runtime. The module could be
> +loaded to different addresses in different QEMU instances. The debug symbols
> +in the modules @file{.module} binary, thus are always wrong, and GDB needs
> +to be told where to load the symbols to. But this must happen at runtime
> +after GRUB2 has determined where the module will get loaded. Luckily the
> +@file{gdb_grub} script takes care of this with the 
> @command{runtime_load_module}
> +command, which configures GDB to watch for GRUB2 module loading and when
> +it does add the module symbols with the appropriate offset.
> +
> +@node x86_64-efi
> +@section x86_64-efi
> +
> +Using GDB to debug GRUB2 for the x86_64-efi target has some similarities with
> +the i386-pc target. Please read and familiarize yourself with the 
> @ref{i386-pc}
> +section when reading this one. Extra care must be used to run QEMU such that 
> it
> +boots a UEFI firmware. This usually involves either using the @samp{-bios}
> +option with a UEFI firmware blob (eg. @file{OVMF.fd}) or loading the firmware
> +via pflash. This document will not go further into how to do this as there 
> are
> +ample resource on the web.
> +
> +Like all EFI implementations, on x86_64-efi the (U)EFI firmware that loads
> +the GRUB2 EFI application determines at runtime where the application will
> +be loaded. This means that we do not know where to tell GDB to load the
> +symbols for the GRUB2 core until the (U)EFI firmware determines it. There are
> +two good ways of figuring this out when running in QEMU: use a @ref{OVMF 
> debug log,
> +debug build of OVMF} and check the debug log, or have GRUB2 say where it is
> +loaded. Neither of these are ideal because they both generally give the
> +information after GRUB2 is already running, which makes debugging early boot
> +infeasible. Technically, the first method does give the load address before
> +GRUB2 is run, but without debugging the EFI firmware with symbols, the author
> +currently does not know how to cause the OVMF firmware to pause at that point
> +to use the load address before GRUB2 is run.
> +
> +Even after getting the application load address, the loading of core symbols
> +is complicated by the fact that the debugging symbols for the kernel are in
> +an ELF binary named @file{kernel.exec} while what is in memory are sections
> +for the PE32+ EFI binary. When @command{grub-mkimage} creates the PE32+
> +binary it condenses several segments from the ELF kernel binary into one
> +.data section in the PE32+ binary. This must be taken into account to
> +properly load the other non-text sections. Otherwise, GDB will work as
> +expected when breaking on functions, but, for instance, global variables
> +will point to the wrong address in memory and thus give incorrect values
> +(which can be difficult to debug).
> +
> +The calculating of the correct offsets for sections when loading symbol
> +files are taken care of when loading the kernel symbols via the user-defined

This sentence feels a bit clumsy. I'd write something like:

    Calculating the correct offsets for sections is taken care of
    automatically when loading the kernel symbols via the
    user-defined...

I was originally going to suggest "section offsets" here too, but I'm
not confident that it couldn't potentially mean something else in this
context.

> +GDB command @command{dynamic_load_kernel_exec_symbols}, which takes one
> +argument, the address where the text section is loaded, as determined by

I would personally drop the second comma in "argument, ... as determined
by".

> +one of the methods above. Alternatively, the command 
> @command{dynamic_load_symbols}
> +with the text section address as an agrument can be called to load the
> +kernel symbols and setup loading the module symbols as they are loaded at

"setup" should probably be "set up".

> +runtime.
> +
> +In the author's experience, when debugging with QEMU and OVMF, to have
> +debugging symbols loaded at the start of GRUB2 execution the GRUB2 EFI
> +application must be run via QEMU at least once prior in order to get the
> +load address. Two methods for obtaining the load address are described in
> +two subsections below. Generally speaking, the load address does not change
> +between QEMU runs. There are exceptions to this, namely that different
> +GRUB2 EFI applications can be run at different addresses. Also, it has been
> +observed that after running the EFI application for the first time, the
> +second run will some times have a different load address, but subsequent

"some times" should probably be "sometimes".

> +runs of the same EFI application will have the same load address as the
> +second run. And it's a near certainty that if the GRUB EFI binary has 
> changed,
> +eg. been recompiled, the load address will also be different.
> +
> +This ability to predict what the load address will be allows one to assume
> +the load address on subsequent runs and thus load the symbols before GRUB2
> +starts. The following command illustrates this, assuming that QEMU is
> +running and waiting for a debugger connection and the current working
> +directory is where @file{gdb_grub} resides:
> +
> +@example
> +gdb -x gdb_grub -ex 'dynamic_load_symbols @var{address of .text section}'
> +@end example
> +
> +If you load the symbols in this manner and, after continuing execution, do
> +not see output showing the loading of modules symbol, then it is very likely

Would this make more sense as "showing the module symbols loading"?

> +that the load address was incorrect.
> +
> +Another thing to be aware of is how the loading of the GRUB image by the
> +firmware affects previously set software breakpoints. On x86 platforms,
> +software breakpoints are implemented by GDB by writing a special processor
> +instruction at the location of the desired breakpoint. This special 
> instruction
> +when executed will stop the program execution and hand control to the
> +debugger, GDB. GDB will first save the instruction bytes that are
> +overwritten at the breakpoint and will put them back when the breakpoint
> +is hit. If GRUB is being run for the first time in QEMU, the firmware will
> +be loading the GRUB image into memory where every byte is already set to 0.
> +This means that if a breakpoint is set before GRUB is loaded, GDB will save
> +the 0-byte(s) where the the special instruction will go. Then when the 
> firmware
> +loads the GRUB image and because it is unaware of the debugger, it will
> +write the GRUB image to memory, overwriting anything that was there 
> previously,
> +notably in this case the instruction that implements the software breakpoint.

I would probably split "notably in this case ..." off into its own
sentence.

> +This will be confusing for the person using GDB because GDB will show the
> +breakpoint as set, but the brekapoint will never be hit. Furthermore, GDB
> +then becomes confused, such that even deleting an recreating the breakpoint
> +will not create usable breakpoints. The @file{gdb_grub} script takes care of
> +this by saving the breakpoints just before they are overwritten, and then
> +restores them at the start of GRUB execution. So breakpoints for GRUB can be
> +set before GRUB is loaded, but be mindful of this effect if you are confused
> +as to why breakpoints are not getting hit.
> +
> +Also note, that hardware breakpoints do not suffer this problem. They are
> +implemented by having the breakpoint address in special debug registers on
> +the CPU. So they can always be set freely without regard to whether GRUB has
> +been loaded or not. The reason that hardware breakpoints aren't always used
> +is because there are a limited number of them, usually around 4 on various
> +CPUs, and specifically exactly 4 for x86 CPUs. The @file{gdb_grub} script
> +goes out of its way to not use hardware breakpoints internally and when
> +needed use them as short a time as possible, thus allowing the user to have a

I'd write this as:

    The gdb_grub script goes out of its way to avoid using hardware
    breakpoints internally, and when needed, uses them as briefly as
    possible, thus allowing the user...

> +maximal number at their disposal.
> +
> +@node OVMF debug log
> +@subsection OVMF debug log
> +
> +In order to get the GRUB2 load address from OVMF, first, a debug build
> +of OVMF must be obtained 
> (@uref{https://github.com/retrage/edk2-nightly/raw/master/bin/DEBUGX64_OVMF.fd,
> +here is one} which is not officially recommended). OVMF will output debug
> +messages to a special serial device, which we must add to QEMU. The following
> +QEMU command will run the debug OVMF and write the debug messages to a
> +file named @file{debug.log}. It is assumed that @file{disk.img} is a disk
> +image or block device that is setup to boot GRUB2 EFI.

This "setup" should probably be "set up" as well.

- Oskari

> +
> +@example
> +qemu-system-x86_64 -bios /path/to/debug/OVMF.fd \
> +    -drive file=disk.img,format=raw \
> +    -device virtio-scsi-pci,id=scsi0 \
> +    -debugcon file:debug.log -global isa-debugcon.iobase=0x402
> +@end example
> +
> +If GRUB2 was started by the (U)EFI firmware, then in the @file{debug.log}
> +file one of the last lines should be a log message like:
> +@samp{Loading driver at 0x00006AEE000 EntryPoint=0x00006AEE756}. This
> +means that the GRUB2 EFI application was loaded at @samp{0x00006AEE000} and
> +its .text section is at @samp{0x00006AEE756}.
> +
> +@node Using the gdbinfo command
> +@subsection Using the gdbinfo command
> +
> +On EFI platforms the command @command{gdbinfo} will output a string that
> +is to be run in a GDB session running with the @file{gdb_grub} GDB script.
> +
> +
>  @node Porting
>  @chapter Porting
>  
> -- 
> 2.34.1
>
signature.asc
Description: PGP signature
[Prev in Thread]
Current Thread
[Next in Thread]
[PATCH v3] docs: Add debugging chapter to development documentation, Glenn Washburn, 2023/06/06
- Re: [PATCH v3] docs: Add debugging chapter to development documentation, Daniel Kiper, 2023/06/12
- Re: [PATCH v3] docs: Add debugging chapter to development documentation, Oskari Pirhonen <=
  - Re: [PATCH v3] docs: Add debugging chapter to development documentation, Daniel Kiper, 2023/06/15
    - Re: [PATCH v3] docs: Add debugging chapter to development documentation, Oskari Pirhonen, 2023/06/15
Prev by Date: Re: [PATCH v3 5/5] loongarch: Use the -mno-relax cflags for gcc
Next by Date: Re: [PATCH v3] docs: Add debugging chapter to development documentation
Previous by thread: Re: [PATCH v3] docs: Add debugging chapter to development documentation
Next by thread: Re: [PATCH v3] docs: Add debugging chapter to development documentation
Index(es):
- Date
- Thread