p0: Running Rust code on RISC-V in QEMU
Table of Contentsš Overview
The goal for this project is to run some code on a RISC-V virtual machine and start learning how to debug it. Key terms are bolded, and you may want to search online for more information about them.
š ļøĀ Setting up the toolchain
The Rust standard toolchain already supports RISC-V, so we can use rustc to cross-compile. In a command prompt, install the nightly toolchain of Rust and add RISC-V as a target.
# install Rust - https://rustup.rs
$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# install nightly toolchain and make a new rust project in ./p0
$ rustup run nightly cargo new --name kernel p0
$ cd p0
# from now on, all paths and commands will be relative to the root of p0
# set the nightly toolchain as the default for this project
$ rustup override set nightly
$ tree
.
āāā Cargo.toml
āāā src
āāā main.rs
$ cat Cargo.toml
[package]
name = "kernel"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
$ cat src/main.rs
fn main() {
println!("Hello, world!");
}
Cargo is the Rust package manager and helpfully provides wrappers around most of the Rust toolchain, so cargo will be your main point of contact with Rust. A Rust package (described by a Cargo.toml manifest file) is a collection of files that provide one or more crates. A crate is either a library or an executable program, referred to as either a library crate or a binary crate, respectively. A target is a platform that you want your code to run on. Crates are usually inferred automatically based on the layout of the project ā e.g., if there is a file called src/main.rs , Cargo assumes you wanted to add a binary crate using main.rs and all of its dependencies.
This can be confusing, and it gets worse when we introduce modules, so itās worth spending some time playing around with cargo to understand the difference between packages, targets, and crates.
āļøĀ Compiling a program for RISC-V
Letās try compiling the default binary target.
$ cargo build
Compiling kernel v0.1.0 (.../p0)
Finished dev [unoptimized + debuginfo] target(s) in 0.15s
Well, that works! I guess weāre done.
Letās take a look at the binary executable Rust outputted at target/debug/kernel:
$ file target/debug/kernel
target/debug/kernel: Mach-O 64-bit executable arm64
Oops. Your output might be different, but unless you are running on a 32-bit RISC-V machine, it is unlikely to be RISC-V. You may have noticed the package manifest never mentions RISC-V. Rust assumes that every crate can be compiled for every target. Thatās true for most crates, but not for an operating system kernel, so we need to tell the Rust toolchain that it should build a binary for a different target than the one the compiler is running on (a process called cross-compiling).
We need to create an additional configuration file, called .cargo/config.toml (note the leading dot), which tells cargo how to compile and run for each target. These options are separate because they can change depending on your development environment. Letās go ahead and create that file:
# .cargo/config.toml
[build]
target = "riscv32imac-unknown-none-elf"
Weāre telling the Rust compiler to target rv32imac, as advertised. The value of target is commonly referred to as the triple, though the astute reader may notice that the triple does not always contain three values. Our triple mentions ELF: that will be important later.
So now we should be good to go! Right?
$ cargo run
Compiling kernel v0.1.0 (.../p0)
error[E0463]: can't find crate for `std`
|
= note: the `riscv32imac-unknown-none-elf` target may not support the standard library
= note: `std` is required by `kernel` because it does not declare `#![no_std]`
= help: consider building the standard library from source with `cargo build -Zbuild-std`
# ... more errors, but we will tackle them one at a time
Ah, right. Rust programs by default are compiled together with the Rust standard library (std). Rust doesnāt provide a pre-compiled std crate for RISC-V, and even if it did, std uses many functionalities that are dependent on the operating system (e.g. std::thread). We canāt use the operating system to implement the operating system!
Weāll have to modify our src/main.rs to tell Rust we donāt want the standard library. Just like the compiler suggested, we can add #![no_std] to the top. #![no_std] is a crate-level attribute which changes the way the crate is compiled. In general, you will see # used to denote compiler options (similar to its use for preprocessor directives in C, but much more limited).
Weāll also have to remove println! (for now) because it is defined by the standard library. Letās replace it with a simple loop for now.
// src/main.rs
#![no_std]
fn main() {
loop {}
}
And if we try running again
$ cargo run
Compiling kernel v0.1.0 (.../p0)
error: `#[panic_handler]` function required, but not found
error: could not compile `kernel` due to previous error
We need a panic handler! This is the function that gets called when the program invokes the panic! macro. The standard library typically provides one, but since weāve opted out of the standard library, we will need to provide our own.
#![no_std]
// "import" PanicInfo from [core]
use core::panic::PanicInfo;
fn main() {
loop {}
}
#[panic_handler]
fn on_panic(info: &PanicInfo) -> ! {
loop {}
}
Letās discuss the on_panic functionās signature.
#[panic_handler]is an attribute ā an instruction to the compiler ā in this case, to callon_panicwhen any code uses thepanic!macro.infois its only argument. The type ofinfois an immutable reference to aPanicInfostruct.-> !meanson_panicnever returns.
We also had to add a use statement to make PanicInfo available in this scope (the whole file). Alternatively, we could also have fully-qualified the type of info: &core::panic::PanicInfo.
Ok, back to the compiler:
$ cargo run
Compiling kernel v0.1.0 (.../p0)
error: requires `start` lang_item
error: could not compile `kernel` due to previous error
Weāre missing another thing: somewhere for the program to start! It turns out that main is not actually the entry point for most programs ā if you compile a program with std, there is a (very small) amount of setup that needs to happen before main. The real, true, actual entry point is typically called _start (or some variation of that) and is usually provided by the standard library or compiler for whatever language you are using. In C, this is usually crt0.
In Rust (and most higher-level languages), the compiler changes (āmanglesā) the name of every function to make it unique across the whole crate. Letās rename main to _start and tell the Rust compiler not to mangle the name ā there really should only be one _start in the crate!
#[no_mangle]
fn _start() {
loop {}
}
Then the compiler will complain that thereās no main function. As it recommends, we can fix that by adding #![no_main] to the top of the file.
Finally, if we run cargo build everything should succeed.
$ cargo build
Compiling kernel v0.1.0 (.../p0)
Finished dev [unoptimized + debuginfo] target(s) in 0.17s
$ file target/riscv32imac-unknown-none-elf/debug/kernel
target/riscv32imac-unknown-none-elf/debug/kernel: ELF 32-bit LSB executable,
UCB RISC-V, RVC, soft-float ABI, version 1 (SYSV), statically linked,
with debug_info, not stripped
Nice! Thereās a lot of information here, but the important parts are ELF 32-bit, RISCV, and statically linked. Weāve successfully compiled a freestanding Rust executable for our target architecture.
šĀ Running the program
If we try running now, weāll get a different output:
$ cargo run
target/riscv32imac-unknown-none-elf/debug/kernel: cannot execute binary file
Oops again. Weāre (probably) not working on a RISC-V machine, so we canāt even run the program directly. We need to emulate a RISC-V machine, and to do so we turn to our friend, QEMU.
Unlike other virtualisation softwares you might be familiar with (Docker), QEMU is a whole-system emulation framework. Amazingly, it is actually decently performant (and more than fast enough for our use-case). It is not a physical simulation ā it does not pretend to run circuits ā but it gives us the illusion of running code on ābare-metalā hardware while still allowing us to debug comfortably from our local computer.
We need to tell Cargo to run our program using QEMU. Add a runner to your .cargo/config.toml file:
# .cargo/config.toml
# ... from before
[target.riscv32imac-unknown-none-elf]
runner = """ qemu-system-riscv32
-cpu rv32
-machine virt
-m 150M
-s
-nographic
-bios """
Weāll add more flags as our kernel becomes more complete, but weāll start with the basics:
-cpu rv32means QEMU will emulate a 32-bit processor.-machine virttells QEMU which platform to emulate.-m 150Mmeans we want to emulate 150MB of physical RAM.-senables remote debugging (weāll see why this is useful shortly).-nographicdisables the QEMU graphic display (we might eventually add VGA output, at which point weāll need to remove this to see whatās happening).-biostells QEMU to skip the bootloader and just run our program directly on startup (as if it were a BIOS). We could also use-kernelto have QEMU run a full bootloader (which a real operating system like Linux would require), but that process is more complicated and unnecessary as weāre just getting started.
$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `qemu-system-riscv32 -cpu rv32 -machine virt -m 150M -bios target/riscv32imac-unknown-none-elf/debug/kernel`
⦠and then it hangs. Did it work? How do we know whatās happening?
š”Ā What happens when the system starts
Boom. The switch is flipped. Current is flowing through the processor. What happens? The (full) answer is complicated and dives deep into microarchitectural components that arenāt super interesting from an operating systems perspective, so weāll refine our question: when the processor starts execution, what is its initial state?
The answer depends on the platform, but for a virt machine the program counter is initially set to point to the reset vector (by default, address 0x1000). QEMU pretends like the bootloader loads our program into memory (notice that we havenāt actually configured a hard drive yet, so this isnāt quite realistic, but itās a useful simplification for now) and then jumps to the start of physical memory at 0x8000_0000.
But donāt take my word for it. We can use gdb (the GNU debugger) to step through the process one instruction at a time!
Youāll need to add -S (capitalised) to the QEMU run configuration to make QEMU wait to run until the debugger is connected. You can do this from the cargo run command:
$ cargo run -- -S
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `qemu-system-riscv32 -cpu rv32 -machine virt -m 150M -bios target/riscv32imac-unknown-none-elf/debug/kernel -S`
Everything after -- gets added to the runner command, so we just appended -S right after the -bios [target] flag. The output looks the same, but QEMU is actually paused, waiting for us to debug it.
Next, weāll open gdb in another terminal:
$ gdb
...
$ (gdb) file target/riscv32imac-unknown-none-elf/debug/kernel
Reading symbols from target/riscv32imac-unknown-none-elf/debug/kernel...
$ (gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x00001000 in ?? ()
Weāre in the debugger now, connected to QEMU, and we can see the program counter. We can print any registers we want, look at any place in memory, etc.
Letās ask QEMU to print the code in memory wherever we are.
$ (gdb) x/6i $pc
=> 0x1000: auipc t0,0x0
0x1004: addi a2,t0,40
0x1008: csrr a0,mhartid
0x100c: lw a1,32(t0)
0x1010: lw t0,24(t0)
0x1014: jr t0
So weāre loading some values, then jumping to the address in t0. Letās step forward until we are about to jump:
(gdb) x/6xi 0x1000
=> 0x1000: auipc t0,0x0
0x1004: addi a2,t0,40
0x1008: csrr a0,mhartid
0x100c: lw a1,32(t0)
0x1010: lw t0,24(t0)
0x1014: jr t0
(gdb) si
0x00001004 in ?? ()
(gdb) si
0x00001008 in ?? ()
(gdb) si
0x0000100c in ?? ()
(gdb) si
0x00001010 in ?? ()
(gdb) si
0x00001014 in ?? ()
(gdb) x/i $pc
=> 0x1014: jr t0
(gdb) p/x $t0
$1 = 0x80000000
Great, so weāre jumping to 0x80000000. And if we look at the instructions there, we should find our kernel!
(gdb) x/10i $t0
0x80000000: unimp
0x80000002: unimp
0x80000004: unimp
0x80000006: unimp
0x80000008: unimp
0x8000000a: unimp
0x8000000c: unimp
0x8000000e: unimp
0x80000010: unimp
0x80000012: unimp
Whelp⦠(you can use q to exit GDB, and C-a x to exit QEMU).
šĀ Linking the program
What went wrong?
So far, weāve assumed that the standard compiler and linker settings will Just Work for our kernel. Clearly weāll need to put in a bit more effort to make it work. Letās start by understanding the problem.
How does QEMU know where the various parts of our kernel program should live in memory? Weāve discovered that _start isnāt loaded at 0x8000_0000 and we started running some random garbage instead.
ELF, or Executable and Linkable Format, is a highly flexible type of file that encodes all of a program’s data. This includes the code required for execution, static and constant variables, references to libraries, and addressing information. We’ll explore ELF in more detail later, but for now, it’s enough to know that the output of the linker for a RISC-V binary target is an ELF file that QEMU reads to understand the kernel’s memory layout. In a non-virtualized system, a bootloader in ROM would handle this at runtime, so by the time execution reaches our kernel, it has been loaded into memory.
Letās poke around our kernelās ELF file and see whatās happening. To do so, weāll use readelf, a very useful program with a ton of functionality and options.
$ readelf -l target/riscv32imac-unknown-none-elf/debug/kernel
Elf file type is EXEC (Executable file)
Entry point 0x110b4
There are 4 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x00010034 0x00010034 0x00080 0x00080 R 0x4
LOAD 0x000000 0x00010000 0x00010000 0x000b4 0x000b4 R 0x1000
LOAD 0x0000b4 0x000110b4 0x000110b4 0x00004 0x00004 R E 0x1000
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0
Section to Segment mapping:
Segment Sections...
00
01
02 .text
03
If we look at the PhysAddr column, everything is positioned at pretty small addresses. We know thatās not correct. Because we have some very specific requirements for memory layout, we need to write a linker script that specifies where everything should be. Linker scripts can be intimidating, so letās step through one together. Create a file called src/script.ld and add:
# src/script.ld
OUTPUT_ARCH("riscv")
ENTRY(_start)
Weāre linking for RISC-V, and the programās entry point is _start.
# ...
MEMORY {
ram (wxa) : ORIGIN = 0x80000000, LENGTH = 128M
}
Thereās one region of memory (as far as the linker is concerned). It starts at 0x8000_0000 and is 128MB long. Notice that this is less than the amount we requested from QEMU; thatās OK! And wxa means RAM can be written, executed, and accessed once the program is loaded.
# ...
PHDRS {
text PT_LOAD;
data PT_LOAD;
bss PT_LOAD;
}
PHDRS stands for āprogram headers.ā There are three kinds of headers of our program: text is code, data is global variables and constant values (like strings), and bss is any global value that is initialised to zero. (Having a bss is a common optimisation since programs typically have a lot of initially-zero values). These program headers (also called segments) are the chunks that are actually loaded into memory at runtime. The compiler will choose to put every section of data into one of these segments. We need to tell the linker the relationship between sections and segments:
# ...
SECTIONS {
. = ORIGIN(ram); # start at 0x8000_0000
.text : { # put code first
*(.text.init) # start with anything in the .text.init section
*(.text .text.*) # then put anything else in .text
} >ram AT>ram :text # put this section into the text segment
PROVIDE(_global_pointer = .); # this is magic, google "linker relaxation"
.rodata : { # next, read-only data
*(.rodata .rodata.*)
} >ram AT>ram :text # goes into the text segment as well (since instructions are generally read-only)
.data : { # and the data section
*(.sdata .sdata.*) *(.data .data.*)
} >ram AT>ram :data # this will go into the data segment
.bss :{ # finally, the BSS
PROVIDE(_bss_start = .); # define a variable for the start of this section
*(.sbss .sbss.*) *(.bss .bss.*)
PROVIDE(_bss_end = .); # ... and one at the end
} >ram AT>ram :bss # and this goes into the bss segment
}
Thereās a lot of magic here and you should take some time to play around with it and understand the effect this has.
Next, we need to tell the compiler to use this linker script. We could modify the .cargo/config.toml file, or we can write a build.rs script (the details of both are not super important). Weāll be doing the latter. Create a new file in the root directory (not in ./src!) called build.rs:
// build.rs
fn main() {
// Use the linker script.
println!("cargo:rustc-link-arg=-Tsrc/script.ld");
// Don't do any magic linker stuff.
println!("cargo:rustc-link-arg=--omagic");
}
Now letās look at the ELF file:
$ cargo build
Compiling kernel v0.1.0 (.../p0)
Finished dev [unoptimized + debuginfo] target(s) in 0.19s
$ readelf --segments target/riscv32imac-unknown-none-elf/debug/kernel
Elf file type is EXEC (Executable file)
Entry point 0x80000000
There is 1 program header, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000094 0x80000000 0x80000000 0x00004 0x00004 R E 0x2
Section to Segment mapping:
Segment Sections...
00 .text
We only have one segment at the moment because our Rust program is so simple, it has no global variables or constants. Just one function, which does nothing ā hence one ret statement which takes 4 bytes. So it looks like the linker is working!
Now, if we run QEMU and GDB it again, we should see something different.
$ gdb target/riscv32imac-unknown-none-elf/debug/kernel
...
Reading symbols from target/riscv32imac-unknown-none-elf/debug/kernel...
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x00001000 in ?? ()
(gdb) si
0x00001004 in ?? ()
(gdb) si
0x00001008 in ?? ()
(gdb) si
0x0000100c in ?? ()
(gdb) si
0x00001010 in ?? ()
(gdb) si
0x00001014 in ?? ()
(gdb) si
kernel::_start () at src/main.rs:10
10 loop {}
(gdb) si
0x80000002 10 loop {}
(gdb) si
0x80000002 10 loop {}
(gdb) si
0x80000002 10 loop {}
Nice! So the debugger is showing us that we are running our Rust code! Mission accomplished!
š©āš§Ā Fixing some mistakes
There are two (subtle) correctness issue with our implementation so far. When the bootloader jumps to 0x8000_0000, it is essentially making a function call in the standard C calling convention (see the RISC-V manual chapter on āCalling Conventionā). Rust uses a different (and currently, not formally specified) calling convention. We got lucky that for a very simple function like _start, they appear to generate the same code, but we need to tell the compiler that _start should be callable using the C calling convention before we add much else. Change the signature of _start to:
#[link_section = ".text.init"]
extern "C" fn _start() -> ! {
This means that _start is callable from external C code, and that it should never return.
EXERCISE: Run the program in GDB and set a breakpoint on `_start`. What is the value of the `sp` register at that point? (Click for answer.)
$ gdb target/riscv32imac-unknown-none-elf/debug/kernel
Reading symbols from target/riscv32imac-unknown-none-elf/debug/kernel...
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x00001000 in ?? ()
(gdb) break _start
Breakpoint 1 at 0x80000000: file src/main.rs, line 16.
(gdb) continue
Continuing.
Breakpoint 1, kernel::_start () at src/main.rs:16
16 asm!(
(gdb) p/x $sp
$1 = 0x0
EXERCISE: What happens if you attempt to dereference `0x0` in GDB? What about `0x9000_0000`? Why? (Click for answer.)
(gdb) x 0x90000000
0x90000000: Cannot access memory at address 0x90000000
(gdb) x 0x0
0x0: Cannot access memory at address 0x0
Second, we donāt have a stack! Thatās bad, because the C calling convention requires sp point to a valid (and aligned) stack. So before we can enter proper C code, we need to set up a stack.
While weāre setting up, we only need a relatively small initialisation stack. Once we have some fancier data structures and a better understanding of the runtime environment, we can migrate to a bigger stack somewhere else.
We can ask the linker to reserve a small amount of space separate from the code or data for our stack. Add the following to src/script.ld:
SECTIONS {
# ... everything from before
# . is now at the end of all code/data
. = ALIGN(16) # our stack needs to be 16-byte aligned, per the C calling convention
PROVIDE(_init_stack_top = . + 0x1000) # reserve 0x1000 bytes for the initialisation stack
}
The linker will now provide a symbol, _init_stack_top, which we can use to refer to the initialisation stack. Letās use this to set our sp as soon as we enter _start. Weāll have to drop into assembly to mess with registers:
// src/main.rs
#[no_mangle]
extern "C" fn _start() {
use core::arch::asm;
asm!("la sp, _init_stack_top");
loop {}
}
asm! tells the Rust compiler to emit one (or more) assembly instructions in place. It is extremely powerful, but more restrictive than plain assembly, and we sometimes need to tell Rust what kinds of things our assembly can do. You can refer to https://doc.rust-lang.org/reference/inline-assembly.html for specifics.
If we try to compile, we get an error:
$ cargo build
Compiling kernel v0.1.0 (.../p0)
error[E0133]: use of inline assembly is unsafe and requires unsafe function or block
--> src/main.rs:9:5
|
9 | asm!("la sp, _init_stack_top");
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ use of inline assembly
|
= note: inline assembly is entirely unchecked and can cause undefined behavior
For more information about this error, try `rustc --explain E0133`.
Weāve reached our first unsafe! Iāll spare you the extended discussion of Undefined Behavior ā¢ļøĀ (a favourite topic of Rustaceans) and give you some suggested reading:
- https://doc.rust-lang.org/nomicon/meet-safe-and-unsafe.html
- https://doc.rust-lang.org/nomicon/what-unsafe-does.html
- https://doc.rust-lang.org/nomicon/working-with-unsafe.html
We could add the unsafe block around asm!, but this still does not solve our violation of the C calling ABI, since sp would still not be valid at the moment QEMU jumps to _start. The compiler could still try to insert instructions before our asm! block to allocate space on the stack.
Instead, we will mark _start as an unsafe, naked function and make another function, called entry, which is our ārealā entry point into Safe Rust. _start will do as little work as possible to establish the invariants that the C calling convention requires before jumping to entry.
We’ll also use the #[link_section] function attribute to force the linker to place _start in the .text.init section, right at the top of our :text segment.
// src/main.rs
#![feature(naked_functions)] // we need to use a new feature!
// ... from before
#[naked]
#[no_mangle]
#[link_section = ".text.init"]
unsafe extern "C" fn _start() -> ! {
use core::arch::asm;
asm!(
// before we use the `la` pseudo-instruction for the first time,
// we need to set `gp` (google linker relaxation)
".option push",
".option norelax",
"la gp, _global_pointer",
".option pop",
// set the stack pointer
"la sp, _init_stack_top",
// "tail-call" to {entry} (call without saving a return address)
"tail {entry}",
entry = sym entry, // {entry} refers to the function [entry] below
options(noreturn) // we must handle "returning" from assembly
);
}
extern "C" fn entry() -> ! {
loop {}
}
If you debug the program now, you should see it enter entry with a valid stack pointer. Now weāre done! Thatās a lot for one blog post, so next time weāll try to get a message to print to the QEMU console.