The smallest Hello World program
January 2, 2025

The smallest Hello World program

So, initially, I just wanted to see what the minimum binary size of a “Hello World” program written in Rust was. Why? Just out of curiosity – it’s probably just a simple compiler flag, right? Well, it turns out that some help, but you need to do more work to get the truly minimal binary. Most of them don’t even have anything to do with Rust! Of course, there are many disadvantages when optimizing a minimal executable, but there are some valid use cases where space or transfer size matters.

As a first step, I’d like to know what the lowest general limits are for a “Hello World” program. To gain maximum control and ensure there is no compiler overhead, I will develop it using a combinational language. With this baseline, I can compare the generated binaries to future binaries written in Rust (or even Zig and C).

Let’s first create some rules for the “Hello World” program:

  • The program must be executable on any modern 64-bit x86 Linux machine

  • It should be able to be executed directly without being passed to any other program first (so no need to decompress it)

  • According to the specification, it should be a “correct” executable binary

  • It should print “Hello World” to standard output and exit with the code 0 (success)

  • Performance isn’t important but it should display text quickly

Now, writing the x86 component: a normal “Hello World” program is actually not as simple as it sounds, because we need to interact with the operating system to print to the terminal. We can write our own system calls, but usually developers use libc call printf Function. However, since we are seeking the smallest binary, this would not be an option because printf Actually not just printing to stdout we must link to libc This creates a lot of overhead!

This is the smallest component I can think of:

msg: db 'Hello, World!', 0xA

global _start
_start:
    mov rax, 1         ; syscall: sys_write
    mov rdi, 1         ; file descriptor: stdout
    lea rsi, [rel msg] ; pointer to message
    mov rdx, 14        ; message length
    syscall

    mov rax, 60        ; syscall: sys_exit
    xor edi, edi       ; exit code 0
    syscall

To give a brief explanation: First, we statically write the bytes of the null-terminated “Hello World” string into the assembly. We expose the entry point of the application to the ELF interface by defining tags (we will learn about this later) _start as global.

To actually print something, we call sys_write System calls. This is defined in the Linux source code here as:

SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf, size_t, count)
{
    return ksys_write(fd, buf, count);
}

So it’s defined by some C macro and just calls another helper function. The signature will expand to the following:

long sys_write(unsigned int fd, const char __user *buf, size_t count);

The first parameter identifies the file descriptor (stdout In our case, that is 1). The second is a pointer to the data; finally, we get the length of the data.

We still need the final system call to exit cleanly using code 0;Then we’re done. I avoid using the .data section as this will introduce extra section headers, metadata and alignment bytes.

To assemble this, I use NASM assembler and link to ld:

$ nasm -f elf64 -o hello.o hello.asm 
$ ld -o hello hello.o                
$ chmod +x hello                     

Now let’s see if it works and print the dimensions:

$ ./hello
Hello World!
$ wc -c hello 
4728 hello

So 4728 bytes. Not bad, right? But can we make it smaller?

Well, first we can use a 32-bit architecture because instructions use fewer bytes to encode, pointers only use 4 bytes, and binary files are 4-byte aligned (instead of 8 bytes Group). I tried it and it dropped the binary size to 4548 bytes, which is much smaller! But remember the rules? We limit ourselves to modern 64-bit architectures!

But why do we need so many bytes to do such a simple thing in the first place? Remember the two commands we used to build the executable? Let’s print the dimensions after each step:

$ wc -c hello.o 
640 hello.o
$ wc -c hello   
4728 hello

Wait a moment? Why does our binary file become more than 7 times larger during the linking process? ld? Why do we still need connections?

In short: GNU linker (ld) takes one or more target files (for example, hello.o) are generated by the assembler and combine them into executable binaries or shared libraries (.so). It parses symbols (such as _start tag) and hardcodes its final memory address in the binary file. It will also move some addresses and add zero bytes to optimize memory layout. If we want to use shared libraries, it will set those up for us as well. Additionally, debugging symbols are generated to make debugging possible. Finally, it generates the application entry point so that the system can execute it directly. Since this process involves understanding the assembly and moving a lot of stuff around, optimizing the input target file (removing padding/alignment bytes or symbols) will not reduce the file size of the final executable, but may even break the linking process – trust me, I tried.

Instead, we can try to remove some information from the binary file. For example, symbols will help us debug our application – but we don’t need it because our code will always work perfectly on the first try. Let’s take a look at them:

$ nm hello
0000000000402000 T __bss_start
0000000000402000 T _edata
0000000000402000 T _end
0000000000401000 t msg
000000000040100e T _start

Some of these may sound familiar from our assembly code, others are built-in. use strip We can get rid of them:

$ wc -c hello 
4728 hello
$ strip --strip-all hello
$ wc -c hello
4352 hello
$ ./hello
Hello World!

Down to 4352 bytes! So we deleted a bunch of stuff and the executable still works fine.

This is not bad at all, but in order to go further we have to understand each byte of the binary file. The binary file format on Linux is called ELF. But who is this magic elf🧝? It stands for “Executable and Linkable Format” and describes the format in which binary files are assembled. You can read the specification here But I’ll sum it up quickly. The specification contains this nice diagram that describes the layout of our previous binary archive ( hello.o file) and linked:

Therefore, although the small binary file before the link also follows the ELF format, most of the information is only added to the executable file. We’ll look at titles in more detail later. The program header describes the portion of memory to be loaded at runtime. Section header describes static data (.text, .data part). These sections define what needs to be loaded into memory for execution. This picture The execution view from Wikipedia visualizes it well, but you may need to zoom in.

We can assemble the executable without ELF format into a so-called “flat binary file” and get only the bytes of the hello-world code:

$ nasm -f bin -o hello.o hello.asm
$ wc -c hello.o
47 hello.o
$ ld -o hello hello.o               
ld:hello.o: file format not recognized; treating as linker script
ld:hello.o:1: syntax error

So the original assembly binary file is now 47 bytes. However, we can no longer link and execute it within the operating system. This is because we don’t have an ELF header anymore! This is very useful if we want to build our own BIOS or system kernel.

But in this case, we still need an executable file! Since we are not relying on the linker or any other features of the ELF format in this case, we can build the ELF header ourselves. Using assembly we can write the required bytes directly into the program code:

Now assemble and execute it:

$ nasm -f bin -o elf elf.asm; chmod +x elf
$ ./elf
Hello World!
$ wc -c elf
167 elf

Great! It took me too long to get it working, but I can recommend the following documentation Wikipedia If you want to try it yourself (or just copy my title).

Finally, we are now down to 167 bytes! For 32-bit architecture, I reduced this to 129 bytes. Now, there are ways we can lower this number even further, but I think they violate our “per spec” requirement. For example, not all ELF header bytes are actually used (or the system may not care), so we can start our program early by reusing some ELF header bytes, as demonstrated by Brain Raiter here. Through this technique, we can get it to a size that may be less than 100 bytes.

What a journey! If you want to really dig into executable files, I can recommend the blog series “Making our own executable packer” by fasterthanlime. Have fun!

2024-12-30 16:23:04

Leave a Reply

Your email address will not be published. Required fields are marked *