Manually compiling ESP8266 applications

29 Nov 2019 - tsp
Last update 11 Dec 2019
Reading time 23 mins

Not using FOTA
Building FOTA binaries

Disclaimer: There is no guarantee for this information to be complete (it’s highly likely that there is something missing out). It’s just enough to build the applications that I’m associated with or do on my own and has been reconstructed by reading information from various sources. This post does in no way claims to be correct (even though I personally think for myself it is).

The following post emerged out of the curiosity on what the compilcated Makefiles and scripts of the ESP8266 SDK really do during the build process of an ESP8266 (or ESP32) application. Suprisingly (or in reality not so suprisingly) there is not much magic in the whole process. Basically it’s just a bunch of steps to collect all binary data sections that are flashed into the flash memory.

If one wants to do fast and simple experiments with ESP8266 using a finished prototyping board like the NodeMCU Amica that’s based on the ESP-12E component board is a nice and fast solution (note: Link is an Amazon affilate link, this pages author profits from purchases)

There are two methods supported by the SDK dependent on the decision if the FOTA bootloader of espressif is used or not. The FOTA bootloader is the easiest way of providing over the air firmware upgrades. It periodically polls espressifs cloud, checks if there is a new firmware revision available, downloads it and flashes it into the flash memory. The bootloader then selects the specific image to load, shadows the required regions into RAM, performs the mapping of flash into the adress space and executes the code.

On the other hand one can decide to not use espressifs FOTA bootloader - or use an own bootloader. When not using the bootloader there will be two files generated by the build process. These are eagle.flash.bin which will get copied into instruction RAM by the (unmodifyable) ROM bootloader on bootup and the irom0text.bin, which is just mapped into the adress space but not copied into RAM. Note that irom0text.bin has nothing to do with the real ROM of the ESP8266. This flash section might be way larger than the RAM region - but accesses are about 10 times slower and all content is read only. It’s the perfect region to fit string constants, etc. that are not required very often - or non timing critical functions.

One might also use a different bootloader (for example an own bootloader, arduinos eboot or one of the other available bootloaders). In this case the basic idea of building flash images is the same - linker parameters and adresses will be differen though.

It’s also possible to use remaining flash regions for other stuff like SPIFFS partitions or other filesystems. Do not forget to register them inside the partition table that gets passed to system_partition_table_regist inside your user_pre_init callback - if you don’t supply a correct partition table layout the ROM library might run havok or enter some kind of undefined behaviour.

Not using FOTA

In this case the build process is straight forward:

Build the users code object files from sources in the user directory and pack them into a library libuser.a
Do the same with the drivers inside driver directory and pack them into a libdriver.a
Link them into a single ELF out file (using one of the predefined linker scripts) to calculate adresses.
Generate the dump and assembly listing by using objdump
Dump all required sections into separate binary files using objcopy
Dump a list of all symbols into a sym file by using nm (this is already done inside the gen_appbin.py Python script)
The most complicated step: Use all supplied information to build the flash image files. To do that the gen_appbin.py file locates all required symbols and generates flash image file headers. After that all headers and binary section data is concatenated into a single binary file (one to be copied into the iram0 section, one to be mapped into the address space, the irom0 section).

Building user and driver code

To build user and driver code the gcc compiler from the xtensa toolchain is used. For example to build user_main.c and store it in the .output/eagle/debug/obj/user_main.o subdirectory of the user folder the toolchain calls

xtensa-lx106-elf-gcc
  -Os
  -g
  -Wpointer-arith
  -Wundef
  -Wl,-EL
  -fno-inline-functions
  -nostdlib
  -mlongcalls
  -mtext-section-literals
  -ffunction-sections
  -fdata-sections
  -fno-builtin-printf
  -fno-guess-branch-probability
  -freorder-blocks-and-partition
  -fno-cse-follow-jumps
  -DICACHE_FLASH
  -DSPI_FLASH_SIZE_MAP=6
  -I include -I ./ -I ../../include/ets -I ../include
  -I ../../include -I ../../include/eagle -I ../../driver_lib/include
  -o .output/eagle/debug/obj/user_main.o
  -c
  user_main.c

The serve the following purpose:

-Os enabled the optimizer and instructs it to optimize for size instead of speed
-g embedds debug data into the generated binary
-Wpointer-arith warns about sizeof usage thats undefined according to the C specification (like sizeof a function type or a void object). In C++ mode it also warns about calculations involving NULL. Note that these operations violate the C/C++ specifications so they shouldn’t be used anyways.
-Wundef warns whenever an undefined identifier is located inside a conditional preprocessor statement like #if.
-Wl,-EL selects little endian output for the linker.
-fno-inline-functions disabled automatic inlineling by the optimizer which is normally only done with -O2 or higher
-nostdlib disables the usage of the standard c library
-mlongcalls will translate direct calls into indirect calls at the assembly stage in case it cannot be guaranteed that the call target is in range of the call.
-mtext-section-literals instructs the compiler to put literals into the text section (and not into other constant sections) to keep references as local as possible. Literals get moved into the vicinity of functions that reference them whenever possible.
-ffunction-sections enabled the generation of a separate function section for each function inside the generated code. This allows dead code eliminiation (DCE) to remove all unnecessary functions that are never called or references.
-fdata-sections enabled the generation of a separate data section for each global variable in the sourcefile. This allows the linker to discard variables that will never get references (like dead code elimination).
-fno-builtin-printf disables the optimizer to translate printf statements into more direct output functions. In case the compiler knows the used C library this can reduce the amount of parsing of the pattern string. This is always required when one replaces the standard printf function or doesn’t use the standard C library.
-fno-guess-branch-probability disabled the static branch predition optimizer.
-freorder-blocks-and-partition instructs the optimizer to try to re-arange code blocks to keep jumps more locally. To do that code is separated into hot and cold basic blocks that get rearranged inside their sections appropriatly.
-fno-cse-follow-jumps prevents the common subexpression elimination (CSE) optimizer to follow jumps into different code regions.
-DICACHE_FLASH
-DSPI_FLASH_SIZE_MAP=6 passes a preprocessor definition for the SPI flash size map that’s used. 6 would select the 4096 MByte flash map supported by the espressif SDK.
-I adds include directories to the preprocessors search path.
-o supplies to output object file to write into.

This call generates the ELF object files for each and every source file (separately). After that all object files get packed into a single object file archive by using ar:

xtensa-lx106-elf-ar ru .output/eagle/debug/lib/libuser.a .output/eagle/debug/obj/user_main.o

In this case the user_main.o gets inserted or replaced (r) into the archive libuser.a. To insert only files that are newer than existing members of the archive file the u flag gets supplied. This method is selected to keep the same libuser.a over sucessive build steps and only update changed files. Of course that means that - without a clean operation - old object files are kept inside the archives. These do not end up inside the final binary due to dead code and unreferenced section removal by the linker later on.

Driver code is generated exactly the same way. There is no real difference between drivers and user code anyways. This is just a decision made by the SDK authors.

Linking user and driver code

In the next step both driver and user object file libraries as well as all required runtime libraries supplied by the SDK will be linked into a single ELF object file.

This normally ends up inside the .output subdirectroy relative to the applications root directory.

xtensa-lx106-elf-gcc
  -L../lib -nostdlib
  -T../ld/eagle.app.v6.ld
  -Wl,--no-check-sections
  -Wl,--gc-sections
  -u call_user_start
  -Wl,-static
  -Wl,--start-group
  	-lc -lgcc -lhal -lphy -lpp -lnet80211
  	-llwip -lwpa -lcrypto -lmain -ljson
  	-lupgrade -lssl -lpwm -lsmartconfig
  	user/.output/eagle/debug/lib/libuser.a
  	driver/.output/eagle/debug/lib/libdriver.a
  -Wl,--end-group
  -o .output/eagle/debug/image/eagle.app.v6.out

-L supplies additional search paths for libraries
-nostdlib prevents linking against the standaard C library
-T is the most essential option here. It supplies the linker script that will be discussed below.
-Wl,--no-check-sections prevents the linker to check assigned adresses to prevent overlaps. Since a custom linker script is used and there is some potential desired overlap this option is used.
-Wl,--gc-sections enables garbage collection of sections. All sections that are neither directly nor indirectly references from the entry point will get discarded. The linker follows all references to other sections to build a reference graph and discards all sections that cannot be traced back to the section containing the entry point. In this step all unnecessary libraries and functions as well as variables and constants get discarded.
-u call_user_start requires the symbol call_user_start to be entered as undefined into the produced ELF object file.
-Wl,-static disables linking against shared libraries that are obviously not supported on microcontrollers.
-Wl,--start-group and -Wl,--end-group defines that all libraries referenced inside the group get searched repeatedly (i.e. in multiple passes). Normally references are resolved only in order and once which would prevent for example two library modules referencing each othe (i.e. references could only go into one direction).
-lc, -lgcc, -lhal, -lphy, -lpp, -lnet80211, -llwip, -lwpa, -lcrypto, -lmain, -ljson, -lupgrade, -lssl, -lpwm, -lsmartconfig add the respective binary libraries supplied with the SDK to the object file. Because the linker runs garbage collection on sections only the library objects that are really used are included.

This produces the ELF opject file binary eagle.app.v6.out that only contains required sections.

The linker script

The linker script used above depends on the flash configuration and usage of bootloader. In case eagle.app.v6.ld is used the boot_v1.2+ bootloader mode is used with non-FOTA and 4096KB(1024KB+1024KB) SPI size and mapping.

To get a fast overview of the memory map used by the ESP8266 one might take a look at the ESP8266 memory map.

When one looks into the linker script one can discover that there are 4 defined regions:

MEMORY
{
  dport0_0_seg :                        org = 0x3FF00000, len = 0x10
  dram0_0_seg :                         org = 0x3FFE8000, len = 0x14000
  iram1_0_seg :                         org = 0x40100000, len = 0x8000
  irom0_0_seg :                         org = 0x40210000, len = 0x5C000
}

dport0_0_seg is the memory mapped I/O region
dram0_0_seg is user data RAM available to user applications
iram1_0_seg is the instruction RAM section used by the bootloader to load flash memory < 40000h into RAM.
irom0_0_seg contains all code that won’t get copied into RAM on boot but will get mapped into the address space (read only)

Next ELF program headers are generated for all segments as well as the dram0 bss section (bss sections contain variables that are not initialized during load and so don’t have to be read into memory or saved in flash before accessing them).

PHDRS
{
  dport0_0_phdr PT_LOAD;
  dram0_0_phdr PT_LOAD;
  dram0_0_bss_phdr PT_LOAD;
  iram1_0_phdr PT_LOAD;
  irom0_0_phdr PT_LOAD;
}

The PT_LOAD command instructs the ELF loader that these segments have to be loaded from the file. ELF would be capable of providing additional information like for example notes, dynamic linking information, name of the used linker, etc. that won’t we used in this linker script.

After that the entry point call_user_start as well as the five exception vectors are defined:

ENTRY(call_user_start)
EXTERN(_DebugExceptionVector)
EXTERN(_DoubleExceptionVector)
EXTERN(_KernelExceptionVector)
EXTERN(_NMIExceptionVector)
EXTERN(_UserExceptionVector)
PROVIDE(_memmap_vecbase_reset = 0x40000000);

The definition as EXTERN tells the linker that these symbols will enter the resulting object file as undefined (they will be linked by some other tool later on).

The PROVIDE statements throughout the file include additional symbols into the symbol table that are not defined inside the input object files. This is done so they don’t have to be defined inside the source files - and because they change depending on the flash map configuration. As one can see the reset vector points into the internal boot ROM of the ESP8266 (thats not modifyable).

Next follows the cache configuration. This tells which regions get mapped with write-back (wb) or write-through (wt) strategies:

_memmap_cacheattr_wb_base = 0x00000110;
_memmap_cacheattr_wt_base = 0x00000110;
_memmap_cacheattr_bp_base = 0x00000220;
_memmap_cacheattr_unused_mask = 0xFFFFF00F;
_memmap_cacheattr_wb_trapnull = 0x2222211F;
_memmap_cacheattr_wba_trapnull = 0x2222211F;
_memmap_cacheattr_wbna_trapnull = 0x2222211F;
_memmap_cacheattr_wt_trapnull = 0x2222211F;
_memmap_cacheattr_bp_trapnull = 0x2222222F;
_memmap_cacheattr_wb_strict = 0xFFFFF11F;
_memmap_cacheattr_wt_strict = 0xFFFFF11F;
_memmap_cacheattr_bp_strict = 0xFFFFF22F;
_memmap_cacheattr_wb_allvalid = 0x22222112;
_memmap_cacheattr_wt_allvalid = 0x22222112;
_memmap_cacheattr_bp_allvalid = 0x22222222;
PROVIDE(_memmap_cacheattr_reset = _memmap_cacheattr_wb_trapnull);

Note that these values are again highly dependent on the used flash memory map.

After that sections get mapped into the segments that got defined at the beginning. Each entry of SECTIONS defines an output section. As one can see can see inside the linker script the output sections get assembled from the various input sections. For example

.irom0.text : ALIGN(4)
{
  _irom0_text_start = ABSOLUTE(.);

  *libat.a:(.literal.* .text.*)
  *libcrypto.a:(.literal.* .text.*)
  *libespnow.a:(.literal.* .text.*)
  *libjson.a:(.literal.* .text.*)
  *liblwip.a:(.literal.* .text.*)
  *libnet80211.a:(.literal.* .text.*)
  *libsmartconfig.a:(.literal.* .text.*)
  *libssl.a:(.literal.* .text.*)
  *libupgrade.a:(.literal.* .text.*)
  *libwpa.a:(.literal.* .text.*)
  *libwpa2.a:(.literal.* .text.*)
  *libwps.a:(.literal.* .text.*)

  *libmbedtls.a:(.literal.* .text.*)

  *libm.a:(.literal .text .literal.* .text.*)

  *(.irom0.literal .irom.literal .irom.text.literal .irom0.text .irom.text)
  _irom0_text_end = ABSOLUTE(.);
} >irom0_0_seg :irom0_0_phdr

assigns the literal and text sections from all linked libraries into the irom0.text sections in the specified order. At the end all sections that got assigned to irom0.literal, irom0.text, etc. get also merged into this section. The _irom0_text_end = ABSOLUTE(.) command assigns the absolute position of _irom0_text_end instead of using realtive positioning. As one can also see the section gets aligned to a 32 bit boundary.

At the end of the linker script another linker script eagle.rom.addr.v6.ld gets included. This provides the absoluve adresses of all functions that should be callable inside the non modifyable ROM area. For example

PROVIDE ( SHA1Final = 0x4000b648 );
PROVIDE ( SHA1Init = 0x4000b584 );
PROVIDE ( SHA1Transform = 0x4000a364 );
PROVIDE ( SHA1Update = 0x4000b5a8 );

Just craete the four linker symbols SHA1Final, SHA1Init, SHA1Transform and SHA1Update and provide the absolute adresses of these functions that reference into the internal boot ROM.

Extracting dump, assembly and sections

In the following step the sections that make up the image file get extracted from the ELF object file using objcopy. With the help of objdump and assembly language dump as well as a dump file gets generated. These files are then used by the gen_appbin python script to build the image file itself.

xtensa-lx106-elf-objdump -x -s .output/eagle/debug/image/eagle.app.v6.out > ../bin/eagle.dump
xtensa-lx106-elf-objdump -S .output/eagle/debug/image/eagle.app.v6.out > ../bin/eagle.S
xtensa-lx106-elf-objcopy --only-section .text -O binary .output/eagle/debug/image/eagle.app.v6.out eagle.app.v6.text.bin
xtensa-lx106-elf-objcopy --only-section .data -O binary .output/eagle/debug/image/eagle.app.v6.out eagle.app.v6.data.bin
xtensa-lx106-elf-objcopy --only-section .rodata -O binary .output/eagle/debug/image/eagle.app.v6.out eagle.app.v6.rodata.bin
xtensa-lx106-elf-objcopy --only-section .irom0.text -O binary .output/eagle/debug/image/eagle.app.v6.out eagle.app.v6.irom0text.bin

As one can see this simply dumps the sections previously defined in the linker scripts.

Generating the application image `eagle.app.flash.bin`

This step is the most compilcated. The Makefiles only call a small Python script:

python ../tools/gen_appbin.py .output/eagle/debug/image/eagle.app.v6.out 0 0 0 6 0

The first argument is the ELF file that should be parsed. Note that the dumped file names from above (for example eagle.app.v6.text.bin, etc.) have been fixed inside the python script.
2nd argument: Boot mode
3rd argument: Flash mode
4th argument: Flash clock divisor
5th argument: Flash map size (one of the preconfigured ones)
6th argument: user_bin

The python script itself first extracts all symbols from the ELF file:

xtensa-lx106-elf-nm -g .output/eagle/debug/image/eagle.app.v6.out > eagle.app.sym

This symbol file contains just a list of linker calculated adresses as well as the symbolic names. For example the dump used to write this example contains

40101300 T Cache_Read_Disable_2
40004678 A Cache_Read_Enable
40101340 T Cache_Read_Enable_2
401001b0 T Cache_Read_Enable_New
40100004 T call_user_start
40100254 T call_user_start_local

for the call_user_start symbol (that points into instruction RAM). This symbol file is used by the script to locate three symbols:

call_user_start
_data_start
_rodata_start

Dependent on the boot mode it now prepares an image header:

Flash image header

Boot mode 0 (`boot_v1.1`)

Offset	Length	Content
0	1	`BIN_MAGIC_FLASH` (0xE9)
1	1	Constant 3
2	1	`FlashMode` (0:QIO, 1:QOUT, 2:DIO, 3:DOUT)
3	1	`(FlashSizemap << 4) \| FlashClockDivider`
4	4	Address of entry point `call_user_start`

The FlashSizemap is again one of the known constants:

`FlashSizemap`	Flash Size	Mapping
0	512 KB	256 KB + 256 KB
1	256 KB
2	1024 KB	512 KB + 512 KB
3	2048 KB	512 KB + 512 KB
4	4096 KB	512 KB + 512 KB
5	2048 KB	1024 KB + 1024 KB
6	4096 KB	1024 KB + 1024 KB

as well as the clock divider:

`FlashClockDivider`	Factor
0	80m / 2
1	80m / 3
2	80m / 4
15	80m / 1

Boot mode 1 (`boot_v1.2+`)

Offset	Length	Content
0	1	`BIN_MAGIC_FLASH` (0xE9)
1	1	Constant 3
2	1	0
3	1	`user_bin`
4	4	Address of entry point `call_user_start`

Boot mode 2 (`none`)

In this boot mode an additional header is added in front of the other image data:

Offset	Length	Content
0	1	`BIN_MAGIC_IROM` (0xEA)
1	1	Constant 4
2	1	0
3	1	`user_bin`
4	4	Address of entry point `call_user_start`

Immediately following that header irom0.text.bin is directly written into the output file. This file is prepended by another header consisting of two 32 bit integers that form an start_offset = 0 and length = (file_length + 15) & (~15) field

Offset	Length	Content
0	4	Start offset of IROM section (0)
4	4	Length rounded to multiples of 16

After that the same header as for boot mode 0 (boot_v1.1) as described above gets appended. All blocks that are not a multiple of 16 get filled with zero bytes.

Binary section files

Following the header (or in boot mode none (2) after the irom header and irom0.text.bin) the .text, .data (only if _data_start as previously extracted is set) and .rodata sections are appended.

All section files get prepended with a start offset and length header:

Offset	Length	Content
0	4	Start offset (see below)
4	4	Length rounded to multiples of 4

The start offset is either set to 0x40100000 in case of the .text binary or to the previously fetched _data_start or _rodata_start from the symbol table. This header instructs the bootloader to load sections at the specified offsets.

Note that all payload bytes (including padding but excluding header) are added to form a simple checksum. This checksum is formed by simply xoring all data and padding bytes. This Checksum gets initialized with the magic value 0xEF.

Checksum header

The checksum header is formed by first padding the file excluding the last byte with 0x00 bytes up to the flash data line size (i.e. 16 bytes) - these bytes are not included in the checksum any more.

The last byte written is then the collected checksum created during the concatenation of the binary sections.

Padding and irom0 (boot mode 1)

Only in boot mode 1 (boot_v1.2+) the flash file gets padded up to 0x10000 (i.e. 64 KByte) bytes with 0xFF bytes. After that the content of eagle.app.v6.irom0text.bin gets simply appended byte by byte.

Cyclic redundancy check (CRC, mode 1 and 2 only)

At the end the file in boot modes boot_v1.2+ or none a 32 bit cyclical redundancy checksum (CRC) gets appended. This is calculated as the CRC32 of the whole previously generated flash image file. This is again appended in little endian order to the file.

Finishing up

After the script generated the eagle.app.flash.bin as described above the only thing left is copying the resulting files into the ../bin/ directory

mv eagle.app.flash.bin ../bin/eagle.flash.bin
mv eagle.app.v6.irom0text.bin ../bin/eagle.irom0text.bin

and cleaning up

rm eagle.app.v6.*

The script then tells us where to put the files into flash memory:

eagle.flash.bin-------->0x00000
eagle.irom0text.bin---->0x10000

Building FOTA binaries

The build process when building FOTA binaries is very similar to the above one. The main difference is that the generated files are called user1 and user2, dependent on which partition they should be written into - and the linker scripts supply corresponding adresses (hence the build has to be run two times - once for user1 and once for user2). The other main difference the python script generating the flash image is called with user_app=1 and user_app=2. This leads to a minor change inside the flash image header (in boot_mode 1 or 2).

The linker script used is eagle.app.v6.new.2048.ld - one can see that the script is adapted to only use half of the available memory.

At the end of the script run the makefile tells us where to flash the images:

boot.bin------------>0x00000
user1.4096.new.6.bin--->0x01000

boot.bin------------>0x00000
user2.4096.new.6.bin--->0x101000

Note that one normally does not flash user2 directly - thats normally done via the cloud service during an upgrade.

Note that boot.bin is the FOTA bootloader binary provided by espressif that can be found at ${SDKROOT}/bin. The files are currently called boot_v1.2, boot_v1.6 and boot_v1.7.bin. The version numbers may increase in future.

Initial data sections

Note that there are some additionally provided binaries that might be required to put the ESP8266 into a useable state:

blank.bin
esp_init_data_default_v05.bin
esp_init_data_default_v08.bin