Firmware reversing - intro

Tuesday, January 10, 2023    Post   1689 words   8 mins read

In computing, firmware is a specific class of computer software that provides the low-level control for a device’s specific hardware. Firmware, such as the BIOS of a personal computer, may contain basic functions of a device, and may provide hardware abstraction services to higher-level software such as operating systems. For less complex devices, firmware may act as the device’s complete operating system, performing all control, monitoring and data manipulation functions. Typical examples of devices containing firmware are embedded systems (running embedded software), home and personal-use appliances, computers, and computer peripherals.

The firmware itself is the type of software that communicates with and controls the hardware components of a specific device; it is the first piece of code that runs on the device, it usually loads the operating system and provides specific services to run programs, interacting with various hardware components.

Many of the devices you use on a daily basis interact with firmware, it can be viewed as actual code that runs on an IoT or embedded device. Reversing the firmware involves disassembling and understanding the inner workings of the device: this can range from simple analysis (looking at various aspects of the device, such as its file system and various interfaces) to an in-depth look at the firmware itself and uncovering the internal details and algorithms of the firmware.

Reversing (reverse-engineering) firmware is useful because it can lead to the discovery of various important information about the device, such as hard-coded data, security flaws in various critical algorithms and maybe login credentials. The firmware may contain information such as encryption keys, API keys and other hard-coded secrets. We could modify the firmware and flash the device with our patched code to change the execution flow in the chip.

When dealing with the parsed code, you will have to understand the different internals of the device. One important aspect of the device is its architecture, most embedded systems use a RISC architecture because it has small and optimized instructions and usually takes fewer clock cycles compared to computers which can do complex tasks in a single clock cycle. A clear understanding of the architecture of the device is important if you want to grasp the inner workings of the device.

Microcontroller-based devices are not only used in sensor networks, they are everywhere, from the refrigerator to the microwave oven, alarm system, there are dozens of them in your car as well as in your laptop/computer.

Devices with firmware are divided into several classes:

Network Devices

  • Routers, switches, NAS (Network Attached Storage), VoIP phones.
  • Surveillance - alarms, cameras, CCTV, DVRs, NVRs.
  • Industrial Automation - PLCs.

Process monitoring and automation

  • Sensors, smart homes, Z-Waves.

Home Appliances

  • Household Appliances - washing machines, refrigerators, dryers.
  • Entertainment equipment - TV, DVR, receivers, stereo, game.
  • Consoles, MP3 players, cameras, cell phones, toys.

Other complex devices

  • Hard drives, printers.
  • Cars.
  • Medical devices.

When you don’t have the source code for that firmware (which is most of the time), you resort to reverse engineering. Reversing binary firmware files is a bit different from reversing Windows EXE and Linux ELF files because they have no predefined structure. For bare-metal binaries, you will need the chipset specifications and to create a memory map in your disassembly tool (such as IDA Pro, Ghidra, etc.) to get the proper disassembly. The memory map also helps users answer another very important question: what GPIOs and other peripherals the device communicates with. This information in turn helps you to understand the functionality of the device.

In fact we need special tools to quickly find certain information before loading the firmware into our favorite disassembler:

  • The byte order of the architecture, because you want to know how the values are stored in memory.
  • The base address where the contents of the firmware are loaded.

In what cases a bare-metal firmware is used?

If you are considering a device that performs simple tasks (e.g. temperature or light control), it does not require any complex software such as Linux; in those specific cases a bare-metal firmware is often used.

But what is a bare-metal firmware?

Simply put, this type of firmware communicates directly with the hardware without the involvement of a driver or kernel. Bare-metal firmware doesn’t do a lot of complex things, they usually have no more than 3 or 4 tasks, and these tasks are put in a loop because they are scheduled to run in a specific order/condition. The SDK provided by the vendor for these devices provides life-cycle methods that the programmer writes and these functions are executed in a loop. Some small variants of the complex functions of these SDKs are FreeRTOS, Mbed OS, those are Real Time Operating Systems that allow you to do task scheduling and very quickly respond to some requests.

When you see more advanced systems like routers, smart home panels, drones, and medical devices, they do little more than what systems from scratch do.

Take this for example: a typical wireless router has features such as LAN and WiFi connectivity, they can blacklist or whitelist certain MAC addresses, and some modern routers have anti-virus software and other security features. To implement all of these features, you need a complete operating system that supports all of these complex functions.

General statistics say that the most popular OS choice among embedded-system products is Linux, there are many reasons for this, some of which are open source and flexibility.

Structure

A firmware usually contains at least three components: the boot loader, the kernel and the file system.

  • The boot loader is the part of the software that helps boot the kernel and passes various information to it.
  • Once the boot loader completes execution, the kernel takes over.
  • Once the kernel starts executing it runs other user applications to make the operating system usable by the end user. This usually involves running various applications and services in the background.

Once all of this is done, the user is able to interact with the system. All user applications and application data are stored on the file system.

As stated earlier, firmware is heavily tied to a specific architecture using a given processor with its own peripherals and communication buses, with its own characteristics and features, therefore making reverse engineering a tedious task. This information can be found in the architecture documentation, if available.

Architecture

ARM

ARM processors are a family of central processing units (CPUs) based on a reduced instruction set computer (RISC) architecture. ARM stands for Advanced RISC Machine. ARM architectures represent a different approach to system hardware design than the more familiar server architectures such as x86.

Some of the basic instructions:

  • Arithmetic operations: ADD, SUB, MUL.
  • Load and store instructions: LDR, LDM, STR, STM.
  • Industry operations: B, BL, BX, BLX.

MIPS

MIPS is a scaled-down RISC instruction set architecture mainly used in embedded systems such as gateways and routers. MIPS has 32 general-purpose registers. The term MIPS is an acronym for Microprocessor without Interlocked Pipeline Stages, it is a reduced instruction set architecture developed by MIPS Technologies and is very useful to learn because many embedded systems run on a MIPS processor.

Some of the basic instructions:

  • Arithmetic operations: add, addu, sub, subu, div, mult, multu.
  • Load and store instructions: lhu, lbu,lw, sw, sb, sh.
  • Branch operations: beq, bne, ble, bge.
  • Control flow instructions: j, jal, jr, jalr.

Atmel AVR

The AVR is an 8-bit RISC architecture that is mainly used in automotive, security, and entertainment applications. The AVR is also the architecture used by Arduino development boards.

Some of the basic instructions:

  • Arithmetic operations: ADD, ADC, SUB, SUBC, MUL, INC, DEC.
  • Input/Output: IN, OUT.
  • Load and store instructions: LD, LDI, LDS, ST, STS.

RISC-V

RISC-V is a free and open source ISA based on RISC. RISC-V ISA uses a bootstrap architecture, it has integer and logical instructions as well as several memory instructions. RISC-V is a load/store architecture, so the integer operands of instructions must be registers.

Some of the basic instructions:

  • Arithmetic operations: ADD, ADDI, SUB, SLT, SLTU, LUI.
  • Branch operations: BEQ, BNE, BLT, BGE.
  • Load and store instructions: LB, LH, LW, LBU, LHU.

There are many more architectures, above are just some of the ones that are better known. Try to learn the main parts of the architecture, such as the concepts of the basic instructions and their program flow. A good way to learn about the different architectures is to write some simple high-level code and then compare it with the corresponding assembly.

Tools

  • QEMU - A valuable tool for working in a cross-architecture environment (ARM, MIPS, etc.), which is usually the case for embedded system developers. This tool allows you to emulate binary firmware for different architectures such as ARM, MIPS, etc. in a host system which has a different architecture, such as x86, amd64. QEMU comes in handy when you want to check the firmware but you don’t have a device or the debugger setup for that system is very complicated.
  • Binwalk - This is a fast and easy-to-use tool for analyzing, reverse engineering and extracting firmware images. It is a firmware extraction tool, it tries to extract binaries from any binary blob. It does this by searching for signatures for many common binary file formats such as zip, tar, exe, ELF, etc. Binwalk has a database of binary header signatures against which the signatures are matched. The general purpose of using this tool is to extract the file system, such as Squashfs, yaffs2, Cramfs, ext*fs, jffs2, etc, which is embedded in the firmware binary.
  • at51 - Applications for reverse engineering architecture 8051 firmware.
  • FIRMADYNE - Platform for emulation and dynamic analysis of Linux-based firmware.

The Practical Reverse Engineering articles by Juan Carlos Jimenez are really useful in understanding what you are dealing with.

As you can probably notice by now, reverse-engineering firmware is a big topic and so far I touched the basic architecture and some of the tools. Stay tuned for the next article in the series.