Low-Level (Assembly) Language for Reverse Engineering

Assembly language is the primary skill that each reverse engineer has to have before they can do their work. The ABCs of reversing may be…

Assembly language is the primary skill that each reverse engineer has to have before they can do their work. The ABCs of reversing may be learned by studying assembly language. Although it can seem difficult at first, it will gradually become second nature. The language used to interact with the computer is called assembly language. Humans can understand a program’s source code, but computers cannot. For the machine to interpret the source code, it must be reduced to assembly language code form. But what if the source code is not accessible to humans? Reading the assembly code of a program is the only way we can comprehend what it does.

Assembler language will be briefly introduced, with an emphasis on the x86 Intel architecture. Since Intel x86 is now the most well-liked and widely-used architecture, we will concentrate on it.


x86

Assembly language has its own variables, grammar, operations, and functions much like any other programming language. A modest quantity of data is processed by each line of code. In other words, each line of code reads or writes a single byte.

Registers

You may think of registers as variables in assembly language. The registers are categorized as being one of the following:

  • General purpose registers
  • Segment registers
  • Flag registers
  • Instruction pointers

In x86 architecture, each general purpose register has its designated purpose and is stored at WORD size, or 16 bits, as follows:

  • Accumulator (AX)
  • Counter (CX)
  • Data (DX)
  • Base (BX)
  • Stack pointer (SP)
  • Base pointer (BP)
  • Source index (SI)
  • Destination index (DI)

System states and logical results of executed code are stored in the FLAGS register. Every bit of the FLAGS register has its own purpose.

All these registers have an “extended” mode for 32-bits. It can be accessed with a prefixed “E” (EAX, EBX, ECX, EDX, ESP, EIP, and EFLAGS). The same goes with 64-bit mode, which can be accessed with a prefixed “R” (RAX, RBX, RCX, RDX, RSP, and RIP).

The memory is divided into sections such as the code segment, stack segment, data segment, and other sections. The segment registers are used to identify the starting location of these sections, as follows:

  • Stack segment (SS)
  • Code segment (CS)
  • Data segment (DS)
  • Extra segment (ES)
  • F segment (FS)
  • G segment (GS)

Memory addressing

Every byte stored in the memory is assigned a memory address that identifies its location.

We use the registers or memory to process data while reading or writing it to memory as BYTE, WORD, DWORD, or even QWORD. Data is read in little-endian form or big-endian form depending on the platform or software.

  • In little-endian, a chunk of data read into a DWORD is reversed.
  • In the big-endian system, the same chunk of data will be read as AABBCCDDh.

Basic instructions

Assembly language is made up of direct lines of code that follow this syntax:label/address mnemonic operands; comment

Opcode bytes

Every instruction has an equivalent opcode (operation code) byte:Opcode Instructions
B8 00000040 MOV EAX,40000000h
B9 04000000 MOV ECX,4

Copying data

The MOV instruction is used to move data.mov eax, 0xaabbccdd ; 0xaabbccdd value to eax
mov al, byte ptr [00000001] ; read from memory
mov byte ptr [00000001], al ; write to memory

MOV is used to read the value at a given address, while LEA (Load Effective Address) is used to get the address instead:mov eax, dword ptr [00000060] ; stores 63626160h to eax
lea eax, dword ptr [00000060] ; stores 00000060h to eax

Addition and subtraction

In addition (ADD) and subtraction (SUB), the OF, SF, and CF flags are affected.add ecx, ebx
sub ecx, edx

Increment and decrement instructionsinc eax
dec eax

Multiplication and division instructionsmov eax, 0x80000000
mov ecx, 2
mul ecx

Other signed operations

NEG

This operation does a two’s complement.

MOVSX

This moves a BYTE to WORD or WORD to DWORD, including the sign.

Bitwise algebraNOT
AND
OR
XOR
SHL/SAL ; shifts bits to the left.
SHR/SAR ; shifts bits to the right.
ROL ; rotates bits to the left.
ROR ; rotates bits to the right.

Control flow

JMP:

jump to a given addressjmp eax
jmp dword ptr [00403000]

CALL and RET:

CALL stores the address of the next operator in the stack while RET retrieves the stored value from the stack.

Conditional Jump:

jxx statements jump based on the condition set on the flag register.


Stack manipulation

Data is temporarily stored on the stack, a memory area. The first-in, last-out approach is used for adding and deleting data from the stack. The stack, also known as a stack frame, is where uninitialized variables in subroutines built from C programs are originally allocated space.

We majorly have three operations:

  • PUSH — push to the stack
  • POP — delete from the stack
  • ENTER — commonly used at the start of a subroutine. It is used to create a stack frame for the subroutine.

Stack frame creationpush ebp ; save the current value of ebp
mov ebp, esp ; stores current stack to ebp
add esp, 8 ; create a stack frame with a size of 8 bytes


Tools

  1. Popular assemblers
  • MASM — sometimes called Microsoft Macro Assembler, has been in use for more than three decades. It is a component of the Visual Studio software and is maintained by Microsoft. It was made to translate x86 source code into executable code.
  • NASM — Netwide Assembler is known by the acronym NASM. With a few minor variations in their syntax, directives, and variable definition, NASM and MASM are fairly similar. The easy identification of code and data sectioning is a nice feature of NASM.
  • FASM — The Flat Assembler, or FASM, is comparable to NASM and MASM. Similar to MASM, it offers a built-in source editor. The program is available in versions for Windows and Linux and has components that are simple to identify and configure

2. x86 Debuggers

Debuggers are the tools used by programmers to go through their code. These instruments are used to verify that the software behaves as predicted. We can track our code line by line using a debugger. Every instruction is executed, allowing us to see as it modifies the registers and memory contents.

  • WinDbg
  • Ollydebug
  • x64dbg

Please give me a clap if you found it to be useful and follow me to get more hacking knowledge.