Post by Acid Burn on Jun 19, 2006 13:23:07 GMT
------------------------------------------------
C/O :: Acid Burn of Oblivion Realm
------------------------------------------------
Okay, I've decided to write this
introduction to x86 assembly as I know there are a number of people out
there who would like to know it though have either had trouble understanding
some of the tutorials they've come across, or just haven't got round to
learning.
First a few definitions:
note: Whenever I use the letter 'b', 'h' or 'd' after a number it signifies
that it is either a binary, hexadecimal or decimal number respectively.
Hexadecimal can also be shown by appending '0x' before the number instead of
the 'h' afterwards.
bit - the name given to the smallest size of data in a computer. A bit can
hold one of two values; 1 or 0.
nybble (also spelt nibble or called a semioctet) - 4 bits, it has a maximum
value of 1111b (Fh/15d)
byte (also called an octet) - 8 bits or 2 nybbles, it has a maximum value of
11111111b (FFh/255d)
word - a grouping of 2 bytes (4 nybbles or 16 bits)
double word (dword) - as the name suggests a dword is 2 words, 4 bytes, 8
nybbles or 32 bits in size
far word (fword, also called a triPle word; pword) - 3 word's, 6 bytes or 48
bits
quadruple word (qword) - 2 dword's, 4 word's, 8 bytes or 64 bits
ten word (tword, also more accurately called a ten byte; tbyte) - 5 word's,
10 bytes or 80 bits
double quadruple word (dqword, also called an octuple word; oword) – 2
quadruple word's, 4 dword's, 8 word's, 16 bytes or 128 bits
kilobyte (also kilooctet) - 1024 bytes (not 1000 as all data sizes must have
a root of 2 when talking about computers. This stems from the way all
computers are based fundamentally around the binary numbering system)
megabyte (also megaoctet) - 1024 kilobytes or 1,048,576 bytes
gigabyte (also gigaoctet) - 1024 megabytes, 1,048,576 kilobytes or
1,073,741,824 bytes
There are also larger data sizes than a gigabyte. These are (in order);
terabyte, petabyte, exabyte, zettabyte and yottabyte. Each one of these is
1024 times larger than its predecessor. In each of them the word byte can
also be replaced with octet in order to be more descript when referring to
the x86 computer architecture.
note: The size of a word, and so also dword and qword varies between
different computer architectures. Throughout this tutorial when referring
to word's, dword's and qword's I shall be using them (like above) as they
are defined for the x86 computer architecture (effectively most Intel and
AMD CPU's). Also, byte can refer to a piece of data of size greater or
smaller than 8 bits long on other computer architectures. Octet however can
only be used to describe 8 bits.
Now that that's over I'll start with the basics. Every piece of software in
its simplest form is made of binary; 1's and 0’s, and just using binary was
how the first computers were programmed. Of course as I'm sure you can see
it soon becomes complicated to tell what's going on when you have a page
full of just 2 different characters, and such a program is near impossible
to write, let alone debug. As an example the below binary prints 'Hello
World!' when run in dos.
101111100001011100000001101011001000010011000000011101000000100110110100000
111010111011000000100000000011001101000100001110101111110010001100001110010
110011010001011011001101000110010100100001100101011011000110110001101111001
000001010111011011110111001001101100011001000010000100000000
Obviously it would be impossible for anyone to know what's going on in the
above program without carefully studying every part of it bit by bit (pardon
the pun). There was a simpler method later derived however, which is the
reason that all instructions are defined in bytes (16 bits), and must be a
multiple of this. This simpler method is to use the base-16 numbering system
(hexadecimal; also commonly referred to as hex, although hex in reality
means base-6 – confusion between the 2 is unlikely however, as base-6 is
very rarely used) to define each byte individually. Employing this system
converts our above code into this:
BE 17 01 AC 84 C0 74 09 B4 0E BB 02 00 CD 10 EB F2 30 E4 CD 16 CD 19 48 65
6C 6C 6F 20 57 6F 72 6C 64 21 00
You can see that this second example is much more legible, and with proper
knowledge anyone could work out and debug the below code in a fraction of
the time it would take to do the same thing with the first example. It still
takes far too much time to program in however, and is still extremely
difficult, which is how the next evolution of programming came about;
assembly. Assembly is essential the same as writing out the code in
hexadecimal only the meaningless numbers have been replaced by short
mnemonics so that code can become truly readable. Below is the same program
as above though written in x86 assembly:
mov si, 0x117
lodsb
test al, al
jz short 0x111
mov ah, 0x0E
mov bx, 0x02
int 0x10
jmp short 0x103
xor ah, ah
int 0x16
int 0x19
db 'Hello World!',0x00
To help show how this works we'll break down the first opcode (a group of
hexadecimal values that together form one instruction) of the above program;
'BE 17 01'. This is created by the assembler from the user inputted
instruction 'mov si, 0x117'. The assembler knows that 'mov si' is the same
as the hexadecimal 'BE', and that this should be followed by the value of
the word after 'mov si', which is '17 01' (words, dwords and qwords are
stored 'backwards' due to the way they are fed in and then read by the
processor).
At this point I feel it appropriate to mention how to manually convert
numbers between binary, decimal and hexadecimal:
_Decimal -> Binary_
We know that binary is base-2, so the number columns in binary go up like
so:
(etc.) 2^3 2^2 2^1 2^0
The above expressed as numbers is; 8 4 2 1. Now say we wanted to convert the
number 7 into binary. We firstly take this line of numbers (the highest
number in this line must be higher than the number we are converting) and
write it down like so:
8 4 2 1
We now divide the number we’re converting by the highest number in this
sequence; 8. This gives us 7 / 8 = 0 r 7 (we’re not working with fractions).
We now divide our remainder (7) by the next number along, 4. This gives us 1
r 3. We no repeat this for the next number along; 2, giving us 1 r 1. Now
repeat for the final number, 1; 1 / 1 = 1 r 0. Finally allocate a 0 to all
numbers that did not divide, and a 1 to all numbers that did. This gives us;
0111b, which can be shortened to 111b.
We can check this is right by again using the number sequence; 8 4 2 1. We
know that 1 + 2 + 4 = 7 so 111b = 7d. Remember however that in computing
every number is expressed as bytes, so the number of binary characters must
be divisible by 8; 1 byte (as 1 binary character is 1 bit). This means that
our above number, 111b goes to 00000111b (although this can normally be
expressed as 111b to the assembler).
_Decimal -> Hexadecimal_
The process of converting decimal to hexadecimal is very similar to
converting decimal to binary, except we use a base-16 number sequence
instead of a base-2 one.
Using the number 123d as an example we firstly write down the number
sequence:
16^2 16^1 16^0 -> 256 16 1
We now divide our number (123) by 256 which equals 0 r 123. Then the
remainder (123) of this by 16 which equals 7 r 11. Then the remainder of
this (11) by 1 which equals 11 r 0. This gives us the end result of 0 7 11.
Now remember that in hexadecimal A = 10, B = 11, C = 12, D = 13, E = 14 and
F=15. This means that we should express our result (0 7 11) as 0 7 B. Now we
remove any leading 0’s, leaving us with the number 7Bh. Remember that a pair
of hexadecimal numbers is a byte so hexadecimal should always be expressed
in set of two’s (e.g. 5h -> 05h), though like with binary this is not
normally essential for the assembler to interpret the number correctly.
_Binary -> Decimal_
This is very simple to do. Say we have the binary number 01010101. We again
start with our base-2 number sequence writing the respective binary digit
beneath each number:
128 64 32 16 8 4 2 1
0 1 0 1 0 1 0 1
Now we add up all the numbers in the sequence that have a one below them; 64
+ 16 + 4 + 1 = 85.
_Hexadecimal -> Decimal_
This is very similar to the binary to decimal conversion only with base-16.
Using the number B3h as an example we write out our number sequence followed
by the numbers we are converting like above:
256 16 1
0 B 3
We next multiply the top number by the bottom number on all the numbers
above (remember that B = 11) and add up the results. This gives us; (16 x
11) + (1 x 3) = 179.
_Binary -> Hexadecimal_
Using the number 10101b as an example we first split our binary number up
into groups of four and the write the number sequence 8, 4, 2, 1 above each
group like so:
8421 8421
0001 0101
We next multiply the top number by the bottom number and add up the results
for each group. This gives us; (8 x 0) + (4 x 0) + (2 x 0) + (1 x 1) = 1 and
(8 x 0) + (4 x 1) + (2 x 0) + (1 x 1) = 5. Putting these two together gives
us the hexadecimal value of 15h.
_Hexadecimal -> Binary_
Using the number B3 as an example we first write the numbers constituent
parts all in decimal so that B3 becomes 11 3 (as B = 11). We now write these
numbers down four times in a row above the number sequence 8, 4, 2, 1 as
shown below:
11 3
8 4 2 1 8 4 2 1
We now divide each of the top numbers by the highest number in the sequence
below it (8), so 11 / 8 = 1 r 3 and 3 / 8 = 0 r 3. We then write this result
(either 1 or 0) below the number we just divided by and then divide the
remainder of these sums by the next number along in the sequence, and so on
until we have been through the entire sequence. We should then have a result
like below:
11 3
8 4 2 1 8 4 2 1
1 0 1 1 0 0 1 1
Finally we combine these two results to form the number 10110011b.
By now you're probably agitated to start writing your first bit of x86
assembly (or maybe not), though there are just a few more things you need to
know before you can start.
One of these is about registers. Registers can be considered as the assembly
equivalent of variables, though there are different registers and types of
registers for different purposes. The fundamental ones I shall explain
below, then introduce some of the others later on in this document.
Firstly the general purpose registers (GPR's); ax, bx, cx and dx. These are
16 bit registers and are known as the accumulator, base address, count and
data registers respectively.
note: that on a 32 bit machine (all modern CPU's, so the 80386 and above)
these registers are extended to the 32 bit registers; eax, ebx, ecx and
edx, and on a 64 bit machine (most new AMD CPU's at the time of this
writing) these 32 bit registers are extended further into; rax, rbx, rcx
and rdx. Also, for ease I shall explain these registers as if they are 64
bit. If you are only using a 32 bit system the below information is exactly
the same, just disregard the 64 bit register extensions.
All four of the GPR's do have slightly different purposes, though in most
cases can just be treated as variables within which you can store whatever
value you want. The differences between them are basically that the
accumulator register should be used wherever possible for calculations as
many opcodes have a 1 byte variant for dealing specifically with the ax
register. The base register doesn't really have a specific function anymore,
though derives its name from the 'xlat' opcode where it still has some
specific functionality, it can also be used as an index register (discussed
in a second) under certain circumstances. The count register is called so
because it is designed for keeping count of something, and this use can be
seen in the 'loop' opcode which removes 1 bit from the count register every
time it is run. The data register is designed primarily to give a 32 bit
(dx), 64 bit (edx), or 128 bit (rdx) extension to the accumulator register
in certain mathematical operations.
Below I have broken down the rax register into its constituent parts. The
other GPR's can be broken down in exactly the same way (just replace the 'a'
in the registers name with either 'b', 'c' or 'd'):
|------------------rax------------------|
|--------eax--------|
|----ax---|
[ [ [ ah | al ]
To help demonstrate what the above diagram is trying to represent, if for
example we set the rax register as the 64 bit integer
'1111111111111111111111111111111100000000000000001111111100000000b' then the
eax register would now have the value '00000000000000001111111100000000b',
the ax register would have the value '1111111100000000b', the ah register
would have the value '11111111b', and the al register would have the value
'00000000b'.
I haven't yet discussed what the ah and al registers are. These are just 8
bit registers that make up the 8 high-order and 8 low-order bits of the 16
bit ax register. So if the ax register is set as '0000000000000000b' and we
then add 3 to the ah register so that it becomes '00000011b' (since 11b is
the same as 3d), then the ax register's value would now be
'0000001100000000b'. Assuming the rax register had been set to
'100000000000000000000000000000000b' when this modification to the ah
register occurred, then the rax registers new value would be
'0000000000000000000000000000000100000000000000000000001100000000b'.
On a 64 bit machine there are also eight other general purpose registers
although I do not know enough about them to comment.
There are also stack and index registers that both have a more specific task
than that of the GPR's.
The three index registers are the; si (source index), di (destination
index), and ip (instruction pointer) registers (along with their 32 bit and
64 bit forms, appending an 'e' or 'r' to the beginning respectively). The
source index register is intended for use in keeping tabs of where to read
from in stream operations. The destination index register is the opposite of
si, being a place to store where data is to be written to in stream
operations. Both these registers can also be used as GPR's. The instruction
pointer register is different to the other two index registers in that it
can not be directly written too. It keeps tabs of the memory address at
which the current opcode being executed is, and also has the 32 bit and 64
bit variants; eip and rip for protected/unreal mode and long mode (explained
later).
The stack registers; sp (stack pointer) and bp (base pointer) (or esp/rsp
and ebp/rbp) are used to point to the top of the stack, and the base of the
stack (the stack is explained shortly). They are useful in many operations,
though can also be used as GPR's. However the base pointer register is the
only one you're likely to use much as a GPR.
Segment registers are another type of register. For now just know that there
are six of them; the ss (stack segment), cs (code segment), ds (data
segment), es (extra (data) segment), fs (another data segment) and gs
(another data segment). The stack segment register defines the segment
being used for the stack, the code segment register defines the segment in
which the code currently being executed is, and the four other segment
registers can each be used to point to different segments containing data. A
segment is an area in memory that can be between 1 byte and 4GB's in size
and are used when using a segmented memory model (as opposed to the flat or
real-address mode memory models). Different memory models will be explained
later.
The final register I shall be explaining at this point is the rflags
register. As you have probably already guessed the low-order 32 bits of the
rflags register is the eflags register, and the low-order 16 bits of this
register is called the flags register. The eflags register is the only part
of the rflags register that is ever used (the rest of rflags being Intel
reserved), and even much of this is Intel reserved and so I believe it
remains permanently fixed as one value (although this cannot be assumed). I
have drawn the eflags register below:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
[ 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ID | VIP| VIF| AC | VM | RF
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
| 0 | NT | I O P L | OF | DF | IF | TF | SF | ZF | 0 | AF | 0 | PF | 1 | CF ]
0. Carry Flag
2. Parity Flag
4. Auxiliary Carry Flag
6. Zero Flag
7. Sign Flag
8. Trap Flag
9. Interrupt Enabled Flag
10. Direction Flag
11. Overflow Flag
12-13. I/O Privilege Level
14. Nested Task
16. Resume Flag
17. Virtual-8086 Mode
18. Alignment Check
19. Virtual Interrupt Flag
20. Virtual Interrupt Pending
21. ID Flag
It is not important for you to know what all of these are at the moment
though they are all 1 bit flags (excluding IOPL), and so can hold either a 1
or a 0; yes or no. These help the processor determine what action to take
with conditional opcodes. For example the command 'jz' stands for 'jump if
zero' and what it means is if the zero flag is set to 1 then code execution
should jump to the memory address following the jz opcode. In other words if
zf = 1 then ip is changed to the number written after the jz command. If you
were to use the conditional jump 'jz' you would also need an opcode before
'jz' that will determine whether or not the zero flag is set, a command such
as 'test ax, ax' which means; if ax = 0 then zf = 1, else zf = 0.
The stack is a defined area of memory with no defined end that can be used
to store data, normally by using the push and pop opcodes that shall be
explained later. Data can be 'pushed' (copied) onto the top of the stack
from a specified register, and 'popped' (copied) off of the top of the stack
into a specified register. When data is pushed onto the stack the stack is
said to grow downwards (with the top of the stack visually being at the
bottom). When this happens the stack pointer register is decreased to point
to the new top of the stack. Similarly when data is popped off of the stack
into a specified register the stack pointer is then increased to point to
the value before last that was pushed onto the stack. For example if we
wanted to switch the contents of the ax and bx registers through use of the
stack we could do:
push ax
push bx
pop ax
pop bx
Because of the way the stack works it is said to use a LIFO/FILO system;
Last In First Out, or First In Last Out. It is possible to have multiple
stacks and each stack may have (theoretically) a maximum size of either 64KB
(16 bit) or 4GB (32 bit) although we won't go into that now.
note: to anyone wondering what the maximum size is under 64 bit architecture
no clear cut answer can be given. At first glance you may think 2^64 bytes,
or 16 exabytes which would make sense. However the architectural limit for
physical memory on the AMD64 is only 2^53 bytes, or 8 petabytes. You could
however still have a virtual address space of 16EB (not very useful when the
maximum memory you can have is 8PB), except there is a catch; current chips
only support at most around 256TB's (2^48 bytes) of virtual address space,
and the most memory you can fit on a single node is about 16GB (2^34 bytes),
making my previous three sentences completely pointless (thanks ricebowl for
that information). =)
Okay, now all the basic theory is out the way we can finally get on to
learning how to use the assembler and start writing are first few lines of
assembly.
Just to outline the difference between an assembler and a compiler at this
point for anyone still unclear, a compiler takes instructions and converts
them into several machine instructions, whereas a 'true' assembler takes
assembly mnemonics and does a direct 1:1 conversion of them into machine
code.
There are several assemblers to choose from, the main ones being;
a86
I don't know anything about it other than it's not free and I've never
(knowingly) met anyone who uses it.
Gas (GNU Assembler) - www.gnu.org/
I have never used this although it supposedly has poor error checking and
also uses the AT&T syntax. For those people who prefer AT&T syntax however,
it is very popular.
FASM (Flat Assembler)
I don't know much about this assembler although it seems fairly similar to
nasm.
MASM (Microsoft Macro Assembler) - www.masm32.com/
This assembler is very popular though I really dislike it. As the name
suggests it is only for windows and is designed around macros. It cannot
assemble a flat binary file and is lacking in support for a number of
opcodes forcing you to enter the raw hex code instead of mnemonics in some
instances. It does have a good pre-processor though, as well as there being
loads of includes and tutorials on the internet specifically aimed at this
assembler.
NASM (Netwide Assembler) - sourceforge.net/projects/nasm
My favourite assembler. Aside from being one of the most popular assemblers
out there and having great documentation it also likes to keep things as
'pure' assembly, although also contains a reasonable pre-processor.
TASM (Borland Turbo Assembler) - info.borland.com/borlandcpp/cppcomp/tasmfact.html
Like masm except it's not free. It is apparently slightly better although I
can't say I'm experienced enough with it to comment on that.
YASM
A partial nasm rewrite. I'm not sure what the advantage is of using this
over the better supported and more well known nasm but you may want to try
it.
There are also several other assemblers such as GoASM, RosASM and SPASM and
by all means try them. However if you wish to use includes in you future
projects you may have problems finding one's that work with the more niche
assemblers.
In case you are wondering what AT&T syntax is after reading the above let me
explain. Every assembler uses a 'slightly' different syntax, although their
syntax is always closely related to either AT&T or Intel syntax and can be
referred to as using one of these two syntaxes. To show the difference
between them I have included the same code in both AT&T and Intel syntax
below:
_Syntax_
_Intel_ _AT&T_
mul ebx mull %ebx,%eax
lodsb lodsb %ds:(%esi),%al
inc [eax-4] incl -0x4(%eax)
mov ebp, esp movl %esp,%ebp
I have heard users of AT&T syntax argue that it is much more legible than
Intel syntax. Now I personally not only feel that AT&T syntax is harder to
read, though that reading it can be compared to having your eyeballs spooned
out by a Vietnamese leper, although it is all personal preference I
suppose. It is also argued that it is more logical which I can see to some
extend. For a better comparison let's look at this line of code in
particular:
add eax, 1 addl $1,%eax
The main difference between these two lines of code is the ordering of ebp
and esp, although both instructions read 'move esp into ebp' (I will try and
ignore the fact that while the ordering of at&t for this may seem more
logical, using instructions like -0x4(%eax) for [eax-4] confuses the hell
out of me, although it is done that was supposedly as it suggests that it is
indexing). Anyway, this line also demonstrates my main qualm with using AT&T
syntax which is the ordering. If we look at the hex equivalent of the above
mnemonic we will see that is 'E3 C0 01', with E3 being 'add/addl', 'C0'
being the eax register, and '01' obviously being the number 1. You will note
from this that on a 1:1 basis between hexadecimal and assembly it is Intel
syntax that is in the right order. This is because the x86 architecture is
little endian, as explained below.
Endianess (a.k.a. byte order) denotes the order in which bytes are stored
and can be categorised as either big-endian, little-endian or middle-endian
(a.k.a. mixed-endian). If we take the number 0x12345678 as an example; using
a big-endian system this would be stored in memory as 12 34 56 78 with the
most significant byte first, using a little-endian system it would be stored
as 78 56 34 12 with the least significant byte first, and using a middle
endian system it would be stored as somewhere in-between the two; either
34 12 78 56 or 56 78 12 34 depending on the specific machine architecture.
note: The bit ordering of the numbers does not change with different
endianess, only the byte ordering. So the number 1111110000000011b under a
big-endian system architecture will be stored as 1111110000000011b, and under
a little-endian system architecture will be stored as 0000001111111100b.
This convention is carried through to opcode operands where on the x86
machine architecture the destination operand always precedes the source
operand. That is to say that if for example we had the opcode 'mov eax, ebx'
(move the value of ebx into eax so that eax now equals ebx) our target (the
destination operand, eax) comes before the value that will remain unchanged
(the source operand). If we were working under a big-endian system
architecture the above instruction would mean the opposite; move the value
of eax into ebx so that ebx no equals eax.
The term endianess originates from the book 'Gulliver's Travels', where two
factions are at war over which end of an egg you should crack open; the big end or the little-end.
This concludes part 1 of this tutorial series. Please give me any feedback on what you think so far.
C/O :: Acid Burn of Oblivion Realm
------------------------------------------------
Okay, I've decided to write this
introduction to x86 assembly as I know there are a number of people out
there who would like to know it though have either had trouble understanding
some of the tutorials they've come across, or just haven't got round to
learning.
First a few definitions:
note: Whenever I use the letter 'b', 'h' or 'd' after a number it signifies
that it is either a binary, hexadecimal or decimal number respectively.
Hexadecimal can also be shown by appending '0x' before the number instead of
the 'h' afterwards.
bit - the name given to the smallest size of data in a computer. A bit can
hold one of two values; 1 or 0.
nybble (also spelt nibble or called a semioctet) - 4 bits, it has a maximum
value of 1111b (Fh/15d)
byte (also called an octet) - 8 bits or 2 nybbles, it has a maximum value of
11111111b (FFh/255d)
word - a grouping of 2 bytes (4 nybbles or 16 bits)
double word (dword) - as the name suggests a dword is 2 words, 4 bytes, 8
nybbles or 32 bits in size
far word (fword, also called a triPle word; pword) - 3 word's, 6 bytes or 48
bits
quadruple word (qword) - 2 dword's, 4 word's, 8 bytes or 64 bits
ten word (tword, also more accurately called a ten byte; tbyte) - 5 word's,
10 bytes or 80 bits
double quadruple word (dqword, also called an octuple word; oword) – 2
quadruple word's, 4 dword's, 8 word's, 16 bytes or 128 bits
kilobyte (also kilooctet) - 1024 bytes (not 1000 as all data sizes must have
a root of 2 when talking about computers. This stems from the way all
computers are based fundamentally around the binary numbering system)
megabyte (also megaoctet) - 1024 kilobytes or 1,048,576 bytes
gigabyte (also gigaoctet) - 1024 megabytes, 1,048,576 kilobytes or
1,073,741,824 bytes
There are also larger data sizes than a gigabyte. These are (in order);
terabyte, petabyte, exabyte, zettabyte and yottabyte. Each one of these is
1024 times larger than its predecessor. In each of them the word byte can
also be replaced with octet in order to be more descript when referring to
the x86 computer architecture.
note: The size of a word, and so also dword and qword varies between
different computer architectures. Throughout this tutorial when referring
to word's, dword's and qword's I shall be using them (like above) as they
are defined for the x86 computer architecture (effectively most Intel and
AMD CPU's). Also, byte can refer to a piece of data of size greater or
smaller than 8 bits long on other computer architectures. Octet however can
only be used to describe 8 bits.
Now that that's over I'll start with the basics. Every piece of software in
its simplest form is made of binary; 1's and 0’s, and just using binary was
how the first computers were programmed. Of course as I'm sure you can see
it soon becomes complicated to tell what's going on when you have a page
full of just 2 different characters, and such a program is near impossible
to write, let alone debug. As an example the below binary prints 'Hello
World!' when run in dos.
101111100001011100000001101011001000010011000000011101000000100110110100000
111010111011000000100000000011001101000100001110101111110010001100001110010
110011010001011011001101000110010100100001100101011011000110110001101111001
000001010111011011110111001001101100011001000010000100000000
Obviously it would be impossible for anyone to know what's going on in the
above program without carefully studying every part of it bit by bit (pardon
the pun). There was a simpler method later derived however, which is the
reason that all instructions are defined in bytes (16 bits), and must be a
multiple of this. This simpler method is to use the base-16 numbering system
(hexadecimal; also commonly referred to as hex, although hex in reality
means base-6 – confusion between the 2 is unlikely however, as base-6 is
very rarely used) to define each byte individually. Employing this system
converts our above code into this:
BE 17 01 AC 84 C0 74 09 B4 0E BB 02 00 CD 10 EB F2 30 E4 CD 16 CD 19 48 65
6C 6C 6F 20 57 6F 72 6C 64 21 00
You can see that this second example is much more legible, and with proper
knowledge anyone could work out and debug the below code in a fraction of
the time it would take to do the same thing with the first example. It still
takes far too much time to program in however, and is still extremely
difficult, which is how the next evolution of programming came about;
assembly. Assembly is essential the same as writing out the code in
hexadecimal only the meaningless numbers have been replaced by short
mnemonics so that code can become truly readable. Below is the same program
as above though written in x86 assembly:
mov si, 0x117
lodsb
test al, al
jz short 0x111
mov ah, 0x0E
mov bx, 0x02
int 0x10
jmp short 0x103
xor ah, ah
int 0x16
int 0x19
db 'Hello World!',0x00
To help show how this works we'll break down the first opcode (a group of
hexadecimal values that together form one instruction) of the above program;
'BE 17 01'. This is created by the assembler from the user inputted
instruction 'mov si, 0x117'. The assembler knows that 'mov si' is the same
as the hexadecimal 'BE', and that this should be followed by the value of
the word after 'mov si', which is '17 01' (words, dwords and qwords are
stored 'backwards' due to the way they are fed in and then read by the
processor).
At this point I feel it appropriate to mention how to manually convert
numbers between binary, decimal and hexadecimal:
_Decimal -> Binary_
We know that binary is base-2, so the number columns in binary go up like
so:
(etc.) 2^3 2^2 2^1 2^0
The above expressed as numbers is; 8 4 2 1. Now say we wanted to convert the
number 7 into binary. We firstly take this line of numbers (the highest
number in this line must be higher than the number we are converting) and
write it down like so:
8 4 2 1
We now divide the number we’re converting by the highest number in this
sequence; 8. This gives us 7 / 8 = 0 r 7 (we’re not working with fractions).
We now divide our remainder (7) by the next number along, 4. This gives us 1
r 3. We no repeat this for the next number along; 2, giving us 1 r 1. Now
repeat for the final number, 1; 1 / 1 = 1 r 0. Finally allocate a 0 to all
numbers that did not divide, and a 1 to all numbers that did. This gives us;
0111b, which can be shortened to 111b.
We can check this is right by again using the number sequence; 8 4 2 1. We
know that 1 + 2 + 4 = 7 so 111b = 7d. Remember however that in computing
every number is expressed as bytes, so the number of binary characters must
be divisible by 8; 1 byte (as 1 binary character is 1 bit). This means that
our above number, 111b goes to 00000111b (although this can normally be
expressed as 111b to the assembler).
_Decimal -> Hexadecimal_
The process of converting decimal to hexadecimal is very similar to
converting decimal to binary, except we use a base-16 number sequence
instead of a base-2 one.
Using the number 123d as an example we firstly write down the number
sequence:
16^2 16^1 16^0 -> 256 16 1
We now divide our number (123) by 256 which equals 0 r 123. Then the
remainder (123) of this by 16 which equals 7 r 11. Then the remainder of
this (11) by 1 which equals 11 r 0. This gives us the end result of 0 7 11.
Now remember that in hexadecimal A = 10, B = 11, C = 12, D = 13, E = 14 and
F=15. This means that we should express our result (0 7 11) as 0 7 B. Now we
remove any leading 0’s, leaving us with the number 7Bh. Remember that a pair
of hexadecimal numbers is a byte so hexadecimal should always be expressed
in set of two’s (e.g. 5h -> 05h), though like with binary this is not
normally essential for the assembler to interpret the number correctly.
_Binary -> Decimal_
This is very simple to do. Say we have the binary number 01010101. We again
start with our base-2 number sequence writing the respective binary digit
beneath each number:
128 64 32 16 8 4 2 1
0 1 0 1 0 1 0 1
Now we add up all the numbers in the sequence that have a one below them; 64
+ 16 + 4 + 1 = 85.
_Hexadecimal -> Decimal_
This is very similar to the binary to decimal conversion only with base-16.
Using the number B3h as an example we write out our number sequence followed
by the numbers we are converting like above:
256 16 1
0 B 3
We next multiply the top number by the bottom number on all the numbers
above (remember that B = 11) and add up the results. This gives us; (16 x
11) + (1 x 3) = 179.
_Binary -> Hexadecimal_
Using the number 10101b as an example we first split our binary number up
into groups of four and the write the number sequence 8, 4, 2, 1 above each
group like so:
8421 8421
0001 0101
We next multiply the top number by the bottom number and add up the results
for each group. This gives us; (8 x 0) + (4 x 0) + (2 x 0) + (1 x 1) = 1 and
(8 x 0) + (4 x 1) + (2 x 0) + (1 x 1) = 5. Putting these two together gives
us the hexadecimal value of 15h.
_Hexadecimal -> Binary_
Using the number B3 as an example we first write the numbers constituent
parts all in decimal so that B3 becomes 11 3 (as B = 11). We now write these
numbers down four times in a row above the number sequence 8, 4, 2, 1 as
shown below:
11 3
8 4 2 1 8 4 2 1
We now divide each of the top numbers by the highest number in the sequence
below it (8), so 11 / 8 = 1 r 3 and 3 / 8 = 0 r 3. We then write this result
(either 1 or 0) below the number we just divided by and then divide the
remainder of these sums by the next number along in the sequence, and so on
until we have been through the entire sequence. We should then have a result
like below:
11 3
8 4 2 1 8 4 2 1
1 0 1 1 0 0 1 1
Finally we combine these two results to form the number 10110011b.
By now you're probably agitated to start writing your first bit of x86
assembly (or maybe not), though there are just a few more things you need to
know before you can start.
One of these is about registers. Registers can be considered as the assembly
equivalent of variables, though there are different registers and types of
registers for different purposes. The fundamental ones I shall explain
below, then introduce some of the others later on in this document.
Firstly the general purpose registers (GPR's); ax, bx, cx and dx. These are
16 bit registers and are known as the accumulator, base address, count and
data registers respectively.
note: that on a 32 bit machine (all modern CPU's, so the 80386 and above)
these registers are extended to the 32 bit registers; eax, ebx, ecx and
edx, and on a 64 bit machine (most new AMD CPU's at the time of this
writing) these 32 bit registers are extended further into; rax, rbx, rcx
and rdx. Also, for ease I shall explain these registers as if they are 64
bit. If you are only using a 32 bit system the below information is exactly
the same, just disregard the 64 bit register extensions.
All four of the GPR's do have slightly different purposes, though in most
cases can just be treated as variables within which you can store whatever
value you want. The differences between them are basically that the
accumulator register should be used wherever possible for calculations as
many opcodes have a 1 byte variant for dealing specifically with the ax
register. The base register doesn't really have a specific function anymore,
though derives its name from the 'xlat' opcode where it still has some
specific functionality, it can also be used as an index register (discussed
in a second) under certain circumstances. The count register is called so
because it is designed for keeping count of something, and this use can be
seen in the 'loop' opcode which removes 1 bit from the count register every
time it is run. The data register is designed primarily to give a 32 bit
(dx), 64 bit (edx), or 128 bit (rdx) extension to the accumulator register
in certain mathematical operations.
Below I have broken down the rax register into its constituent parts. The
other GPR's can be broken down in exactly the same way (just replace the 'a'
in the registers name with either 'b', 'c' or 'd'):
|------------------rax------------------|
|--------eax--------|
|----ax---|
[ [ [ ah | al ]
To help demonstrate what the above diagram is trying to represent, if for
example we set the rax register as the 64 bit integer
'1111111111111111111111111111111100000000000000001111111100000000b' then the
eax register would now have the value '00000000000000001111111100000000b',
the ax register would have the value '1111111100000000b', the ah register
would have the value '11111111b', and the al register would have the value
'00000000b'.
I haven't yet discussed what the ah and al registers are. These are just 8
bit registers that make up the 8 high-order and 8 low-order bits of the 16
bit ax register. So if the ax register is set as '0000000000000000b' and we
then add 3 to the ah register so that it becomes '00000011b' (since 11b is
the same as 3d), then the ax register's value would now be
'0000001100000000b'. Assuming the rax register had been set to
'100000000000000000000000000000000b' when this modification to the ah
register occurred, then the rax registers new value would be
'0000000000000000000000000000000100000000000000000000001100000000b'.
On a 64 bit machine there are also eight other general purpose registers
although I do not know enough about them to comment.
There are also stack and index registers that both have a more specific task
than that of the GPR's.
The three index registers are the; si (source index), di (destination
index), and ip (instruction pointer) registers (along with their 32 bit and
64 bit forms, appending an 'e' or 'r' to the beginning respectively). The
source index register is intended for use in keeping tabs of where to read
from in stream operations. The destination index register is the opposite of
si, being a place to store where data is to be written to in stream
operations. Both these registers can also be used as GPR's. The instruction
pointer register is different to the other two index registers in that it
can not be directly written too. It keeps tabs of the memory address at
which the current opcode being executed is, and also has the 32 bit and 64
bit variants; eip and rip for protected/unreal mode and long mode (explained
later).
The stack registers; sp (stack pointer) and bp (base pointer) (or esp/rsp
and ebp/rbp) are used to point to the top of the stack, and the base of the
stack (the stack is explained shortly). They are useful in many operations,
though can also be used as GPR's. However the base pointer register is the
only one you're likely to use much as a GPR.
Segment registers are another type of register. For now just know that there
are six of them; the ss (stack segment), cs (code segment), ds (data
segment), es (extra (data) segment), fs (another data segment) and gs
(another data segment). The stack segment register defines the segment
being used for the stack, the code segment register defines the segment in
which the code currently being executed is, and the four other segment
registers can each be used to point to different segments containing data. A
segment is an area in memory that can be between 1 byte and 4GB's in size
and are used when using a segmented memory model (as opposed to the flat or
real-address mode memory models). Different memory models will be explained
later.
The final register I shall be explaining at this point is the rflags
register. As you have probably already guessed the low-order 32 bits of the
rflags register is the eflags register, and the low-order 16 bits of this
register is called the flags register. The eflags register is the only part
of the rflags register that is ever used (the rest of rflags being Intel
reserved), and even much of this is Intel reserved and so I believe it
remains permanently fixed as one value (although this cannot be assumed). I
have drawn the eflags register below:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
[ 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ID | VIP| VIF| AC | VM | RF
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
| 0 | NT | I O P L | OF | DF | IF | TF | SF | ZF | 0 | AF | 0 | PF | 1 | CF ]
0. Carry Flag
2. Parity Flag
4. Auxiliary Carry Flag
6. Zero Flag
7. Sign Flag
8. Trap Flag
9. Interrupt Enabled Flag
10. Direction Flag
11. Overflow Flag
12-13. I/O Privilege Level
14. Nested Task
16. Resume Flag
17. Virtual-8086 Mode
18. Alignment Check
19. Virtual Interrupt Flag
20. Virtual Interrupt Pending
21. ID Flag
It is not important for you to know what all of these are at the moment
though they are all 1 bit flags (excluding IOPL), and so can hold either a 1
or a 0; yes or no. These help the processor determine what action to take
with conditional opcodes. For example the command 'jz' stands for 'jump if
zero' and what it means is if the zero flag is set to 1 then code execution
should jump to the memory address following the jz opcode. In other words if
zf = 1 then ip is changed to the number written after the jz command. If you
were to use the conditional jump 'jz' you would also need an opcode before
'jz' that will determine whether or not the zero flag is set, a command such
as 'test ax, ax' which means; if ax = 0 then zf = 1, else zf = 0.
The stack is a defined area of memory with no defined end that can be used
to store data, normally by using the push and pop opcodes that shall be
explained later. Data can be 'pushed' (copied) onto the top of the stack
from a specified register, and 'popped' (copied) off of the top of the stack
into a specified register. When data is pushed onto the stack the stack is
said to grow downwards (with the top of the stack visually being at the
bottom). When this happens the stack pointer register is decreased to point
to the new top of the stack. Similarly when data is popped off of the stack
into a specified register the stack pointer is then increased to point to
the value before last that was pushed onto the stack. For example if we
wanted to switch the contents of the ax and bx registers through use of the
stack we could do:
push ax
push bx
pop ax
pop bx
Because of the way the stack works it is said to use a LIFO/FILO system;
Last In First Out, or First In Last Out. It is possible to have multiple
stacks and each stack may have (theoretically) a maximum size of either 64KB
(16 bit) or 4GB (32 bit) although we won't go into that now.
note: to anyone wondering what the maximum size is under 64 bit architecture
no clear cut answer can be given. At first glance you may think 2^64 bytes,
or 16 exabytes which would make sense. However the architectural limit for
physical memory on the AMD64 is only 2^53 bytes, or 8 petabytes. You could
however still have a virtual address space of 16EB (not very useful when the
maximum memory you can have is 8PB), except there is a catch; current chips
only support at most around 256TB's (2^48 bytes) of virtual address space,
and the most memory you can fit on a single node is about 16GB (2^34 bytes),
making my previous three sentences completely pointless (thanks ricebowl for
that information). =)
Okay, now all the basic theory is out the way we can finally get on to
learning how to use the assembler and start writing are first few lines of
assembly.
Just to outline the difference between an assembler and a compiler at this
point for anyone still unclear, a compiler takes instructions and converts
them into several machine instructions, whereas a 'true' assembler takes
assembly mnemonics and does a direct 1:1 conversion of them into machine
code.
There are several assemblers to choose from, the main ones being;
a86
I don't know anything about it other than it's not free and I've never
(knowingly) met anyone who uses it.
Gas (GNU Assembler) - www.gnu.org/
I have never used this although it supposedly has poor error checking and
also uses the AT&T syntax. For those people who prefer AT&T syntax however,
it is very popular.
FASM (Flat Assembler)
I don't know much about this assembler although it seems fairly similar to
nasm.
MASM (Microsoft Macro Assembler) - www.masm32.com/
This assembler is very popular though I really dislike it. As the name
suggests it is only for windows and is designed around macros. It cannot
assemble a flat binary file and is lacking in support for a number of
opcodes forcing you to enter the raw hex code instead of mnemonics in some
instances. It does have a good pre-processor though, as well as there being
loads of includes and tutorials on the internet specifically aimed at this
assembler.
NASM (Netwide Assembler) - sourceforge.net/projects/nasm
My favourite assembler. Aside from being one of the most popular assemblers
out there and having great documentation it also likes to keep things as
'pure' assembly, although also contains a reasonable pre-processor.
TASM (Borland Turbo Assembler) - info.borland.com/borlandcpp/cppcomp/tasmfact.html
Like masm except it's not free. It is apparently slightly better although I
can't say I'm experienced enough with it to comment on that.
YASM
A partial nasm rewrite. I'm not sure what the advantage is of using this
over the better supported and more well known nasm but you may want to try
it.
There are also several other assemblers such as GoASM, RosASM and SPASM and
by all means try them. However if you wish to use includes in you future
projects you may have problems finding one's that work with the more niche
assemblers.
In case you are wondering what AT&T syntax is after reading the above let me
explain. Every assembler uses a 'slightly' different syntax, although their
syntax is always closely related to either AT&T or Intel syntax and can be
referred to as using one of these two syntaxes. To show the difference
between them I have included the same code in both AT&T and Intel syntax
below:
_Syntax_
_Intel_ _AT&T_
mul ebx mull %ebx,%eax
lodsb lodsb %ds:(%esi),%al
inc [eax-4] incl -0x4(%eax)
mov ebp, esp movl %esp,%ebp
I have heard users of AT&T syntax argue that it is much more legible than
Intel syntax. Now I personally not only feel that AT&T syntax is harder to
read, though that reading it can be compared to having your eyeballs spooned
out by a Vietnamese leper, although it is all personal preference I
suppose. It is also argued that it is more logical which I can see to some
extend. For a better comparison let's look at this line of code in
particular:
add eax, 1 addl $1,%eax
The main difference between these two lines of code is the ordering of ebp
and esp, although both instructions read 'move esp into ebp' (I will try and
ignore the fact that while the ordering of at&t for this may seem more
logical, using instructions like -0x4(%eax) for [eax-4] confuses the hell
out of me, although it is done that was supposedly as it suggests that it is
indexing). Anyway, this line also demonstrates my main qualm with using AT&T
syntax which is the ordering. If we look at the hex equivalent of the above
mnemonic we will see that is 'E3 C0 01', with E3 being 'add/addl', 'C0'
being the eax register, and '01' obviously being the number 1. You will note
from this that on a 1:1 basis between hexadecimal and assembly it is Intel
syntax that is in the right order. This is because the x86 architecture is
little endian, as explained below.
Endianess (a.k.a. byte order) denotes the order in which bytes are stored
and can be categorised as either big-endian, little-endian or middle-endian
(a.k.a. mixed-endian). If we take the number 0x12345678 as an example; using
a big-endian system this would be stored in memory as 12 34 56 78 with the
most significant byte first, using a little-endian system it would be stored
as 78 56 34 12 with the least significant byte first, and using a middle
endian system it would be stored as somewhere in-between the two; either
34 12 78 56 or 56 78 12 34 depending on the specific machine architecture.
note: The bit ordering of the numbers does not change with different
endianess, only the byte ordering. So the number 1111110000000011b under a
big-endian system architecture will be stored as 1111110000000011b, and under
a little-endian system architecture will be stored as 0000001111111100b.
This convention is carried through to opcode operands where on the x86
machine architecture the destination operand always precedes the source
operand. That is to say that if for example we had the opcode 'mov eax, ebx'
(move the value of ebx into eax so that eax now equals ebx) our target (the
destination operand, eax) comes before the value that will remain unchanged
(the source operand). If we were working under a big-endian system
architecture the above instruction would mean the opposite; move the value
of eax into ebx so that ebx no equals eax.
The term endianess originates from the book 'Gulliver's Travels', where two
factions are at war over which end of an egg you should crack open; the big end or the little-end.
This concludes part 1 of this tutorial series. Please give me any feedback on what you think so far.