Encoding Intel x86/IA-32 Assembler Instructions

On the post Debugging hello, world, someone asked about the reason for translating the instruction jmp 114 into hexadecimal EB12. To answer this, we are going to recur to the “lovely” and elder Intel Architecture Software Developer Manual (IASDM), Volume 2. This volume describes the instructions set of the Intel Architecture processor (x86/IA-32) and the opcode structure. I’ll review some terms involved here:

x86

IA-32

IA-32

Opcode

code

machine language

mnemonics

JMP 114

JMP

Unlike in high-level languages, there is usually a one-to-one correspondence between basic assembly statements and the binary code of machine language instructions. Nevertheless, in some cases, an assembler may provide pseudo-instructions which expand into several machine language instructions to provide commonly needed functionality. Or no instruction at all, such as DB in

db 0d,0a,"hello, world!",0d,0a,"$"

which directly translates into the sequence of characters (in hexadecimal):

0D 0A 68 65 6C 6C 6F 2C 20 77 6F 72 6C 64 21 0D 0A 24

Therefore, pseudo-instruction DB acts only as a data markup for the assembler. Now, for clarity, I’ll repeat the code of Debugging “hello, world” here:

- a 100
CS:0100 jmp 114         ; Jump over the 18 bytes of the string
CS:0102 db 0d,0a,"hello, world!",0d,0a,"$"
CS:0114 mov ah,9       ; Print function
CS:0116 mov dx,102
CS:0119 int 21
CS:011B mov ah, 0      ; Terminate the program
CS:011D int 21
CS:011F
-g =100

Translation of the second line is a direct and solved issue. What about jmp 114? Well, we want to jump over the data (18 bytes, one byte per each character in the string.) IASDM tell us (Appendix B) that the opcode for unconditional jumps in the same segment is 11101011, which in hexadecimal, is expressed as EB. We need to provide the operand for completing the instruction. In this case, as we want to jump over the string data, our operand is 18 (12 in hexadecimal.) That’s why jmp 114 translates into EB12. Note that the operand for this jmp specifies the 8-bit displacement, i.e., the operand is not an explicit address.

Translation of the other instructions is straightforward, and again we only have to follow the IASDM. Let’s analyze encoding of mov ah,9 anyway. In this case we have an immediate operand (a constant, 9.) Thus, for moving an immediate operand to a register the encoding adopts this form:

1011 w reg : immediate data

There, w represents the bit for operand size. That bit specifies if data is byte or full-sized (where full-sized is either 16 or 32 bits.) As we’ll be using 8-bit operands, set the bit to 0. On its side, reg is a 3-bit sequence identifying the destination register. Table B-3 of the IASDM dictates that if w = 0, then register AH is encoded as binary 100. Thus, encoding of mov ah,9 is

10110100 00001001

which in hexadecimal is expressed as B409. The next instruction, mov dx,102, follows a similar approach:

1011 1 010 0000 0001 0000 0010

In this case, however, w is set to 1, as the operand 102 requires more than 1-byte storage. The 3-bit sequence for DX is 010. Needless to say, 0000 0001 0000 0010 is the binary representation of the hexadecimal value 102 (16 bits are required). Expressing in hexadecimal, we would have BA0102. However, the bytes for the operand has to be stored in reverse order, and thereby the right encoding for the instruction is BA0201.

Next, INT n (Interruption type n) is encoded as 1100 1101 : type. Therefore, int 21 is encoded as 1100 1101 0010 0001 (CD21 in hexadecimal.) And encoding of mov ah, 0 as B400 follows directly from our previous explanations. Finally, we can translate our little “hello, world!” into binary code directly:

-e 100 EB 12 0D 0A 68 65 6C 6C 6F 2C 20 77 6F 72 6C 64
-e 110 21 0D 0A 24 B4 09 BA 02 01 CD 21 B4 00 CD 21 0D
-g =100

And that’s all. I think that my explanations have been clear. But I’m always open to any suggestions and corrections. Thanks for reading.

Meta

2010 – July, 17th

Thanks to BaBax for pointing out an error in the encoding of db 0d,0a,"hello, world!",0d,0a,"$". I had involuntarily included the address of the character “!” into the encoding. Thanks, BaBax, for the correction.

20 thoughts on “Encoding Intel x86/IA-32 Assembler Instructions”

CarlosV says:

July 16, 2010 at 6:02 pm

I think that assembly is not my thing anymore… And in the remote chance I code in assembly again, I’d go for RISC architectures.
NaruFan says:

July 16, 2010 at 6:58 pm

Thanks for the post… I didn’t know the IA32 – IA64 thing. But now the translation from assembly to binary code seems pretty straightforward to me.

Heck, perhaps we could now code directly in binary!

🙂
NaruFan says:

July 16, 2010 at 8:11 pm

@Carlos_Vasquez: “And in the remote chance I code in assembly again”

Tell that to my hardware architecture professor!
Alvaro Canales says:

July 16, 2010 at 9:07 pm

So far, so good. Your article is very clear, and now the doubts about JMP 114 translation should be out.

However, please, please, don’t forget to change the layout colors. Those ‘grayed’ assembly comments are killing my eyes.
El_Hombre_Que_Programaba says:

July 16, 2010 at 10:10 pm

…hace rato que no programo en ensamblador, pero el post me trae buenos recuerdos… y la explicación está muy bien hecha

prácticamente, podemos abrir un archivo de texto y escribir los caracteres en hexadecimal, cambiamos la extensión a ejecutable y deberíamos tener un programa funcionando, sin ensamblador ni compilador…

ahora, yo no le dedicaría más tiempo al MS-DEBUG, y me iría por algo mucho mejor como el Gas (GNU as)…
thehAllMark says:

July 16, 2010 at 10:16 pm

aaaggghhh… who needs assembly nowadays ?!
alejandro says:

July 16, 2010 at 11:20 pm

Thanks everybody for dropping by, and thanks for your comments.

@El_Hombre_Que_Programaba: Aunque aún no estoy seguro, dudo que los próximos 2 o 3 artículos traten de ensamblador. Y lo expuesto en el artículo, en su mayoría, es de carácter general (no restringido a MS-DEBUG). La referencia a MS-DEBUG proviene del post anterior.
Mario Cantera says:

July 16, 2010 at 11:33 pm

@El_Hombre_Que_Programaba: Cómo harás para introducir el 0D final en el archivo de texto?
alejandro says:

July 16, 2010 at 11:41 pm

El ‘Carriage return’ del final realmente es prescindible. El programa termina después de ejecutar INT 21.
GForceSpeed says:

July 17, 2010 at 2:03 am

Wow, now tell us about coding in pure binary. Who does need C++, Python, Perl and such inefficient things?
BaBax says:

July 17, 2010 at 2:16 am

It’s ok! Without the knowledge of assembler mnemonic it is
hard to understand what a “high-tech computer” is doing when
it is booted.
Good explanation!

By the way, there is an error in the part

———————————————————-
“which directly translates into the sequence of characters
(in hexadecimal):

0D 0A 68 65 6C 6C 6F 2C 20 77 6F 72 6C 64 110 21 0D 0A 24”
———————————————————-

The number 110 is the address of the byte 0x21 (the
character “!”) and doesn’t belong to the string.

The 0x110 is out of range of standard ASCII and the whole
string would look like “hello, worldÉ!” if you are using
Extended ASCII Codes. Furthermore, with the 0x110 the
whole string would be 19 characters long (as shown) and not
18.
alejandro says:

July 17, 2010 at 7:57 am

You’re right, BaBax. I’m sorry for the mistake. I have corrected the encoding, and included a note recognizing your contribution.

Thank you very much.
Vosk says:

July 17, 2010 at 8:45 am

Gracias por la explicación acerca de este tema.
Pingback: Writing Programs with Echo (DOS) | Chocolates para Lucía
Jaramillo says:

July 26, 2010 at 10:52 pm

Excelente!
JaviMartin says:

July 27, 2010 at 6:10 pm

Demás, demás esta explicación, clarísmo.
__hexacode__ says:

July 27, 2010 at 8:50 pm

great post indeed

now we should code directly in binary
Denthre says:

July 28, 2010 at 8:50 pm

nice explanation… retro feeling
felipe-ramirez says:

August 5, 2010 at 5:04 pm

thanks god I studied laws
Pingback: Cracking Cocoa Apps for Dummies « fruitfly

This site uses Akismet to reduce spam. Learn how your comment data is processed.

20 thoughts on “Encoding Intel x86/IA-32 Assembler Instructions”

Leave a Reply