coLinux, int 80 on Windows and other rants

Generally speaking, an Application Binary Interface (ABI) is the interface between an application program and the operating system. Conceptually, it’s related to the more well-known API concept. But ABIs are a low-level notion, while APIs are more leaned toward the application source code level.

Recently, a friend sent me an email exposing some problems he faced when trying to assemble on Cygwin a code originally targeted at Linux. The problem, as he stated, was that int 0x80 didn’t perform as expected. Well, plenty of explanations are pertinent:

Cygwin

Cygwin allows to run a collection of Unix tools on Windows, including the GNU development toolchain. However, at its core, cygwin is a library which translates the POSIX system call API into the pertinent Win32 system calls (system calls are often abbreviated as syscalls). Therefore, cygwin is a software layer between applications using POSIX system calls and the Win32 operating systems, which allows porting some Unix applications to Windows. This way you can, for instance, have the Apache daemon working as a Windows service. Other very attractive feature of Cygwin is its interactive environment: you can run your shell quite nicely, and run your Autoconf scripts, for example. However, porting means recompiling. There is no binary compatibility, and your program cannot run in computers without Cygwin (without CYGWIN1.DLL, more precisely). Furthermore, albeit some progress has been made, Cygwin is relatively slow (it’s a POSIX compatibility layer, after all.) If possible, I prefer to recompile my applications directly with MinGW. For me, this allows for a faster development cycle. Note, though, that Cygwin can compile MinGW-compatible executables. It’s just that, as aforesaid, I prefer to work with MinGW directly. I only work on Windows if I have to develop applications for Windows. But Linux’s development tools are the best, and we can access several of them by using MinGW. I think that Cygwin is best suited for general cross-development and for handling complicated software porting.

System Calls and int 0x80

A system call is a request by an active process for a service performed by the operating system kernel. Remember that a process is an executing (running) instance of a program, and the active process is the process currently using the CPU. The active process may perform a system call to request creation of other process, for instance. Or perhaps the process needs to communicate with a peripheral device. In Linux on x86, int 0x80 is the assembly language instruction that is used to invoke system calls. int 0x80 is a software interrupt, as it will be raised by a software process, not by hardware devices. Before invoking such interruption, our program has to store the system call number (which allows the operating system to know what service your program is specifically requesting ) in the proper register of the CPU. Every interrupt is a signal to the operating system, notifying it about the occurrence of an event that must be computationally handled.

Continue reading “coLinux, int 80 on Windows and other rants”

hello world, C and GNU as

A thing all these programs had in common was their use of the 09h function of INT 21h for printing the “hello, world!” string. But it’s time to move forward. Now I plan to use the lovely C printf function.

GNU Head

Finally, it’s time to switch to the fabulous GNU as. We’ll forget about DEBUG for some time. Thanks DEBUG. GNU as, Gas, or the GNU Assembler, is obviously the assembler used by the GNU Project. It is part of the Binutils package, and acts as the default back-end of gcc. Gas is very powerful and can target several computer architectures. Quite a program, then. As most assemblers, Gas’ input is comprised of directives (also referred to as Pseudo Ops), comments, and of course, instructions. Instructions are very dependent on the target computer architecture. Conversely, directives tend to be relatively homogeneous.

1 Syntax

Originally, this assembler only accepted the AT&T assembler syntax, even for the Intel x86 and x86-64 architectures. The AT&T syntax is different to the one included in most Intel references. There are several differences, the most memorable being that two-operand instructions have the source and destinations in the opposite order. For example, instruction mov ax, bx would be expressed in AT&T syntax as movw %bx, %ax, i.e., the rightmost operand is the destination, and the leftmost one is the source. Other distinction is that register names used as operands must be preceded by a percent (%) sign. However, since version 2.10, Gas supports Intel syntax by means of the .intel_syntax directive. But in the following we’ll be using AT&T syntax.

Continue reading “hello world, C and GNU as”

Writing Programs with Echo (DOS)

How do you input those characters as parameters for the echo command? I found no way of doing that. If you know a way, please drop me a line.

Is that possible? Yes, it is. It’s just a matter of redirecting echo output to a file. Writing the program with echo should be a straightforward task if we are able to produce the sequence of characters corresponding to the intended binary, executable file. Is that useful? Surely not. But it’s a healthy way to waste your time 🙂 This can be achieved by writing the characters of the executable file, using a simple text editor like notepad or even the old MS-DOS Editor. Of course, the program should be relatively small or we would adventure into the dangerous lands of masochism. By using the echo command of DOS we will be following the conceited style of doing things 🙂 But we’ll restrict this post to the simple hello, world! program we have been reviewing in previous entries.

Continue reading “Writing Programs with Echo (DOS)”

Encoding Intel x86/IA-32 Assembler Instructions

Translation of the second line is a direct and solved issue. What about jmp 114? Well, we want to jump over the data (18 bytes, one byte per each character in the string.) IASDM tell us (Appendix B) that the opcode for unconditional jumps in the same segment is 11101011, which in hexadecimal, is expressed as EB.

On the post Debugging hello, world, someone asked about the reason for translating the instruction jmp 114 into hexadecimal EB12. To answer this, we are going to recur to the “lovely” and elder Intel Architecture Software Developer Manual (IASDM), Volume 2. This volume describes the instructions set of the Intel Architecture processor (x86/IA-32) and the opcode structure. I’ll review some terms involved here:

    x86: It refers to the instruction set of the Intel-compatible CPU architectures (chips produced by Intel, AMD, VIA, and others) inaugurated by Intel’s original 16-bit 8086 CPU. A decision which proved wise was to make each new instance of x86 processors almost fully backwards compatible.
    IA-32: It is Intel’s 32-bit implementation of the x86 architecture; IA-32 distinguishes this implementation from the preceding 16-bit x86 processors. Note that when the 64-bit era arrived, Intel launched its Itanium processor, which discards compatibility with the IA-32 instruction set. Such 64-bit architecture description and implementation is referred to as IA-64, meaning “Intel Architecture, 64-bit”, but even though the names are similar, IA-32 and IA-64 are very different architectures and instructions sets. However, AMD’s response to Intel 64-bit processors, uses an instruction set that, in essence, is composed of 64-bit extensions to IA-32, i.e., it’s a superset of the x86 instruction set. Such instruction set is referred to as AMD64 (initially, x86-64.) Later, Intel cloned it under the name Intel 64. AMD’s processors Athlon 64, Terium, Opteron, Sempron, etc., are based on AMD64.
    Opcode: An opcode (operation code) is the part of a machine language instruction (pure binary code) specifying the operation to be performed. The other portion of the instruction is the operand, which is optional and represents the data to be operated on. In assembly language, mnemonics are used to represent the opcodes. Concretely, and according to the IASDM, a mnemonic is a reserved name for a class of instruction opcodes which have the same function. For example, in JMP 114, the mnemonic is JMP, and the operand is 114 (remember, 114 in hexadecimal, which is 276 in decimal.)

Continue reading “Encoding Intel x86/IA-32 Assembler Instructions”