Why adding people to a late software project makes it later?

That statement is known as Brooks’s Law, and it was coined by the renowned computer scientist and software engineer Frederick P. Brooks. Concretely, the original statement found in his 1975 classic The Mythical Man-Month is “adding manpower to a late software project makes it later”. Basically, the idea is that adding more analysts, designers or programmers to a project running behind the original schedule will delay it even more.

Broadly speaking, the rationale of Brooks’s law is related to knowledge management. First, when new personnel is added to the project, some resources have to be diverted into training or informing the newcomers about the project’s status, vision and philosophy. That will delay the project. Further, when the number of people participating in a project increases, so does the number of communication paths. Thereby, more resources (including time) are required in order to distribute the information. Regarding this point, you may be interested in reading my entry on “Knowledge Sharing” in Software Design, Trials and Errors.

Reentrant Routine

A routine or procedure P is reentrant (or pure code) if it can be “re-entered” after it is already in execution. Basically, it means that P can be executed two or more times simultaneously, or alternatively, that P can be safely executed concurrently. There are some conditions P must follow in order to be reentrant, and we may check them in the Wikipedia entry for reentrant functions.

Some programs necessarily have to be reentrant. For instance, device drivers. A device driver has to be reentrant because another interrupt may be raised while the driver is running. This means that reentrancy allows for code sharing. For example, if a program consists of 600 KB of code and 200 KB of data, and n users are simultaneously using the program, we would require n x 600 KB of physical memory for the code if the program is not reentrant. But if the code is reentrant we can share it among the n users, saving a lot of memory.

Stub

What’s a stub? It obviously depends on the context. A Stub may even be a relative of the Danish poet Ambrosius Stub. After all, code is poetry.

In computing, I know of 4 contexts where the word stub has a well-established meaning:

  1. Web Sites: A stub is a web page in progress, i.e., a page which provides minimal information and is intended for later development. For instance, a Wikipedia stub is a short article in need of expansion.
  2. Coding: During development, we sometimes use a “skeleton” function (or procedure, or method) to simulate some intended (but not yet implemented) functionality. For instance, the function may stand in for a complex algorithm to be developed later, or simulate a procedure running on a remote host. Such placeholder function is called a stub function. Stub functions come in handy for quick prototyping and testing.
  3. Distributed Systems: In distributed systems, a service interface defines the services available to programs. These services are distributed among several networked machines. In distributed systems, a program in machine A may request a service by calling a procedure. However, the procedure may be offered by a remote host, say, machine B. Remote Procedure Calls (RPC) are a paradigm of distributed systems aimed at abstracting the communication between hosts in a network. The goal of RPCs is to hide the details of the remote call. The remote call should look like a local one, i.e., the program in machine A would invoke the procedure in machine B as it would invoke a procedure locally. Under the hood, though, it’s obvious that we have to transmit information from the client (caller) to the server (callee), and in the other direction. Now, how to hide the fact that we are calling a procedure located in other machine? This is the basic idea of RPCs:
    • In the address space of the client, we represent the server procedure by means of a local procedure called client stub. Likewise, the server is also linked to a server stub, which will receive the message from the client stub.
    • When machine A requests a service which is provided by machine B, a call is made to the client stub (which has the same name as the procedure in B). As the client stub lies in the same address space of the caller, the invocation is handled locally, and the program sees this invocation as a local one. However, the client stub marshalls the received parameters and sends them, throught the network, to the server stub. In turn, the server stub unmarshalls the parameters and perform the call to the real procedure in the server. When the server procedure finishes, results or exception data travels back, from server to client. By the way, marshalling is the process of taking a collection of data items (such as the procedure name and its arguments) and grouping them according to some predefined representation, suitable for transmission over the network. The server should know and conform to this representation in order to unmarshall the received data and recover the transmitted information.

    Albeit conceptually simple, there are some interesting (nasty) problems for implementing RPCs, such as passing pointer arguments (remember that client and server have different address spaces).

  4. Computer Networking: A stub network is a network or part of network with only one communication path to external networks (non-local hosts). For instance, if we connect to our Internet Service Provider using only one router, our local network is a stub network with respect to our provider.

There is other related context for stubs: in electronics, we identify Stub sections, which are mostly used for impedance matching in transmission lines. But I’m not too familiar with this “stub” meaning.

A Central Abstraction: The Process

Abstractions

I do strongly believe in abstraction being the root of computing (however, you may want to read Is abstraction the key to computing? as a motivation for a different perspective on the role of abstraction in computing). Modern hardware and software systems include a lot of features and perform so many tasks that it is impossible to understand, build and use them without recurring to abstractions. For instance, let’s take a look at the CPU: it is the central part of a general purpose computing system, and is also an extremely complex system in itself. Functionally, a CPU is an instruction-crunching device: it processes one instruction after another, following the steps of fetch, decode, execute and writeback (in von Neumann architectures). In other words, the CPU retrieves the instruction from memory, decodes it, executes it, and put the results of the operation back into memory. Further, the CPU has no clue (and actually does not care) about the higher-level semantics of the instruction it may be executing at a specific time. For example, the CPU may be executing an instruction related to a spell-checking task, and a few instructions later it may be executing an instruction related to other task, say, MP3 playing. It only follows orders, and just execute the instruction it is told to execute.

Nowadays, computing systems are expected to do more tasks on behalf of its users. Several tasks must be performed concurrently. As in the previous example, the system might be running the spell-checker and the media player simultaneously. In multiprogrammed systems we can achieve pseudoparallelism by switching (multiplexing) the CPU among all the user’s activities (true parallelism is only possible in multi-processor or multi-core systems). Remember that multiprogramming requires the CPU being allocated to each system’s task for a period of time and deallocated when some condition is met.
Continue reading “A Central Abstraction: The Process”

hello world, C and GNU as

A thing all these programs had in common was their use of the 09h function of INT 21h for printing the “hello, world!” string. But it’s time to move forward. Now I plan to use the lovely C printf function.

GNU Head

Finally, it’s time to switch to the fabulous GNU as. We’ll forget about DEBUG for some time. Thanks DEBUG. GNU as, Gas, or the GNU Assembler, is obviously the assembler used by the GNU Project. It is part of the Binutils package, and acts as the default back-end of gcc. Gas is very powerful and can target several computer architectures. Quite a program, then. As most assemblers, Gas’ input is comprised of directives (also referred to as Pseudo Ops), comments, and of course, instructions. Instructions are very dependent on the target computer architecture. Conversely, directives tend to be relatively homogeneous.

1 Syntax

Originally, this assembler only accepted the AT&T assembler syntax, even for the Intel x86 and x86-64 architectures. The AT&T syntax is different to the one included in most Intel references. There are several differences, the most memorable being that two-operand instructions have the source and destinations in the opposite order. For example, instruction mov ax, bx would be expressed in AT&T syntax as movw %bx, %ax, i.e., the rightmost operand is the destination, and the leftmost one is the source. Other distinction is that register names used as operands must be preceded by a percent (%) sign. However, since version 2.10, Gas supports Intel syntax by means of the .intel_syntax directive. But in the following we’ll be using AT&T syntax.

Continue reading “hello world, C and GNU as”

hello, world

Personally, by reading “hello, world”, I evoke orange and warm afternoons, with my eyes strained (and soothed) by code. Nice, and overly inefficient Pascal code. In some images, a few BASIC snippets interleave, but those are not that nice to remember…

In calm thoughts, these two words (with the comma) bring to mind plenty of images. More often that not, I hold “hello, world” in fond remembrances. For this post I’ve slightly modified the default WordPress post title, in favor of the original Kernighan‘s form: no capitalization and presence of comma. Through the years, it seems to me that this sequence lightens my worries when coping with new languages, systems, things. Somehow, the mind has understood that once “hello, world” is done, then reaching the entire system is achievable. Kind of Pavlovian Conditioning, I guess.

In K&R’s C Tutorial, this feel at ease perception it’s also intended:

The only way to learn a new programming language is by writing programs in it. The first program to write is the same for all languages: Print the words hello, world. This is the basic hurdle; to leap over it you have to be able to create the program text somewhere, compile it successfully, load it, run it, and find out where your output went.

This way, “hello, world” should be our first step for pummeling through the new beast (language).
Continue reading “hello, world”