Sat May 6, 2017
One of the most important tasks of the kernel, in a client-server model, is to provide a means for clients and servers to send and receive messages. Clients need to be able to talk to servers, servers need to be able to talk back to clients, and servers need to talk to servers.
I. Communication Mechanisms
There are four parts to this communication infrastructure:
When a process that wants to be a server is created, it requests a port number from the kernel. Ports are the endpoints where clients and servers send their messages, kind of like a mailbox address. Each port has its own queue of messages being sent to, or returned by, the server.
When a client wants to talk to a server it needs a way to find what port number that server is bound to. This requires the need of a separate server that has access to the kernel’s directory of port numbers. This server translates the server names to port numbers.
3. Message Passing Commands
These are the actual syscalls for all user processes, clients and servers, to use to send and receive messages.
4. Interrupt Communication
There must also be a way for the interrupt handlers to communicate to other processes. These are separate from user space message passing commands because they are used within the kernel.
II. OS Comparisons
Mach/Hurd does things somewhat similar to the basic client-server system in that the microkernel (Mach handles the IPC. It uses ports and a translator, but its differences are signiffcant enough to take a moment to look at it.
Remote Procedure Calls (RPCs)
MachOS utilizes RPCs, which are transport-transparent procedure calls. It is basically a way of making the functions exported by a server accessible by clients (local or remote). Mach facilitates this by providing an interface for connecting client RPC calls to the server RPC functions. This is really nice for developers writing on the Mach platform. However, internally, the kernel has to go through complex steps to make it all work.
The Mach/Hurd IPC is asynchronous in that a thread sending a message isn’t blocked until the thread it sent it to receives it. This can easily cause scheduling headaches and other problems, as well as add a lot of overhead (such as complex buffer code).
When a potential server is requesting a port it calls on the translator. The translator will actually mount the server to Hurd’s Virtual File System, so it is treated like any other directory.
So if a client wanted to talk to the
pfinet (a TCP/IP internet driver) it would call the root translator (which is seen by all applications) for the
port right to
/servers/socket/pfinet. Port right is the right, or permission, to connect to a port. The root translator will then find the TCP/IP internet driver) translator which will then open the port by giving the port right to the client. At this point the client can make RPC calls to open TCP/IP sockets or send commands such as ping.
Any user who wants to run a server process can have it request a translator of its own. Upon bootup, though, most servers’ translators are started by the root translator, such as
pfinet. This whole system has very powerful implications, since all of the servers are accessible through the file system, filesystem commands, such as
ls can give the user access to different parts of the server if they were in the appropriate directory. This is controlled and restricted, however, by using UNIX file permissions.
Qubit and Plan9.
Plan9’s translator, called
namer, is a user space server that facilitates the translation of a server name to its port number. When a server registers itself with the namer, namer assigns it a port number that is provided by the kernel. The kernel keeps track of these associations in a lookup table.
When a process wants to send a message to a server, it must connect to that server and get the port number for it. This connection request, via
msg_connect(), uses namer to get the port number. Once connected, the process can use that port number when sending messages. When the process does send a message, using
msg_send(), the syscall uses the port/server lookup table in the kernel to translate the port number to the server it is associated with.
This method requires a lot of bouncing back and forth between user space and the kernel, an alternative being to integrate
namer into the kernel. It will still issue, or
bind, a port to a server upon request; however, ports are represented by strings instead of numbers. And instead of these strings being some arbitrary lookup number to the server, they actually contain the address (within the file system) of the server. This accomplishes two things:
1) The user space to kernel bounces are diminished. 2) It eliminates the server name/port number lookup table.
Message Passing Commands
msg_connect(port)- Connecting to a port first, before the thread can send messages to it, is important because the kernel has to authenticate the thread. For example, you don’t want a userspace program to be able to connect straight to the
wddisk server because they could cause some real damage. Instead, the user program has to connect to
fswhich can connect to
wdbecause it knows how to properly talk to wd without corrupting files.
msg_send(port, &msg)- This is where the thread can actually send a message
msgto the server. Since the Qubit IPC is synchronous, every message send causes the sender to block. It can send as many as it wants, one at a time. When it wakes up, the message that
msgpoints to has been changed to contain the server’s reply.
msg_receive(&msg)- This allows server threads to receive a message
msg. Each receive blocks the thread, and it can do as many as it likes, one at a time.
msg_reply(&msg)- This is how the server replies to a message sent to it. It also enables the server to pass messages back to the sender by changing what’s inside the original
msg_close()- This closes the connection to the port and the process can no longer send or receive messages from it.
Interrupt Service Request (ISR)
Interrupts are implemented in the ISR. This is basically a set of message calls that allow interrupt handlers to directly inject messages into subscribed servers’ queues without blocking (
kmesg_send() for kernel-message send, etc.). Clients never talk directly to interrupts and they never have to. Besides, the interrupt handler has to have a port to send messages to, and clients can’t have ports. They always user servers as an in-between, thus further simplifying the ISR.
Qubit’s implementation takes advantage of shared interrupts, which means that more than one server can subscribe to an interrupt. The handler also has two different lists of these subscribers: a ready list and a pending list. If a server were ready to receive a message from the handler it would be on the ready list. When the interrupt calls the handler it informs every server on the ready list and copies them to the pending list. When a server receives the message, off the queue, from the handler, that server is moved back over to the ready list. This model does away with missed interrupts, but allows for a much cleaner and more flexible way of dealing with multiple subscribers.
Typical IPC Flow
Let’s say a thread wants to use the keyboard. Keyboard is an I/O and uses interrupts; the in-between server is
console (whose program actually resides in the file system at location
//console for console. The procedure this thread would go through is as follows:
msgcontains a message that the
consoleserver understands as “get keyboard input”. The program blocks until
consolereplies with the actual keyboard input inside
msg_close(//console)to close the connection after looping through send/receive/send/receive enough times to get all that it needs.
In the case of the
console server, it would look something like this:
msg_receive(&msg)which will block the server until it receives a message.
msg_send()to the keyboard interrupt and block, putting returned data into
msg_reply(&msg)which will wake up (i.e.
set_runnable()) the blocked thread, with the new data in
The server will then loop back to
msg_receive() and block, waiting for more messages.
Due to the nature of multithreaded programming, special attention must be paid to the critical regions of code. These critical regions are parts of a program that access a shared resource. For instance, the message queue for a server is shared by everyone; in other words, any other client or server can send a message to it whenever they please (if they have the appropriate privileges). The part of the program that processes a message send request by putting a message on the server’s queue, and then incrementing the index pointer on that queue, is a critical region. Keeping in mind that a preemptive scheduler (such as Qubit’s) can interrupt a process anywhere, imagine what may happen if that process were interrupted after adding a message to a queue, but before incrementing the index pointer! Now imagine if the new process to run was sending a message to the same queue. Since the index pointer was never incremented, the message from the previous process becomes overwritten!
This is an example of a race condition: two processes are racing for access to a shared resource the one that gets there first alters the state of the resource, adversely affecting the other’s work on the resource.
To overcome this common occurrence, the kernel must ensure that all message commands are atomic, and thus are run as one command without interruption. The only case, where race conditions could still become a problem is in the context of multiprocessors. This is because two different message syscalls can be running at once (on two different processors).