SBL Simulator

This chapter describes the SBL simulator: how to use it and how it works.

User's Guide to the SBL Simulator

The simulator runs on top of PVM version 3.3.4 or later. It can be run on more than one machine, but all machines need to have the same virtual address size. It has been tested on the network of DEC ALPHA workstations as well as the network of LINUX PCs.

To run the simulator you need to install first PVM software. Refer to PVM documentation to properly install this package. The simulator works with internet sockets only, so you need to define NOUNIXDOM in PVM configuration file before you compile PVM. SBL checks this condition and prints error messages indicating what exactly should be done if it is not satisfied.

Each SBL process is represented by one PVM task. Standard output and standard error of processes started with SBL_Spawn go to temporary file (in standard configuration it is /tmp/pvml.userid on each node).

The simulator consists of the following source files:

The following restrictions are introduced by this simulator:

Making SBL Simulator

To make SBL library, go to sbl/src directory and "make depend" followed by "make".

CLIENTSERVER compilation variable is defined in SBL makefile (in DEFINES macro). When this variable is defined, SBL takes care of cases when processes die unexpectedly. In such cases, PVM delivers notifications to surviving processes, and SBL is able to clean up its state. With CLIENTSERVER undefined, SBL library is oblivious to deaths of other processes.

The correct semantics of SBL is obtained with CLIENTSERVER defined, at the cost of a few spurious error messages printed by PVM daemons. If you run only multicomputer programs, where processes do not die unexpectedly, you can undefine CLIENTSERVER in SBL makefile and avoid these messages. For more discussion see below.

SBLMALLOC compilation variable is defined in SBL makefile. When this variable is defined, SBL supplied malloc is used instead of standard malloc. You should not undefine this variable, because PVM may call malloc from inside functions called by SBLSIG handler.

Starting a New Program

To start a new program you need to start PVM (PVM daemon on each node).

Next, you can start your own program in a standard way (for example from shell prompt).

If nodes in your PVM machine use the same file system (NFS mounted), you can keep one copy of an executable, and start it with absolute path (this path must identify the executable on all nodes). Alternatively, you can place executables in default PVM directory for binaries (in the standard configuration it is ~/pvm3/\$(PVM\_ARCH)/bin).

Debugging

SBL supports collecting event traces (logs) of SBL processes. Logging is non-invasive, and does not require network communication until a log is printed to a file.

Each process can produce its own log. To do so, SBL should be compiled with DEBUG defined and LOGDIR set to the path where log files should go. By default this is /u/$(LOGNAME)/tmp. Note that the same directory is used by each process, so it should be accessible on all nodes running given SBL program (for example, by using NFS).

Log file is printed to a file by calling SBL_PrintLog(). This call produces event log for this process only, so to produce all log files you need to call it inside each process. The event log of a process pid executing on node with hostname Hname goes to text file $(LOGDIR)/..log. For example, if process 9428 belonging to user rda executed on node blizzard, then the log file is /u/rda/tmp/9428.blizzard.log.

Log file is a text file, with each line representing one event. Each event has its name (text string) followed by an event-specific text. Events can be logically grouped into receive events, send events, and misc events (spawn, task exit, export and user-defined event). Both send and receive events print, after event name, id of a sender followed by id of a receiver associated with this particular event, followed by a sequence number of sender. This number is incremented each time a process sends a message (system or user-level). The triple (sender-id, receiver-id, seqno) uniquely identifies a message across all log files of a given SBL program.

Note that for a receive event, the receiver is always a process to which this log belongs. For send events, owner of this log is the sender process.

How an SBL process is identified inside log files depends on how many nodes are used by a PVM machine a given SBL program executes on. If only one node is used, an id is just a UNIX proces id; if more than one node is used, an id consists of a UNIX process id followed by a colon followed by a hostname.

User-defined events can be added to a log by calling SBL_AddLogEvent(char *evdesc). This adds a string evdesc to this process log as a description of a user event.

The following events are defined:

See the twonodes sample program for illustration on how to use logging, and how to add user-defined events.

Sample User Programs

All sample programs work regardless of definition status of CLIENTSERVER SBL compilation variable.

twonodes Program

twonodes.c is a simple program illustrating how to use a number of SBL calls.

Two processes are started: master and slave. Both are started form the same executable (twonodes.c). Master is the main process. It takes no arguments. Slave is started automatically by a master with one argument - string "slave" (note that you should not start slave directly).

Master process first uses SBL_Hosts to get machine configuration, and prints it on the standard output. Next, slave is started. Master process exports one receive buffer (rbuf) of size 1 SHRIMP word and with buffer id rbufId set to 2. After this export, master spins waiting for a write coming from the slave.

Slave imports receive buffer created by the master and writes to it once. Note that slave uses SBL_Parent to get node and process id of its master, which are needed for import call.

unimport Program

unimport.c is a variant of twonodes in which master spins at the end waiting for slave to unimport receive buffer. If CLIENTSERVER was defined at SBL compilation, it is not necessary for slave to explicitly unimport receive buffer.

notify Program

notify.c is a variant of twonodes in which slave sends message with notify to master.

Master spins at the end waiting for a flag to become 1. This flag is set when notification arrives and the message handler associated with exported receive buffer is called. Please note that message itself does not set the flag (unlike in original twonodes), as the data of the message is one integer equal to zero.

ring Program

ring.c file contains the source of the ring program which builds a ring of processes and passes one token (an integer) along this ring.

First process, a master, is started with one argument, total number of processes to start. This number should be greater than 1.

Slaves and masters execute from the same file. To distinguish between master and slaves, first argument to slave is slave string, the second argument is number of processes. You should not attempt to start slave process.

Master starts nprocs -1 slaves and initializes them by sending one initialization message to each slave. This message goes to initialization receive buffer (InitRbuf) on each slave (note that master has to import all initialization buffers of slaves). Initialization message contains node and process ids of all processes started by ring. Additionally, this message contains process arguments specific to a given slave (order number of this slave on the ring).

To build a ring, each process exports its elment of the ring (receive buffer RingElem), and imports a proxy for the ring element of next process. Finally, each slave process receives an integer, increments it and forwards it in the ring. Master process prints incoming token on standard output (the value of the token should be equal to nprocs.

Notes on Simulator Implementation

The simulator uses asynchronous I/O and SBLSIG signal to interrupt processes when there is a pending message to be received. SBLSIG is defined in sbl.c to be SIGIO UNIX signal.

Use of signals is caused by a semantic gap between programming model supported by PVM and message passing in SHRIMP. Namely, in PVM model there is explicit receive operation, whereas in SHRIMP model messages are received by hardware and no software intervention is necessary. Since signal handler is provided by SBL, the use of signals is transparent to user programs with the following exceptions: SBLSIG signal should not be used by user programs, signals should be blocked with care and PVM calls should not be used directly. The last restriction is necessary since PVM routines are not reentrant so if a signal handler is invoked in one of them unpredictable things may happen.

To avoid such problems in SBL, SBLSIG signal is blocked each time we call SBL function. After SBL call returns, this signal is unblocked. Obviously, care must be taken to avoid deadlock with this scheme, as blocking signals turns off blind (i.e. signal-generated) message receiving. If an SBL function must block spin waiting, SBLSIG is unblocked to enable receiving.

Any non-reentrant functions from other libraries (like stdio) cannot be called from within signal handler. For this reason SBL supplies its own malloc related functions, which replace system-specific malloc. This malloc is SBLSIG signal-safe and can be used from iside signal handler.

How CLIENTSERVER Definition Affects SBL

There is a cost of spurious error messages printed by PVM daemons associated with SBL compiled with CLIENTSERVER defined. Since SBL library does not know the order in which user processes will die, SBL of each process requests notifications about other processes' deaths from PVM daemons. This means, that sometimes PVM daemon attempts to send notification to a process which is already dead. In such cases, PVM prints an error message (which should not be printed in my opinion): [t80040000] sendmessage() what? to t0 [t80040000] waitc_get() cod dm_notifyack from t80040000 wid 8 not found The following text illustrates few cases in which SBL behavior depends on whether CLIENTSERVER was defined in the time of SBL compilation:
Case Description: Exporter Dead
Case Description: Importer Dead
Case Description: Importer Dead while Unexport in Progress

Copyright (c) 1995, Princeton University