Programs using bproc should include the bproc header file
sys/bproc.h and be linked with -lbproc. This package
builds both static and dynamic versions of libbproc.
This initializes the bproc library. It reads the
current machine state from /var/run/bproc. (This machine state is
only available on the master node.) It also reads an initial node
mapping from $HOME/.bprocnodes if it exists.
Returns the number of nodes in the system. This is the number of slave nodes (not including the front end). The nodes are numbered 0 though n-1.
Returns true if node is up.
Saves
the IP address of node in the structure pointed to by s.
Note that bproc_init has to be called on the master node in order for
this information to be available.
The library allows the user to create a node mapping that sits on
top of the real node numbers. This allows the user to always see the
nodes he is using as nodes 0 through n-1 regardless of the physical
nodes in use. Mappings are presented as an array of integers. The
number in element zero is the real node number node zero will map on
to and so on. bproc_init() reads an initial node mapping.
This sets the node
mapping being used by libbproc. map is a pointer to an array
which lists the real node numbers that node numbers 0 through
numnodes should map onto.
This clears any node mapping that might be present. After this call, all node numbers will be treated as physical node numbers.
Bproc provides a number of mechanisms for creating processes on remote nodes. It is probably better to think of these mechanisms as moving processes from the front end to the remote node. The rexec mechanism is like doing a move then exec with lower overhead. The rfork mechanism is implemented as an ordinary fork on the front end and then a move to the remote node before the system call returns. Execmove does an exec and then move before the exec returns to the new process.
Movement to another machine on the system is voluntary and is not transparent. Once a process has been moved all its open files are lost except for STDOUT and STDERR. These two are replaced with a single socket. (Their output is combined.) There is an IO daemon what will forward between the other end of that connection and whatever the original STDOUT was connected to. No pseudo tty operations are done.
The move is completely visible to the process after it has moved
except for process ID space operations. Process ID space operations
include fork(),wait,kill, etc. All file operations
will operate on files local to the node that the process has been
moved to. Memory that was shared on the front end will no longer be
shared.
Processes currently cannot move twice. The process movement API is only provided on the master node.
Bug: Any child processes that a process had before moving will no longer be visible to it after moving. SIGCHLD's will be delivered when they exit but it will be impossible to pick up their exits status with wait().
This
call is like execve in that it replaces the current process with
a new one. The new process is created on node and the local process
becomes the ghost representing it. All arguments are interpreted on
the remote machine. The binary and all libraries it needs must be
present on the remote machine. This function returns -1 on failure
and does not return on success.
This call will move the current process to the remote node number given by node. The flags argument determines the details of the memory space move. See the VMADump for details on the flags argument. Returns 0 on success, -1 on failure.
The semantics of this function are designed to minic fork() except that the child process created will end up on the node given by the node argument. What happens behind the scenes is the process forks a child and that child performs a bproc_move() to move itself to the remote node.
By combining these two operations in a system call, we can prevent zombies and SIGCHLD's in the case that the fork is successful but the move is not.
On success, this function returns the process ID of the new child process, on failure it returns -1.
This function allows migration of ordinary binaries by allowing you to exec a new process and move the new process before it "wakes up".
Returns -1 on failure, does not return on success.
VMADump is a kernel module distributed with bproc which will dump a process's state to or from a file descriptor. VMADump is short for Virtual Memory Area Dumper. It will read or write to pipes, sockets, etc. as well as ordinary files. These functions are used internally by bproc to move processes around. The saved state includes:
The following interface is provided for vmadump in libbproc:
This takes the current process
and dumps it to the file fd. It returns the number of bytes
written to fd. When the process is undumped, this function will
return 0. The flags argument determines what memory regions will have
their data dumped and which ones will be stored as file references.
Writable memory regions are never stored as file references.
If given, read only mmaps from files in /lib and /usr/lib will not be stored as file references.
If given, read only mmaps from the executable file will not be stored as file references.
If given, other read only mmaps not falling into the categories above will not be stored as file references.
If given, no read only mmaps will be stored as file references. This is the safest option if in doubt. This is the logical OR of the other flags.
This attempts to undump an image from fd. This function is not very error tolerant. If something goes wrong half way through undumping, it will return with a half-undumped process. If successful, the current process is replaced with the image from the dump. (much like exec)