Transit Note #19
MBTA: Network Level Transactions
Andre DeHon
Original Issue: June 1990
Last Updated: Tue Nov 9 12:54:46 EST 1993
Goal
Stepping a a few levels up in abstraction from the raw network transport
layer (RN1), we must consider the lowest level of semantic operations
which occur across the network. The exact nature of the optimal set of
primitives is probably somewhat dependent on the model of computation used.
In any case, at this point in time we do not presume to know what the
optimal network transactions are. Determining the set of useful low level
network transactions will be one of the aspects of our study.
In constructing the experimental multiprocessor MBTA (tn17), we
must support some low level transactions to gain both functionality and a
moderate level of efficient network usage. The goal then, is to
efficiently support a small set of very primitive operations which seem of
universal value. These basic primitives should allow us to emulate others
which may be of interest for the sake of experimentation. The supported
primitives should be simple to implement in minimal hardware; this
simplicity is paramount to keeping the hardware complexity of MBTA to a
minimum.
Essential Primitives
The following primitives are most probably absolutely essential. I believe
they make up a minimally sufficient set.
- read
- write
- noop
- reset
- rop
The remainder of this section, discusses each of these operation.
Read
The read operation simply reads data from the
specified memory locations on the destination node. This is a primitive
operation and performs a raw read; that is, it has no interaction with the
node other than to obtain data. Read should be able to proceed
without intervention from the processor at the node. In addition to
efficiency, read operation's independence from the processor allows reads
to occur while the processor is shut down. This is a desirable feature for
debugging and for processor booting (see Section ).
Write
Like read, the write operation simply
performs a primitive write to the specified memory location on the
destination node. Again, the operation is a raw operation and does not
interact with the node. For the same reasons of booting and debugging,
reads should be independent of the processor.
Noop
The noop operation does nothing. A noop
operation is interesting for testing the network. In sending the
noop, the source node will get checksums from the network components
between the source and destination. The checksum data will be useful for
testing and diagnosing the state of the network.
Reset
Making reset a network initiated
operation allows independent processor booting under software control.
This operation is essential to the MBTA booting scheme described in
Section .
Rop
The rop operation is the catch-all
operation to allow emulation of any other network operation. When the
network controller receives an rop operation, it passes it one to the
network processor and allows it to handle the data associated with the
network message. The remote operation handler and message content are
freely specifiable at the software level giving considerable freedom and
flexibility.
e.g. All of the following should be be easily supported with the
rop operation.
- message passing
- process spawning and migration
- memory allocation
- garbage collection
- fetch-and-op operations
- coherent protocol operations
If each MBTA node has a dedicate processor for servicing rop requests
over the network, it will sit in a tight server loop answering network
requests. This will save the overhead associated with an interrupt
handler.
Other Candidate Primitives
The following is a list of additional primitives I originally considered
for inclusion. However, the more I consider the original goals, the more I
consider their inclusion of only marginal benefit. The basic primitives
described in the previous section should be sufficient to move all the
interesting issues up to the software level where maximum flexibility for
experimentation is afforded.
- start
- stop
- write request
- processes spawn
- allocate memory
- write permission grant
Look Ma, No EPROM! (Booting MBTA)
To meet the goals stated in (tn18) of simplicity and
minimum component count, we would like to be able to construct MBTA nodes
without non-volatile EPROM memory. Additionally, if MBTA can be built
without non-volatile memory, reconfiguration will be much easier. With a
reasonable host interface and the primitives described above
(Section ) we can achieve this goal.
Basic Boot Sequence
The host interface has complete control
over the memory in one or more designated boot nodes (tn20). With
this control, the host interface can fill a node's memory with boot code.
The host interface can control a reset sequence for the network which
should also reset the network interfaces for each node in the network.
Once the boot code is installed in the boot node's memory in this manner,
the host interface can reset the processor on the boot node. The boot node
processor can then perform its normal boot sequence from its static memory
which has been loaded by the host interface. The single booted node can
then use network level write operations to place boot code in the
memories of all other nodes. Once a node is initialized with data, a
reset network operation can be used to start the processor computing. In
this manner, the entire network can be booted from data downloaded over a
host interface which connects to a single node in the network.
Variations
Some network testing can be performed prior to booting the non-boot nodes
using the noop network operation.
In large networks, the task of installing boot code on nodes and initiating
them can fan out in a tree-like manner allowing parallel bootstrapping.
Post-mortem analysis can be performed by shutting down all processors, then
using the boot node processor to retrieve data items from memory anywhere
in the machine using the network read operation.
Acknowledgments
While the network interrupt operation ( rop) is a fairly obvious
catch-all mechanism, my thoughts were influenced by John Kubiatowicz's
interprocessor interrupt mechanism for Alewife [Kub90].
Feedback from Henry Minsky on the use and implementation of these ideas
was encouraging.
See Also...
References
- DeH90a
-
Andre DeHon.
Mbta: Message formats.
Transit Note 21, MIT Artificial Intelligence Laboratory, 545
Technology Square, Cambridge MA 02139, June 1990.
- DeH90b
-
Andre DeHon.
Mbta: Modular bootstrapping transit architecture.
Transit Note 17, MIT Artificial Intelligence Laboratory, 545
Technology Square, Cambridge MA 02139, April 1990.
- DeH90c
-
Andre DeHon.
Mbta: Network interface (input).
Transit Note 24, MIT Artificial Intelligence Laboratory, 545
Technology Square, Cambridge MA 02139, July 1990.
Obsolete; See Transit Note #31.
- DeH90d
-
Andre DeHon.
Mbta: Network interface (output).
Transit Note 23, MIT Artificial Intelligence Laboratory, 545
Technology Square, Cambridge MA 02139, July 1990.
Obsolete; See Transit Note #31.
- DeH90e
-
Andre DeHon.
Mbta: Thoughts on construction.
Transit Note 18, MIT Artificial Intelligence Laboratory, 545
Technology Square, Cambridge MA 02139, June 1990.
- DeH90f
-
Andre DeHon.
Memory transactions.
Transit Note 13, MIT Artificial Intelligence Laboratory, 545
Technology Square, Cambridge MA 02139, May 1990.
- DeH90g
-
Andre DeHon.
T-station: The mbta host interface.
Transit Note 20, MIT Artificial Intelligence Laboratory, 545
Technology Square, Cambridge MA 02139, June 1990.
- DS90a
-
Andre DeHon and Tom Simon.
Mbta: Node architecture.
Transit Note 25, MIT Artificial Intelligence Laboratory, 545
Technology Square, Cambridge MA 02139, July 1990.
- DS90b
-
Andre DeHon and Tom Simon.
Mbta: Node architecture selection.
Transit Note 22, MIT Artificial Intelligence Laboratory, 545
Technology Square, Cambridge MA 02139, June 1990.
- Kub90
-
John Kubiatowicz.
Special mechanisms for multi-model support.
Alewife Systems Memo 4, MIT Artificial Intelligence Laboratory, 545
Technology Square, Cambridge MA 02139, March 1990.
- Min90
-
Henry Q. Minsky.
Rn1 data router.
Transit Note 26, MIT Artificial Intelligence Laboratory, 545
Technology Square, Cambridge MA 02139, July 1990.