Transit Note #19

MBTA: Network Level Transactions

Andre DeHon

Original Issue: June 1990

Last Updated: Tue Nov 9 12:54:46 EST 1993

Goal

Stepping a a few levels up in abstraction from the raw network transport layer (RN1), we must consider the lowest level of semantic operations which occur across the network. The exact nature of the optimal set of primitives is probably somewhat dependent on the model of computation used. In any case, at this point in time we do not presume to know what the optimal network transactions are. Determining the set of useful low level network transactions will be one of the aspects of our study.

In constructing the experimental multiprocessor MBTA (tn17), we must support some low level transactions to gain both functionality and a moderate level of efficient network usage. The goal then, is to efficiently support a small set of very primitive operations which seem of universal value. These basic primitives should allow us to emulate others which may be of interest for the sake of experimentation. The supported primitives should be simple to implement in minimal hardware; this simplicity is paramount to keeping the hardware complexity of MBTA to a minimum.

Essential Primitives

The following primitives are most probably absolutely essential. I believe they make up a minimally sufficient set.

The remainder of this section, discusses each of these operation.

Read

The read operation simply reads data from the specified memory locations on the destination node. This is a primitive operation and performs a raw read; that is, it has no interaction with the node other than to obtain data. Read should be able to proceed without intervention from the processor at the node. In addition to efficiency, read operation's independence from the processor allows reads to occur while the processor is shut down. This is a desirable feature for debugging and for processor booting (see Section ).

Write

Like read, the write operation simply performs a primitive write to the specified memory location on the destination node. Again, the operation is a raw operation and does not interact with the node. For the same reasons of booting and debugging, reads should be independent of the processor.

Noop

The noop operation does nothing. A noop operation is interesting for testing the network. In sending the noop, the source node will get checksums from the network components between the source and destination. The checksum data will be useful for testing and diagnosing the state of the network.

Reset

Making reset a network initiated operation allows independent processor booting under software control. This operation is essential to the MBTA booting scheme described in Section .

Rop

The rop operation is the catch-all operation to allow emulation of any other network operation. When the network controller receives an rop operation, it passes it one to the network processor and allows it to handle the data associated with the network message. The remote operation handler and message content are freely specifiable at the software level giving considerable freedom and flexibility.

e.g. All of the following should be be easily supported with the rop operation.

If each MBTA node has a dedicate processor for servicing rop requests over the network, it will sit in a tight server loop answering network requests. This will save the overhead associated with an interrupt handler.

Other Candidate Primitives

The following is a list of additional primitives I originally considered for inclusion. However, the more I consider the original goals, the more I consider their inclusion of only marginal benefit. The basic primitives described in the previous section should be sufficient to move all the interesting issues up to the software level where maximum flexibility for experimentation is afforded.

Look Ma, No EPROM! (Booting MBTA)

To meet the goals stated in (tn18) of simplicity and minimum component count, we would like to be able to construct MBTA nodes without non-volatile EPROM memory. Additionally, if MBTA can be built without non-volatile memory, reconfiguration will be much easier. With a reasonable host interface and the primitives described above (Section ) we can achieve this goal.

Basic Boot Sequence

The host interface has complete control over the memory in one or more designated boot nodes (tn20). With this control, the host interface can fill a node's memory with boot code. The host interface can control a reset sequence for the network which should also reset the network interfaces for each node in the network. Once the boot code is installed in the boot node's memory in this manner, the host interface can reset the processor on the boot node. The boot node processor can then perform its normal boot sequence from its static memory which has been loaded by the host interface. The single booted node can then use network level write operations to place boot code in the memories of all other nodes. Once a node is initialized with data, a reset network operation can be used to start the processor computing. In this manner, the entire network can be booted from data downloaded over a host interface which connects to a single node in the network.

Variations

Some network testing can be performed prior to booting the non-boot nodes using the noop network operation.

In large networks, the task of installing boot code on nodes and initiating them can fan out in a tree-like manner allowing parallel bootstrapping.

Post-mortem analysis can be performed by shutting down all processors, then using the boot node processor to retrieve data items from memory anywhere in the machine using the network read operation.

Acknowledgments

While the network interrupt operation ( rop) is a fairly obvious catch-all mechanism, my thoughts were influenced by John Kubiatowicz's interprocessor interrupt mechanism for Alewife [Kub90].

Feedback from Henry Minsky on the use and implementation of these ideas was encouraging.

See Also...

References

DeH90a
Andre DeHon. Mbta: Message formats. Transit Note 21, MIT Artificial Intelligence Laboratory, 545 Technology Square, Cambridge MA 02139, June 1990.

DeH90b
Andre DeHon. Mbta: Modular bootstrapping transit architecture. Transit Note 17, MIT Artificial Intelligence Laboratory, 545 Technology Square, Cambridge MA 02139, April 1990.

DeH90c
Andre DeHon. Mbta: Network interface (input). Transit Note 24, MIT Artificial Intelligence Laboratory, 545 Technology Square, Cambridge MA 02139, July 1990. Obsolete; See Transit Note #31.

DeH90d
Andre DeHon. Mbta: Network interface (output). Transit Note 23, MIT Artificial Intelligence Laboratory, 545 Technology Square, Cambridge MA 02139, July 1990. Obsolete; See Transit Note #31.

DeH90e
Andre DeHon. Mbta: Thoughts on construction. Transit Note 18, MIT Artificial Intelligence Laboratory, 545 Technology Square, Cambridge MA 02139, June 1990.

DeH90f
Andre DeHon. Memory transactions. Transit Note 13, MIT Artificial Intelligence Laboratory, 545 Technology Square, Cambridge MA 02139, May 1990.

DeH90g
Andre DeHon. T-station: The mbta host interface. Transit Note 20, MIT Artificial Intelligence Laboratory, 545 Technology Square, Cambridge MA 02139, June 1990.

DS90a
Andre DeHon and Tom Simon. Mbta: Node architecture. Transit Note 25, MIT Artificial Intelligence Laboratory, 545 Technology Square, Cambridge MA 02139, July 1990.

DS90b
Andre DeHon and Tom Simon. Mbta: Node architecture selection. Transit Note 22, MIT Artificial Intelligence Laboratory, 545 Technology Square, Cambridge MA 02139, June 1990.

Kub90
John Kubiatowicz. Special mechanisms for multi-model support. Alewife Systems Memo 4, MIT Artificial Intelligence Laboratory, 545 Technology Square, Cambridge MA 02139, March 1990.

Min90
Henry Q. Minsky. Rn1 data router. Transit Note 26, MIT Artificial Intelligence Laboratory, 545 Technology Square, Cambridge MA 02139, July 1990.

MIT Transit Project