Transit Note #75
METRO LINK
METRO Network Interface

Andre DeHon

Original Issue: September 1992

Last Updated: Fri Nov 5 13:33:34 EST 1993

Overview

METRO LINK ( MLINK for short) provides an interface between a METRO based network and the processor and memory on an MBTA node. The core unit for MLINK can be configured either as a network input ( net-in), dealing with all network traffic destined for the node, or as a network output ( net-out), dealing with traffic originating from the node. This note describes the function and behavior of MLINK

Division of Labor

METRO LINK is designed to handle several primitive operations directly without need for intervention from the processor. For more complicated operation, it simply serves as an interface between the processor and network, acting under the direct control of the processor.

MLINK handles virtually all of the necessary low-level issues of communication. It is intended to handle the portions of the network interface which must be implemented in hardware and are well understood now. The network interface is especially intended to handle operations which need to be implemented efficiently in hardware in order to obtain a reasonable level of performance. To this end, the network interface handles:

converting messages between the double-word-wide MBTA node format and METRO's word-wide format
selecting a ``random'' network port for outgoing transactions
generating and verifying end-to-end message checksums
generating random control bytes ( e.g. TURN, DROP)
node reset
raw reads and writes to node memory
noop and status network operations
typical message retransmission
dummy cycle generation

Component I/O

The network interface component is shown in Figure . (tn25) shows how network interfaces are integrated into an MBTA node. Table describes the network interface's data and control signals. Table summarizes the pin requirements for this component.

The network interface component needs to be synchronized to both the network and the node. To keep the frequency of the node in line with that of the network, the network interface provides NODE_CLK_OUT which is used as the source for clocking the node. NODE_CLK_OUT runs at half the frequency of the network clock ( NCLK). To avoid skew problems with the node, the signals which interface with the node are synchronized to the input clock, NODE_CLK, which is presumed to be result of buffering NODE_CLK_OUT so it can be fanned out to all components on the node requiring clocks. See (tn37) for further details on MBTA clocking strategy.

Network Interface Integration

We can probably win (at least with respect to area versus pin requirements) by integrating the four network interfaces into a single component. As shown in Table

, 105 of the 120 pins on each network interface can be shared among all four network interfaces. Three pins become unnecessary when the network interfaces reside on the same die. Only 12 pins need to be unique for each network interface. A combined part would occupy four times the area and have twice the perimeter. This part would require only

signal pins. This number is also quite fortuitous. Since it is under 160, we can package this combined part in one Transit-DSPGA372 package (tn33).

Processor Interface

Memory Mapped Addressing

The processor communicates with each network interface as a memory mapped device. Table shows the relevant communication offset addresses within each network interface's designated memory region. Each network interface has its own memory region assigned by the node bus controller (tn30). Bits 7:4 of the address are used to distinguish which network interface is being addressed. W and R are used to indicate active processor address cycles intended for the network interfaces. All communications directly between the processor and a network interface takes place on the low data bus (D<31:0>); as such all network interface addresses are at least double word aligned.

The processor will only address the network interface during the processor's designated memory cycle. The processor will never communicate with a network interface during a borrowed memory cycle.

Table lists the internal registers in the network interface. Many of these can be read or changed through addresses shown in Table . Each register is described elsewhere in this document where its function is relevant.

Note that Table marks the 3rd lowest nibble (bits 11:8) as . These bits specify which network part is being addressed by the memory operation. Table summarizes the meanings of the various values of .

Buffer Pointers

The network interface uses the node's SRAM for the source and destination of data sent over the network. The processor tells the network interface where to put incoming data for a remote handler invocation network operation or from a remote read by setting the out_buf_ptr. Similarly, the processor tells the network interface where to find outgoing data by setting the interface's in_buf_ptr. Once set, these pointers remain in effect until changed. As such, it is only necessary for the processor to respecify a pointer when a target memory location changes. These pointers should not be changed while the network interface is performing an operation which uses them to reference memory.

Remote Address

The remote_address register specifies the destination address on the remote node for raw write operations. This register is only used when the interface is configured as a network output and performing a network write operation or remote handler invocation.

Route Word

The actual values used for setting up a route through the network are specified by the route-word. MLINK sends the bottom one to four words of the route-word into the network at the head of the message as configured in the configuration register (Section ).

The routing specification will be highly dependent on the details of the routers being used, their configuration, and the topology of the network. By allowing the node to specify the routing word entirely, this information does not need to be hardcoded into the network interface. This allows a single METRO LINK design to service a wider range of METRO implementation and network topologies.

The node may want to use a lookup table to map from destination addresses to routing specifications. Another option is to generate the routing specification when the destination is determined and store it with the appropriate data-structures so it is readily available when needed.

Note that the DST specification in the operation (Section ) is only used to verify that the message ended up at the right place -- and does not affect the route selected through the network.

Operations

The processor tells the network interface to perform an operation by writing to the OPERATION or OPERATION_STG address. All transactions are initiated or aborted by issuing an operation.

The general format of an operation is:

LEN OP FUNCTION DST

Each portion of the operation word is one byte in length. DST specifies the destination node for the specified operation. LEN specifies the length in double words of the operation. OP specifies the operation to be performed at the remote node when a primitive operation is being performed. FUNCTION specifies whether to initiate or abort an operation. Additionally, when the OPERATION_STG address is used to initiate an operation, number of route word and number of registers portion of the configuration register are reloaded. This allows MLINK to function effeciently in the case where all paths through the network are not the same length and may contain varying number of routers in each path.

FUNCTION

Figure shows the interpretation of the bits within the FUNCTION byte. The highest bit is used to decide whether to start or abort an operation. As described above, the lower four bits are only loaded when the OPERATION_STG address is used. These values are loaded into the same register as the corresponding fields in the configuration register and will hence superceede anything previously written using the configuration address.

ABORT instructs MLINK to drop the current operation and return to its idle state as soon as possible. When in transmit mode, MLINK will drop the connection immediately by send a DROP and returning to idle. When receiving, if the BCB is active, MLINK will use BCB to attempt to get the connection shutdown and return to idle as soon as transmission stops.

OP

OP specifies the network operation to be performed. Operation encodings are shown in Table

LEN

LEN specifies the length in double words of data to be transferred during a network operation. This is required for all operations. For operations which transfer a fixed amount of data ( i.e. noop, reset, status), it should be consistent with the amount of data expected.

DST

DST specifies the intended destination node for the network message. Note that this is needed to guard against incorrect delivery rather than to specify a route through the network. The routing word (Section ) will be used for the actual route through the network.

Configuration

this text/details need to be updated to reflect: (1) two configuratin registers, (2) new selection paramters incl. rnd/deterministic selection...

The network interface has a number of configurable options. It is possible to specify the number of dummy cycles between real network data by setting dummy-cycles. The number of retransmissions net-out will attempt is specified by retries. The number of network stages can be selected by setting stages. The node number is configured by setting node-number. Figure shows the composition of the configuration register. Individual portions of the word cannot be set independently. To change just part of the configuration, read the configuration, reset the desired bits, and write the configuration back. The unused bits in the configuration word are available for other configuration options which may come up during design and prototyping. Space is specifically left next to the number of dummy cycles so this parameter can be expanded if early experience with MBTA indicates the number of allocated bits is insufficient. this paragraph is incomplete -- it will get updated later when things settle down -- See Figure .

N.B. These things will most likely be loadable under boundary scan control in the future.

Status

The status_ptr points to the memory location for the status buffer. Net-out will place the result of each network retransmission in successive double words in memory starting at the address stored in status_ptr. The status_ptr has no use when the interface is configured as a network input. For each failed network attempt, one double-word is written to memory.

Each time a connection is attempted, the status word is updated. When a connection fails and MLINK is configured to offload error status, the status word will be written out at the current status pointer. The status pointer is incremented with each trial so that the connection attempt history is available after the connection is made or MLINK gives up on attempting the connection. When a connection is successfully opened, the status word will be written out if configured to do so by offload successful status. Any operation which turns the turns the network more than once from the forward direction will only store the status from the final turn -- connection-wise, this data should be identical to that acquired on the first turn ( will we actually have any of these?).

Status Word

The status word indicates the result of each attempt to open a connection through the network. The status word is formatted as:

this may be a bit out of date -- see tcf code and update

Some text describing this would probably be nice.

Checksum Words

mumble where things are

State

Reading STATE will return state information of the network interface. The format and meaning of this word will be defined as the component is implemented. Some subset of these bits should indicate what the interface is expecting from the processor. This may also be useful for keeping the processor in synch with the component. The state as a whole should be useful in diagnostic testing.

The state currently indicates the following:

When errors occur such that the network interface is forced to signal the processor that an error has occurred using its line, the processor should be able to determine the error by reading STATE. The current list of possible errors is shown in Table .

N.B. All of the errors shown so far are essentially fatal. When one of these errors occurs, either the processor and the network interface are in inconsistent states or there is a bug in the source program. The assertion of indicates that such an error has occurred; the processor should halt and signal the error to the host so the source of the error can be located and debugged. At present, there is no way to turn off, short of doing a hard reset...we might want to rectify this.

Currently not noting pointer reloads while operations are in progress. We might want to set something up to monitor that, as well.

Acknowledgments

The state address can also be used to check the successful completion of an operation. As such it is used in two slightly different ways depending on the network operation performed. After any operation which turns the network around for an acknowledgment but not for data ( i.e. noop, reset, write, or remote handlers), it indicates whether or not the ack returned indicated the success or failure of the operation. After any operation which sends a response over the network ( i.e. read and status), it indicates whether or not the reply checksum was correct. The second lowest state bit indicates whether or not the ack or final checksum has been received. This bit is cleared at the beginning of an operation and is set when the ack arrives (actually, the final ack when retries are configured). The lowest bit is only valid when this second lowest bit is set. The lowest bit indicates the state of the actual success or failure of the operation. When set, the operation succeeded ( i.e. the ack was true or the checksum was valid); when cleared, the operation failed ( i.e. the ack was false or the checksum was invalid).

End of Cycle Counter

Each network interface will be counting the number of dummy cycles so it will know when to send and expect real data over the network. Each emulation cycle is composed of eight real network cycles and hence 8 sets of dummy cycles. The end of cycle counter keeps track of the number of dummy cycles and the real network cycles. Dummy cycles count from 0 modulo the configured number of dummy cycles plus one. The dummy cycle counter is incremented every node cycle. The phase counter counter increments every network cycle and counts from 0 modulo 8. Each reset of the phase counter denote a node cycle and hence increments the dummy counter. The end of cycle counter is formatted as:

Net-out Port Randomization

The two network outputs used in an MBTA node function logically as a single network output interface which randomly selects between network ports for transmissions. RND_IN and RND_OUT are used to select the output port, and hence the associated net-out, for a particular transmission attempt

Operation Initiation

When the processor initiates a network transaction, it writes the operation generically to net-out. Both network outputs receive the operation. They both xor RND_IN and RND_OUT together. If the result of the xor is the same as the network interface's UNIT designation, the network interface handles the network transmission. In this manner, exactly one net-out attempts to transmit the network transaction.

Retransmission

If the previous attempt to open a connection fails, another attempt must be made to open the connection. The network outputs need, once again, to randomly select a network port. The net-out which made the failed connection attempt, asserts to indicate that retransmission is necessary. The other net-out does nothing except wait for the next operation or retransmission. On the network cycle following the assertion of , both net-outs xor RND_IN and RND_OUT together and select which network output will handle the retransmission. The assertion of also signals the idle net-out to increment its retries counter.

Net-In Status

After receiving a TURN byte, net-in transmits STATUS and CHECKSUM. mumble status see Table ; mumble checksum.

mumble bit meanings

Memory Interface

Each network interface has an opportunity to access memory once every eight real network cycles. Since the memory is 64 bits wide, this is just frequently enough to transfer data at the full network data rate when necessary. During an eight network cycle memory round, each logical network interface has a designated access cycle on each shared bus ( i.e. address and data busses). The portion of the round belonging to each logical network interface is shown in Figure .

When a network interface wants to use memory during its access cycle, it asserts the want bus ( WB) signal during its designated WB cycle prior to presenting the data to be read or written. Along with asserting WB, the network interface should assert the appropriate word write enables (<1:0>). When writing to either or both words of the specified memory location, the appropriate word write enable should be asserted. For memory reads, both word write enable should be deasserted. The host bus controller (tn30) deals with turning the WB and <1:0> signals into the appropriate enables for the SRAM memory. The network interface does not support byte writes. Both WB and <1:0> should be asserted only during the network interface part's respective cycle on the write enable bus.

Node memory operation timing differs somewhat when there are no dummy cycles from when there are dummy cycles. With no dummy cycles, the network interface will generally be performing back to back memory cycles in the pipelined fashion required by the node bus. Figure shows what the bususe from a single network interface looks like. This pattern of usage is repeats as necessary for each memory interaction. As mentioned above each network interface has its own designated cycle for use of the data and address busses so each network interface uses this pattern appropriately out of phase with its peers. During the R/W Addr cycle the address of the next read data or the previous write data is presented (see (tn25)).

When dummy cycles are present, each network interface only references memory during the beginning of the each emulation cycle. As such, it is not possible to optimize back to back memory cycles. Instead, within the two node cycles following the beginning of the cycle, each network interface performs a complete read or write operation. The processor is then free to steal cycles during the remained of the emulation cycle knowing that the network interfaces will not require use of the node busses until the beginning of the next emulation cycle. Figure shows the end of cycle and bus timing when dummy cycles are present. As noted, the point at which the EC signal is asserted with respect to the phase of a network interface depends on the network interface. For network-input 0 (which is out of phase with the processor and hence the only unit from which the processor will be stealing bus cycles), EC is asserted during its designated address phase one node cycle before the network input uses its address bus. This allows the bus controller adequate warning so that the address bus will be available if the network interface wishes to perform a read operation.

Network Interfacing

Message Components

The bytes of a network message can be classified as follows:

Destination

The destination specifies the node in the network to which the network message is directed.

N.B. This limits the number of nodes to 256. This should not pose any long term limitation since we will certainly have revised many of these details (including going to a larger address space) by the time we build a machine with more than 256 nodes.

Operations

As defined in section .

Address

For read and write operations, there will be three bytes of address to specify the address of data on the remote node.

Data

The data associated with each operation will be transmitted with each word broken into four byte chunks.

Length

For operations of non-fixed size, a length byte specifies the number of consecutive memory words being transfered (or to be transfered).

Current issue: Should we allow operation of odd-word lengths? Its not clear if its worth the hair. Would we be hurt by the restriction to only multiples of double-word transfers?

Checksum

The integrity of each network transaction is verified with a forward checksum [DeH90a]. The checksum is a 16-bit CRC checksum and is transmitted in two consecutive bytes ( CHKSUM1 and CHKSUM2) The forward checksum uses the same CRC checksum generator used by RN1B [Min91].

Control

Some operations require no data in a response. Ack provides a succeed/fail response to indicate the completion of such operations. Ack is used generically to refer to responses which can be ack_t or ack_f (see Table ).

Often, the node may not be able to respond immediately to a network operation. When the node cannot supply the requested data to the source immediately, it must be capable of telling the source to wait. To allow this specification, METRO includes a distinguished DATA-IDLE specification which keeps the connection open, but is out of band of the normal data stream so MLINK can tell that it is not to be treated as normal data. After an operation is requested, DATA-IDLE will be transmitted to the source until the destination node can field a reply. Once ready to reply, the destination node is ready to send data, it resumes by sending the reply data.

METRO messages

METRO defines the message components shown below. The ninth bit shown here is the control bit [EDP +92].

Messages

This section describes the format each network transactions using the components described in the previous sections.

Some things to note:

MLINK will send between 1 and 4 routing words based on how it is configured (Section ). The receiving MLINK will see at most one of these routing words.
During connection establishment (source transmits to destination), the checksum is calculated starting with DST which is not transmitted and is calulated through the data word immediately preceeding the checksum.
There may be any number of DATA-IDLE words between CHKSUMB and STATUS due to variable turn delay [EDP +92].
Dummy cycles are inserted between all node transmitted data following OP and preceeding CHKSUM1 and within the reply data up to the final checksum or ack. TURN/ DROP control follows directly after checksum/ack transmission so no corruptted data may slip through the connection between the checksum/ack and the TURN/ DROP. There are never dummy cycles between CHKSUM1 and CHKSUM2.
All sequences shown here assume a path is available on the first try through the network. If the transaction was blocked at some routing component in the network, a DROP would be generated by the blocked router following its CHKSUMB.

Noop and Reset

Following is a noop or reset operation sequence as seen from the interface between the sending node and the network. denotes the number of network stages.

This same sequence for a noop or reset operation looks like the following from the interface between the network and receiving node.

Discussion

Note that the status checksum groups numbered 1 through come from the successive routers in the network. The status/checksum pair labeled comes from the network interface at the destination node. See for a description of these status and checksum bytes.

A noop and reset transaction should always succeed. Thus, the ACK should always be ACK_T.

Read

A read transaction proceeds as follows

From the interface between the network and receiving node, this read sequence looks like:

Discussion

The final checksum is necessary to make certain that the return data arrived uncorrupted.

Write

A write transaction proceeds as follows:

From the interface between the network and receiving node, this write sequence looks like:

Discussion

The ack here is necessary to provide a final opportunity for the receiving node to indicate that it was not able to deal with the write transaction and the operation should be repeated. This is important in the case where the data arrives corrupted.

It is necessary to specify the length ( LEN) of the data to be written in order to guarantee that faults in the network ( e.g. a control bit stuck asserted) do not cause a write operation to write over important data in the node's memory. A checksum is included immediately after the address and length specification to protect the receiving node's memory. This checksum comes before the data and is used to assure that the address and length have been received correctly before anything is overwritten in memory. This prevents transmission errors from overwriting random sections of a node's memory.

Status

A status transaction proceeds as follows

From the interface between the network and receiving node, this status sequence looks like:

Discussion

Exact content of status data is still being determined.

Remote Handler Invocation

An remote handler invocation transaction proceeds as follows:

From the interface between the network and receiving node, this rop sequence looks like:

Discussion

As in the case with the write operation, the inclusion of the CHKSUM following LEN is necessary to prevent faults from allowing MLINK to write over useful data in memory.

Exactly what happens after the turn is currently a subject of debate. In the past, we wanted to support holding the connection open for a reply as well as turning the network an arbitrary number of times. The utility and desireability of this is not clear at present. Comments welcomed.

Network Operations

In this section the following conventions will be used to distinguish required and optional processor operations:

It is always optional to specify new buffer pointers. Checking acknowledgments is never required, but always recommended.

All the sequences in this section concentrate on the i/o operations between the processor and the network interface. Intervening computation by the processor is categorically omitted.

Originating Transaction

Only network outputs will actually originate network operations. This section briefly describes the way the processor uses net-out to issue network transactions.

Checking the success of a network operation is not explicitly shown in the sequences which follow. In general, the processor will want to read the net-out's STATE to check on its progress and perhaps look at the status words in memory. When a network output fails to successfully open within the configured number of retires, the network output will cease to attempt retransmission. The processor should recognize this occurrence when it checks the state of the network output.

Noop or Reset

A noop or reset sequence proceeds as:

write new status_ptr
write new route_word
issue operation by writing: DST 0x00 noop NETOP to OPERATION
check acknowledgment by reading ack until one returns

The function could also be NETOPNOTURN if no response is expected -- but that is probably not very useful since it provides no indication if the message was blocked in the network.

Following is a C-rendition of the above sequence using a busy-wait on the acknowledgment:

In general, it would probably be more useful to store away a pointer to a handler to deal with the operation when it completes or fails and let the processor go on to doing something else rather than busy-waiting on the return ack as shown above.

Status

A status sequence proceeds as:

write new status_ptr
write new in_buf_ptr
write new remote_addr
write new route_word
issue operation by writing: DST 0x01 status NETOP to OPERATION
check acknowledgment by reading ack until one returns

Read

A read sequence proceeds as:

write new status_ptr
write new in_buf_ptr
write new remote_addr
write new route_word
issue operation by writing: DST LEN read NETOP to OPERATION
check acknowledgment by reading ack until one returns

Write

A write sequence proceeds as:

write new status_ptr
write new out_buf_ptr
write new remote_addr
write new route_word
issue operation by writing: DST LEN write NETOP to OPERATION
check acknowledgment by reading ack until one returns

Remote Handler Invocation

A remote handler sequence proceeds as:

write new status_ptr
write new out_buf_ptr (??)
write new remote_addr
write new in_buf_ptr
write new route_word
issue operation by writing: DST LEN ROOP REMOTE_HANDLER to OPERATION
check reply ack by reading ack until one returns

Many things still to be decided here.

Autonomous Transaction Handling

When configured as a network input, the network interface will autonomously handle all of the incoming low-level network transactions described in section except remote handler invocation transaction which implicitly requires the processor's control. (???)

Non-Memory Transactions

These transactions require no node resources.

Noop

When a NOOP network transaction is received, net-in drops the connection after returning its status and checksum bytes. See Section for information on the status and checksum bytes.

Reset

When a RESET network transaction is received, net-in drops the connection after releasing the signal on the node and returning its status and checksum bytes.

Note that this transaction does not hang around to verify that the node boots successfully. It is easy to arrange things such that once the node is booted far enough to send messages under it's processor control, it can send a reply back to the booting node. Additionally, the STATUS message can be used to check if the processor's pin is asserted.

Status

When a STATUS network transaction is received, net-in returns a a single status word indicating status information about the associate node. After sending the status word, the net-in returns a checksum and drops the connection.

Memory Transactions

Net-in can directly handle the raw memory transactions described in (tn21). This along with the RESET transaction allow the node to be booted over the network without EPROMs (tn19) [DeH90b]. The node bandwidth is sufficient to handle these raw operations at the full network data rate (see Section ). The format of data received and transmitted over the network during any of these transactions is given in .

Read

Upon receiving a read transaction, net-in returns the requested words at the emulation rate ( i.e. one word per emulation cycle). Following the last word, net-in sends a forward checksum on the data transmitted before closing the connection.

Write

Write transactions are handled similar to read operations. One word is written into memory each emulation cycle. When the network is turned around following the transmission of the write data, net-in transmits an ack to indicate whether or not the write completed successfully. ack_f may occur for any of the following reasons:

the processor aborts the net-in operation
the processor forces the net-in to indicate an error
either of the checksums in the write transaction are incorrect

On incoming write operations, the checksum on the address and length fields of the message must be correct before net-in will write any data to memory. This checksum is necessary to guarantee that random portions of a node's memory are not trashed by transmission errors (Section ).

Receiving and Handling Transactions

In addition to autonomous transactions network inputs must handle remote handler invocations so the processor can respond accordingly.

The following is the way ROP's used to work in concept. This will probably change.

When an ROP is received, net-in places the contents of the message in memory at the address specified by the out_buf_ptr. The processor recognizes the arrival of the ROP by checking on the state of net-in. Once received, net-in will hold the connection open sending idle cycles over the network until the processor sets up a response. During an ROP, the network can be turned around as many times as the software requires. Once the initial message is received, ROPs are handled in much the same way as net-out handles ROPs (see Section ).

An ROP sequence proceeds as:

processor recognizes arrival of ROP by reading the network interface's STATE address
if network connection is turned around again by remote node, decide whether or not to keep connection open; if connection should close, issue close operation by writing: 0x00 0x00 0x00 ENDNOW to OPERATION and processor is finished with transaction
write new status_ptr
write new out_buf_ptr
write new in_buf_ptr
Either:
- issue a response operation by writing: 0x00 0x00 0x00 RESPONDFINAL to OPERATION and processor is finished with transaction
- issue a response operation by writing: 0x00 0x00 0x00 RESPOND to OPERATION
wait for arrival of response and proceed with step 2

Router Checksums

Here are some thoughts about checksums in the METRO / METRO LINK network.

Unique Checksums

Observation 1

Each router in the path between a source and a destination will see a different data stream and hence produce a different checksum.

This follows immediately from the fact that each router sees different routing bits. When we rotate the data to shift in new routing-bits, this makes the routers see different rotations of the data. When we swallow the head byte to get a fresh routing byte, the subsequent routers do not see the swallowed byte. Further, in tree machines [DeH91d], exactly what each router a given number of hops from the source sees will depend on the height of crossover in the trees.

Consequence

To check the router checksums, one must compute a separate checksum for each router in the path from source to destination. Further, to check the checksums on the fly in hardware, this means one needs a separate checksum computation unit for each router in the worst-case path between source and destination.

Role of Checksums

Observation 2

The only checksum which tells the destination that a data-stream is uncorrupted is the forward checksum

Observation 3

What matters in determining the successful transmission of data to a destination, is the integrity of what the destination node sees.

From these observations, we conclude that the critical indication of success is the reply from the destination node. If the destination node accepts the message as complete and replies with a legal reply, then that is the authoritative indication of success. We do need to encode the reply so that it is sufficiently unlikely that a reply indicating failure can be corrupted into one indicating success.

Consequence

The forward checksum is the most important checksum in terms of determining the success of message transmission.

The only think which the reverse checksums tell us is where in the network a message may have been corrupted. Further, this information is based on full-speed data transmission between network routers.

Recommendations

Increase Forward Checksum to the full 16-bit CRC checksum
Punt hardware checksum verification for router checksums
Replace hardware router checksum verification with
- Registers to store reverse checksums
- Selectable mode in which reverse checksums are offloaded to node memory

This allows us to move the checking of router checksums into software. Presumably, this would only be necessary in the rare cases when data is actually being corrupted. Moving it into software also allows any given METRO LINK to work with a larger range of networks since it is not necessary to code the data-permutations in effect for each router in the network into the network interface hardware. This also makes METRO LINK completely independent of the the checksum used by a particular router implementation. In fact, the router may have a mode where it transmits data back other than checksum information and METRO LINK will save it out in the same manner.

Issues

This section raises a number of recent/current issues. Many of these are unresolved and feedback is strongly encouraged.

What primitives should hardware support?

The current theory is that hardware supports the following:

raw read/write operations
remote reset/boot
remote status
remote function invocation

These are probably minimally sufficient. There may be others which, if implemented in hardware directly, things would be much more efficient. However, at this point it is not clear what operations fall into this category. We have considered having some form of primitive read-modify-write operation, but the atomicity complications has us leaning to avoid actually handling such unless there are some very good reasons.

How are network messages/operations initiated?

As described so far, everything is done using some combination of writes to memory and writes directly to the network interface. For the most part, we believe the writes directly to the interface are not a problem. It might be inefficient for some messages to have to write the data out to memory first, then launch the operation. Thus, it might be worthwhile to be able to launch short messages directly from the network interface. This will, of course, require additional hardware resources on the network interface and there will have to be some limit on message sizes which can be handled this way. So there are many questions here:

Is the current scheme reasonable for most network usage?
Is it worth a little extra work to handle short message launches without going through memory?
How long would the direct message buffer have to be to make it worthwhile? (How long should it be?)

What happens to operations when they arrive at the destination?

Here, we are concerned primarily with remote function invocations.

Does MLINK hold onto the handler pointer, keep the network open, and wait for a response? ( n.b. if the network is not held open, another message will arrive...perhaps overwriting information associated with the pending one, so holding the network is a necessity in this case.)
Does each MLINK maintain its own queue of incoming messages in memory and release the network as soon as possible?
Does it make sense to have any modes wherein MLINK holds the network for reply data?

How does the processor arrange to service incoming messages (which need service)

This is related to the previous question.

If MLINK queues operation up, then the processor will have to read them from the queue -- have some way of telling when the queue is empty -- and will probably have to inform MLINK when it is through with a given message.
Otherwise, the processor could poll the net-in's to determine when there are pending operations.
Alternately, it might read a NEXT_INSTRUCTION_HANDLER from each net-in. MLINK would then return either the address the handler or some preprogrammed continuation address.

What should we do with errors noted during net-in message reception?

Unlike net-out since the net-in has no control over when it is busy, witting it out to memory is not really an option. Nor are successive messages to the same net-in necessarily related in any way.

Where should the destination MLINK's status be returned?

Status is currently returned in the first byte of the pair returned by the MLINK. This requires that the forward checksum errors be noted and inserted into the outgoing status byte within one cycle. Now that the routers are not putting status in the first two bits of the first checksum word, it might make sense to rearrange so the status bits occur in the second checksum word.

Should we require all network ops to be double word entities, or should we allow odd length read/write/handler-invocations?

I do not think we are willing to allow any transfers to odd word addresses, so this is only a question about length.

How long of a message should we support?

We currently support 256 Words. If we drop odd support, that could go to 256 double words = 512 words. Any additional length would require two length bytes be transmitted with each message instead of one (or some other restriction on the possible lengths).

Does the idempotence restriction limit what we can express efficiently?

See [DeH92] for the issue and possibilities here.

References

DeH90a: Andre DeHon. Forward Checksum. Transit Note 6, MIT Artificial Intelligence Laboratory, May 1990. [tn6 HTML link] [tn6 FTP link].
DeH90b: Andre DeHon. MBTA: Boot Sequence. Transit Note 28, MIT Artificial Intelligence Laboratory, July 1990. [tn28 HTML link] [tn28 FTP link].
DeH90c: Andre DeHon. MBTA: Message Formats. Transit Note 21, MIT Artificial Intelligence Laboratory, June 1990. [tn21 HTML link] [tn21 FTP link].
DeH90d: Andre DeHon. MBTA: Modular Bootstrapping Transit Architecture. Transit Note 17, MIT Artificial Intelligence Laboratory, April 1990. [tn17 HTML link] [tn17 FTP link].
DeH90e: Andre DeHon. MBTA: Network Initialization. Transit Note 27, MIT Artificial Intelligence Laboratory, July 1990. [tn27 HTML link] [tn27 FTP link].
DeH90f: Andre DeHon. MBTA: Network Level Transactions. Transit Note 19, MIT Artificial Intelligence Laboratory, June 1990. [tn19 HTML link] [tn19 FTP link].
DeH90g: Andre DeHon. MBTA: Thoughts on Construction. Transit Note 18, MIT Artificial Intelligence Laboratory, June 1990. [tn18 HTML link] [tn18 FTP link].
DeH90h: Andre DeHon. T-Station: The MBTA Host Interface. Transit Note 20, MIT Artificial Intelligence Laboratory, June 1990. [tn20 HTML link] [tn20 FTP link].
DeH91a: Andre DeHon. MBTA: Clocking Strategy. Transit Note 37, MIT Artificial Intelligence Laboratory, January 1991. [tn37 HTML link] [tn37 FTP link].
DeH91b: Andre DeHon. MBTA: Network Interface Implementation Notes. Transit Note 36, MIT Artificial Intelligence Laboratory, January 1991. [tn36 HTML link] [tn36 FTP link].
DeH91c: Andre DeHon. MBTA: Quick Overview. Transit Note 38, MIT Artificial Intelligence Laboratory, January 1991. [tn38 HTML link] [tn38 FTP link].
DeH91d: Andre DeHon. Practical Schemes for Fat-Tree Network Construction. In Carlo H. Sequin, editor, Advanced Research in VLSI: International Conference 1991, pages 307-322. MIT Press, March 1991. [FTP link].
DeH92: Andre DeHon. The Case of the Corrupted Acknowledgment. Transit Note 76, MIT Artificial Intelligence Laboratory, September 1992. [tn76 HTML link] [tn76 FTP link].
DKD90: Fred Drenckhahn, Thomas Knight Jr., and Andre DeHon. Stack Packaging Components. Transit Note 33, MIT Artificial Intelligence Laboratory, December 1990. [tn33 HTML link] [tn33 FTP link].
DS90a: Andre DeHon and Thomas Simon. MBTA: Node Architecture. Transit Note 25, MIT Artificial Intelligence Laboratory, July 1990. [tn25 HTML link] [tn25 FTP link].
DS90b: Andre DeHon and Thomas Simon. MBTA: Node Bus Controller. Transit Note 30, MIT Artificial Intelligence Laboratory, August 1990. [tn30 HTML link] [tn30 FTP link].
EDP +92: Eran Egozy, Andre DeHon, Samuel Peretz, Henry Minsky, and Thomas F. Knight Jr. METRO Architecture. Transit Note 73, MIT Artificial Intelligence Laboratory, August 1992. [tn73 HTML link] [tn73 FTP link].
Min91: Henry Q. Minsky. RN1 Data Router -- B revision. Transit Note 45, MIT Artificial Intelligence Laboratory, May 1991. [tn45 HTML link] [tn45 FTP link].

MIT Transit Project

Transit Note #75 METRO LINK METRO Network Interface

Overview

Division of Labor

Component I/O

Network Interface Integration

Processor Interface

Memory Mapped Addressing

Buffer Pointers

Remote Address

Route Word

Operations

FUNCTION

OP

LEN

DST

Configuration

Status

Status Word

Checksum Words

State

Acknowledgments

End of Cycle Counter

Net-out Port Randomization

Operation Initiation

Retransmission

Net-In Status

Memory Interface

Network Interfacing

Message Components

Destination

Operations

Address

Data

Length

Checksum

Control

METRO messages

Messages

Noop and Reset

Discussion

Read

Discussion

Write

Discussion

Status

Discussion

Remote Handler Invocation

Discussion

Network Operations

Originating Transaction

Noop or Reset

Status

Read

Write

Remote Handler Invocation

Autonomous Transaction Handling

Non-Memory Transactions

Noop

Reset

Status

Memory Transactions

Read

Write

Receiving and Handling Transactions

See Also...

Router Checksums

Unique Checksums

Observation 1

Consequence

Role of Checksums

Observation 2

Observation 3

Consequence

Recommendations

Issues

References

Transit Note #75
METRO LINK
METRO Network Interface