Transit Note #81

METRO LINK Programmer's Quick Reference

Andre DeHon

Original Issue: March 1993

Last Updated: Fri Nov 5 13:09:56 EST 1993

Purpose

This is a quick guide for programmers writing code to directly interface to the METRO LINK network interface. This document presents the basic programmer model necessary for using MLINK, but does not go into detail on message transport (see (tn75)). Presumably, only people building higher-level software interfaces will actually be writing code to manipulate the MLINK directly at this level.

Basics

In the current design scheme, each MBTA node has two MLINKs which act as network inputs and two which act as network outputs. When configured properly, nodes can operate with one network input and one network output. Network outputs and inputs behave somewhat differently, but their basic structure is fairly similar. Except where noted, the discussion which follows is relevant to both network inputs and network outputs.

The magic addresses and constants which are referenced in this discussion are contained in the include files mbta.h and metro_link.h which are shown in Appendicies and , respectively.

The MLINK network interfaces serve to connect the nodes to a METRO network. Their primary function is to launch messages from node memory and receive messages into node memory. In order to effect message transport and control how messages are handled, MLINK provides interfaces for:

  1. configuration (scan based configuration)
  2. message launch (network output)
  3. monitoring message progress
  4. monitoring MLINK operational status
  5. message queuing (network input)
All operations on MLINK components are through a memory-mapped interface. The node processor writes to special MLINK addresses to configure the network interfaces and initiate network operations and reads from special addresses to monitor network and message status.

MLINK Addresses

Each MLINK address is composed of two parts:

  1. component selection base address
  2. register selection address
These two components are appended to address a particular register on one or more MLINK parts. Symbolicly, the base addresses are summarized in Table . Each of the four MLINK components on a node has its own base address. Additionally, composite addresses exist for referring to more than one MLINK at a time.

The register addresses are summarized in Table . Reading and writing these registers controls the basic operation of each network interface.

Configuration

The network interface is configured for operation in three parts. Configuration information which is basically static throughout operation is configured via the Test-Access Port. A few control bits of configuration are available to the node processor through the memory-mapped interface using the CONFIGURATION address. Configuration which may be specific to each operation launch can be set during operation launch using the OPERATION_STG address. The scan and processor based configuration are described in this section and the operation configuration is described along with operation launching in the following section.

Scan Based Configuration

Table summarizes the scan loadable configuration. metro_link.h defines the symbolic lengths for the non-bit fields. These lengths should be considered implementation dependent as they are likely to change in future versions. The exact configuration of the scan chain should be detailed in the datasheets for each MLINK component ( e.g. (tn91) (tn92)).

Each node needs to know its own node number to verify that the messages received were, in fact, intended for the receiving node.

A network output can be configured to retry failed operations up to a specified number of trials. If the operation can be completed successfully within the specified number of trials, no processor intervention is required for the retransmissions. If the network output cannot deliver the message in the configured number of trials, it stops trying and the processor can determine the next course of action.

The number of dummy cycles between real cycles can be configured. This should be set consistently between all nodes and interfaces in the system. Once emulation is started ( SINIT is asserted), a number of dummy bytes is transmitted between every byte of real data. This feature is intended to allow the network to be throttled to more fairly match up with node emulation.

A network input can be configured to swallow the first byte it receives in the same manner as a router (tn45) [EDP +92]. It is often necessary to discard the final routing byte which appears at the head of a message to expose the message body for the receiving MLINK.

If the network routers are composed from pairs of cascaded routers, the dual router configuration bit should be set so that the network interface will deal appropriately with the pair of control bits. Note that METROJR (tn90) is a 4-bit router; configurations which use a pair of METROJR routers will require this bit set.

Back drop, or fast path reclamation (see (tn73)), can be enabled and disabled. The back-drop-allowed bit controls whether or not MLINK should be allowed to initiate a back drop request.

When set, Listen-to-bcb tells the MLINK to accommodate back drop requests from the network. Note that this is distinct from back-drop-allowed, which controls whether or not MLINK generates back drop requests.

The remote function address transmitted across the network is only three bytes long. When the address is saved into memory on the remote node, the top byte is taken from the PA3 value configured on the receiving network input. This option gives us flexibility about the placement of code and allows the net-in to construct an address in memory which can be used directly to invoke a remote function without further interpretation.

To accommodate potentially long interconnect, one can configure each network interface to account for the number of pipeline cycles associated with turning a connection. The value stored in the variable turn delay is the number of cycles in the round-trip time between MLINK and its attached router. This variable turn delay option is exactly analogous to METRO's variable turn delay (tn73). The variable turn delay will probably always be set to zero for the near future.

Also like METRO, an MLINK can be disabled. When an MLINK is disabled, the drive-when-disabled configuration bit controls whether or not it will send idling data to the attached router.

A net-in METRO LINK component can be configured to interrupt the processor when it has no free buffer pointers for storing incoming remote function invocations.

Processor Configuration

Table shows the composition of the processor accessible CONFIGURATION register. The symbolic values are defined in metro_link.h and should be considered implementation dependent.

When the clear perror bit is set it tells MLINK to clear its error interrupt signal. The signal is cleared the whole time this bit is set, so it should be reset to zero after being cleared if the processor wishes to receive notification of any further errors.

Offload configuration is used to specify the verbosity of data offloaded by a network output when attempting to send a network operation. Configured to OFFLOAD_NO_STATUS, no status information is saved. Only the status following failed transmission attempts is saved when set to OFFLOAD_FAIL_STATUS. All status, including the final success, is offload when configured to OFFLOAD_ALL_STATUS. The OFFLOAD_CHECKSUMS configuration allows all the status plus the checksums received to be offloaded into memory. When recorded, the sequence of status and checksum data which is offloaded is saved into sequential memory words starting at the address specified in the STATUS_PTR. In normal operation, we expect one will configure the network output in the OFFLOAD_FAIL_STATUS state.

Launching Operations

The MLINK network interfaces understand five kinds of network operations:

  1. MLINK_NOOP -- dummy operation
  2. MLINK_RESET -- reset a remote node
  3. MLINK_READ -- read a value from a remote node's memory
  4. MLINK_WRITE -- write a value to a remote node's memory
  5. Remote Function Invocation -- queue up address+data on the receiving node for use as an Active Messages style [E +92] remote function call.
In all cases, the operations are launched by a write to the OPERATION or OPERATION_STG address. In general the write should be done to both network outputs ( i.e. NET_OUT_BOTH). The pair of network output will then arbitrate amongst themselves to decide who sends the operation. When retry is necessary, they will re-arbitrate allowing randomized selection of the network output during retransmission.

Support Registers

The data used for each network operation is determined by the contents of several common registers. The route word specifies up to four bytes which comprise the series of words used as the message route header. Note that MLINK assigns no semantics to the route word and does not interpret it in any way. The route word data should be constructed based on the kind and configuration of network routers in use. The in-buf pointer specifies the base address where data received by the node should be placed. The out-buf pointer specifies the base address from which data being sent to the remote node should be taken. The remote-address pointer refers to the address on the remote node for remote memory reads and writes. The remote-address also serves as the function-address for remote function invocations. All of these registers can be loaded by writing to their designated address (See Table ).

Configuration

Each operation launch is controlled by several configuration values. The configuration values are loaded when the launch is performed by a write to the OPERATION_STG address (See Table ). Once one operation launch has occurred, if the configuration values do not change from operation to operation, the OPERATION address can be used to launch subsequent operations without reloading the configuration values.

The number of routing words should be set to reflect the number of routing words which should be sent into the network at the beginning of network operation. Depending on the number of routers in the network, their width, and the pipelining structure, the number of routing words may vary from network to network. Similarly, the number of stages must be configured, as well. To allow flexibility in the router implementation, these two configuration options are controlled independently.

When random-unit-select is set, the pair of MLINK network outputs choose randomly amongst themselves to service operations. When it is not set, the unit whose unit number matches the selected-unit bit will always service the operation. These bits provide a mechanism for reverting to deterministic selection of the network output when one (or its attached router or interconnect) is known to be faulty or needs to be taken off-line for testing.

Launching

The processor tells the network interface to perform an operation by writing to the OPERATION or OPERATION_STG address. All transactions are initiated or aborted by issuing an operation.

The general format of an operation is given in Table . The symbolic values are defined in metro_link.h and should be considered implementation dependent. DST specifies the destination node for the specified operation. LEN specifies the length in double words of the operation. OP specifies the operation to be performed at the remote node when a primitive operation is being performed. Init operation specifies whether to initiate or abort an operation. Additionally, when the OPERATION_STG address is used to initiate an operation, the operation configuration information is reloaded (See Section ). This allows MLINK to function efficiently in the case where all paths through the network are not the same length and may contain varying number of routers in each path.

Note that the operation address can also be used to abort an ongoing operation by using the abort bit in the function field. We do not expect this feature to be used heavily. It provides a mechanism for forcing a network input or output back to an idle state. MLINK will try to shutdown any pending connections in a moderately clean manner so there may still be a delay before the interface is ready to perform a new operation.

Operations

NOOP/RESET

The noop and reset operations are the most trivial network operations. The noop operation simply sends a minimal length message to the destination and obtains status information from the destination MLINK and the intervening routing components. The reset operation, when successful, causes the receiving network input to reset the node processor allowing it to boot. Figure shows examples of noop operation launch, and Figure shows a reset launch (using the LMC PCL language semantics [Log92] for the write operation).

READ

The primitive read operation allows a node to request data at a specific location in a remote node's memory. The memory is read from the destination node at the address specified by the remote-address register. When the read data arrives on the originating node, it is placed at the location specified by the in-buf pointer. Remember that there are no coherence semantics associated with this read operation. Figure shows a sample read operation launch.

WRITE

The primitive write operations allows a node to place data at a specific location in a remote node's memory. The data is taken from the originating node's memory at the address specified by the out-buf pointer. On the remote node, the data is placed at the address specified by the remote-address pointer. Like the read, this is an incoherent operation. Figure shows an example write launch.

Remote Function Invocation

For launch purposes, remote function invocations proceed much like remote writes. The remote address is used for the address of the remote function handler rather than a destination memory location. Function data comes from the local node at the address indicated by out-buf pointer. The operation encoding can be any encoding that is distinct from read, write, noop, and reset (see metro_link.h). MLINK places no semantic interpretation on the operation value, leaving this option up to the software system. Figure shows a sample remote function launch. As will be clear in Section , MLINK only transfers the handler address and data and enqueues them on the remote node. The software system is responsible for what actually happens to the arriving invocation request.

Monitoring MLINK Operations

Once an operation is launched, the network outputs should be left alone to send the message. MLINK can only keep track of one message to transmit at a time, so any outgoing buffering should be done at a higher level in software. While a message operation is in progress, reading the STATE register address of the pair of network outputs ( i.e. NET_OUT_BOTH) will provide an indication of the message status. Table summarizes the fields of the state register. The symbolic fielding is defined in metro_link.h.

Whenever the processor-wait bit is set, MLINK is waiting for the processor to perform some action ( e.g. launch a new operation). When the bit is not set, MLINK is busy handling an outgoing or incoming message. The success bit indicates whether or not the last operation completed successfully. When a network output completes an operation and is waiting on the processor, the processor should check the success bit to determine whether or not the previous operation completed properly. Once the processor-wait bit is set, it is safe for the processor to look at any status recorded for the last operation. The number of attempts will indicate the current trial number while a network output is busy attempting to deliver a message or the total number of attempts required when MLINK is finished. This trial number should be used to determine how many status words were written for the most recent network launch. The state also includes the operation identification for the current operation in progress.

The processor-errors field encodes a few conditions under which the network interface is forced to abort operations due to processor interference. The processor should never start a new operation while one is already in progress or change configuration registers while an operation is in progress. If a network interface notices such an occurrence, it will assert its interrupt pin to notify the processor of this fatal error. The processor error state bits crudely encode the event which caused the error to be signalled.

The routine mbta_get_no_status_when_done reprinted in Appendix will busy wait for the network outputs to return to idle then fetch the status information stored in memory, depending on the offload mode. Note that a single read to the state of both network outputs will only be answered by the one currently active. When retransmission is necessary, the two network outputs arbitrate with each other to decide on who is active and both keep proper track of the number of trials attempted. In practice, one will want to perform other computations or bookkeeping while the network output is busy rather than busy waiting.

Table summarizes the information stored in each status double word. When configured to offload checksums, every other double word stored to the status-pointer address contains the router checksums collected along with the preceding status double word. Table summarizes the possible problem indications which may be stored in the status word problem indications.

Input Message Queue Handling

When remote function invocation requests arrive at a network input MLINK, the incoming message is stored away in a buffer in memory. The address of the buffer is taken from the front of the receiving MLINK's buffer address queue. Each network input keeps its own address queue of buffers for incoming messages. The processor is responsible for adding addresses to these queues when they run short.

Queue Operations

The processor can add an entry to a network inputs address queue by writing the address to the NEXT_QUEUE_PTR address. The network inputs do not coordinate on queue maintenance, so the queue pointers should be unique between the two network interfaces. The processor can check the status of a network input's queue by reading the QUEUE_PTRS address on the network input. Table shows the encoding of this register. Again, the symbolic values are detailed in metro_link.h and should be considered implementation dependent. As shown the register contains the current value of the head and tail pointers of the queue and bits indicating if the queue is full or depleted. In the current implementation ( NIACT-ORBIT (tn92)), the queues can hold up to seven entries. When the tail entry is 7 greater (mod 8) than the head entry, the queue is full and the queue full bit is set. When the entries are equal, the no-pointers bit is set and the queue is empty. The tail pointer is automatically incremented when a new value is written to the NEXT_QUEUE_PTR address. The head pointer is incremented after MLINK finishes writing an uncorrupted remote function request into the buffer specified by the head pointer. The processor should use the current value of the head pointer to determine which buffers have been filled with incoming message data and are ready for processing. Figures and show typical code for servicing the MLINK queues.

Enqueued Data

The data stored in the buffer is formatted as shown in Figure . The first word is composed from the remote address and the configured high address byte (see Section ). This word should be directly usable as an address to invoke the remote function handler. The second word is composed from the operation identification and the data length. The third and following words are the data which was sent along with the message. Data words appear sequentially in the same order they appeared in the outgoing memory buffer.

Service Request

To obviate the need for continuous polling, each METRO LINK component has a service request line. The component will assert this line whenever it is in need of service from the processor. The net-out component asserts service request whenever it is in an idle state waiting on the processor. The net-in component asserts service request whenever its address queue buffer is not full.

Cycle Counter

Each NET-OUT MLINK component has a running cycle counter. This counter is cleared by the MLINK's reset and increments once every network clock cycle.

Synchronization

Since the processor and the METRO LINK network interfaces on a node access the same memory, it is necessary to properly synchronize the regions of memory which they share. When METRO LINK is given a buffer pointer (in-buf, out-buf, status, input buffer queue pointer) it assumes it has the right to read or write that pointer as necessary. In some cases, the processor will want to read a buffer after METRO LINK has placed data in it. Notably:

  1. queue input buffer following message arrival through a net-in
  2. status following an operation launch attempt with the net-out configured to offload status
  3. in-buf following a remote read by a net-out

The NEXT_PTR field in QUEUE_PTRS is incremented only after the buffer data at the specified pointer address is verified correct and completely written into memory. This pointer increment should be used as the guard for accessing the associated buffer data. Note that the full indication and consequently the service request line are derived from the pointer values so the deassertion of full or the assertion of service request serve to properly notify the processor of message arrival and storage complete. Once a buffer is written and the NEXT_PTR field is incremented to the subsequent address, MLINK will not touch the associated buffer memory any more.

The attempt count (in the STATE register) is only incremented after the respective data it represents has been offloaded to memory. It is safe to read status data up to the current value of the attempt counter. For an example, see how the sample code in Appendix uses the attempt count to guard status reads.

Processor wait guards the completion of net-out operations. When processor wait is asserted, the memory from a remote read should be intact at the designated in-buf pointer. When processor wait is asserted, it is safe to launch a new operation.

Summary

In summary, to support communications over MLINK network interface components, the software must:

  1. Configure each MLINK appropriately for the intended use and attached network configuration
  2. Launch operations on network-outputs and monitor their progress and success/failure
  3. Fill queues with buffer address on the network inputs as the buffers are used
  4. Note when network input queue buffers have been filled with data and service them appropriately
All operations on the MLINK components are performed through memory-mapped reads and writes to a block of network interface addresses. No intervention is required on the receiving node to handle primitive NOOP, READ, WRITE, and RESET operations. The software on the receiving node is fully responsible for handling remote function invocation requests once they have been queued to node memory.

Samples

At present, all of the test routines which exercise the network interface are written in PCL for the LMC models for simulation under Verilog-XL. These routines live in /home/tr/designs/cni/test_common and all have the .prog extension. The programs there should provide a good basic set of examples demonstrating how to program MLINK network interfaces.

See Also...

mbta.h

metro_link.h

mbta_get_no_status

References

DeH91
Andre DeHon. MBTA: Quick Overview. Transit Note 38, MIT Artificial Intelligence Laboratory, January 1991. [tn38 HTML link] [tn38 FTP link].

DeH92
Andre DeHon. METRO LINK -- METRO Network Interface. Transit Note 75, MIT Artificial Intelligence Laboratory, September 1992. [tn75 HTML link] [tn75 FTP link].

DeH93a
Andre DeHon. METROJR-ORBIT Datasheet. Transit Note 90, MIT Artificial Intelligence Laboratory, August 1993. [tn90 HTML link] [tn90 FTP link].

DeH93b
Andre DeHon. NIACT-ORBIT Datasheet. Transit Note 92, MIT Artificial Intelligence Laboratory, August 1993. [tn92 HTML link] [tn92 FTP link].

DeH93c
Andre DeHon. NOACT-ORBIT Datasheet. Transit Note 91, MIT Artificial Intelligence Laboratory, August 1993. [tn91 HTML link] [tn91 FTP link].

E +92
Thorsten von Eicken et al. Active Messages: a Mechanism for Integrated Communication and Computation. In Proceedings of the 19th Annual Symposium on Computer Architecture, Queensland, Australia, May 1992.

EDP +92
Eran Egozy, Andre DeHon, Samuel Peretz, Henry Minsky, and Thomas F. Knight Jr. METRO Architecture. Transit Note 73, MIT Artificial Intelligence Laboratory, August 1992. [tn73 HTML link] [tn73 FTP link].

Log92
Logic Modeling Corporation. SmartModel Library Reference Manual, 1992.

Min91
Henry Q. Minsky. RN1 Data Router -- B revision. Transit Note 45, MIT Artificial Intelligence Laboratory, May 1991. [tn45 HTML link] [tn45 FTP link].

MIT Transit Project