Transit Note #36

Network Interface

Implementation Notes

Andre DeHon

Original Issue: January 1991

Last Updated: Mon Nov 8 14:52:15 EST 1993

Overview

This document is a serious of notes on the implementation of the network interface component.

Node Clocking

Basic Configuration

The network interface receives its primary clock, NCLK, from the network. NCLK is the same clock seen by the RN1 routing components composing the network. This clock is used for all FSMs and most registers inside of the network interface. NCLK is divided by 2 in frequency and provided as a 5V CMOS level signal to the node. The node then buffers this clock and feeds versions of it to all parts requiring clock. An instance of the buffered node clock is fed back into the network interface. This input NODE_CLK is what the network interface actually uses on all registers which communicate directly with the node. This scheme allows a bounded amount of skew between NCLK and NODE_CLK while keeping the frequencies of the clocks closely coupled. Figure shows this clock arrangement. See (tn37) for a more global description of the MBTA clocking strategy.

Phase Relationships

Figure shows the phase delays between the various clock signals shown in Figure . This diagram underscores the point that there will be a non-trivial phase difference between NCLK and NODE_CLK. Table shows names the timing parameters used throughout this description.

Bridging between Clocks

Since the network interface runs on two distinct clocks, care must be taken in communicating between portions of the design clocked on different clocks. Figure shows the configuration for synchronizing signals between the two clocks. All network interface flip-flops, except for those communicating directly with the node, are clocked with NCLK. Those communicating with the node are clocked with NODE_CLK. We synchronize control signals from the FSMs to flip-flops running from the NODE_CLK by using a single D flip-flop clocked by NCLK. This strategy has the effect of holding all control singals past the end of an NCLK cycle so they are valid at the end of the corresponding NODE_CLK cycle.

Bounds on Clock Skew

This strategy puts a few constraints on the skew between NCLK and NODE_CLK. First, the control signals clocked into the synchronizing D flip-flop must be valid on the output of the D flip-flop prior to the rising edge of NODE_CLK. This requires that the phase difference be at least as large as the time for data to settle on the flip-flop outputs following a rising clock edge ().

Additionally, the rising edge of NODE_CLK must occur sufficiently before the next rising edge of NCLK to meet the hold times required for the control signal.

Clocking Discipline

With the synchronization strategy just described, we must still follow a discipline in providing data and control to these flip-flops.

Finite State Machines

This is currently, very out of date. -- 8/17/93 -- andre

Three FSMs control the flow of data through the network interface component. The proc-io FSM handles read and write operations directly from the node processor to the network interface. Node-out handles the task of writing out network data to the node memory in accordance with the node's bus discipline. The central FSM coordinates all activity between the node and network.

The cannonical source for these state machines currently resides in:

/home/ar/transit/mbta/ni/ucb/. All state machines are qualified with the extension .scm.

Proc-io FSM

This finite state machine recognizes bus transactions requiring the node's attention. As such it services all read and write operations performed to the network interface.

Figure shows the basic behaviour of the proc-io FSM.

Node-out FSM

The node-out FSM deals with staging data and coordinating with the node bus controller to write data out to node memory.

Figure shows the basic flow of control in the node-out FSM. To make the diagram readable, signal assersions have been omitted leaving only the basic state machine flow. Loopback arrows with no qualification are taken when the condition on the exit arrows are not satisfied.

The primary difficulties in this state machine are dealing with the arbitrary phase skew (relative to component's bus cycles) of data arriving on part. This skew arises because of skew in the sender due to which network part is sending (and it could be any of the 4) and how many stages of network exist; this is componded by the fact that there can an arbitrary number of dummy cycles. This means that the data can arrive on the part at virtually any phase relative to the components cycles on the data bus. It could happen that the component gets a full chunk of data to write out just in time to stage it and write the data out on its bus cycle. It could also happen that the data could arrive just one network cycle (10ns cycle) too late requiring that the data be held an entire emulation cycle before being offloaded to memory.

The consequence of this is that some effeciency in hardware and latency through the part is sacrificed to allow this generallity. Given a number of dummy cycles and phase of arriving data, I'm sure it is possible to derive an optimal sequence of data manipulations that minimize latency -- however I don't believe it is possible to derive one that works with the generallity necessary. The data must be double buffered before being fed to the final output buffer. This ineffeciency arises because the final output buffer is clocked on NODE_CLOCK and hence must be valid across two cycles; however, since the phase can be shifted by any amount, it is not possible to guarantee that the data will be stable for any two clock cycles. Instead, we must be able to transfer incomming data to staged data in one cycle. The staged data can then take the two clocks necessary to transfer to the final output register. Also, it would be possible in some cases to note that data will be available a set number of clock cycles in the future and start arbitrarting for bus access before the data is actually ready to be transfered to the staging registers. However, since there can be an arbitrary number of dummy cycles it is not possible to make this prediction with sufficient generallity in a simple manner. Thus, the current solution is inefficient in these manners. Perhaps, it can be optimized some after everything else is working.

Note in Figure that there is a branch of the FSM taken when DUMMY0 is not asserted. DUMMY0 flags the particular case in which the number of dummy cycles is set to zero. In this case, write operations are overlapped with enable and want bus signals for the next write asserted with the address for the last write. In all other cases ( i.e. the number of dummy cycles is greater than zero), writes start and complete in the two node cycles following end of cycle and are not overlapped. See the end of cycle timing diagrams in (tn31).

Central FSM

Shared Bus Usage

(tn31) describes when each network interface part and the processor reference the shared data, address, and control busses. In setting this up in the network interfaces, I found a table as shown in Table useful.

Resets and Initialization

SINIT should not reset the phase of the network parts with respect to the phase set by HINIT. Resetting the phase could leave the 80960 out of phase after SINIT

Things reset by HINIT (only):

Things reset by SINIT (only):

Things reset by CINIT= HINIT or SINIT (only):

Status Bits

Status bits are collected by the status_shift_registers. At present, the shift registers stop collecting status information on the first error ( i.e. shifting in status bits after an error has been detected could cause the failure bit to be erroneously reset).

See Also...

References

DeH90a
Andre DeHon. Forward Checksum. Transit Note 6, MIT Artificial Intelligence Laboratory, May 1990. [tn6 HTML link] [tn6 FTP link].

DeH90b
Andre DeHon. MBTA: Boot Sequence. Transit Note 28, MIT Artificial Intelligence Laboratory, July 1990. [tn28 HTML link] [tn28 FTP link].

DeH90c
Andre DeHon. MBTA: Message Formats. Transit Note 21, MIT Artificial Intelligence Laboratory, June 1990. [tn21 HTML link] [tn21 FTP link].

DeH90d
Andre DeHon. MBTA: Modular Bootstrapping Transit Architecture. Transit Note 17, MIT Artificial Intelligence Laboratory, April 1990. [tn17 HTML link] [tn17 FTP link].

DeH90e
Andre DeHon. MBTA: Network Initialization. Transit Note 27, MIT Artificial Intelligence Laboratory, July 1990. [tn27 HTML link] [tn27 FTP link].

DeH90f
Andre DeHon. MBTA: Network Interface. Transit Note 31, MIT Artificial Intelligence Laboratory, August 1990. [tn31 HTML link] [tn31 FTP link].

DeH90g
Andre DeHon. MBTA: Network Level Transactions. Transit Note 19, MIT Artificial Intelligence Laboratory, June 1990. [tn19 HTML link] [tn19 FTP link].

DeH90h
Andre DeHon. MBTA: Thoughts on Construction. Transit Note 18, MIT Artificial Intelligence Laboratory, June 1990. [tn18 HTML link] [tn18 FTP link].

DeH90i
Andre DeHon. T-Station: The MBTA Host Interface. Transit Note 20, MIT Artificial Intelligence Laboratory, June 1990. [tn20 HTML link] [tn20 FTP link].

DeH91
Andre DeHon. MBTA: Clocking Strategy. Transit Note 37, MIT Artificial Intelligence Laboratory, January 1991. [tn37 HTML link] [tn37 FTP link].

DS90a
Andre DeHon and Thomas Simon. MBTA: Node Architecture. Transit Note 25, MIT Artificial Intelligence Laboratory, July 1990. [tn25 HTML link] [tn25 FTP link].

DS90b
Andre DeHon and Thomas Simon. MBTA: Node Bus Controller. Transit Note 30, MIT Artificial Intelligence Laboratory, August 1990. [tn30 HTML link] [tn30 FTP link].

Min90
Henry Q. Minsky. RN1 Data Router. Transit Note 26, MIT Artificial Intelligence Laboratory, July 1990. [tn26 HTML link] [tn26 FTP link].

MIT Transit Project