Transit Note #36
Network Interface
Implementation Notes

Andre DeHon

Original Issue: January 1991

Last Updated: Mon Nov 8 14:52:15 EST 1993

Overview

This document is a serious of notes on the implementation of the network interface component.

Node Clocking

Basic Configuration

The network interface receives its primary clock, NCLK, from the network. NCLK is the same clock seen by the RN1 routing components composing the network. This clock is used for all FSMs and most registers inside of the network interface. NCLK is divided by 2 in frequency and provided as a 5V CMOS level signal to the node. The node then buffers this clock and feeds versions of it to all parts requiring clock. An instance of the buffered node clock is fed back into the network interface. This input NODE_CLK is what the network interface actually uses on all registers which communicate directly with the node. This scheme allows a bounded amount of skew between NCLK and NODE_CLK while keeping the frequencies of the clocks closely coupled. Figure

shows this clock arrangement. See (tn37) for a more global description of the MBTA clocking strategy.

Phase Relationships

Figure shows the phase delays between the various clock signals shown in Figure . This diagram underscores the point that there will be a non-trivial phase difference between NCLK and NODE_CLK. Table shows names the timing parameters used throughout this description.

Bridging between Clocks

Since the network interface runs on two distinct clocks, care must be taken in communicating between portions of the design clocked on different clocks. Figure shows the configuration for synchronizing signals between the two clocks. All network interface flip-flops, except for those communicating directly with the node, are clocked with NCLK. Those communicating with the node are clocked with NODE_CLK. We synchronize control signals from the FSMs to flip-flops running from the NODE_CLK by using a single D flip-flop clocked by NCLK. This strategy has the effect of holding all control singals past the end of an NCLK cycle so they are valid at the end of the corresponding NODE_CLK cycle.

Bounds on Clock Skew

This strategy puts a few constraints on the skew between NCLK and NODE_CLK. First, the control signals clocked into the synchronizing D flip-flop must be valid on the output of the D flip-flop prior to the rising edge of NODE_CLK. This requires that the phase difference be at least as large as the time for data to settle on the flip-flop outputs following a rising clock edge ().

Additionally, the rising edge of NODE_CLK must occur sufficiently before the next rising edge of NCLK to meet the hold times required for the control signal.

Clocking Discipline

With the synchronization strategy just described, we must still follow a discipline in providing data and control to these flip-flops.

Control signals (generally from FSMs) are asserted during the corresponding network cycle. In this manner, the control signals are available at the outputs of the synchronizing flip-flops when rising edge of NODE_CLK occurs.
Data provided to flip-flops clocked with the NODE_CLK should be valid on the cycle the control signals are driven and should remain valid through the following cycle. In this manner, the data is guaranteed to be setup and stable when the NODE_CLK rises. Data is also guaranteed to remain valid until after NODE_CLK rises.
Data clocked from the node must not be sampled until two clock cycles following the assertion of the control signals which latch the data from the node. i.e. Assert enables during one NCLK cycle. At the end of that cycle, the enables are propogated through the synchronizing flip-flops to be enable the node flip-flops for the arrival of NODE_CLK. It is not possible to sample data during this immediately following cycle since NODE_CLK could occur just before the next rising edge of NCLK and there would not be time to do anything with the data. However, by the following cycle, NODE_CLK is guaranteed to have occurred so the data is valid and an entire NCLK cycle is available to act upon the data.
Enable signals (for tristate and bidirectional i/o signals) should be asserted during both phases of a node clock and synchronized through flip-flops like control signals.

Finite State Machines

This is currently, very out of date. -- 8/17/93 -- andre

Three FSMs control the flow of data through the network interface component. The proc-io FSM handles read and write operations directly from the node processor to the network interface. Node-out handles the task of writing out network data to the node memory in accordance with the node's bus discipline. The central FSM coordinates all activity between the node and network.

The cannonical source for these state machines currently resides in:

/home/ar/transit/mbta/ni/ucb/. All state machines are qualified with the extension .scm.

Proc-io FSM

This finite state machine recognizes bus transactions requiring the node's attention. As such it services all read and write operations performed to the network interface.

Figure shows the basic behaviour of the proc-io FSM.

Node-out FSM

The node-out FSM deals with staging data and coordinating with the node bus controller to write data out to node memory.

Figure shows the basic flow of control in the node-out FSM. To make the diagram readable, signal assersions have been omitted leaving only the basic state machine flow. Loopback arrows with no qualification are taken when the condition on the exit arrows are not satisfied.

The primary difficulties in this state machine are dealing with the arbitrary phase skew (relative to component's bus cycles) of data arriving on part. This skew arises because of skew in the sender due to which network part is sending (and it could be any of the 4) and how many stages of network exist; this is componded by the fact that there can an arbitrary number of dummy cycles. This means that the data can arrive on the part at virtually any phase relative to the components cycles on the data bus. It could happen that the component gets a full chunk of data to write out just in time to stage it and write the data out on its bus cycle. It could also happen that the data could arrive just one network cycle (10ns cycle) too late requiring that the data be held an entire emulation cycle before being offloaded to memory.

The consequence of this is that some effeciency in hardware and latency through the part is sacrificed to allow this generallity. Given a number of dummy cycles and phase of arriving data, I'm sure it is possible to derive an optimal sequence of data manipulations that minimize latency -- however I don't believe it is possible to derive one that works with the generallity necessary. The data must be double buffered before being fed to the final output buffer. This ineffeciency arises because the final output buffer is clocked on NODE_CLOCK and hence must be valid across two cycles; however, since the phase can be shifted by any amount, it is not possible to guarantee that the data will be stable for any two clock cycles. Instead, we must be able to transfer incomming data to staged data in one cycle. The staged data can then take the two clocks necessary to transfer to the final output register. Also, it would be possible in some cases to note that data will be available a set number of clock cycles in the future and start arbitrarting for bus access before the data is actually ready to be transfered to the staging registers. However, since there can be an arbitrary number of dummy cycles it is not possible to make this prediction with sufficient generallity in a simple manner. Thus, the current solution is inefficient in these manners. Perhaps, it can be optimized some after everything else is working.

Note in Figure that there is a branch of the FSM taken when DUMMY0 is not asserted. DUMMY0 flags the particular case in which the number of dummy cycles is set to zero. In this case, write operations are overlapped with enable and want bus signals for the next write asserted with the address for the last write. In all other cases ( i.e. the number of dummy cycles is greater than zero), writes start and complete in the two node cycles following end of cycle and are not overlapped. See the end of cycle timing diagrams in (tn31).

Central FSM

Shared Bus Usage

(tn31) describes when each network interface part and the processor reference the shared data, address, and control busses. In setting this up in the network interfaces, I found a table as shown in Table useful.

Resets and Initialization

SINIT should not reset the phase of the network parts with respect to the phase set by HINIT. Resetting the phase could leave the 80960 out of phase after SINIT

Things reset by HINIT (only):

Effective dummy value is cleared
Phase counter

Things reset by SINIT (only):

Effective dummy value is loaded from dummy configuration register

Things reset by CINIT= HINIT or SINIT (only):

End of Cycle counter
all state machine states
error and flag bits
registers needing initialization

Status Bits

Status bits are collected by the status_shift_registers. At present, the shift registers stop collecting status information on the first error ( i.e. shifting in status bits after an error has been detected could cause the failure bit to be erroneously reset).

Transit Note #36
Network Interface
Implementation Notes

Overview

Node Clocking

Basic Configuration

Phase Relationships

Bridging between Clocks

Bounds on Clock Skew

Clocking Discipline

Finite State Machines

Proc-io FSM

Node-out FSM

Central FSM

Shared Bus Usage

Resets and Initialization

Status Bits

See Also...

References

Transit Note #36 Network Interface Implementation Notes

Overview

Node Clocking

Basic Configuration

Phase Relationships

Bridging between Clocks

Bounds on Clock Skew

Clocking Discipline

Finite State Machines

Proc-io FSM

Node-out FSM

Central FSM

Shared Bus Usage

Resets and Initialization

Status Bits

See Also...

References

Transit Note #36
Network Interface
Implementation Notes