Transit Note #4

Shared Randomness

Andre DeHon

Original Issue: May 1990

Last Updated: Tue Nov 9 12:28:58 EST 1993

Problem

As mentioned in (tn1), we would like to be able to cascade routing components in parallel to construct networks with wider data paths. When cascading components, synchronization problems arise keeping parallel components working together. The cascaded components must switch in the same manner. This poses a potential problem when components randomly select outputs. This problem is exasperated further down the network where cascaded components may get inputs from different numbers of routing components due to faults in the network.

Analyzing the Problem

Essentially, we need to be able to keep the state of a set of parallel cascaded routing components identical. While the number of state bits in each router is kept to a minimum, this number is too large to make the state externally visible for sharing among routers. We must be clever about how we guarantee cascaded components stay in sync.

Since each router is self-routing, each router needs to see essentially the same routing bits. To achieve this, we must ensure that each byte in the first word of the parallel data stream is identical and specifies the path through the network. While this is easy, there is an additional requirement that makes the problem hard.

In order for the subsequent routers in the network to see identical inputs across a cascade of parallel routers, all routers in a cascade must route outputs through the same physical ports ( i.e. properly corresponding ports between routers). Additionally, the subsequent stages should get inputs from all the routers in the cascade. If one of the routers in a cascade gets less inputs during an allocate cycle than his neighbors, that router will potentially allocate output ports incorrectly causing problems at subsequent routing stages.

The random selection of an output port among logically equivalent output ports must be identical between routing components in a cascade.

This leaves us with two problems inherent in cascading routing components.

  1. The indeterminacy of the selection of physical routing ports due to pseudo-random port selection makes it difficult to keep cascaded routing components in sync.

  2. As long as the individual byte chunks of the parallel data path can fail independently, down stream routers in a single cascade can see different allocations when faults occur.

Solutions

Sharing

We want to have a reasonably random selection among output ports and consistency among a cascade of routing components. To achieve this, we can have a random bus that feeds into each routing component in the cascade. Each router has a single random out line that supplies one pseudo-random bit. This bit is probably generated as some xor function on some or all of the data bits passing through the component. Each line in the random bus is driven by a random out from a different component in the cascade.

Each chip ``randomly'' selects among outputs based on the value it reads from the random bus. That is, each of the components performs the same logical manipulations on the value read from the random bus in order to determine the random selection for each set of logically equivalent output ports. The simplest scheme would be to use a set of bits from the random bus for each set of outputs.

One good heuristic for deciding the width of the random bus would be to give it enough bits that the random bus bits could be used unqualified to specify the random information for all the output ports on the component. If the component has a dilation of , each set of logically equivalent output ports will require bits. The switch will have radix . The total number of bits required, and hence desired width of the random bus, will be:

Figure shows an example of a the shared random configuration. This shows 4 RN1 style components ( crossbars with dilation 2) cascaded to provide a 32-bit wide data path.

Pseudo-Atomic Multicomponent Failure

To keep the outputs of a the router cascade in sync, we can connect the output control bits of corresponding output ports on the routing components together in a wired-and configuration. This guarantees that the outputs of a cascade of components will only request allocations in sync. If a single output in the cascade does not allocate an output, the others will fail atomically with it.

Each output port also reads the control bit out line. If the component finds this line low when it has attempted to drive it high, the component drops the allocation of that port by resetting the in-use indication in the crosspoint array. This allows the components allocating the output port to sync with the component(s) not allocating it at the end of the clock cycle in which the allocate problem occurred and thus avoid propagating the problem any further. The output port is then uniformly not in use for each component in the cascade on the next cycle.

In the current RN1 implementation, the addition of this kind of deallocation would be trivial. This requires no additional pins to implement.

Figure shows the wired-and configuration on the output control bits.

Analyzing Solution

Randomness

The shared random bus guarantees that all components in a cascade have the same set of random data. This resolves the problem of getting a cascade of components to agree on random bit determination.

Uniform Allocation Across Cascade

The wired-and control bit deals with most problems that occur when faults prevent a cascade of routing chips from either presenting a uniform set of allocated outputs or receiving a uniform set of allocate requests. Clearly, the wired-and guarantees that allocate requests leaving a cascade either atomically succeed or fail. If there are failures in the wires or inputs between the outputs of one cascade of chips and the inputs of the next stage cascade, the later stage should detect the problem in allocation with its wired-and output and correct it there. In this manner the problem is always detected within one network stage and the routers are synchronized.

If there are bit-flips, stuck bits, or other problems in the bits indicating routing direction, the wired-and will often catch the misrouting and inhibit the fragmented allocation from propagating. When the network is lightly loaded, it will always block the bad allocations.

Screw Case

There is only one obvious screw case that this scheme cannot detect. When there are multiple simultaneous allocate requests presented to the cascade, a bit error in the routing bits to one router occurring simultaneous with one or more requests for the same output direction can splice two streams of data together. The bit error will cause a misroute in one or more routers. In some routers the outputs in one direction are all allocated to the non-error allocate requests. In the other router(s), one of the outputs is given to the output in error. The wired-and will detect the problem with a partially allocated set of outputs for the intended output. However, since the control bits are set in the direction with both error and correct requests, the routers cannot distinguish the fact that two streams have been merged.

If this error occurs in the middle of the network, it is quite probably that the routing bits will differ at subsequent stages and the error will eventually be caught and suppressed. If it occurs in the final stage of routing, the data will arrive spliced at the destination. At this point, a checksum on the entire data word can be used to detect the fact that an error has occurred (tn6).

Since allocate requests will probably occur in only a small fraction of a router's cycles, this screw case can occur only infrequently. When it does occur, however, the only problem it poses is providing a path of bad data through the network. The endpoints can use checksums and status information to detect and discard the bad data. The wired-and still keeps the allocation of resources identical between components in a cascade assuring that the cascade can allocate subsequent request correctly.

Path Collapsing

This scheme may perform even better when combined with path collapsing (tn1). As soon as faults are recognized by the wired-and, the path can be incrementally collapsed across the cascade back to the source. If the problem occurred at a prior stage in the network, this would allow even faster recovery.

e.g. Consider a data stream splice occurring in the second stage of routing. Perhaps, this splice is detected in the fourth stage of routing. The fourth stage drops the offending streams and propagates the drop back to the 3rd stage. The drop then propagates stage by stage back through the network. This frees the network resources of transmitting the meaningless data splice earlier than simply allowing the data to completely cross the network and letting the destination router recognize the problem. The amount of benefit this offers will depend on the average length of data streams and the frequency of data stream splices.

Arbitrarily Cascadeable

Both the shared random bus and the wired-and allow us to cascade an arbitrary number of routing components. This cascadibility is trivial to see for the wired-and. For the random bus, we need only note a few properties:

Physical Locality Requirement

This solution does require that components in a cascade be located physically close to each other since they must share a few wires. This sharing prevents us from building byte wide stacks and simply using them in parallel to achieve wider data paths.

Additional Resource Summary

This scheme will require only a few additional pins. Each chip will need a random out line and the () inputs for the random bus. Thus the total number of additional pins is ().

The changes to use a random bus will be minor. They may even managed to utilize less silicon area than the current random scheme.

The control bit outputs will have to be modified so they can be pulled down by a wired-and, if they cannot be already.

Further Thought

Certainly, a few things might require some additional thought here. Are there any screw cases I missed? What is the frequency of the data splicing I described? How much of a detriment does it have on performance? It will probably be necessary to test the behavior of this scheme under ``typical'' loading conditions to get rough answers for the mentioned questions. We may want to provide more information in the status returned by the routing component. Particularly, we might like to distinguish the case in which connections were dropped due to disagreement about output port allocation within a cascade.

See Also...

References

DeH90a
Andre DeHon. Forward Checksum. Transit Note 6, MIT Artificial Intelligence Laboratory, May 1990. [tn6 HTML link] [tn6 FTP link].

DeH90b
Andre DeHon. RN1: Things to Improve Upon. Transit Note 1, MIT Artificial Intelligence Laboratory, April 1990. [tn1 HTML link] [tn1 FTP link].

MIT Transit Project