/****************************************************************************** * Copyright (c) 2011, 2012 University of Pennsylvania * * Permission to use, copy, modify, and distribute this software and * its documentation for any purpose, without fee, and without a * written agreement is hereby granted, provided that the above copyright * notice and this paragraph and the following two paragraphs appear in * all copies. * * IN NO EVENT SHALL THE UNIVERSITY OF PENNSYLVANIA BE LIABLE TO ANY PARTY FOR * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF PENNSYLVANIA * HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * THE UNIVERSITY OF PENNSYLVANIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY * AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON * AN "AS IS" BASIS, AND THE UNIVERSITY OF PENNSYLVANIA HAS NO OBLIGATIONS TO * PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. *******************************************************************************/ CONTENTS I. FILES II. MODULE HIERARCHY III. PACKET FORMAT IV. HOW TO BUILD NETWORK USING SWITCH MODULES V. ROUTING/ARBITRATION FUNCTIONS VI. BLUESPEC/VERILOG INTERFACE VII. TRACE BASED TESTBENCH VIII. REFERENCE I. FILES |--README --README |--design_parameters.par --Design parameters |--network_builder.pl --parse design parameters; build the | network testbench (networkTb.bsv); | write design_parameters.defines |--Makefile --make sim OR make verilog |-- --two types of primitive modules | | the default routing algorithm is | | dimension ordered routing (DOR) | | --all parametrized modules should remain | | intact | |--queue.bsv | |--splitN.bsv | |--mergeN.bsv | |--splitN_queue.bsv | |--mergeN_queue.bsv | |--switchN.bsv | |--bsPE.bsv --dummy PE module | |--switchC.bsv --special purpose switch modules | |--switchEV.bsv | |--switchEH.bsv | |--switch.bsv | `--Makefile --makefile used to compile switch level | modules | |-- --two types of primitive modules | | the default routing algorithm is | | west side first routing (WSF) | `-(same contants as the DOR modules) | |-- | |--trace_gen.pl --generate trace files for simulation | |--gs_trace_gen44.pl --generate 4x4 GraphStep trace files | |--gs_trace_gen88.pl --generate 8x8 GraphStep trace files | |--analyzer.pl --used to analyse simulation log file | |--design_parameters.defines --detailed design parameters for bsv | | modules | |--[compiled bsv files(*.bsv, *.ba, *.bo, *.o, *.h, etc.)] | | --bsv files from will be | | copied into and compiled into | | executable file(run ./networkTb) | |--networkTb.bsv --Top module, generated by | | network_builder.pl | |--Ctrl_Injection_Rate.bin --6-bit binary parameter to control the | | flit injection rate | |--[trace_files(*.bin)] --trace files for each dummy PE | |--[log_file] --simulation log files, need to be | | specified when running the simulation | | (./networkTb > mesh.log) | `--Makefile --makefile used to compile network | testbench | |-- --simulation directory for WSF mesh | `-- | |--implementation.sh --scripts for ISE synthesis & PAR flow | |--xst_switch.script --XST synthesis script | |--[Verilog modules(*.v)] --verilog modules for switches/network | |--[ISE synthesis files] --various ISE files | `--Makefile --makefile used to clean up verilog | modules |-- | |-- --trace files for 4x4 GraphStep benchmarks | |-[name]_true_random_even.4.4.msg_record | |-[name]_true_random_even_load_balance_load_balance.4.4.msg_record | |-[name]_true_mlpart.1.2.false.false.1.4.4.msg_record | `-[name]_true_mlpart.1.2.false.false.1_load_balance_load_balance.4.4.msg_record | `-- --trace files for 8x8 GraphStep benchmarks II. MODULE HIERARCHY --------------------------------------------------------------------------- | Corner/Edge/Interior switch | | (switch.bsv; switchC.bsv; switchEV.bsv; switchEH.bsv) | | | | ---------------- -------------- ---------------- | | | Design | | Routing | | Arbitration | | | | Parameters | | Function | | Function | | | ---------------- -------------- ---------------- | | | | | | | V V V | | --------------------------------------------------------------------- | | | general switch (switchN.bsv) | | | | | | | | ----------------------------- ----------------------------- | | | | | input-queued split | | input-queued merge | | | | | | (splitN_queue.bsv) | | (mergeN_queue.bsv) | | | | | | --------- ---------- | | --------- | | | ---------->| queue | -> | splitN |---------> | queue | | | | | | | --------- ---------- | | --------- | | | | | | | | . ---------- | | | | | ----------------------------- | . -> | mergeN |----------> | | | . ---------- | | | | | | --------- | | | | | | | queue | | | | | | | --------- | | | | | . | | | | | | . ----------------------------- | | | | . . | | | | . | | | | . | | | | | | | --------------------------------------------------------------------- | | | --------------------------------------------------------------------------- The split/merge primitives are highly parametrized to make the design flexible under various network configurations. Input-queued split/merge modules combine FIFO queues with primitives at each input port which simplify the internal interconnection when building general switches. The general switch module is also fully parametrized and can be regarded as an X*X crossbar without built-in routing and arbitration functions. And the special purpose switch modules (e.g. DOR mesh switch on vertical edges) contains all design parameters and functions to shape the general switch module. III. PACKET FORMAT A packet in the split/merge network is composed of one head flit and various body flits. Head flit: |1| destX | destY | PacketLength | Payload | Body flit: |0| Payload | IV. HOW TO BUILD MESH NETWORK USING SWITCH MODULES 1. assign routing and arbitration functions for corner, edge and interior modules; (refer to the next section for function details) 2. define design parameters in design_parameters.par; 3. build top module using network_builder.pl; The top module is the network testbench which assembles switches as well as dummy PEs for trace-based simulation. $perl network_builder.pl 4. compile bsv modules into executable or verilog files; $make simulate_1cyc OR $make simulate_2cyc $make verilog OR $make verilog_2cyc 5a. for simulation: a. goto sim directory; $cd sim_1cyc OR $cd sim_2cyc b. generate trace files for the network; 1) for uniform random traffic: $perl trace_gen.pl <0-fixed pkt length:1-random pkt len> 2) for GraphStep traffic: $perl gs_trace_gen44.pl , <0/1(rnd/mlpart)>, <0/1(non-lb/lb)>, , c. set the traffic injection rate (6-bit binary value from 0~63. the value will be divide by 64 to represent traffic injection rate); $echo "111111" > Ctrl_Injection_Rate.bin d. run the simulation; $./networkTb > mesh.log e. analyze the simulation log file; $./perl analyzer.pl mesh.log 5b. for synthesis: a. goto verilog directory; $cd verilog_1cyc OR $cd verilog_2cyc b. modify the xst script file (xst_switch.script); c. modify user constraint (mkSwitch.ucf); d. modify synthesis & PAR properties in script(implementation.sh); e. run the synthesis & PAR flow; $sh implementation.sh V. ROUTING/ARBITRATION FUNCTIONS Routing and arbitration functions are passed as parameters to the general switch module(mkSwitchN) when building special purpose switches. 1. routing function: function Bit#(sel_width) route_func(Bit#(x_len) destX, Bit#(y_len) destY, Bit#(x_len) myX, Bit#(y_len) myY, Vector#(num_fanout, Bit#(addr_width)) addr, int index); INPUT: destX - X coordinate of the packet destination destY - Y coordinate of the packet destination myX - X coordinate of the switch myY - Y coordinate of the switch addr - used to indicate fullness of output buffers; used for output selection policy in adaptive routing index - used to index split primitives/I/O ports inside the switch, the default ordering of splits for a 5*5 switch in the mesh network is: 0-PE, 1-W, 2-E, 3-N, 4-S; (following the rule: PE < X directions < Y directions for corner and edge switches) OUTPUT: output port index (same order of I/O ports as mentioned above) E.g. routing function for interior switch (switch.bsv), applied with DOR. ----------------------------------------------------------- function Bit#(2) route4_dor_func(Bit#(X_len) destX, Bit#(Y_len) destY, Bit#(X_len) myX, Bit#(Y_len) myY, Vector#(4, Bit#(Addr_width)) addr, int index); //directions in [] are not selectable for DOR routing case (index) matches 0:begin//PE->W,E,N,S if (destX < myX) return 0; else if (destX > myX) return 1; else if (destY < myY) return 2; else return 3; end 1:begin//W->PE,W,N,S if (destX == myX && destY == myY) return 0; else if (destX != myX) return 1; else if (destY < myY) return 2; else return 3; end 2:begin//E->PE,E,N,S if (destX == myX && destY == myY) return 0; else if (destX != myX) return 1; else if (destY < myY) return 2; else return 3; end 3:begin//N->PE,[W],[E],N if (destX == myX && destY == myY) return 0; else return 3; end 4:begin//S->PE,[W],[E],S if (destX == myX && destY == myY) return 0; else return 3; end endcase endfunction ----------------------------------------------------------- 2. arbitration function: function Bit#(sel_width) arbitrate_func(Bit#(sel_width) curSel, Vector#(num_fanout, bit) valid_in, Vector#(num_fanout, Bit#(addr_width)) addr_in)); INPUT: curSel - used to indicate currently selected input port valid_in - used to indicate the data availability from each input port addr_in - used to indicate the number of entries in each input buffer. The buffer fullness policy will choose the port containing smallest number of entries. OUTPUT: input port index E.g. arbitration function for interior switch (switch.bsv), use buffer-fullness based policy with deterministic tie-break rule ----------------------------------------------------------- function Bit#(2) arbitrate4_func (Bit#(2) curSel, Vector#(4, bit) valid, Vector#(4, Bit#(Addr_width)) addr); Bit#(Addr_width) tmp = 0; Bit#(2) tmpSel = 0; if (addr[0] >= addr[1]) begin tmp = addr[0]; tmpSel = 0; end else begin tmp = addr[1]; tmpSel = 1; end if (tmp >= addr[2]) begin end else begin tmp = addr[2]; tmpSel = 2; end if (tmp >= addr[3]) tmpSel = tmpSel; else tmpSel = 3; if (valid[tmpSel] == 0) begin for (Integer i = 0;i < 4;i = i + 1) begin if (valid[i] == 1) tmpSel = fromInteger(i); end end return tmpSel; endfunction ----------------------------------------------------------- VI. BLUESPEC/VERILOG INTERFACE 1. Bluespec switch module interface: The split/merge switches use valid-backpressure flow control method which use two additional wires for control signals in correspond to each data channel. For Bluespec modules, the switch interface has the following format: //for input signals; method Action update(bpX_in, validX_in, dataX_in); //for output signals; method bit bpX_out; method bit validX_out; method Flit_width dataX_out; 2. Verilog switch module interface: Compared with the Bluespec interface, the Verilog module has two additional global signals CLK and RST_N, implicit signals EN_ and RDY_ for Bluespec methods For detailed Bluespec to Verilog name mapping, please refer to: Bluespec user-guide Chap. 9, "Verilog back end". VII. TRACE BASED TESTBENCH 1. Simulation method For detailed simulation method, please refer to my master's thesis "Split/Merge Based Network-On-Chip Design Targeting FPGA Platforms" Chap6.1 methodology. 2. Trace file format The trace files for cycle accurate simulation can be generated by using trace_gen.pl. It creates two types of trace files: peMemory.X.Y.bin (one for each dummyPE, X and Y are PE coordinates) and recMemory.bin. peMemory.X.Y.bin contains message traces for each dummyPE, its contents are: |1|000000000| //beginning of each barrier-synchronized traffic step |0|destX|destY|num_flits| //packet |1|111111111| //EOF recMemory.bin contains the total number of flits for each dummyPE. The testbench use this file to decide when all packets arrive at their destinations and terminate the simulation. VIII. REFERENCE @InProceedings{split_merge_fpt2012, author = {Yutian Huan and Andr\'e DeHon}, title = {{FPGA} Optimized Packet-Switched {NoC} using Split and Merge Primitives}, booktitle = {Proceedings of the International Conference on Field-Programmable Technology}, year = 2012, month = {December}, publisher = {IEEE}, URL={http://ic.ese.upenn.edu/abstracts/split_merge_fpt2012.html} }