Elastic Buffer Flow Control
Further research has been conducted on the elastic buffer (EB) flow-control for on-chip networks. We explored the design space of elastic buffer routers. EB routers are bufferless packet-switched routers. They have the area and energy benefits of circuit-switched routers, without the latency and cost overhead of setting up and tearing down circuits. EB routers operate by using master and slave latches of flip-flops (FFs) as independent storage locations. Therefore, the pipeline FFs in channels are used for buffering, in place of input buffers at the routers.
Figure 1: The enhanced two-stage router
In this work, we evaluate two representative EB router designs -- the enhanced two-stage router and the single-stage router. The enhanced two-stage router is shown in Figure 1. It prioritizes network throughput by reducing cycle time by 42% compared to the baseline two-stage router. It improves the baseline router’s two most important weaknesses. Firstly, it replaces the three-slot output EB with a two-slot EB. This reduces complexity and cost because three-slot EBs are implemented as FIFOs. Furthermore, it allows the router to meet output port timing constraints more easily. Secondly, routing is now performed in a look-ahead manner, in parallel with the other first stage calculations. This removes routing computation from the critical path at configurations where the first stage determines the cycle time.
Figure 2: The enhanced two-stage router
The single-stage router is illustrated in Figure 2. It prioritizes latency instead of throughput and avoids pipelining overhead by merging the two stages of the enhanced two-stage router. Look-ahead routing is performed in parallel, in the same manner as the enhanced two-stage router.
Figure 3: Cycle time
Figure 4: Energy per bit
Evaluation was conducted by placing and routing RTL implementations of the three designs using Synopsys Design Compiler and Cadence Silicon Encounter for placement and routing. We used a low-power 45nm commercial library under worst-case conditions. We assumed 5x5 2D mesh routers using dimension-ordered routing. The datapath width was swept from 29 to 192 bit. Packet size was held constant at 512 bits.
As shown in Figure 3 and Figure 4, the enhanced two-stage router reduces cycle time by 42% compared to the baseline two-stage router. On the other hand, the single-stage router requires 29% less energy to transfer a single bit than the enhanced two-stage router. Moreover, the enhanced two-stage router occupies 20% less area compared to the baseline two-stage router. If each router operates at its maximum clock frequency, the single-stage router offers comparable zero-load latency compared to the enhanced two-stage router. However, if an equal clock frequency is used, the enhanced two-stage router has a zero-load latency increased by 32%.
The optimal router choice depends on the clock frequency used for the routers. If all routers operate at the same clock frequency, the single-stage router is superior in terms of area and latency. If each router operates at its maximum frequency, the optimal choice for area is the enhanced two-stage router. The choice for designs prioritizing latency can depend on how the channel latency in clock cycles is affected by the clock frequency increase. Finally, although the baseline two-stage router provides the smallest energy per transferred bit, it is very close to the single-stage router which is preferable in terms of cycle time, latency and area.
The future research on this topic focuses on exploring EB router architectures with additional FIFO buffers at router inputs. They can provide additional buffering to increase throughput. The increased performance is offset by increased area and energy costs. The purpose of this work is to examine under which network configurations input-buffered EB routers are preferable.
For further information, please contact George Michelogiannakis.