Allocator Circuits
We have developed a full-custom, high-speed wavefront allocator to improve router cycle time in high-radix topologies. In an implementation of a 10x10 allocator our design achieved a 2-3x reduction in latency over a fully-synthesized allocator.
Figure 1 shows the schematics of our wavefront allocator cell. We optimize the x- and y-token propagation path using dual-rail domino logic and exploited the one-hot nature of the priority inputs to remove redundant transistors that cannot be detected through logic simplification alone. The resulting cell has only one domino gate in its critical path and a maximum pull-down depth of two.
Figure 1: High-Speed Wavefront Allocator Cell Schematics
Figure 2 shows how the delay of our 10x10 allocator implemented in a 45nm process changes with area. As devices are upsized the delay asymptotically approaches a minimum of approximately 9.5 FO4 delays. In contrast, the fastest 10x10 allocator we were able to synthesize in the same technology has a delay of 27 FO4 delays.
Figure 2: Delay vs. Area of a 10x10 Wavefront Allocator
Channel Circuits
We have developed low-power channel circuits that reduce the energy required to transport data across a chip by 4-5x compared to a conventional signaling methods. Our design features a self-calibrating mechanism that makes it more tolerant to device mismatches in advanced CMOS processes.
Figure 3 shows the schematics of our low-power channel repeater. Under normal operation (RESET = 0, CAL = 0, PC# = 1) the VDD-referenced, differential low-swing input signal is regenerated by an input-isolated clocked sense amplifier. The result is held in an RS-latch the output of which is used to control a pair of capacitor coupled output drivers. The coupling capacitors limits the output voltages to nominally between VDD and a small negative offset from VDD determined by the ratio of the coupling capacitor to the line capacitance.
![]() |