Last modified 8 years ago Last modified on 05/07/10 21:29:51


Allocator Circuits

We have developed a full-custom, high-speed wavefront allocator to improve router cycle time in high-radix topologies. In an implementation of a 10x10 allocator our design achieved a 2-3x reduction in latency over a fully-synthesized allocator.

Figure 1 shows the schematics of our wavefront allocator cell. We optimize the x- and y-token propagation path using dual-rail domino logic and exploited the one-hot nature of the priority inputs to remove redundant transistors that cannot be detected through logic simplification alone. The resulting cell has only one domino gate in its critical path and a maximum pull-down depth of two.

figure 1

Figure 1: High-Speed Wavefront Allocator Cell Schematics

Figure 2 shows how the delay of our 10x10 allocator implemented in a 45nm process changes with area. As devices are upsized the delay asymptotically approaches a minimum of approximately 9.5 FO4 delays. In contrast, the fastest 10x10 allocator we were able to synthesize in the same technology has a delay of 27 FO4 delays.

figure 2

Figure 2: Delay vs. Area of a 10x10 Wavefront Allocator

Channel Circuits

We have developed low-power channel circuits that reduce the energy required to transport data across a chip by 4-5x compared to a conventional signaling methods. Our design features a self-calibrating mechanism that makes it more tolerant to device mismatches in advanced CMOS processes.

Figure 3 shows the schematics of our low-power channel repeater. Under normal operation (RESET = 0, CAL = 0, PC# = 1) the VDD-referenced, differential low-swing input signal is regenerated by an input-isolated clocked sense amplifier. The result is held in an RS-latch the output of which is used to control a pair of capacitor coupled output drivers. The coupling capacitors limits the output voltages to nominally between VDD and a small negative offset from VDD determined by the ratio of the coupling capacitor to the line capacitance.

figure 3

Figure 3: Capacitor-coupled low-swing repeater with self-calibration circuitry

As CMOS processes scale down the amount of device mismatch continues to deteriorate, limiting the minimum differential voltage we can use for signaling that can still be reliably regenerated. Simultaneously, the supply voltage for these processes has also decreased from generation to generation. The ever-shrinking gap between the supply and the minimum differential voltage makes it increasingly difficult to save energy using low-swing techniques. Our design addresses this problem by taking advantage of the large capacitance already present in the long channel wires. By selectively injecting charge into the wires we can control the reference voltage of the two differential lines independently, calibrating away any static offsets at the input of the sense amplifier.

The self-calibrating mechanism of our repeater is illustrated in Figure 4. We simulated a segment of the channel for 50 cycles using an alternating data pattern (010101…) and we deliberately introduced a 100mV offset at the receiver inputs. At the start of the simulation RESET = 1 so PC# = 0, CAL = 1 and both lines (IN1, IN1#) are precharged to VDD. Between cycle 2 and 5 RESET is de-asserted and the transmitter begin transmitting the data patterns as shown by the line voltages on IN1 and IN1#. Due to the artificial offset, the receiver mistakenly regenerates every bit to a logical-1 during this period (BIT and BIT#).

figure 4

Figure 4: Self-Calibration Simulation

In cycle 6 the SEL signal is asserted at the transmitter side to indicate the start of the self-calibration process so both IN1 and IN1# are pulled towards VDD. By cycle 7 SEL has propagated to the receiver and PC# and CAL are driven to 0 and 1 respectively. The PC# signal, which is only asserted for one cycle, is used to replenish any charge that might have leaked off the channel wires between successive calibrations and also to reset the DONE signal. The CAL signal drives both differential high independent of the data in the RS-latch. This is necessary for the calibration of the downstream repeater and is held for the duration of the calibration.

Once DONE is de-asserted (cycle 8) the receiver regenerates every cycle and depending on its outcome adjusts the line voltage on either IN1 or IN1# in tiny steps by connecting the small trimming capacitors (pre-discharged) to the channel wires. In this experiment the IN1 line is favored by the offset so the receiver always regenerates to a logical-1. The result is that the IN1 line voltage is gradually pulled down between cycle 8 and cycle 35. In cycle 36 the receiver regenerates to logical-0 for the first time. This signifies that the offset is fully compensated for by the reduced line voltage on IN1 so DONE is asserted to prevent further line voltage adjustments.

In cycle 41 and 42 the SEL signal is de-asserted at the transmitter and receiver respectively and the repeaters are returned to their normal mode of operation. The receiver now correctly regenerates the alternating data pattern as can be seen on BIT and BIT# between cycle 42 and 50.

Figure 5 shows the tradeoff between transmission energy and propagation delay for both the conventional channel and our capacitor-coupled channel. The results shown are for minimum pitched wires. For wires of non-minimum pitch the energy savings of the low-swing channels will be even more pronounced due to the increased wire dominance in overall energy consumption. The result for a low-swing channel design using a second low-voltage supply is also plotted. Compared to the conventional channel our design offers a 4-5x energy saving depending on the speed of operation. The maximum speed of operation is about 400ps/mm versus about 250ps/mm in the conventional channel but our design does not require any explicit retiming elements when crossing clock boundaries since the repeaters are themselves retiming elements.

figure 5

Figure 5: Energy Delay Tradeoff of conventional and low-swing channels


For further information, please contact James Chen.