Power Analysis for High-Speed I/O Transmitters

Power Analysis for High-Speed I/O Transmitters 

Hamid Hatamkhani, Chih-Kong Ken Yang 

University of California, Los Angeles 

9 Marisol, Newport Coast, CA, 92657, USA 

Tel: 1-949-246-6129, E-mail: hhatam@icsl.ucla.edu 

Abstract 

This paper studies the design tradeoffs to minimize 

power dissipation of multi-Gbps parallel I/O transmitters. A 

macromodel of a transmitter that can be optimized for power 

is presented. Also discussed is a means to consider the impact 

of deterministic jitter due to on-chip buffering on power 

dissipation. The model allows analysis that considers varying 

design constraints, and circuit architectures. The optimization 

results provide some guidance on the choice of architecture, 

and data rate to achieve large aggregate I/O bandwidths. 

Introduction 

As the off-chip bandwidth of ICs is anticipated to 

exceed 1Tb/s within the next 5 years [7], power dissipation by 

the I/O is an increasing concern. In order to maintain 

reasonable power consumption, both the signaling and the 

transmitter architecture must be carefully considered and 

designed for low power. 

This paper considers the three sources of power 

consumption: the signaling power (power driven off-chip), 

driver and pre-driver power (the buffering required to drive 

the signal), and the signal-conditioning power (the power 

required to maintain good signal integrity). By using a 

detailed circuit model that relates output data eye (transmitter 

performance) with transistor size, we can optimize the design 

for power given various link specifications such as data rates, 

signaling levels, signal integrity requirements, and process 

technology. This paper studies the trade-off between the link 

specifications and power dissipation for a multi-Gbps 

transmitter. The optimization would indicate desired data 

rates for optimal power dissipation, and the achievable 

minimum power dissipation for large aggregate bandwidth 

links. 

Transmitter Architecture 

The basic transmitter architecture in this study, as 

shown in Fig. 1, assumes several design conditions. The input 

data rate arrives at half the rate of the transmitted output. The 

capacitance driven by each input data bit is less than the 

equivalent capacitance of a gate width that is 8*Lmin (for 

NMOS). The clock loading is also a gate width of 8*Lmin. 

Additional retiming latches and switches are used to precondition 

the data for pre-emphasis. A pre-driver and driver 

produce the output signal with the desired output swing, 

common-mode voltage and output impedance. 

X[ 

n] 

Mux Buf 

PDrv Drv 

2x1 

Sw 

X[ n −1] 

Fig.1: Transmitter block diagram 

Two common signaling levels are investigated: low 

common-mode (LCM) and high common mode (HCM). With 

LCM, the output driver devices are in triode and provide 

impedance matching (Fig. 2c and Fig. 2d). With HCM, the 

output driver devices work in saturation and perform current 

steering to the output. A 50 ohms resistor provides the 

impedance matching (Fig. 2a and Fig. 2b). 

The amount of swing at the output of the driver is 

determined either by the specification or by calculating the 

amount of attenuation of the channel and sensitivity of the 

receiver. In most of the paper, the nominal output swing is 

250mV. Other swings are investigated later in the paper. 

X 

50 

Vdd 

Out 

50 

Out 

X1 

VG 

Vdd Vdd 

50 

Out 

X1 

(a) (b) (c) (d) 

Fig.2: Output drivers: (a) single-ended HCM (b) differential HCM 

(c) single-ended LCM (d) differential LCM. 

The study not only considers the basic transmitters 

but also the impact of signal integrity. To reduce ISI, a firstorder 

(2-tap) pre-emphasis filter is used in the study. 

(1) 

Our analysis studies a digitally implemented pre-emphasis. 

Switches (4-bits) choose between X[n] or X[ n − 1] 

to program 

η. The output driver is split into binary-weighted branches 

similar to a current-mode (HCM) or resistive (LCM) D/A 

converter [1,2]. Additionally, to consider output-swing 

control (HCM), and output-impedance control (LCM), the 

pre-driver is also designed and properly sized to use a 

feedback control voltage. 

Transmitter Model 

The transmitter power optimization is constrained by 

the desired output data-eye opening. The output voltage swing 

and common-mode level determine the driver architecture and 

transistor sizing. The signal-integrity considerations 

(specifically, swing control, impedance control, and preemphasis) 

determine the pre-driver and switching design. 

The output timing margin defines maximum 

tolerable jitter, both deterministic jitter (DJ) and random jitter 

(RJ). Since RJ is due to the clock source (i.e. a PLL) and 

supply/substrate noise, it is not considered in the model 

currently. DJ at the transmitter output (before the filtering of 

the channel), is primarily due to on-chip ISI that results from 

on-chip low-pass filtering 1 . The filtering is due to transistor 

sizing and fan-out within the transmitter. So one of the key 

optimization included in this model is the efficient device 

sizing to minimize power while constraining DJ. The DJ 

model for a first order system is: 

Tbit 

/ 

DJ ( / Tbit 

) ln( 1 e ) 

τ − 

Y[n] = X[n] -ηX[N 

-1] 

= − τ − 

(2) 

where τ is time constant of the system. 

1 Some amount (commonly

For several stages of logic, we approximate 

deterministic jitter of a stage denoted by i, with the following 

model: 

−X 

−Y 

⎧DJi 

= DJi−1 

− ( Atdi 

ln( 1− 

e ) + Ct di−1 

ln( 1− 

e )) / Tbit 

⎪ 

(3) 

⎨ C( 

1− 

DJi−1 

) Tbit 

D( 

1− 

DJi−1 

) Tbit 

⎪X 

= 

, Y = 

⎩ t di 

t di−1 

A, B, C and D can be approximated by mean-square 

estimation method fitted to simulated data from various logic 

blocks in the transmitter (i.e. inverter chain and transmission 

gates). The delays (tdi and tdi-1) are modeled to within 15% 

error using an α-power model and the Elmore delay formula 

[3,4]. Fig. 3 reports DJ relative error which is normalized to 

the t FO4 

/ T for two inverter chain. Chain 1 is composed of 

b 

six inverters which are sized with constant fanout (FO-4) at 

3Gbps. Chain 2 is six inverters with variable fanout (4, 9/8, 3, 

1.5, 2 and 4) at 3.5Gbps. The simulated data is from a 0.18- 

µm technology. The variable fanout case clearly stresses our 

model resulting in error of 20%. However, such large 

variations in fanout is not expected so the 20% can be 

considered as an error bound. 

%(DJ normalize relative error) 

20 

15 

10 

5 

Chain1: constant FO 

Chain2: variable FO 

0 

1 2 3 4 5 6 

Inverter number in the chain 

Fig.3: DJ normalized relative error. Chain1: 6 inverters sized in FO4 

Chain2: 6 inverters with fanout of: 4, 9/8, 3, 1.5, 2, 4 

The DJ model is used in an optimization to find 

device sizes that satisfy final output eye constraints. After 

determining the optimal device sizes based on the voltage and 

timing constraints, the total power can be calculated as a sum 

of the dynamic (on-chip CV 2 f and crowbar) power, and the 

off-chip signaling power. Note that since active power is the 

dominant, we do not consider leakage. 

We compare the optimization results from our model 

with Spice simulations for the power dissipation of basic 

transmitters. Fig.4 shows a good agreement between the 

results from Spice and the model. In this figure, simulation 

reports smaller values for power than the model. This is due 

to the assumption of switching activity in the model, which is 

different than the one in the simulation. Figure 5 reports the 

% error when comparing our model with Spice as we increase 

the data rate. The bit-time of the plot is normalized 

to t FO4 

/ T . The error is less than b 

± 10% 

. 

Since the signaling power remains essentially 

constant when driving a constant load impedance, one of the 

primary tradeoff in power is on-chip through optimizing the 

sizing for a given DJ. 

Power(mW) 

12 

10 

8 

6 

4 

2 

HCM 

LCM 

Simulation 

Model 

0 

2 3 4 5 6 

(bit time)/(FO4 delay) 

Fig.4: Power values from the model and Spice simulation for singleended 

HCM and LCM architecture with pre-emphasis. DJ=11% of 

bit time 

%(DJ normalized relative error) 

15 

10 

5 

0 

−5 

HCM 

LCM 

−10 

2 3 4 5 6 


Fig.5: Normalized DJ relative error for single-ended HCM and LCM 

architecture with pre-emphasis. DJ=11% of bit time. 

Achieving smaller DJ necessitates using smaller 

fanouts in gate sizing which in turn leads to increase in 

capacitance and power dissipation. Figure 6 shows the power 

of a single-ended LCM transmitter with pre-emphasis for 

varying DJ specifications. Note that reducing the DJ to less 

than 11% causes considerable increase in power. 

From our initial analysis, power can be optimized by 

maximizing the DJ within the desired constraint. Furthermore, 

setting a proper DJ constraint (to no less than 9%) can impact 

power significantly. 

Analysis Using the Optimization Model 

Our analytical model can provide further insight into 

the design tradeoffs. Already shown is the power penalty of 

HCM compared to LCM. The model can also be used to 

illustrate the impact of pre-emphasis. There is clearly a 

tradeoff between power consumption and data rate. For a 

given desired aggregate data rate, there is an optimal data rate 

per pin 2 . 

2 This is equivalent to energy-per-bit


4.5 

4 

3.5 

3 

2.5 

2 

8 10 12 14 16 18 20 

DJ(% of bit time) 

Fig.6: DJ vs. power for a single-ended LCM architecture with preemphasis 

at 4Gbps. Process is 0.18µ. 

The impact of varying the output signal swing or DJ 

constraints can also be studied. Furthermore, by adjusting the 

model fitting parameters for a different technology, we can 

compare the optimal data rate for different technologies. 

Fig.7 shows the results of the model for LCM and 

HCM architectures. Since current standards limit the DJ to be 

less than 13% of bit time [5,6], we use 11% of bit time for the 

maximum tolerable DJ from on-chip buffering. For the LCM 

architecture, applying pre-emphasis is more power hungry at 

higher data rates. This can be explained by the larger logicrelated 

capacitance, which makes dynamic power dominant. 

Differential signaling reduces Ldi/dt noise considerably at the 

expense of more dynamic power budget. However, notice that 

the penalty is not significant, especially in HCM architecture, 

where signaling power is dominant. From the figure it is 

evident that HCM is not an interesting choice for power 

reduction for data rates of less than 4.5Gbps (bit-time of 

222ps in 0.18-µm technology). 


12 

10 

8 

6 

4 

2 

HCM 

LCM 

Simple 

pre−emphasis 

Pre−emphasis,Differential 

0 

2 4 6 8 10 


Fig. 7: Power dissipation in 0.18µ CMOS process. DJ=11% of bit 

time 

We next use the model to analyze the impact of the 

position of the multiplexing in the transmitter architecture. 

The multiplexing function can be placed at various places 

along the transmitter architecture (i.e. buffering, pre-emphasis 

switches, pre-driver, and driver). Figure 9 shows 2 possible 

placements: within the buffering and at the output. The output 

multiplexing implementations for both HCM and LCM is 

shown in Fig. 8. X1 and X2 are the half-rate inputs driven to 

the output on appropriate level of clock signal. Multiplexing 

at the output achieves higher-speed operation by reducing the 

on-chip data rate. However, it is at the expense of more power 

consumption because of larger device sizes and clock power. 

Figure 9 illustrates the increase in power. The on-chip logic 

power essentially doubles, as shown in the LCM case. 


12 

10 

8 

6 

4 

2 

(a) (b) 

Fig8: Output multiplexing (a) HCM (b) LCM 

LCM 

HCM 

Mux(middle) 

Mux(output) 

0 

2 4 6 8 10 


Fig.9: Comparison of two different multiplexing in single-ended 

LCM and HCM architecture with pre-emphasis. DJ=11% of bit time 

To achieve a data channel of a given aggregate data 

rate, transceivers are parallelized in a wide bus. The optimum 

data rate depends on power budget and number of I/O pins. 

Power(W) 

1 

0.9 

0.8 

0.7 

0.6 

CK 

X1 

50 

Vdd 

CK 

X2 

Out 

3.6FO4 

3.3FO4 

0.5 

2.5FO4 

Simple 

0.4 

pre−emphasis 

Pre−emphasis,Differential 

0.3 

2 4 6 8 10 


Fig.10: Power dissipation for sending 1Tbps data in LCM 

architecture. DJ=11% of bit time. Process is 0.18µ 

If the number of pins is not constrained, using the 

model we can find the appropriate data rate per pin for 

minimum power. In LCM architecture, signaling power is 

X1 

CK 

CK 

VG 

Vs 

X2 

CK 

CK 

X1 2 X 

Vs 

Out

dominant only at low data rates. With increasing the data 

rates, dynamic power is dominant. Fig.10 illustrates the 

tradeoff and shows that the optimum bit time for each I/O in 

order to transmit a constant aggregate data rate of 1Tbps is 

3.6 FO4 for a differential transmitter using pre-emphasis. In 

HCM architecture, since power consumption is mainly due to 

signaling, it is more desirable to increase data rate in order to 

reduce the number of signaling I/O outputs. The optimum bit 

time for each I/O in order to transmit a constant aggregate 

data rate of 1Tbps is 2.4FO4 for a single-ended HCM 

architecture with pre-emphasis. However it is 4.7 times more 

power hungry than single-ended LCM architecture with preemphasis. 

Therefore LCM is more interesting for power 

reduction if the number of pins is not constrained. We will 

concentrate on this architecture for the remaining analyses. 

Raising the output swing increases signaling power 

dissipation. However, the larger swing also leads to bigger 

devices at the output driver. With both signaling and dynamic 

power increasing, the shift in the optimal bit-time is not 

immediately. Our optimization shows that the increase in 

dynamic power is less than the increase in signaling power 

and therefore the optimum data rate shifts to higher data rates 

of (1/3FO4). By the same reasoning, decreasing the output 

swing results in the shift of optimum data rate to smaller 

values. Figure 11 shows the corresponding change in power 

and optimal bit-time with varying signal swing. It is 

worthwhile to note that power increases super-linearly with 

signal swing but not quite quadratically since signaling power 

is not the only source of power dissipation. 

Fig.11: Power dissipation for sending 1Tbps data in a single-ended 

LCM architecture with pre-emphasis for various signal swings. 

DJ=11% of bit time. Process is 0.18µ 

Finally, the impact of technology scaling is analyzed. 

As we would expect, technology shrink improves both speed 

and power due to reducing both capacitance and supply 

voltage. Therefore from equations (2,3), it is expected that the 

optimum data rate for signaling at aggregate bandwidth in 

LCM architecture should scale roughly the same as gate 

delay. Figure 12 validates the scaling in power and bit time. 

However, it shows that the optimal bit-time in a LCM 

architecture scales slightly faster than gate delay (2.6FO4 

versus 3.3FO4 in 0.13µm and 0.18µm respectively). The 

reason is that the output swing is not changed in the analysis 

causing a higher ratio of signaling power to dynamic power. 

This effect would not be nearly as noticeable in HCM 

transmitters. 

Power(W) 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

3.3FO4 

2.6FO4 

0.18 

0.13 

0.2 

0 5 10 15 


Fig.12: Power dissipation for sending 1Tbps data in single-ended 

LCM architecture with pre-emphasis. DJ=11% of bit time 

Conclusion 

This paper presents a macromodel of a transmitter 

that can optimize power dissipation under various data rates, 

and signal data-eye constraints. This model is used to provide 

information on the choice of architecture, signal swing, and 

optimum data rate. Using the model, we show that the LCM 

architecture is more efficient for low-power design than HCM 

architecture especially when output swing is not large (less 

than 350mV). We also show that output multiplexing leads to 

higher bandwidth but it is at the expense of 2x the on-chip 

power dissipation because of larger device sizes and clock 

power. The impact is less for HCM because of the smaller 

device sizes and the passive pull-up. When transmitting a 

given aggregate data rate across multiple pins, the optimal bit 

time is larger (roughly 3.5FO4 delays) for LCM than it is for 

HCM (2.4FO4). The optimal data rate scales faster than FO4 

gate delay by technology scaling due to higher ratio of 

signaling power to dynamic power. The analysis reassures us 

that transmitting >1Tb/s of aggregate data rate off-chip is 

clearly possible without excessive power consumption (

Power Analysis for High-Speed I/O Transmitters

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?