## MODEL AND DESIGN OF BIPOLAR AND MOS CURRENT-MODE LOGIC:

## CML, ECL and SCL Digital Circuits

## Massimo Alioto Gaetano Palumbo



Kluwer Academic Publishers

MODEL AND DESIGN OF BIPOLAR AND MOS CURRENT-MODE LOGIC

# Model and Design of Bipolar and MOS Current-Mode Logic 

 CML, ECL and SCL Digital Circuits byMassimo Alioto<br>University of Siena,<br>Italy

and
Gaetano Palumbo
University of Catania,
Italy

Springer

A C.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN 1-4020-2878-4 (HB)
ISBN 1-4020-2888-1 (e-book)

Published by Springer,
P.O. Box 17, 3300 AA Dordrecht, The Netherlands.

Sold and distributed in North, Central and South America by Springer,
101 Philip Drive, Norwell, MA 02061, U.S.A.
In all other countries, sold and distributed
by Springer,
P.O. Box 322, 3300 AH Dordrecht, The Netherlands.

Printed on acid-free paper
springeronline.com
All Rights Reserved
© 2005 Springer
No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work.

Printed in the Netherlands.

# To our: <br> Maria Daniela and (mom) Giusi <br> Michela, Francesca \& Chiara 

## CONTENTS

Acknowledgment ..... xii
Preface ..... xiii

1. DEVICE MODELING FOR DIGITAL CIRCUITS ..... 1
(by Gianluca Giustolisi and Rosario Mita)
1.1 PN JUNCTION ..... 1
1.1.1 Reverse Bias Condition ..... 4
1.1.2 Foward Bias Condition ..... 6
1.2 BIPOLAR-JUNCTION TRANSISTORS ..... 8
1.2.1 Basic Operation ..... 9
1.2.2 Early Effect or Base Width Modulation ..... 12
1.2.3 Charge Effects in the Bipolar Transistor ..... 12
1.2.4 Small Signal Model ..... 14
1.3 MOS TRANSISTORS ..... 15
1.3.1 Basic Operation ..... 16
1.3.2 Triode or Linear Region ..... 18
1.3.3 Saturation or Active Region ..... 20
1.3.4 Body Effect ..... 21
1.3.5 p-channel Transistors ..... 22
1.3.6 Charge Effects in Saturation Region ..... 22
1.3.7 Charge Effects in Triode Region ..... 25
1.3.8 Charge Effects in Cutoff Region ..... 26
1.3.9 Small Signal Model ..... 27
1.3.10 Second Order Effects in MOSFET Modeling ..... 29
2. CURRENT-MODE DIGITAL CIRCUITS ..... 35
2.1 THE BIPOLAR CURRENT-MODE INVERTER: BASIC ..... 35
PRINCIPLES
2.2 THE BIPOLAR CURRENT-MODE INVERTER: INPUT- ..... 37
OUTPUT CHARACTERISTICS AND NOISE MARGIN
2.2.1 Differential input/output ..... 37
2.2.2 Single-ended input/output ..... 42
2.2.3 Considerations on the non zero input current ..... 45
2.2.4 Remarks and comparison of differential/single-ended gates ..... 46
2.3 THE BUFFERED BIPOLAR CURRENT-MODE (ECL) ..... 47 INVERTER
2.4 THE MOS CURRENT-MODE INVERTER ..... 49
2.4.1 Static modeling of the PMOS active load ..... 50
2.4.2 Input-output characteristics ..... 53
2.4.3 Evaluation of the noise margin ..... 56
2.4.4 Validation of the static model ..... 56
2.4.5 The buffered MOS Current-Mode inverter and remarks ..... 58
2.5 FUNDAMENTAL CURRENT-MODE LOGIC GATES ..... 61
2.5.1 Principle of operation of Current-Mode gates: the series ..... 61 gating concept
2.5.2 Some examples of Current-Mode series gates ..... 64
2.5.3 Supply voltage limitations in bipolar Current-Mode gates ..... 69
2.5.4 MOS Current-Mode series gates and supply voltage ..... 73limitations
2.6 TYPICAL APPLICATIONS OF CURRENT-MODE ..... 74
CIRCUITS
2.6.1 Radio Frequency applications ..... 74
2.6.2 Optic-fiber communications ..... 79
2.6.3 High-resolution mixed-signal ICs ..... 82
3. DESIGN METHODOLOGIES FOR COMPLEX CURRENT- ..... 85 MODE LOGIC GATES
3.1 BASIC CONCEPTS ON THE DESIGN OF A SERIES GATE ..... 85
3.1.1 Evaluation of function $F\left(X_{1} \ldots X_{n}\right)$ implemented by a given ..... 87 topology
3.1.2 Series-gate implementation of an assigned function ..... 90 $F\left(X_{1} \ldots X_{n}\right)$
3.1.3 Limitations of the general series-gate design approach ..... 95
3.2 A GRAPHICAL REDUCTION METHOD ..... 95
3.2.1 Basic concepts on the graphical approach in [CJ89] ..... 95
3.2.2 A design example ..... 99
3.3 AN ANALYTICAL FORMULATION OF THE DESIGN ..... 103 STRATEGY IN [CJ89]
3.3.1 Analytical interpretation of CPE/NPE ..... 103
3.3.2 Analytical simplification through CPE/NPE: an example ..... 104
3.3.3 Circuit implementation of the simplified function after ..... 108 CPE-NPE
3.4 A VEM-BASED REDUCTION METHOD ..... 109
3.5 INPUT ORDERING VERSUS DESIGN GOAL ..... 113
4. MODELING OF BIPOLAR CURRENT-MODE GATES ..... 119
4.1 INTRODUCTION TO MODELING METHODOLOGIES ..... 119
4.2 AN EFFICIENT APPROACH FOR CML GATES ..... 122
4.3 SIMPLE MODELING OF THE CML INVERTER ..... 124
4.3.1 Accuracy of the CML simple model ..... 127
4.4 ACCURATE MODELING OF THE CML INVERTER ..... 132
4.4.1 Accuracy of the CML accurate model ..... 133
4.5 SIMPLE AND ACCURATE MODELING OF THE ECL ..... 134
INVERTER
4.5.1 Validation and improvement of the ECL model ..... 138
4.6 SIMPLE MODELING OF BIPOLAR CML MUX/XOR GATES ..... 142
4.6.1 Validation of the MUX/XOR model ..... 145
4.6.2 Extension to the MUX/XOR when upper transistors switch ..... 147
4.7 ACCURATE MODELING OF BIPOLAR CML MUX/XOR ..... 148
GATES AND EXTENSION TO ECL GATES
4.8 EVALUATION OF CML/ECL GATES INPUT CAPACITANCE ..... 150
4.9 BIPOLAR CURRENT-MODE D LATCH ..... 151
5. OPTIMIZED DESIGN OF BIPOLAR CURRENT-MODE ..... 157 GATES
5.1 INTRODUCTION TO OPTIMIZED METHODOLOGY IN ..... 157
CML GATES
5.2 OPTIMIZED DESIGN OF THE CML INVERTER ..... 160
5.2.1 Design with minimum transistor area ..... 160
5.2.2 Design with non-minimum transistor area ..... 162
5.2.3 Design examples ..... 164
5.3 OPTIMIZED DESIGN OF THE ECL INVERTER ..... 165
5.4 COMPARISON BETWEEN THE CML AND THE ECL ..... 169 INVERTER
5.5 OPTIMIZED DESIGN OF BIPOLAR CURRENT-MODE ..... 174
MUX/XOR AND D LATCH
5.5.1 Design of MUX/XOR CML gates with minimum transistor ..... 174 area
5.5.2 Design of MUX/XOR CML gates with non-minimum ..... 179 transistor area and examples
5.5.3 Design of the CML D latch ..... 181
5.5.4 Design examples ..... 184
5.6 SUMMARY AND REMARKS ..... 184
6. MODELING OF MOS CURRENT-MODE GATES ..... 187
6.1 INTRODUCTION TO THE DELAY MODELING OF MOS ..... 187
CURRENT-MODE GATES
6.2 MODELING OF THE SOURCE-COUPLED INVERTER ..... 188
6.2.1 Circuit model of the PMOS active load ..... 189
6.2.2 Delay model of the SCL inverter for a step input ..... 192
6.2.3 Extension of the delay model to arbitrary input waveforms ..... 197
6.3 MODELING OF THE SOURCE-COUPLED INVERTER ..... 198 WITH OUTPUT BUFFERS
6.4 MODELING OF THE SOURCE-COUPLED MUX/XOR GATE ..... 204
6.4.1 Delay model of the MUX/XOR gate without output buffer ..... 204
6.4.2 Validation of the model of MUX/XOR gate without output ..... 208 buffer
6.4.3 MUX/XOR with the upper transistor switching ..... 211
6.4.4 Delay model of the MUX/XOR gate with output buffers ..... 212
6.5 EVALUATION OF SCL GATES INPUT CAPACITANCE ..... 213
AND EXTENSION TO THE D LATCH
7. OPTIMIZED DESIGN OF MOS CURRENT-MODE GATES ..... 219
7.1 INTRODUCTION TO OPTIMIZED DESIGN OF SCL GATES ..... 219
7.2 OPTIMIZED DESIGN METHODOLOGY IN SCL GATES ..... 220 WITHOUT OUTPUT BUFFERS
7.2.1 Power-efficient design ..... 221
7.2.2 High-speed design ..... 222
7.2.3 Low-power design ..... 224
7.2.4 Remarks on the delay dependence on bias current and ..... 225 logic swing
7.3 TRANSISTOR SIZING TO MEET NOISE MARGIN ..... 226 SPECIFICATION
7.3.1 Design criteria for $V_{\text {SWING }}$ and $A_{V}$ to meet a noise margin ..... 226 specification
7.3.2 Transistor sizing versus $I_{S S}$ ..... 228
7.3.3 Summary and remarks on the transistor sizing versus $I_{S S}$ ..... 230
7.4 OPTIMIZED DESIGN OF THE SOURCE-COUPLED ..... 232 INVERTER
7.4.1 Delay expression versus bias current and logic swing in ..... 233 region $M$
7.4.2 Delay expression versus bias current and logic swing in ..... 236 region $L$ and $H$
7.4.3 Extension of the delay model in the region $M$ to region $L$ ..... 238
and H: a unified expression of delay and remarks
7.4.4 Design criteria and examples ..... 241
7.4.5 Intuitive understanding of the delay dependence on logic ..... 244swing and voltage gain in practical design cases
7.5 OPTIMIZED DESIGN OF THE SOURCE-COUPLED ..... 245 INVERTER WITH OUTPUT BUFFERS
7.5.1 Buffer used as a level shifter ..... 246
7.5.2 Buffer used to improve speed ..... 249
7.6 OPTIMIZED DESIGN OF THE SOURCE-COUPLED ..... 253MUX/XOR AND D LATCH
7.6.1 MUX/XOR delay expression versus bias current and logic ..... 253
swing with the lower transistors switching
7.6.2 MUX/XOR delay expression versus bias current and logic ..... 259 swing for input applied to upper transistors
7.6.3 Delay dependence on logic swing ..... 260
7.6.4 Extension to D latch ..... 261
7.7 OPTIMIZED DESIGN OF THE SOURCE-COUPLED MUX/XOR AND D LATCH WITH OUTPUT BUFFERS
7.8 COMPARISON OF GATES ANALYZED AND EXTENSION TO ARBITRARY SCL LOGIC GATES
8. APPLICATIONS AND REMARKS ON CURRENT-MODE ..... 265 DIGITAL CIRCUITS
8.1 RING OSCILLATORS ..... 265
8.1.1 Bipolar CML ring oscillators ..... 268
8.1.2 Validation of the oscillation frequency in a CML ring ..... 273 oscillator
8.1.3 Remarks on the oscillation amplitude in a CML ring ..... 277 oscillator
8.1.4 CMOS SCL ring oscillators ..... 278
8.2 FREQUENCY DIVIDERS ..... 278
8.2.1 Design of the first stage ..... 280
8.2.2 Design of successive stages ..... 281
8.2.3 Design considerations and examples ..... 283
8.3 LOW-VOLTAGE BIPOLAR CURRENT-MODE ..... 284
TOPOLOGIES
8.3.1 Low-voltage CML by means of the triple-tail cell ..... 285
8.3.2 Analysis of the low-voltage CML D latch static operation ..... 288
8.3.3 Delay of the low-voltage CML D latch ..... 290
8.3.4 Comparison of the low-voltage and traditional CML $D$ ..... 293 latch designed for high speed
8.3.5 Comparison of the low-voltage and traditional CML $D$ ..... 296 latch designed for a low power consumption
8.3.6 Summary of results and remarks ..... 297
8.4 OPTIMIZED DESIGN STRATEGIES FOR CASCADED ..... 298 BIPOLAR CURRENT-MODE GATES
8.4.1 Design of CML non-critical paths with a constraint on the ..... 299 overall bias current
8.4.2 Design of CML critical paths with a constraint on the ..... 302 overall bias current
8.4.3 Design of CML critical paths with a constraint on the ..... 304 overall bias current and equal transistors' emitter area
REFERENCES ..... 307
ABOUT THE AUTHORS ..... 317

## ACKNOWLEDGMENT

The authors wish to thank Prof. Salvatore Pennisi for its help during the correction of the draft.

We would like to thank our families and parents for their endless support and interest in our careers.

Massimo Alioto
Gaetano Palumbo

## PREFACE

Current-Mode digital circuits have been extensively analyzed and used since the early days of digital ICs. In particular, bipolar Current-Mode digital circuits emerged as an approach to realize digital circuits with the highest speed. Together with its speed performance, CMOS Current-Mode logic has been rediscovered to allow logic gates implementations which, in contrast to classical VLSI CMOS digital circuits, have the feature of low noise level generation. Thus, CMOS Current-Mode gates can be efficiently used inside analog and mixed-signal ICs, which require a low noise silicon environment. For these reasons, until today, many works and results have been published which reinforce the importance of Current-Mode digital circuits.

In the topic of Current-Mode digital circuits, the authors spent a lot of effort in the last six years, and their original results highly enhanced both the modeling and the related design methodologies. Since the fundamental Current-Mode logic building block is the classical differential amplifier, the winning idea, that represents the starting point of the authors' research, was to change the classical point of view typically followed in the investigation and design of Current-Mode digital circuits. In particular, they properly exploited classical paradigms developed and used in the analog circuit domain (a topic in which one of the authors maturated a great experience). The change of perspective allowed to collect many results in the domain of Current-Mode digital circuits. Such results represent a complete set of tools to be used during the modeling and design process of these highperformance digital circuits, that are accurate, but so simple to be even used in a pencil and paper approach.

The main focus of this book will be to provide the reader with a deep understanding of modeling and design strategies of Current-Mode digital circuits, as well as to organize in a coherent manner all the authors' results in the domain of Current-Mode digital circuits. Hence, the book allows the reader not only to understand the operating principle and the features of bipolar and MOS Current-Mode digital circuits, but also to design optimized digital gates.

The book can be used as a reference to practicing engineers working in this area and as text book to senior undergraduate, graduate and postgraduate students (already familiar with electronic circuits and logic gates) who want to extend their knowledge and cover all aspects of the
analysis and design of Current-Mode digital circuits. Thus the prerequisites to a well understanding of the book are basic electronics and familiarity with digital circuits design.

Although the material is presented in a formal and theoretical manner, much emphasis is devoted to a design perspective. Indeed, the book can be a valid reference for high-performance digital circuits IC designers.

To further link the book's theoretical aspects with practical issues, and to provide the reader with an idea of the real order of magnitude involved assuming actual technologies, numerical examples together with SPICE simulations are included in the book.

The outline of the text is as follows:
An introduction to the operating principles of Bipolar and MOS transistors is presented in the first invited chapter, where circuit models are also introduced and developed with emphasis on those more suited for Current-Mode digital circuits. Then the Current-Mode logic along with their typical applications are discussed in the second chapter.

In Chapter 3 the various techniques to define the topology of Currentmode gates (series gates) are reviewed, by highlighting their strength and weakness.

All the remaining chapters organically report the results developed by the authors. In particular, the propagation delay models for bipolar CurrentMode gates are discussed in Chapter 4. Both a simple and an accurate model is derived for CML and ECL inverter, MUX/XOR and D latch. The resulting models, which are simple, help the reader to gain an in-depth view of the fundamental parameters affecting the speed performance of a single gate. By using these models, gate-level design strategies to achieve the best performance in terms of speed is achieved in Chapter 5. After a brief introduction, the approach is customized for CML and ECL inverter, MUX/XOR and D latch. Moreover, a comparison between the CML and the ECL logic is also carried out.

Chapter 6 changes the context moving into the MOS technology domain. Timing models for the SCL inverter without and with buffer, MUX/XOR and D latch are developed and discussed in detail. The models consider in depth the peculiarity of modern CMOS processes, and hence include submicron effects. Indeed, they are analytically derived by using standard BSIM3v3 model parameters. Starting from these delay models and considering the static behavior of the SCL gates, design strategies are derived in detail for the inverter without and with buffer, MUX/XOR and D latch.

Chapter 8 completes modeling and design aspects for special classes of circuits blocks, such as ring oscillators, frequency dividers and low-voltage
gate topologies. Moreover, this chapter extends the optimized design procedure to chains of gates.

## Chapter 1

## DEVICE MODELING FOR DIGITAL CIRCUITS Gianluca Giustolisi and Rosario Mita

This chapter will deal with the operation and modeling of semiconductor devices in order to give the reader a basis for understanding, in a simple and efficient manner, the operation of the main building blocks of digital circuits.

### 1.1 PN JUNCTION

A semiconductor is a crystal lattice structure with free electrons and/or free holes or, which is the same, with negative and/or positive carriers. The most common semiconductor is silicon which, having a valence of four, allows its atoms to share four free electrons with neighboring atoms thus forming the covalent bonds of the crystal lattice.

In intrinsic silicon, thermal agitation can endow a few electrons with enough energy to escape their bonds. In the same way, they leave an equal number of holes in the crystal lattice that can be viewed as free charges with an opposite sign. At room temperature, we have $1.5 \cdot 10^{10}$ carriers of each type per $\mathrm{cm}^{3}$. This quantity is referred to as $n_{i}$ and is a function of temperature as it doubles for every $11{ }^{\circ} \mathrm{C}$ increase in temperature [S81], [MK86].

This intrinsic quantity of free charges is not sufficient for the building of microelectronic devices and must be increased by doping the intrinsic silicon. This means adding negative or positive free charges to the pure material. Several doping materials can be used to increase free charges. Specifically, when doping pure silicon with a pentavalent material (that is, doping with atoms of an element having a valence of five) we have almost one extra free electron that can be used to conduct current for every one atom of impurity. Likewise, doping the pure silicon with atoms having a
valence of three, gives us almost one free hole for every impurity atom. A pentavalent atom donates electrons to the intrinsic silicon and is known as a donor. In contrast, a trivalent atom accepts electrons and is known as an acceptor. Typical pentavalent impurities, also called n-type dopants, are arsenic, As, and phosphorus, P, while the most used trivalent impurity, also called p-type dopant, is boron, B. Silicon doped with a pentavalent impurity is said to be n-type silicon, while silicon doped with a trivalent impurity is called p-type silicon.

If we suppose that a concentration $N_{D}\left(N_{A}\right)$ of donor (acceptor) atoms is used to dope the silicon, with $N_{D} \gg n_{i}\left(N_{A} \gg n_{i}\right)$, the concentration of free electrons (holes) in the n -type ( p -type) material, $n_{n}\left(p_{p}\right)$, can be assumed as

$$
\begin{equation*}
n_{n} \approx N_{D} \quad\left(p_{p} \approx N_{A}\right) \tag{1.1}
\end{equation*}
$$

Since some free electrons (holes) recombine with holes (electrons), the concentration of holes (electrons) in the n-type (p-type) material, $p_{n}\left(n_{p}\right)$, is also reduced to

$$
\begin{equation*}
p_{n} \approx \frac{n_{i}^{2}}{N_{D}} \quad\left(n_{p} \approx \frac{n_{i}^{2}}{N_{A}}\right) \tag{1.2}
\end{equation*}
$$

Joining a p-type to an n-type semiconductor as in Fig. 1.1 makes a pn junction, or most commonly a diode. The p-side terminal is called anode (A) while the n -side terminal is called cathode ( K ).

Note that the p-type section is denoted with $\mathrm{p}+$, meaning that this side is doped more heavily (in the order of $10^{20}$ carriers $/ \mathrm{cm}^{3}$ ) than its n -type counterpart (in the order of $10^{15}$ carriers $/ \mathrm{cm}^{3}$ ), that is $N_{A} \gg N_{D}$. This is not a limitation since most pn junctions are built with one side more heavily doped than the other.

Close to the junction, free electrons on the n side are attracted by free positive charges on the p side so they diffuse across the junction and recombine with holes. Similarly, holes on the p side are attracted by electrons on the n side, diffuse across the junction and recombine with free electrons on the n side.

This phenomenon leaves behind positive ions (or immobile positive charges) on the n side, and negative ions (or immobile negative charges) on the p side, thus creating a depletion region across the junction where no free carriers exist. Moreover, since charge neutrality obliges the total amount of charge on one side to be equal to the total amount of charge on the other, the width of the depletion region is greater on the more lightly doped side, that is, in our case where $N_{A} \gg N_{D}$, we have $x_{n} \gg x_{p}$.


Fig. 1.1. pn junction.

Due to immobile charges, an electric field appears from the $n$ side to the $p$ side and generates the so-called built-in potential of the junction. This potential prevents further net movement of free charges across the junction under open circuit and steady-state conditions. It is given by [S81], [MK86]

$$
\begin{equation*}
\Phi_{0}=V_{T} \ln \left(\frac{N_{A} N_{D}}{n_{i}^{2}}\right) \tag{1.3}
\end{equation*}
$$

$V_{T}$ being the thermal voltage defined as

$$
\begin{equation*}
V_{T}=\frac{k T}{q} \tag{1.4}
\end{equation*}
$$

where $T$ is the temperature in degrees Kelvin ( $\approx 300 \mathrm{~K}$ at room temperature), $k$ is the Boltzmann's constant $\left(1.38 \cdot 10^{-23} \mathrm{JK}^{-1}\right)$ and $q$ is the charge of an electron $\left(1.602 \cdot 10^{-19} \mathrm{C}\right)$. At room temperature, $V_{T}$ is approximately equal to 26 mV . Typical values of the built-in potential are around 0.9 V .

Under open circuit and steady-state conditions, it can be shown that the widths of depletion regions are given by the following equations

$$
\begin{equation*}
x_{n}=\left[\frac{2 \varepsilon_{s i} \varepsilon_{0} \Phi_{0}}{q} \frac{N_{A}}{N_{D}\left(N_{A}+N_{D}\right)}\right]^{1 / 2} \approx\left[\frac{2 \varepsilon_{s i} \varepsilon_{0} \Phi_{0}}{q N_{D}}\right]^{1 / 2} \tag{1.5a}
\end{equation*}
$$

$$
\begin{equation*}
x_{p}=\left[\frac{2 \varepsilon_{s i} \varepsilon_{0} \Phi_{0}}{q} \frac{N_{D}}{N_{A}\left(N_{A}+N_{D}\right)}\right]^{1 / 2} \approx\left[\frac{2 \varepsilon_{s i} \varepsilon_{v} \Phi_{0} N_{D}}{q N_{A}^{2}}\right]^{1 / 2} \tag{1.5b}
\end{equation*}
$$

where $\varepsilon_{0}$ is the permittivity of free space $\left(8.854 \cdot 10^{-12} \mathrm{~F} / \mathrm{m}\right), \varepsilon_{s i}$ is the relative permittivity of silicon (equal to 11.8) and where the approximations hold if $N_{A} \gg N_{D}$.

Dividing (1.5a) by (1.5b) yields

$$
\begin{equation*}
\frac{x_{n}}{x_{p}}=\frac{N_{A}}{N_{D}} \tag{1.6}
\end{equation*}
$$

which justifies the fact that $x_{n}$ is greater than $x_{p}$ if p-type semiconductor is more heavily doped than n-type.

The charge stored in the depletion region, per unit device area, is found by multiplying the width of the depleted area by the concentration of the immobile charge, which can be considered equal to $q$ times the doping concentration. So for both the sides of the device we have

$$
\begin{align*}
& Q^{+}=q N_{D} x_{n}=\left(2 q \mathcal{E}_{s i} \varepsilon_{0} \Phi_{0} \frac{N_{A} N_{D}}{N_{A}+N_{D}}\right)^{1 / 2}  \tag{1.7a}\\
& Q^{-}=q N_{A} x_{p}=\left(2 q \varepsilon_{s i} \varepsilon_{0} \Phi_{0} \frac{N_{A} N_{D}}{N_{A}+N_{D}}\right)^{1 / 2} \tag{1.7b}
\end{align*}
$$

Note that the charge stored on the n side equals the charge stored on the p side, as is expected due to the charge neutrality.

All the above equations are valid in the case of abrupt junctions. For graded junctions, that is where the doping concentration changes smoothly from p to n , a better model for the charge can be described by changing the exponent in (1.5) and (1.7) by (1-m) [AM88], where $m$ is a technology dependent parameter (typical $m$ values are around 1/3).

### 1.1.1 Reverse Bias Condition

By grounding the anode and applying a voltage $V_{R}$ to the cathode, we reverse-bias the device. Under such a condition the current flowing through the diode is mainly determined by the junction area and is independent of $V_{R}$. In many cases this current is considered negligible and the device is modeled as an open circuit. However, the device also has a charge stored in the
junction that changes with the voltage applied and causes a capacitive effect, which cannot be ignored at high frequencies. The capacitive effect is due to the so-called junction capacitance.

Specifically, when the diode is reverse biased as in Fig. 1.2, free electrons on the n side are attracted by the positive potential $V_{R}$ and leave behind positive immobile charges. Similarly, free holes in the p region move towards the anode leaving behind negative immobile charges. This means that the depletion region increases and that the built-in potential increases exactly by the amount of applied voltage, $V_{R}$.

Given that the built-in potential is increased by $V_{R}$, both the width and the charge of the depletion region can be found by substituting the term $\Phi_{0}+V_{R}$ to $\Phi_{0}$ in (1.5) and (1.7), respectively. In particular the charge stored results as

$$
\begin{equation*}
Q^{+}=Q^{-}=\left[2 q \varepsilon_{s i} \varepsilon_{0}\left(\Phi_{0}+V_{R}\right) \frac{N_{A} N_{D}}{N_{A}+N_{D}}\right]^{1-m} \tag{1.8}
\end{equation*}
$$

This charge denotes a non-linear charge-voltage characteristic of the device, modeled by a non-linear capacitor called junction capacitance.

For small changes in the applied voltage around a bias value, $V_{R}$, the capacitor can be viewed as a small-signal capacitance, $C_{j}$, whose expression is found by differentiating ${ }^{1}$ (1.8) with respect to $V_{R}$

$$
\begin{equation*}
C_{j}=\frac{d Q^{+}}{d V_{R}}=\frac{C_{j 0}}{\left(1+\frac{V_{R}}{\Phi_{0}}\right)^{m}} \tag{1.9}
\end{equation*}
$$

where

$$
\begin{equation*}
C_{j 0}=\left(\frac{q \varepsilon_{s i} \varepsilon_{0}}{2 \Phi_{0}} \frac{N_{A} N_{D}}{N_{A}+N_{D}}\right)^{1-m} \tag{1.10}
\end{equation*}
$$

is a capacitance per unit of area and depends only on the doping concentration.

For large changes of the reverse voltage across the junction capacitance, as it happens mostly in digital circuits, small-signal capacitance must be

[^0]replaced by large-signal equivalent linear capacitance which displaces equal charge over voltage swing of interest [R96]
\[

$$
\begin{equation*}
\bar{C}_{j}=\frac{\Delta Q^{+}}{\Delta V_{R}}=\frac{Q^{+}\left(V_{2}\right)-Q^{+}\left(V_{1}\right)}{V_{2}-V_{1}} \tag{1.11}
\end{equation*}
$$

\]

where $V_{2}$ and $V_{1}$ are the highest and the lowest reverse voltages applied on the pn junction, respectively.

Evaluating (1.8) for $V_{R}=V_{2}$ and $V_{R}=V_{1}$, substituting in (1.11) follows that

$$
\begin{equation*}
\bar{C}_{j}=K_{e q} C_{j 0} \tag{1.12}
\end{equation*}
$$

where $C_{j 0}$ is again given by (1.10) and

$$
\begin{equation*}
K_{e q}=\Phi_{0}^{m} \frac{\left(\Phi_{0}-V_{1}\right)^{1-m}-\left(\Phi_{0}-V_{2}\right)^{1-m}}{(1-m)\left(V_{2}-V_{1}\right)} \tag{1.13}
\end{equation*}
$$



Fig. 1.2. Reverse-biased pn junction.

### 1.1.2 Forward Bias Condition

With reference to Fig. 1.3, by grounding the cathode and applying a voltage $V_{D}$ to the anode, we forward-bias the device. Under this condition the built-in potential is reduced by the amount of voltage applied.

Consequently, the width of the depletion regions and the charge stored in the junction are reduced, too.

If $V_{D}$ is large enough, the reduction in the potential barrier ensures the electrons in the n side and the holes in the p side are attracted by the anode and the cathode, respectively, thus crossing the junction. Once free charges cross the depletion region, they become minority carriers on the other side and a recombination process with majority carriers begins. This recombination reduces the minority carrier concentrations that assume a decreasing exponential profile. The concentration profile is responsible for the current flow near the junction, which is due to a diffusive phenomenon that is called diffusion current. On moving away from the junction, some current flow is given by the diffusion current and some is due to majority carriers that, coming from the terminals, replace those carriers recombined with minority carriers or diffused across the junction. This latter current is termed drift current.

This process causes a current to flow through the diode that is exponentially related to voltage $V_{D}$ as follows

$$
\begin{equation*}
I_{D}=A_{D} J_{S} \exp \left(\frac{V_{D}}{V_{T}}\right) \tag{1.14}
\end{equation*}
$$

where $A_{D}$ is the junction area and $J_{S}$ the scale current density which is inversely proportional to the doping concentrations. The product $A_{D} J_{S}$ is often expressed in terms of a scale current and denoted as $I_{s}$.


Fig. 1.3. Forward-biased pn junction.

As far as the charge stored in the device is concerned, we have two contributions under the forward bias condition. The first is given by the charge stored in the depletion region that yields a junction capacitance that can be expressed by (1.9) or (1.12) for small signal variation or large signal swing of the forward voltage applied, respectively. In any case, this contribution is very small and the junction capacitance negligible.

The second contribution takes into account the charge due to minority carrier concentrations close to the junction that are responsible for the diffusion current. This component yields a diffusion capacitance, $C_{d}$, which is proportional to the current $I_{D}$ as follows [S81], [MK86]

$$
\begin{equation*}
C_{d}=\tau_{T} \frac{I_{D}}{V_{T}} \tag{1.15}
\end{equation*}
$$

where $\tau_{I}$ is a technology parameter known as the transit time of the diode.
Observe that $C_{d}$ is a small-signal capacitance which is valid around the given bias voltage. Therefore, diffusion capacitance must be corrected when a large forward voltage swing is applied across the pn junction because $I_{D}$ is strongly depending by forward voltage applied (1.14).

Similar to the junction capacitance, a large-signal equivalent diffusion capacitance can be defined as follows [R96]

$$
\begin{equation*}
\bar{C}_{d}=\frac{\tau_{T}}{V_{2}-V_{1}} A_{D} J_{S} e^{\frac{V_{A}}{V_{2}}}=\frac{V_{T}}{V_{2}-V_{1}}\left[C_{d}\left(V_{2}\right)-C_{d}\left(V_{1}\right)\right] \tag{1.16}
\end{equation*}
$$

### 1.2 BIPOLAR-JUNCTION TRANSISTORS

Bipolar transistors or BJT were widespread until the end of seventies when MOS technology started to become popular thanks to the fact that a larger number of transistors could be put together in a single integrated circuit. Although bipolar digital circuits designs occupy a small slice of the digital market, they still are the technology choice in very high performance applications. With respect to MOS, bipolar transistors have the advantage of a larger transconductance factor, $g_{m}$, and a larger output resistance, $r_{c}$, so they exhibit better performance in terms of current driving capability and achievable voltage gain. Moreover, the very high unity-gain frequency of bipolar transistors (in the order of $20-50 \mathrm{GHz}$ ), make them suitable for highspeed digital applications.


Fig. 1.4. Bipolar-Junction Transistor cross section.

A typical simplified BJT cross-section is shown in Fig. 1.4, where the socalled npn vertical transistor is depicted. It can be seen as two back-to-back diodes because it is made up of two n-regions separated by a p-region called base. The actual base region is the gray p-region in the figure whose width, $W_{B}$, is small with respect to the other proportions and in modern bipolar processes is between $0.5-0.8 \mu \mathrm{~m}$. This region has a medium doping concentration, in the order of $10^{17}$ carriers $/ \mathrm{cm}^{3}$. The emitter is the heavily doped $\mathrm{n}+$ region in the figure. It has a width of a few $\mu \mathrm{m}$ and its doping concentration is in the order of $10^{21}$ carriers $/ \mathrm{cm}^{3}$. Finally, the actual collector region is the gray n - epitaxial layer in the figure. The collector doping concentration is in the order of $10^{15}$ carriers $/ \mathrm{cm}^{3}$. To reduce the resistive path that connects the actual collector region to the collector contact, a heavily doped buried layer is grown below the device. The gray area represents the region where the so-called transistor effect takes place and is the actual npn transistor. Since this area extends vertically, the transistor is said to be vertical. Finally, note that, unlike for MOSFETs, the transistor is not symmetric.

### 1.2.1 Basic Operation

The operation mode of the transistor changes depending upon the voltages applied over the device terminals. In digital circuits, the transistor operates by preference cut-off mode (i.e., both junctions are reverse-biased either) or in forward-active mode (i.e., the emitter-base junction is forwardbiased and the collector-base junction is reverse-biased). Operation in saturation region (i.e., both junctions in forward-bias condition) is avoided, as the circuit performance tends to deteriorate.

In cut-off mode, both diodes are reverse-biased, therefore currents into the terminals are saturation current which is extremely small and the device is considered to be in the off-state.

To understand the forward-active operation of a bipolar transistor let us consider the simplified scheme in Fig. 1.5 where the emitter terminal is connected to ground. The base-emitter junction acts as a diode and a current flows if the junction is forward biased. In such a situation, that is $V_{B E}>0$, a current of majority carriers (holes in this case) flows from the base region across the base-emitter junction. Meantime, a current of electrons flows from the emitter across the base-emitter junction and enters the base thus diffusing towards the base-collector junction. Due to the different doping levels electrons that diffuse into the base are much more than just holes that diffuse into the emitter [S81], [MK86].


Fig. 1.5. Simplified scheme of a BJT.

If $V_{C}$ is larger than $0.2-0.3 \mathrm{~V}$, the excess of electrons in the base is subject to a negative electric field imposed by the collector voltage. When those electrons appear at the base-collector junction, they are pushed into the collector region. Since the base width, $W_{B}$, is small, electrons coming from the emitter do not have the possibility to recombine with holes in the base and almost all are pushed into the collector.

In such a situation, the small base current is mainly determined by holes while electrons coming from the emitter mainly determine the large collector current. Consequently, the emitter current is the sum of those two contributions.

The collector current, $I_{C}$, is caused by the base-emitter voltage and, as for a diode, it has an exponential relationship that is

$$
\begin{equation*}
I_{C}=A_{E} J_{C S} \exp \left(\frac{V_{B E}}{V_{T}}\right) \tag{1.17}
\end{equation*}
$$

where $A_{E}$ is the emitter area and $J_{C S}$ is a constant term that represents a current density and is inversely proportional to the base width, $W_{B}$, and its doping concentration. The product $A_{E^{\prime}} J_{C S}$ is often expressed in terms of the current scale factor $I_{C S}$.

In addition, the base current is exponentially related to the base-emitter voltage and has an expression similar to (1.17). Consequently, at a first approximation, the ratio between the collector and the base current is constant and independent of both voltages and currents. This ratio is commonly referred to as $\beta_{F}$, that is

$$
\begin{equation*}
\beta_{F}=\frac{I_{C}}{I_{B}} \tag{1.18}
\end{equation*}
$$

Due to the small amount of base current with respect to the large collector current, the value of $\beta_{F}$ is typically between 50 and 200.

The ratio between the collector current and the emitter current, $I_{L}$, is denoted with $\alpha_{F}$ and results as

$$
\begin{equation*}
\alpha_{F}=\frac{I_{C}}{I_{E}} \tag{1.19}
\end{equation*}
$$

Since $I_{E^{\prime}}=I_{C}+I_{B}$, the constant $\alpha_{F^{\prime}}$ can be expressed in terms of $\beta_{F^{\prime}}$, that is

$$
\begin{equation*}
\alpha_{F}=\frac{\beta_{F}}{\beta_{F}+1} \approx 1 \tag{1.20}
\end{equation*}
$$

which is close to unity for high values of $\beta_{F}$.
In saturation mode, the collector-emitter voltage, $V_{C E}$, approaches the value of about $0.2-0.3 \mathrm{~V}$, commonly referred to as $V_{\text {CEsat }}$, and the basecollector junction becomes forward biased at a voltage $V_{C B, o n} \approx 0.5 \mathrm{~V}$. In such a situation, holes from the base start to diffuse into the collector, and the collector current is no longer related to the base current by (1.18). Specifically, the base-emitter junction behaves like a diode whose current $I_{B}$
exponentially depends on $V_{B E}$ while the base-collector junction behaves like a voltage source whose value is set to $V_{\text {CEsat }}$.

### 1.2.2 Early Effect or Base Width Modulation

In (1.17) the collector current is independent of the collector voltage. However, this is true only at a first order approximation since the dependence in fact exists. Referring to Fig. 1.5, we note that the effective base width, $W_{B}^{\prime}$, that should be used for evaluating $J_{C S}$, is different from the designed base width, $W_{B}$, due to the presence of two depletion regions. The base-emitter depletion region is caused by a forward biasing. Therefore, it is small and almost independent of the voltage applied. In contrast, reverse biasing creates the base-collector depletion region, which, consequently, is larger and strongly depends on the voltage applied. Specifically, the collector voltage modulates the base-collector depletion region, thus decreasing and influencing the effective base width, $W_{B}^{\prime}$.

To take this effect into account, a corrective term is introduced in (1.17) that becomes [GT86], [GM93]

$$
\begin{equation*}
I_{C}=A_{E} J_{C S} \exp \left(\frac{V_{B E}}{V_{T}}\right)\left(1+\frac{V_{C E}}{V_{A}}\right) \tag{1.21}
\end{equation*}
$$

where the constant $V_{A}$ is commonly referred to as the Early Voltage and has a typical value between 50 and 100 V . In most of the applications, especially in digital circuits, this effect is negligible.

### 1.2.3 Charge Effects in the Bipolar Transistor

As described previously, depending upon the operation region, the baseemitter and base-collector junctions can be biased in forward or reverse mode. This fact impacts on the charge stored in depletion layer and, in turns, on the equivalent capacitance. In particular, when these junctions are in reverse-biased condition and assuming large changes of the voltage, their dynamic behaviors can be modeled by means of a linearized junction capacitance given by

$$
\begin{equation*}
\bar{C}_{j}=K_{e q} C_{j 0} \tag{1.22}
\end{equation*}
$$

where $C_{j 0}$ is the zero-bias value of the junction capacitance ( $C_{j 0 b e}$ and $C_{j 0 b c}$ for base-emitter and base-collector junction, respectively) and $K_{e q}$ is the correctness term, defined in (1.13), which depends on the junction grading coefficient ( $m_{b e}$ and $m_{b c}$ ) and the built-in potentials ( $\Phi_{0 b e}$ and $\Phi_{0 b c}$ ).

Observing the BJTs cross-section depicted in Fig. 1.4, it can be seen that the collector is isolated from the substrate by diode a reversed-biased which can be modeled, once again, by a parasitic nonlinear junction capacitance. It is worth noting that this capacitive effect might dominate the performance of the transistor due to the large junction area that leads to a very high capacitance, $C_{j 0 c s}$.

In the active region, the base-emitter junction is forward-biased so, like in a forward-biased diode, the stored charge is given by two contributions. The first one takes into account the charge in the small depletion region that leads to a negligible junction capacitance. The second contribution is given by the minority carrier concentrations accumulated both the base and the emitter. As in a forward-biased diode, this contribution leads to a smallsignal diffusion capacitance, $C_{d b e}$, expressed by

$$
\begin{equation*}
C_{d b e}=\tau_{T b} \frac{I_{C}}{V_{T}} \tag{1.23}
\end{equation*}
$$

where $\tau_{l b}$ is a technology parameter commonly referred to as the base-transit-time constant.

In a digital context, the small-signal diffusion capacitance should be replaced by a large-signal equivalent diffusion capacitance whose expression is given by

$$
\begin{equation*}
\bar{C}_{d b e}=\frac{\tau_{T}}{V_{2}-V_{1}} A_{D} J_{S} e^{\frac{V_{t}}{V_{L}}}=\frac{V_{T}}{V_{2}-V_{1}}\left[C_{d b e}\left(V_{2}\right)-C_{d b e}\left(V_{1}\right)\right] \tag{1.24}
\end{equation*}
$$

However, since in the active-region the voltage swing across the forwardbiased junction is very small, for manual analysis, it is convenient to use the small-signal model expressed by (1.23).

Finally, in saturation region, since both the junctions are forward-biased, both junction capacitances can be neglected and a new diffusion capacitance, $C_{d b c}$, arises due to the carriers injected from the collector. This capacitive value is higher than that given by (1.23) and greatly affects the switchingtime when a transistor must be driven out of the saturation region. This large capacitive contribution slows the BJT's transient behavior and justifies the fact that, in digital circuits, BJTs are to operate either in cut-off or in forward-active mode.

### 1.2.4 Small Signal Model

As already mentioned, bipolar transistors exhibit high performance if and only if they are switched from the cut-off to forward-active region. For this reason it is useful to define a small-signal model valid in both cases. In particular, the cut-off model will be an extension of the forward-active model assuming zero currents into the terminal.

This model is similar to the hybrid- $\pi$ model and is shown in Fig. 1.6. The most important parameter is the voltage-controlled current source, $g_{m} v_{b e}$, whose transconductance, $g_{m}$, is defined as

$$
\begin{equation*}
g_{m}=\frac{\partial I_{C}}{\partial V_{B E}}=\frac{\partial}{\partial V_{B E}} I_{C S} \exp \left(\frac{V_{B E}}{V_{T}}\right)=\frac{I_{C}}{V_{T}} \tag{1.25}
\end{equation*}
$$

The small signal resistance, $r_{b e}$, is defined as

$$
\begin{equation*}
r_{b e}=\left(\frac{\partial I_{B}}{\partial V_{B E}}\right)^{-1}=\left(\frac{1}{\beta_{F}} \frac{\partial I_{C}}{\partial V_{B E}}\right)^{-1}=\frac{\beta_{F}}{g_{m}} \tag{1.26}
\end{equation*}
$$

The resistor $r_{o}$ models the Early effect and is defined as

$$
\begin{equation*}
r_{o}=\left(\frac{\partial I_{C}}{\partial V_{C E}}\right)^{-1}=\left[\frac{\partial}{\partial V_{C E}} I_{C S} \exp \left(\frac{V_{B E}}{V_{T}}\right)\left(1+\frac{V_{C E}}{V_{A}}\right)\right]^{-1} \approx \frac{V_{A}}{I_{C}} \tag{1.27}
\end{equation*}
$$

All the above dc small-signal components are intrinsic of any BJT components since they depend on the npn structure itself.


Fig. 1.6. Small signal model for a BJT in active region.

The model in Fig. 1.6 also includes a base resistance, $r_{b}$, which comes out in real implementations. Specifically, $r_{b}$, models the resistive path that exists between the effective transistor base region (i.e. the gray area in Fig. 1.4) and the base contact (i.e. the $\mathrm{p}+$ doped region). This path presents a small ohmic resistance of a few tens or hundreds of ohm. With respect to $r_{b e}, r_{b}$ has a small value and, in low frequency operations, it can be neglected since the base-emitter voltage is practically equal to $v_{b e}$. In high-frequency circuits (i.e. in RF applications), part of the base current flows across $C_{b e}$ thus reducing the effective impedance in the base-emitter branch. Because of the presence of $r_{b}, v_{b e}$ can be significantly different from the base-emitter voltage applied thus affecting transistor properties considerably. In practice, $r_{b}$ cannot be neglected if a high-speed circuit is being analyzed or designed.

Note that there is also an ohmic resistance in series with the actual collector (whose value is lowered by the $\mathrm{n}+$ buried layer) but its presence is not as crucial as the base resistance is.

As far as the capacitive contribution is concerned, we have two main intrinsic capacitors, $C_{b e}$ and $C_{b c}$, as well as capacitor, $C_{c s}$, which exists in integrated implementations only. Specifically, capacitor $C_{b e}$, is the baseemitter capacitor and is expressed by

$$
\begin{equation*}
\bar{C}_{b e} \approx \tau_{T b} \frac{I_{C}}{V_{T}}+K_{e q} C_{j 0 b e} \tag{1.28}
\end{equation*}
$$

which reduces to $K_{e q} C_{j 0 b e}$ in cut-off region, and $C_{b c}$, which represents the base-collector capacitive contribution, is expressed by

$$
\begin{equation*}
\bar{C}_{b c}=K_{e q} C_{j 0 b c} \tag{1.29}
\end{equation*}
$$

Capacitor $C_{c s}$, comes out from the reverse-biased pn region realized by the collector-substrate junction and is modeled by the following expression

$$
\begin{equation*}
\bar{C}_{c s}=K_{e q} C_{j 0 c s} \tag{1.30}
\end{equation*}
$$

### 1.3 MOS TRANSISTORS

Currently, Metal-Oxide-Semiconductor Field-Effect Transistors (MOSFETs or simply MOS transistors) are the most commonly used components in digital integrated circuit implementations since their characteristics make them more attractive than other devices such as, for example, BJTs. Specifically, their simple realization and low cost, the
possibility of having complementary devices with similar characteristics, their small geometry and, consequently, the feasibility of integrating a large number of devices in a small area, their infinite input resistance at the gate terminal and the faculty of building digital cells with no static dissipation, all motivate the great success of MOS transistors in modern technologies.

A simplified cross section of an n-channel MOS (n-MOS) transistor is shown in Fig. 1.7. It is built on a lightly doped p-type substrate (p-) that separates two heavily doped n-type regions ( $\mathrm{n}+$ ) called source and drain. A dielectric of silicon oxide and a polysilicon gate are grown over the separation region. The region below the oxide is the transistor channel and its length, that is the length that separates the source and the drain, is the channel length, denoted by $L$. In present digital MOS technologies the channel length is typically lower than $0.18 \mu \mathrm{~m}$. In a p-channel MOS (pMOS) all the regions are complementary doped.

There is no physical difference between the source and the drain as the device is symmetric, the notations source and drain only depend on the voltage applied. In an n-MOS the source is the terminal at the lower potential while, in a p-MOS, the source is the terminal at the higher potential.


Fig. 1.7. Simplified cross section of an n- MOS transistor.

### 1.3.1 Basic Operation

To understand the basic operation of MOS transistors we shall analyze the behavior of an n-MOS depending on the voltages applied at its terminals.

If source, drain and substrate are grounded, the device works as a capacitor. Specifically, the gate and the substrate above the $\mathrm{SiO}_{2}$ interface are two plates electrically insulated by the silicon oxide.

If we apply a negative voltage to the gate, negative charges will be stored in the polysilicon while positive charges will be attracted to the channel region thus increasing the channel doping to $\mathrm{p}+$. This situation leads to an accumulated channel. Source and drain are electrically separated because they form two back-to-back diodes with the substrate. Even if we positively bias either the source or the drain, only a negligible current (the leakage current) will flow from the biased $\mathrm{n}+$ regions to the substrate.


Fig. 1.8. Cross section of an n-MOS transistor when the channel is present:
a) bidimensional, b) tridimensional.

By applying a positive voltage to the gate, positive charges will be stored in the gate. Below the silicon oxide, if the gate voltage is small, positive free charges of the p - substrate will be repelled from the surface thus depleting the channel area. A further increase in the gate voltage leads to negative free charges being attracted to the channel that thereby becomes an n region. In this condition the channel is said to be inverted.

The gate-source voltage for which the concentration of electrons under the gate equals the concentration of holes in the p - substrate far from the gate is said to be the transistor threshold voltage, $V_{T, n}$.

At a first approximation, if the gate-source voltage, $V_{G S}$, is below the threshold voltage, no current can exist between the source and the drain and the transistor is said to be in the cutoff region. In contrast, if the gate-source voltage is greater than the threshold voltage, an $n$ channel joins the drain and the source and a current can flow between these two electrically connected regions.

Actually, for gate voltages around $V_{T, n}$, the charge does not change abruptly and a small amount of current can flow even for small negative values of $V_{G S}-V_{T, n}$. This condition is termed weak inversion and the transistor is said to work in subthreshold region.

When the channel is present, as in Fig. 1.8, the accumulated negative charge is proportional to the gate source voltage and depends on the oxide thickness, $t_{o x}$, since the transistor works as a capacitor. Specifically, the charge density of electrons in the channel is given by [S81]-[MK86]

$$
\begin{equation*}
Q_{n}=C_{o x}\left(V_{G S}-V_{T, n}\right) \tag{1.31}
\end{equation*}
$$

where $C_{o x}$ is the gate capacitance per unit area defined as

$$
\begin{equation*}
C_{o x}=\frac{\varepsilon_{o x} \varepsilon_{0}}{t_{o x}} \tag{1.32}
\end{equation*}
$$

and $\varepsilon_{o x}$ is the relative permittivity of the $\mathrm{SiO}_{2}\left(\varepsilon_{o x}\right.$ is approximately 3.9$)$.
The total capacitance and the total charge are obtained by multiplying both the equations (1.31) and (1.32) by the device area, as follows

$$
\begin{align*}
& C_{g s}=W L C_{o x}  \tag{1.33a}\\
& Q_{T-n}=W L C_{o x}\left(V_{G S}-V_{T, n}\right) \tag{1.33b}
\end{align*}
$$

where $W$ is the channel width of the MOS transistor as depicted in Fig. 1.8.

### 1.3.2 Triode or Linear Region

Increasing the drain voltage, $V_{D}$, causes a current to flow from the drain to the source through the channel. A drain voltage different from zero will modify the charge density but for small $V_{D}$ the channel charge will not
change appreciably and can be expressed by (1.31) again. Under this condition, the device operates as a resistor of length $L$, width $W$ with a permittivity proportional to $Q_{n}$. Therefore, the relationship between voltage $V_{D S}$ and the drain-source current, $I_{D}$, can be written as [LS94]

$$
\begin{equation*}
I_{D}=\mu_{n} Q_{n} \frac{W}{L} V_{D S} \tag{1.34}
\end{equation*}
$$

where $\mu_{n}$ is the mobility of electrons near the silicon surface.


Fig. 1.9. MOS transistor channel for large $V_{D}$.

Substituting (1.31) in (1.34) yields

$$
\begin{equation*}
I_{D}=\mu_{n} C_{o x} \frac{W}{L}\left(V_{G S}-V_{T, n}\right) V_{D S} \tag{1.35}
\end{equation*}
$$

Larger drain voltages modify the charge density profile in the channel. Specifically, referring to Fig. 1.9, we can express the channel charge density as a function of channel length. For $x=0$, that is, close to the source, (1.31) holds, while for $x=L$, that is, close to the drain, we have

$$
\begin{equation*}
Q_{n}(L)=C_{o x}\left(V_{G D}-V_{T, n}\right) \tag{1.36}
\end{equation*}
$$

Assuming a linear profile, the charge density has the following expression

$$
\begin{equation*}
Q_{n}(x)=\frac{Q_{n}(L)-Q_{n}(0)}{L} x+Q_{n}(0) \tag{1.37}
\end{equation*}
$$

The current can be expressed in a form similar to (1.34) but with a different charge expression. If the charge density profile is linear, the average charge density can be used instead. The average charge density results in

$$
\begin{equation*}
\bar{Q}_{n}=\frac{Q_{n}(L)+Q_{n}(0)}{2}=C_{o x}\left(V_{G S}-V_{T, n}-\frac{V_{D S}}{2}\right) \tag{1.38}
\end{equation*}
$$

and substituting this value in (1.34) leads to

$$
\begin{equation*}
I_{D}=\mu_{n} \bar{Q}_{n} \frac{W}{L} V_{D S}=\mu_{n} C_{o x} \frac{W}{L}\left(V_{G S}-V_{T, n}-\frac{V_{D S}}{2}\right) V_{D S} \tag{1.39}
\end{equation*}
$$

The current $I_{D}$ is linearly related to $V_{G S}$ and has a quadratic dependence on $V_{D S}$. Under this condition the device is said to operate in triode or linear region. Note also that (1.39) is reduced to (1.35) for small values of $V_{D S}$.


Fig. 1.10. MOS transistor channel for $V_{D G}>V_{T, n}$.

### 1.3.3 Saturation or Active Region

A further increase of $V_{D}$, can lead to the condition of a gate-drain voltage equal to $V_{T, n}$. In this case the charge density close to the drain, $Q_{n}(L)$, becomes zero and current $I_{D}$ reaches its maximum value. This condition is shown in Fig. 1.10.

At a first approximation, the current does not change over this point with $V_{D S}$ since the charge concentration in the channel remains constant and the electron carriers are velocity saturated. Under this condition the transistor is said to work in saturation or active region.

Denoting $V_{D S s a t}$ as the drain source voltage when the charge density $Q_{n}(L)$ becomes zero, we can find an equivalent relationship that expresses the pinch-off condition by substituting $V_{D G}=V_{D S}-V_{G S}$ into $V_{D G}>V_{T n}$. Specifically, we get

$$
\begin{equation*}
V_{D S}>V_{D S s a t} \tag{1.40}
\end{equation*}
$$

where

$$
\begin{equation*}
V_{D S s a t}=V_{G S}-V_{T, n} \tag{1.41}
\end{equation*}
$$

Substituting the value $V_{D S}=V_{D S s a t}$ defined in (1.41) into (1.39) gives the current expression in the pinch-off case and results as

$$
\begin{equation*}
I_{D}=\frac{\mu_{n} C_{o x}}{2} \frac{W}{L}\left(V_{G S}-V_{T, n}\right)^{2} \tag{1.42}
\end{equation*}
$$

As mentioned above, (1.42) is valid at a first approximation. In fact, increasing $V_{D}$ yields an increase in the pinch-off region as well as a decrease in channel length. This effect is commonly known as channel length modulation. To take this effect into account, a corrective term is used to complete (1.42) which becomes

$$
\begin{equation*}
I_{D}=\frac{\mu_{n} C_{o x}}{2} \frac{W}{L}\left(V_{G S}-V_{T, n}\right)^{2}\left[1+\lambda\left(V_{D S}-V_{D S s a t}\right)\right] \tag{1.43}
\end{equation*}
$$

The parameter $\lambda$ is referred to as the channel length modulation factor and, at a first approximation, it is inversely proportional to the channel length, $L$.

### 1.3.4 Body Effect

All the equations derived above were based on the assumption that the source and the substrate (or the bulk) were connected together. Although this is a rather common condition, in general the voltage of these two terminals can be different. In this event a second order effect occurs commonly
referred to as the body effect [GM93]. A different voltage between the source and the bulk is modeled as an increase in the threshold voltage, which assumes the following expression [GM93], [LS94], [JM97]

$$
\begin{equation*}
V_{T, n}=V_{T, n 0}+\gamma\left(\sqrt{V_{S B}+2\left|\phi_{F^{\prime}}\right|}-\sqrt{2\left|\phi_{F^{\prime}}\right|}\right) \tag{1.44}
\end{equation*}
$$

with $V_{S B}$ being the source-bulk voltage, $V_{T, n 0}$ the threshold voltage with zero $V_{S B}, \phi_{\mathrm{F}}$ the Fermi potential of the substrate and $\gamma$ a constant referred to as the body-effect constant. The Fermi potential is defined as [S81]

$$
\begin{equation*}
\left|\phi_{F}\right|=\frac{k T}{q} \ln \left(\frac{N_{A}}{n_{i}}\right) \tag{1.45}
\end{equation*}
$$

while the value of $\gamma$ depends on the substrate doping concentration as follows [S81]

$$
\begin{equation*}
\gamma=\frac{\sqrt{2 q N_{A} \varepsilon_{s i}}}{C_{o x}} \tag{1.46}
\end{equation*}
$$

### 1.3.5 p-channel Transistors

For a p-channel transistor we can use the same equations derived in the previous sections, provided that a negative sign is placed in front of every voltage variable.

Therefore, $V_{G S}$ becomes $V_{S G}, V_{D S}$ becomes $V_{S D}, V_{\text {In }}$ becomes $-V_{T, p}$, and so on. Note that in a p-MOS transistor the threshold voltage is negative. The condition for a p-MOS to be in saturation region is now $V_{S D}>V_{S G}+V_{T, p}$. Current equations (1.39) and (1.43) still hold but the current now flows from the source to the drain.

### 1.3.6 Charge Effects in Saturation Region

Charge effects of a MOS transistor in saturation region include several capacitive effects, each of them with its own physical meaning that can be understood by analyzing the detailed n-MOS cross section in Fig. 1.11.

The most important capacitor is the gate-source capacitor whose value is given by two different terms. The first term takes into account the capacitive effect between the gate and the channel, which is electrically connected to the source. At a first approximation, the corresponding capacitor, $C_{g_{-} c h}$, is a
linear capacitor that depends on the oxide thickness as well as on the device area. It can be demonstrated that its value is approximately given by [R96], [LS94]

$$
\begin{equation*}
C_{g_{-} c h} \approx \frac{2}{3} W L C_{o x} \tag{1.47}
\end{equation*}
$$



Fig. 1.11. Detailed n-MOS cross-section.

The second term that contributes to the gate-source capacitance is given by the overlap that exists between the gate and the source $\mathrm{n}+$ region. This overlap is unavoidable and results from the fact that during the fabrication process the doping element also spreads horizontally. Naming $L_{D}$ the overlap diffusion length, the resulting parasitic capacitor, $C_{g s_{-} v}$, is given by

$$
\begin{equation*}
C_{g s_{-} o v}=W L_{D} C_{o x} \tag{1.48}
\end{equation*}
$$

Hence, the capacitor $C_{g s}$ is expressed by the sum of (1.47) and (1.48), that is

$$
\begin{equation*}
C_{g s} \approx W\left(\frac{2}{3} L+L_{\nu}\right) C_{o x} \tag{1.49}
\end{equation*}
$$

The same boundary effect that determines the gate-source overlap capacitance yields the gate-drain capacitance that is given by

$$
\begin{equation*}
C_{g d}=C_{g d \_o v}=W L_{D} C_{o x} \tag{1.50}
\end{equation*}
$$

The second largest capacitor is the source-bulk capacitor, which can be split into three contributions all of them given by the depletion capacitances of reverse biased pn junctions. The first, $C_{s b}$, takes into account the junction capacitance between the $\mathrm{n}+$ source area and the bulk. Assuming that voltages move rapidly over a wide range, its expression is similar to (1.12) that is

$$
\begin{equation*}
\bar{C}_{s b}=K_{e q}^{I} W L_{X} C_{j 0, s b} \tag{1.51}
\end{equation*}
$$

where $K_{e q}^{I}$ is the correctness term, defined in (1.13), and $C_{j 0, s b}$ is defined as the zero-voltage source-bulk junction capacitance per unit area.
The second contribution is responsible for $C_{j c h-b}$ and takes into account the depletion region between the channel and the bulk. Even in this case we have an expression similar to (1.51) that is

$$
\begin{equation*}
\bar{C}_{j c h-b}=K_{e q}^{I} W L C_{j 0, s b} \tag{1.52}
\end{equation*}
$$

where $K_{e q}^{I}$ is again the correctness which at a first approximation, is equal to that given in (1.51).

The third term is referred to as the source-bulk sidewall capacitance and is denoted as $C_{s b, s w}$. This capacitance is due to the presence of a highly $\mathrm{p}+$ doped region (field implant) that exists under the thick field oxide (FOX) and prevents the leakage current from flowing between two adjacent transistors. The value of $C_{s b, s w}$ can be particularly large if the field implant is heavily doped as in modern technologies. The expression of $C_{s b, s w}$ is then

$$
\begin{equation*}
\bar{C}_{s b, s w}=K_{e q}^{1}\left(W+2 L_{X}\right) C_{j 0, s b_{-} s w} \tag{1.53}
\end{equation*}
$$

where the term $\left(W+2 L_{X}\right)$ is the perimeter of the source junction, excluding the side adjacent to the channel, and $C_{j 0, s b_{-} s w}$ is the zero-voltage junction capacitance per unit length.

Consequently, the source-bulk capacitance is given by the sum of (1.51), (1.52) and (1.53), that is

$$
\begin{equation*}
\bar{C}_{j s b}=K_{e q}^{I}\left[W\left(L+L_{X}\right) C_{j 0, s b}+\left(W+2 L_{X}\right) C_{j 0, s b_{\_} s w}\right] \tag{1.54}
\end{equation*}
$$

The fourth capacitor in the model in Fig. 1.11 is the drain-bulk capacitor, $C_{d b}$. This is similar to the source-bulk capacitance except for the fact that the channel does not make any contribution. Therefore equations similar to (1.51) and (1.53) can be written as follows

$$
\begin{align*}
& \bar{C}_{d b}=K_{e q}^{I I} W L_{X} C_{j 0, d b}  \tag{1.55a}\\
& \bar{C}_{d b, s w}=K_{e q}^{I I}\left(W+2 L_{X}\right) C_{j 0, d b \_s w} \tag{1.55b}
\end{align*}
$$

and the drain-bulk capacitance results as

$$
\begin{equation*}
\bar{C}_{d b}=K_{e q}^{I I}\left[W L_{X} C_{j 0, d h}+\left(W+2 L_{X}\right) C_{j 0, d h_{\_s w}}\right] \tag{1.56}
\end{equation*}
$$

### 1.3.7 Charge Effects in Triode Region

Charge effects in triode region are not easy to determine because the channel is directly connected to both the source and drain resulting in a distributed RC network over the whole length of the device. Moreover, because of the capacitive nature of junction capacitances, the capacitive elements are highly non-linear. This is another factor making the model quite complicated for management by hand analysis.

A simplified model, which is quite accurate for small $V_{D S}$ and for longchannel devices, can be obtained by evaluating the total channel charge contribution and by assuming half of this contribution to be referred to the source and half to the drain [LS94].

Specifically, since the total gate-channel capacitance, is given by

$$
\begin{equation*}
C_{g_{-} c h}=W L C_{o x} \tag{1.57}
\end{equation*}
$$

gate-source and gate-drain capacitances, including the overlap contribution, can be modeled as

$$
\begin{equation*}
C_{g s}=C_{g d}=W\left(\frac{L}{2}+L_{D}\right) C_{o x} \tag{1.58}
\end{equation*}
$$

Unfortunately, the evaluation of $C_{g d}$ in the linear region cannot be carried out in the same manner as for long-channel devices, since the decomposition of channel capacitance into gate-source and gate-drain capacitances no longer applies in submicron technologies [CH99]. As a result, a more accurate capacitance model, is given by

$$
\begin{equation*}
C_{g d}=W\left(\frac{3}{4} A_{\mathrm{bulk}, \max } L+L_{D}\right) C_{o x} \tag{1.59}
\end{equation*}
$$

where $A_{\text {bulk,max }}$ is a BSIM3v3 parameter slightly greater than unity, as will be shown in Section 6.2.1. Obviously, the gate-source capacitance contribution is evaluated as

$$
\begin{equation*}
C_{g s}=W\left(\frac{4-3 A_{\text {bulk,max }}}{4} L+L_{D}\right) C_{o x} \tag{1.60}
\end{equation*}
$$

so that the sum of $C_{g d}$ and $C_{g s}$ equals the total gate capacitance, $W\left(L+2 L_{D}\right) C_{o x}$.

Also, the channel-bulk contribution is shared between the source and the drain and capacitances $C_{s b}$ and $C_{d b}$ become

$$
\begin{align*}
& \bar{C}_{s b}=K_{e q}^{I}\left[W\left(L_{X}+\frac{L}{2}\right) C_{j 0, s b}+\left(W+2 L_{X}\right) C_{j 0, s b_{-} s w}\right]  \tag{1.61a}\\
& \bar{C}_{a b}=K_{e q}^{I I}\left[W\left(L_{X}+\frac{L}{2}\right) C_{j 0, s b}+\left(W+2 L_{X}\right) C_{j 0, s b_{-} s w}\right] \tag{1.61b}
\end{align*}
$$

### 1.3.8 Charge Effects in Cutoff Region

In the cutoff region, since the channel is not present, both $C_{g d}$ and $C_{g s}$ are due only to the overlap contribution, that is

$$
\begin{equation*}
C_{g s}=C_{g d}=W L_{D} C_{o x} \tag{1.62}
\end{equation*}
$$

Source-bulk and drain-bulk capacitances are similar to those given in (1.61) with the difference that the channel does not make any contribution, that is

$$
\begin{align*}
& \bar{C}_{s b}=K_{e q}^{I}\left[W L_{X} C_{j 0, s b}+\left(W+2 L_{X}\right) C_{j 0, s b_{-s w}}\right]  \tag{1.63a}\\
& \bar{C}_{d b}=K_{e q}^{I I}\left[W L_{X} C_{j 0, s b}+\left(W+2 L_{X}\right) C_{j 0, s b_{-s w}}\right] \tag{1.63b}
\end{align*}
$$

The fact that no channel exists, generates a new capacitor, $C_{g b}$, which connects the gate and the bulk. Its value is given by the oxide capacitance multiplied by the device area, that is

$$
\begin{equation*}
C_{g b}=W L C_{o x} \tag{1.64}
\end{equation*}
$$

### 1.3.9 Small Signal Model

Although in almost all digital circuits signals move in wide voltage ranges, there are several digital configurations (e.g. the Source Coupled Logic family) where the MOS device is biased in saturation region and a small signal is applied and processed as a digital value. In these cases it may be useful to consider a low frequency small signal model.

### 1.3.9.1 Saturation region

The low-frequency small signal model for a MOS transistor operating in the active region is shown in Fig. 1.12.


Fig. 1.12. Low-Frequency small signal model for a MOS transistor in active region.

The most important small signal component is the dependent current generator, $g_{m} v_{g s}$, whose transconductance, $g_{m}$, is defined as

$$
\begin{equation*}
g_{m}=\frac{\partial I_{D}}{\partial V_{G S}}=\mu_{n} C_{o x} \frac{W}{L}\left(V_{G S}-V_{T, n}\right) \tag{1.65}
\end{equation*}
$$

Solving (1.42) with respect to $V_{G S^{-}} V_{T, n}$ and substituting the result to (1.65), leads to the well-known expression for the transconductance

$$
\begin{equation*}
g_{m}=\sqrt{2 \mu_{n} C_{o x} \frac{W}{L} I_{D}} \tag{1.66}
\end{equation*}
$$

The second dependent current source, $g_{m b} v_{s b}$, accounts for the body effect and its transconductance is defined as

$$
\begin{equation*}
g_{m b}=-\frac{\partial I_{D}}{\partial V_{S B}}=-\frac{\partial I_{D}}{\partial V_{T, n}} \frac{\partial V_{T, n}}{\partial V_{S B}} \tag{1.67}
\end{equation*}
$$

The first derivative in (1.67) results as

$$
\begin{equation*}
\frac{\partial I_{D}}{\partial V_{T, n}}=-\mu_{n} C_{o x} \frac{W}{L}\left(V_{G S}-V_{T, n}\right)=-g_{m} \tag{1.68}
\end{equation*}
$$

while the second one comes out by deriving (1.44) with respect to $V_{S B}$, thus yielding

$$
\begin{equation*}
\frac{\partial V_{T, n}}{\partial V_{S B}}=\frac{\partial}{\partial V_{S B}}\left[V_{T, n 0}+\gamma\left(\sqrt{V_{S B}+2\left|\phi_{F}\right|}-\sqrt{2\left|\phi_{F}\right|}\right)\right]=\frac{\gamma}{2 \sqrt{V_{S B}+2\left|\phi_{F}\right|}} \tag{1.69}
\end{equation*}
$$

Therefore, substituting (1.68) and (1.69) in (1.67) we get

$$
\begin{equation*}
g_{m b}=\frac{\gamma g_{m}}{2 \sqrt{V_{S B}+2\left|\phi_{F}\right|}} \tag{1.70}
\end{equation*}
$$

Note that this value is nonzero even if the quiescent value of $V_{S B}$ equals zero. Specifically, the body effect arises only if a small signal, $v_{s b}$, is present between the source and the bulk terminals. In general $g_{m b}$ is $0.1-0.2$ times $g_{m}$ and can be neglected in a non-detailed analysis.

The last model parameter is the resistor $r_{d}$, which takes into account the channel length modulation or, which is the same, the dependence of the drain current on $V_{D S}$. It is related to the large signal equations by

$$
\begin{equation*}
\frac{1}{r_{d}}=\frac{\partial I_{D}}{\partial V_{D S}} \tag{1.71}
\end{equation*}
$$

Substituting in (1.71) the current expression in (1.43) results as

$$
\begin{gather*}
\frac{1}{r_{d}}=\frac{\partial I_{D}}{\partial V_{D S}}=\frac{\partial}{\partial V_{D S}} \frac{\mu_{n} C_{o x}}{2} \frac{W}{L}\left(V_{G S}-V_{T, n}\right)^{2}\left[1+\lambda\left(V_{D S}-V_{D S s a t}\right)\right]  \tag{1.72}\\
=\lambda \frac{\mu_{n} C_{o x}}{2} \frac{W}{L}\left(V_{G S}-V_{T, n}\right)^{2} \approx \lambda I_{D}
\end{gather*}
$$

and finally

$$
\begin{equation*}
r_{d} \approx \frac{1}{\lambda I_{D}} \tag{1.73}
\end{equation*}
$$

### 1.3.9.2 Triode region

The low frequency small signal model for a MOS in triode region is a resistor whose value can be determined by deriving (1.39) with respect to $V_{D S}$, that is

$$
\begin{equation*}
\frac{1}{r_{d}}=\frac{\partial I_{D}}{\partial V_{D S}}=\mu_{n} C_{o x} \frac{W}{L}\left(V_{G S}-V_{T, n}-V_{D S}\right) \tag{1.74}
\end{equation*}
$$

If $V_{D S}$ is small (1.74) is often approximated by

$$
\begin{equation*}
r_{d} \approx \frac{1}{\mu_{n} C_{o x} \frac{W}{L}\left(V_{G S}-V_{T, n}\right)} \tag{1.75}
\end{equation*}
$$

### 1.3.9.3 Cut-off region

In the cutoff region the resistance $r_{d}$ is assumed to be infinite so the equivalent model is purely capacitive.

### 1.3.10 Second Order Effects in MOSFET Modeling

The main second order effects that should be taken into account when determining a MOS large signal model are reported in this section. Their effects are always present and are especially prominent in short-channel devices.

In the following we shall neglect the subscript $n$, which referred to $n$ MOS transistors.

### 1.3.10.1 Channel length reduction due to overlap

Referring to Fig. 1.11, we see that designed channel, $L$, is reduced due to the overlap. Assuming a symmetric device with equal overlap, $L_{D}$, at both
the source and the drain, the amount of reduction is equal to $2 L_{D}$, that is, the effective channel length, $L_{e f f}$, is equal to

$$
\begin{equation*}
L_{e f f}=L-2 L_{D} \tag{1.76}
\end{equation*}
$$



Fig. 1.13. Channel length modulation.

Obviously, the influence of the overlap is greater in short channel devices as it strongly affects the real channel. As a consequence, in all the previous equations, (1.76) should be used for the channel length.

A similar equation holds for the width, $W$, as well ( $W_{\text {eff }}=W-2 W_{D}$ ).

### 1.3.10.2 Channel length modulation

The channel length modulation was discussed in previous sections and was modeled by the channel length modulation factor, $\lambda$, in (1.43). In this model, the pinch-off point was assumed to be close to the drain end.

A more effective modeling would take into consideration the fact that, in practice, the pinch-off point moves towards the source as $V_{D S}$ increases due to the variation in the drain depletion region. As a consequence, the effective channel length, $L_{\text {eff, }}$ is further reduced as shown in Fig. 1.13. Defining $L_{\text {pinch }}$ as the distance between the drain end and the pinch-off point we get

$$
\begin{equation*}
L_{e f f}=L-2 L_{D}-L_{\text {pinch }} \tag{1.77}
\end{equation*}
$$

The value of $L_{\text {pinch }}$ is a function of $V_{D S}, V_{D S s a t}$ and the doping concentration of the channel. Substituting, for example, (1.77) in (1.42) we observe that, due to a shorter channel, the drain current increases with $V_{D S}$.

Obviously, this effect is particularly evident in short-channel devices [JM97].

### 1.3.10.3 Mobility reduction due to vertical electric field

As known, the mobility, $\mu$, relates the electrical field, $E$, to the drift velocity of carriers, $v_{d}$, as [S81], [MK86]

$$
\begin{equation*}
v_{d}=\mu E \tag{1.78}
\end{equation*}
$$

In our previous model we assumed the mobility to be a constant. Actually the value of this parameter depends on several physical factors, the most important of which is related to the carrier-scattering mechanisms.

The carrier scattering in the channel is greatly influenced by the vertical electric field induced by the gate voltage. Consequently mobility changes with $V_{G S}$. A semi-empirical equation used to model the mobility reduction due to vertical fields in NMOS transistors is [GM93]

$$
\begin{equation*}
\mu_{s}=\frac{\mu_{0}}{1+\vartheta \vartheta\left(V_{G S}-V_{T, n}\right)} \tag{1.7}
\end{equation*}
$$

where $\mu_{s}$ is now the new mobility (or better the new surface mobility), $\mu_{0}$ is the mobility in the case of low fields and $\vartheta$ is the mobility degradation factor whose value can be related to oxide thickness as $2.3 / t_{o x}$ where $t_{o x}$ is expressed in nm [GM93]. An analogous expression holds for PMOS transistors.

It can be shown that this effect can be modeled as a series resistance, $R_{S}$, in the source of the MOS where

$$
\begin{equation*}
R_{S} \approx \frac{\vartheta}{\mu_{0} C_{o x} W / L_{e f f}} \tag{1.80}
\end{equation*}
$$

### 1.3.10.4 Mobility reduction due to lateral electric field

Mobility is further reduced due to the high lateral electric field. Since, at a first approximation, the electric field is proportional to $V_{D S} / L_{\text {eff }}$, this effect is more pronounced in short-channel devices.

The linear relationship (1.78), which relates the drift velocity to the electric field, no longer holds for high fields because the mobility strongly depends on the field itself and decreases as the field increases. Specifically, at high electric fields, the drift velocity of carriers deviates from the linear
dependency in (1.78) and even saturates. To account for this physical phenomenon, the mobility, $\mu_{s}$, in (1.79) is corrected as follows [GM93]

$$
\begin{equation*}
\mu_{e f f}=\frac{\mu_{s}}{1+\frac{\mu_{s}}{v_{\max }} \frac{V_{D S}}{L_{e f f}}} \tag{1.81}
\end{equation*}
$$

where $\mu_{e f f}$ is the effective mobility, $V_{D S} / L_{e f f}$ represents the lateral field and the term $v_{\text {max }}$ is the maximum drift velocity of the carriers. A typical value of $v_{\max }$ is in the order of a $10^{5} \mathrm{~m} / \mathrm{s}$.

This velocity limitation can be responsible for the saturation in MOS transistors since a MOS can enter the active region before $V_{D S}$ reaches the value of $V_{G S}-V_{T, n}$. Consequently, (1.41) must be adjusted to account for the carrier saturation velocity.

### 1.3.10.5 Drain Induced Barrier Lowering (DIBL)

This effect is due to the strong lateral electric field and affects the threshold voltage. The principal model assumes the channel is created by the gate voltage only. Actually, a strong lateral field from the drain can also help to attract electrons towards the surface. Strictly speaking, the drain voltage influences the surface charge and helps the gate voltage to form the channel. This effect is modeled with a reduction in the threshold voltage (that is, a barrier lowering) and is also modeled by modifying (1.44) as [GM93], [JM97]

$$
\begin{equation*}
V_{T, n}=V_{T, n 0}+\gamma\left(\sqrt{V_{S B}+2\left|\phi_{F}\right|}-\sqrt{2 \mid \phi_{F}}\right)-\sigma_{D} V_{D S} \tag{1.82}
\end{equation*}
$$

where $\sigma_{D}$ is a corrective factor responsible for the dependence of the threshold voltage on $V_{D S}$. Also the DIBL is more pronounced in shortchannel devices and its main effect is a further reduction in the output resistance.

### 1.3.10.6 Threshold voltage dependency on transistor dimensions

As transistor dimensions are reduced, the fringing field at border edges can also affect the threshold voltage [JM97]. Referring to Fig. 1.14 and without entering into a detailed physical explanation, applying a voltage $V_{G}$ to the gate creates a channel. However, due to border effects, only charges in the darker trapezoidal area are linked to the gate voltage. The threshold voltage definition in (1.44) refers all the charges in the rectangular area
below the silicon to the gate voltage, as in Fig. 1.8. Since the threshold voltage depends on the channel charge linked to the gate voltage, it is apparent that the previous model overestimates the value of $V_{T, n}$. This border effect is not critical in long-channel devices, but in a short-channel transistor it can be significant.

To model this phenomenon, the threshold voltage in (1.44) is modified in

$$
\begin{equation*}
V_{T, n}=V_{T, n 0}-\gamma \sqrt{2\left|\phi_{F}\right|}+F_{s} \gamma \sqrt{V_{S B}+2\left|\phi_{F}\right|} \tag{1.83}
\end{equation*}
$$

where $F_{s}$ is a corrective factor that represents the ratio between the trapezoidal and the rectangular areas used to model the channel. As a consequence, $V_{T}$ is less than its original value in (1.44).


Fig. 1.14. Border effects in MOS transistors.

In a similar way, the threshold voltage depends on transistor width if this dimension becomes comparable to the edge effect regions, that is, in narrowchannel (i.e., with short width) devices.

In this case, after applying a voltage to the gate, border effects deplete a wider region thus increasing the threshold voltage. This effect is modeled by adding the term $F_{n}\left(V_{S B}+2\left|\phi_{F}\right|\right)$ to the original $V_{T, n}$ in (1.44). $F_{n}$ is a corrective factor that approaches zero in the case of wide channels.

Taking into account (1.82) and (1.83) the final form for the threshold voltage becomes

$$
\begin{equation*}
V_{T, n}=V_{T, n 0}-\gamma \sqrt{2\left|\phi_{F}\right|}+F_{s} \gamma \sqrt{V_{S B}+2\left|\phi_{F}\right|}-\sigma_{D} V_{D S}+F_{n}\left(V_{S B}+2\left|\phi_{F}\right|\right) \tag{1.84}
\end{equation*}
$$

### 1.3.10.7 Hot carrier effects

High lateral electric fields can generate high velocity carriers also called hot carriers. In short-channel devices, due to their high velocity, electronhole pairs can be generated in the channel by impact ionization and avalanching. As a consequence, in n-MOS, a current of holes can flow from the drain to the substrate.

Moreover, some hot carriers with enough energy can tunnel the gate oxide thus causing either a dc gate current or, if trapped in the oxide, a threshold voltage alteration. This latter phenomenon can drastically limit the long-term reliability of MOS transistors.

A further hot carrier effect is the so-called punch-through. It happens when the depletion regions of source and drain are so close each other that hot carriers with enough energy can overcome the short-channel region thus causing a current that is no longer limited by the drift equations. It is as if the channel were no longer present in the device and both source and drain areas were connected together. This phenomenon is limited by increasing the substrate doping which consequently limits the depletion region extensions. This effect not only lowers drain impedance but can also cause transistor breakdown.

## Chapter 2

## CURRENT-MODE DIGITAL CIRCUITS

In this chapter, operation of Current-Mode logic gates implemented in both bipolar and CMOS technology is addressed. Their static behavior is analytically modeled, and topology of the most often used gates is introduced. Finally, typical applications of Current-Mode gates are discussed.

### 2.1 THE BIPOLAR CURRENT-MODE INVERTER: BASIC PRINCIPLES

The bipolar Current-Mode Logic (CML) gates are based on the emitter coupled pair of npn transistors biased by a constant current source $I_{S S}$. In particular, let us consider the CML inverter gate shown in Fig. 2.1. Its input voltages $v_{i 1}$ and $v_{i 2}$ are applied to the base of transistors Q1-Q2, while output voltages $v_{o 1}$ and $v_{02}$ are taken from their collector nodes. Transistors Q3-Q4 implement a simple current mirror that provides the current source $I_{S S}$ to the emitter coupled pair Q1-Q2, while resistors $R_{E 3}$ and $R_{E 4}$ are introduced to improve matching between transistors Q3 and Q4 of the current mirror and are typically designed so that their voltage drop is around 100 mV [GM93]. For the sake of simplicity, the current mirror will be represented as the ideal current source $I_{S S}$ from now on. It is worth noting that a negative supply voltage $-V_{D D}$ is used in Fig. 2.1 for reasons related to noise immunity, as will be discussed in Section 2.2.4.

By assuming transistors Q1-Q2 to be matched and to operate in the linear region, as well as neglecting the Early effect, the ratio of their collector currents $i_{C 1}$ and $i_{C 2}$ can be expressed as a function of the differential input voltage, $v_{d}-v_{i 1}-v_{i 2}$, according to [SS91], [GM93]

$$
\begin{equation*}
\frac{i_{C 1}}{i_{C 2}} \cong e^{\frac{v_{d}}{V_{T}}} \tag{2.1}
\end{equation*}
$$



Fig. 2.1. CML inverter.

Some numerical values of (2.1) are reported in Table 2.1. It is worth noting that assuming transistors Q1-Q2 to work in the linear region is always correct in practical cases, as their switching speed is unacceptably reduced when operating in the saturation region [R96], because the transistor base charge that must be provided or extracted during the switching strongly increases in the saturation region.

By inspecting Fig. 2.1 and neglecting the base current, we can write

$$
\begin{equation*}
i_{C 1}+i_{C 2} \cong I_{S S} \tag{2.2}
\end{equation*}
$$

Hence, the bias current is almost completely steered to one of the two transistors for differential voltages $v_{d}$ in the order of some $V_{T}$, e.g. about 100 mV at room temperature. Therefore, emitter coupled pair Q1-Q2 efficiently implements a voltage-controlled current switch that steers bias current $I_{S S}$ according to the input logic value. Load resistors $R_{C}$ perform a current-tovoltage conversion and generate the complementary output voltages $v_{o 1}$ and $v_{o 2}$.

TABLE 2.1

| $v_{d} / V_{T}$ | $i_{C 1} / i_{C 2}$ | $v_{d} / V_{T}$ | $i_{C 1} / i_{C 2}$ |
| :---: | :---: | :---: | :---: |
| 1 | 2.7 | -1 | $(2.7)^{-1}$ |
| 2 | 7.4 | -2 | $(7.4)^{-1}$ |
| 3 | 20.1 | -3 | $(20.1)^{-1}$ |
| 4 | 54.6 | -4 | $(54.6)^{-1}$ |
| 5 | 148.4 | 5 | $(148.4)^{-1}$ |

An in depth analysis of bipolar Current-Mode gates, which includes the static behavior and the noise immunity, is developed in the next section.

### 2.2 THE BIPOLAR CURRENT-MODE INVERTER: INPUTOUTPUT CHARACTERISTICS AND NOISE MARGIN

In CML gates, input and output voltages can be differential or singleended. Their static behavior is discussed in the following subsections in terms of logic swing $V_{S W I N G}$, small-signal voltage gain $A_{V}$ and noise margin $N M$. Analysis is carried out by assuming the current provided to the driven gates to be negligible, as will be justified in Subsection 2.2.3.

### 2.2.1 Differential input/output

When differential signaling is assumed, the input and output voltage of the CML inverter gate are defined as the difference

$$
\begin{align*}
& v_{i} \cong v_{i 1}-v_{i 2}  \tag{2.3a}\\
& v_{o}=v_{o 1}-v_{o 2}=-R_{C}\left(i_{C 1}-i_{C 2}\right) \tag{2.3b}
\end{align*}
$$

From relationships (2.1) and (2.2), currents $i_{C 1}$ and $i_{C 2}$ are easily found to be equal to [GM94]

$$
\begin{align*}
& i_{C 1}=I_{S S} \frac{e^{\frac{v_{i}}{V_{T}}}}{1+e^{\frac{v_{i}}{V_{T}}}}  \tag{2.4a}\\
& i_{C 2}=I_{S S} \frac{1}{1+e^{\frac{v_{i}}{V_{T}}}} \tag{2.4b}
\end{align*}
$$

and substituting them into output voltage (2.3b) we get

$$
\begin{equation*}
v_{O}=-R_{C} I_{S S} \frac{e^{\frac{v_{i}}{V_{T}}}-1}{1+e^{\frac{v_{i}}{V_{T}}}} \tag{2.5}
\end{equation*}
$$

whose inspection reveals that the logic threshold voltage $V_{L T}$ (i.e. the input voltage such that $v_{o}=v_{i}$ ) is equal to

$$
\begin{equation*}
V_{L T}=0 \tag{2.6}
\end{equation*}
$$

and the output transfer characteristics (2.5) is symmetric with respect to it. As an example, plot of relationship (2.5) versus $v_{i}$ at room temperature (i.e., by assuming $V_{T} \approx 25 \mathrm{mV}$ ) and for $R_{C} l_{S S}$ equal to 0.25 V is reported in Fig. 2.2.

It is simple to verify that the minimum output voltage $V_{O L}$ achieved when input is high (i.e., $v_{i} \gg V_{T}$ and current $I_{S S}$ is steered to the left-hand output node) is equal to

$$
\begin{equation*}
V_{O L}=-R_{C} I_{S S} \tag{2.7}
\end{equation*}
$$

while the maximum output voltage $V_{O H}$ obtained when input is low (i.e., $v_{i} \ll-V_{T}$ and current $I_{S S}$ is steered to the right-hand output node) is

$$
\begin{equation*}
V_{O H}=R_{C} I_{S S} \tag{2.8}
\end{equation*}
$$

Thus resulting logic swing results in

$$
\begin{equation*}
V_{S W I N G}=V_{O H}-V_{O L}=2 R_{C} I_{S S} \tag{2.9}
\end{equation*}
$$



Fig. 2.2. Transfer characteristics of a CML inverter with differential input/output for $R_{C} I_{S S}=0.25 \mathrm{~V}$.

Regarding the small-signal voltage gain $A_{V}$ around the logic threshold, it is evaluated by linearizing the circuit in Fig. 2.1 around the bias point $v_{i}=0$, around which bias currents of transistors Q1 and Q2 are both equal to $I_{\mathrm{ss}} / 2$ due to the symmetry, thus their small-signal transconductance $g_{m}$ is equal to $I_{s S} / 2 V_{t}$, from relationship (1.25). Moreover, the small-signal circuit obtained by substituting the linearized transistor model in Fig. 1.6 into Fig. 2.1 can be simplified into a common emitter stage consisting of transistor Q1 (Q2) with a load resistor $R_{C}$. This is because the half-circuit concept holds for the circuit in Fig. 2.1, being the circuit symmetric and the input differential [SS91]. As a consequence, magnitude $A_{V}$ of voltage gain around logic threshold results in

$$
\begin{equation*}
A_{V}=g_{m} R_{C}=\frac{V_{S W T N G}}{4 V_{T}} \tag{2.10}
\end{equation*}
$$

which is a function of only logic swing. As an example, when $V_{\text {SWING }}$ ranges from 400 mV to 1 V , which are typical values, voltage gain (2.10) ranges from 4 to 10 .

To evaluate noise margin $N M$, it is necessary to introduce some notations by referring to the output transfer characteristics of a generic logic gate in Fig. 2.3, whose points having slope equal to -1 are named ( $V_{\text {ILmax }}, V_{\text {OHmin }}$ ) and ( $V_{\text {IHmin }}, V_{\text {oLmax }}$ ). According to Fig. 2.4, noise margin at the low level, $N M_{L}$, and at the high level, $N M_{H}$, are respectively defined as

$$
\begin{align*}
& N M_{L}=V_{O H \text { min }}-V_{I H \text { min }}  \tag{2.11a}\\
& N M_{H}=V_{I L \max }-V_{O L \max } \tag{2.11b}
\end{align*}
$$

that are equal when the transfer characteristics is symmetric, and whose minimum value defines noise margin

$$
\begin{equation*}
N M=\min \left(N M_{L}, N M_{H}\right) \tag{2.12}
\end{equation*}
$$



Fig. 2.3. DC output voltage $v_{o}$ versus input voltage $v_{i}$ in a generic logic gate.

Parameter $V_{I H \min }$ of a CML inverter can be found by differentiating relationship (2.5) for $v_{i}$ and setting the result to -1 . Solving this equality for $v_{i}$ leads to

$$
\begin{equation*}
V_{I H \min } \cong V_{T} \ln \left(\frac{V_{S W I N G}}{V_{T}}-2\right) \tag{2.13}
\end{equation*}
$$

where $V_{L L \max }$ was assumed to be lower than $-2 V_{T}$ and relationship (2.9) was used.


Fig. 2.4. Parameters of the transfer characteristic.

Since relationship (2.5) is symmetric with respect to the logic threshold, the low and high noise margin, given in (2.11a) and (2.11b), result to be equal. Then, introducing approximation $V_{O H \min } \approx R_{C} I_{S S}=V_{\text {SWING }} / 2$, the noise margin results equal to

$$
\begin{equation*}
N M=N M_{L}=N M_{H} \cong \frac{V_{S W I N G}}{2}-V_{T} \ln \left(\frac{V_{S W I N G}}{V_{T}}-2\right) \tag{2.14}
\end{equation*}
$$

and, as expected, strongly depends on logic swing.
Some numerical values of the noise margin in (2.14) at room temperature versus the logic swing are reported in Table 2.2. Since minimum acceptable values of noise margin are typically in the order of 100 mV or greater, logic swing must be set at least to 200 mV from Table 2.2 (typical values in highspeed circuits are in the order of $400-500 \mathrm{mV}$ ). In practical cases, relationship (2.14) can be used to size logic swing for a noise margin requirement assigned from considerations at the system level.

To improve the noise margin, high values of the logic swing should be used, even though $V_{\text {SWING }}$ has an upper bound. This can be seen by observing that, when the input voltage $v_{i}$ is high, transistor Q 1 has its base and collector voltage equal to 0 and $-R_{C} I_{S S}$, respectively. Therefore, as discussed in Section 1.2.1, the operation in the saturation region is avoided if the base-collector junction is reverse-biased, which is ensured by keeping its voltage lower
than $V_{C B, o n} \approx 0.5 \mathrm{~V}$ [GM93]-[MG87], or equivalently by satisfying the following inequality

$$
\begin{equation*}
R_{C} I_{S S} \leq V_{C B, o n} \approx 0.5 \mathrm{~V} \tag{2.15}
\end{equation*}
$$

Therefore from (2.9) the maximum logic swing allowed is about 1 V . This result is based on the assumption that the series collector parasitic resistor $r_{c}$ (see the SPICE model in Fig. 4.1) is low enough to warrant its neglect. However, relationship (2.15) is easily generalized to arbitrary values of $r_{c}$ by considering an equivalent resistor at the collector node equal to $R_{C}+r_{c}$, as resistor $r_{c}$ is in series with $R_{C}$.

TABLE 2.2

| $R_{C} I_{S S}(\mathrm{mV})$ | $V_{\text {SWING }}(\mathrm{mV})$ | $N M(\mathrm{mV})$ |
| :---: | :---: | :---: |
| 200 | 400 | 134 |
| 250 | 500 | 178 |
| 300 | 600 | 223 |
| 350 | 700 | 269 |
| 400 | 800 | 315 |
| 450 | 900 | 362 |
| 500 | 1000 | 410 |

### 2.2.2 Single-ended input/output

In a single-ended CML inverter gate, the base voltage of transistor Q 2 in Fig. 2.1 is kept at a constant voltage $V_{R E F}$, implemented by a temperaturecompensated voltage reference. The input voltage is $v_{i 1}$ and output voltage is $v_{o 1}$, which is equal to

$$
\begin{equation*}
v_{o 1}=-R_{C} i_{C 1} \tag{2.16}
\end{equation*}
$$

whose high and low voltage levels are obtained when bias current $I_{S S}$ is almost completely steered to transistor Q2 and Q1, respectively

$$
\begin{align*}
& V_{O H}=0  \tag{2.17}\\
& V_{O L}=-R_{C} I_{S S} \tag{2.18}
\end{align*}
$$

and the resulting logic swing is half as that in the differential case

$$
\begin{equation*}
V_{S W I N G}=V_{O H}-V_{O L}=R_{C} I_{S S} \tag{2.19}
\end{equation*}
$$

that, to avoid operation in the saturation region, cannot greater than about 500 mV , from relationship (2.15).

The base terminal of transistor Q 2 is biased to a constant voltage $V_{R E F}$, which must be set to a suitable value that makes transfer characteristics in (2.16) symmetric to maximize noise margin $N M$. This is achieved for a logic threshold $V_{L T}$ at half the entire swing

$$
\begin{equation*}
V_{L T}=-\frac{R_{C} I_{S S}}{2} \tag{2.20}
\end{equation*}
$$

According to (2.20), when $v_{i 1}$ is set to $V_{L T}$, output must also be equal to it, which is achieved when base voltages of Q1-Q2 are equal (since in this case $I_{S S}$ equally divides between $\mathrm{Q} 1-\mathrm{Q} 2$ and hence $v_{o}$ is equal to $(2.20)$ ), thus also voltage $V_{R E F}$ must be set equal to $V_{L T}$. Under this value of $V_{R E F}$, from (2.4a) and (2.16) the transfer characteristics results in

$$
\begin{equation*}
v_{O}=-R_{C} I_{S S} \frac{e^{\frac{v_{i}+\frac{R_{C} I_{S s}}{2}}{V_{T}}}}{1+e^{\frac{v_{i}+\frac{R_{C} I_{S S}}{V_{T}}}{}}} \tag{2.21}
\end{equation*}
$$

which is symmetric with respect to the logic threshold. As an example, plot of relationship (2.21) versus $v_{i}$ is reported in Fig. 2.5 at room temperature for $R_{C} I_{S S}$ equal to 0.25 V .

As is well known, the small-signal voltage gain around the logic threshold in (2.20) in the single-ended case is half that in the differential case [GM93]

$$
\begin{equation*}
A_{V}=\frac{1}{2} g_{m} R_{C}=\frac{V_{\text {SWING }}}{4 V_{T}} \tag{2.22}
\end{equation*}
$$

which is a function of only the logic swing. As an example, when $V_{\text {SWING }}$ ranges from 200 mV to 500 mV , which are typical values, voltage gain (2.22) ranges from 2 to 5 .

To analytically express noise margin at the high level $N M_{H}$ in (2.11b), let us evaluate parameter $V_{I H \min }$ by differentiating relationship (2.21) for $v_{i}$ and setting the result to -1 . Solving for $v_{i}$ leads to

$$
\begin{align*}
V_{I H \min } & \approx V_{R E F}+V_{T} \cdot \ln \left(\frac{V_{S W I N G}}{V_{T}}-2\right) \\
& =-\frac{R_{C} I_{S S}}{2}+V_{T} \ln \left(\frac{V_{S W I N G}}{V_{T}}-2\right) \tag{2.23}
\end{align*}
$$

where $V_{I L \max }$ was assumed to be lower than $-2 V_{I}$ and relationship (2.20) was used.


Fig. 2.5. Transfer characteristics of a CML inverter with single-ended input/output for $R_{C} I_{S S}=0.25 \mathrm{~V}$.

By approximating $V_{\text {OHmin }} \approx 0$, from (2.23) noise margin results in

$$
\begin{equation*}
N M=N M_{L}=N M_{H} \cong \frac{V_{S W I N G}}{2}-V_{T} \ln \left(\frac{V_{\text {SWING }}}{V_{T}}-2\right) \tag{2.24}
\end{equation*}
$$

since voltage $V_{R E F^{\prime}}$ was chosen to let DC characteristics in (2.21) symmetric with respect to the logic threshold. It is useful to observe that the noise margin of the single-ended CML gate in (2.24) has the same expression as
that of the differential one, but logic swing and its maximum value (2.15) in the former (see eq. (2.19)) is half of the latter in (2.9) for an assigned value of $R_{C} I_{S S}$. For this reason, for an assigned value of $R_{C} I_{S S}$, noise margin in the single-ended case is about half as that of the differential gate (more precisely, the former is slightly lower than half of the latter). Some numerical values of the noise margin (2.24) at room temperature versus the logic swing are reported in Table 2.3 for the single-ended gate.

TABLE 2.3

| $R_{C} I_{S S}(\mathrm{mV})$ | $V_{\text {SWING }}(\mathrm{mV})$ | $N M(\mathrm{mV})$ |
| :---: | :---: | :---: |
| 200 | 200 | 55 |
| 250 | 250 | 73 |
| 300 | 300 | 92 |
| 350 | 350 | 113 |
| 400 | 400 | 134 |
| 450 | 450 | 156 |
| 500 | 500 | 178 |

To achieve acceptable values of noise margin, that are typically in the order of 100 mV or greater, logic swing must be set at least to $350-400 \mathrm{mV}$ from Table 2.3. In practical cases, relationship (2.24) is used to size logic swing for a given noise margin requirement. Finally, it is interesting to note that the noise margin of both differential and single-ended CML gates linearly decreases as increasing temperature through factor $V_{T}$, which is proportional to it.

### 2.2.3 Considerations on the non-zero input current

In the previous subsections, differential and single-ended inverter gates have been analyzed in terms of their static behavior. Such analysis has been carried out by assuming that the static input current required by the driven gates is negligible. Actually, the input current of a driven gate (equal to the base current of its input transistor) is non-zero only when the previous gate's output node voltage is high (i.e., equal to zero), since in the opposite case the input transistor is in the cut-off region and draws a null current. In the former case, the input current of a driven gate is equal to the emitter current $I_{S S}$ of input transistor divided by $\left(\beta_{r}+1\right)$ from (1.18). Thus, when a single-ended CML gate drives $N$ equal gates, the high output voltage becomes

$$
\begin{equation*}
V_{O H}=-R_{C} N \frac{I_{S S}}{\beta_{F}+1} \approx-R_{C} N \frac{I_{S S}}{\beta_{F}} \tag{2.25}
\end{equation*}
$$

while the low output level is still given by relationship (2.18), thus logic swing results in

$$
\begin{equation*}
V_{S W I N G}=R_{C} I_{S S}\left[1-\frac{N}{\beta_{F}}\right] \tag{2.26}
\end{equation*}
$$

which can be substituted into relationship (2.24) to evaluate the effective noise margin achieved. From (2.26), its reduction with respect to the ideal case with a zero gate input current in Section 2.2.2 is negligible for a fan-out $N$ sufficiently low with respect to $\beta_{F^{\prime}}$. For practical values of $\beta_{F^{\prime}}$ and $N$, effect of the gate input current on $N M$ is usually negligible.

Regarding differential gates, both $V_{O H}$ and $\left|V_{O L}\right|$ are reduced by the same amount given in (2.25), thus logic swing turns out to be

$$
\begin{equation*}
V_{S W I N G}=2 R_{C} I_{S S}\left(1-\frac{N}{\beta_{F}}\right) \tag{2.27}
\end{equation*}
$$

for which the same considerations reported above still hold.

### 2.2.4 Remarks and comparison of differential/single-ended gates

In the following, comparison of the two topologies is carried out to justify why differential gates are generally preferred to single-ended ones.

Analysis in Sections 2.2.1 and 2.2.2 has shown that differential CML gates have a better noise immunity by a factor slightly greater than 2 , for a given value of $R_{C} I_{S S}$. As will be shown in Chapter 5, product $R_{C} I_{S S}$ significantly affects speed and its trade-off with power consumption. In particular, it will be shown that a low value of factor $R_{C} l_{S S}$ is advantageous in terms of speed when a low bias current is used and in terms of the speedpower trade-off when a high bias current is adopted. Therefore, for a given noise margin, differential gates allow for a reduction of $R_{C} I_{s s}$ by a factor greater than 2 compared to single-ended gates, thereby improving by the same factor the speed in a low power design or the power efficiency in a high performance design.

There are further arguments concerning immunity to supply noise that make differential gates superior than single-ended counterparts. To
understand this point, let us consider a supply noise voltage $v_{n o i s e}$ superimposed to the ground voltage of a CML gate. By assuming $v_{\text {noise }}$ to be a small signal, its effect on output voltages $v_{o 1}$ and $v_{o 2}$ can be evaluated by means of small-signal analysis. To be more specific, the resulting variation of $v_{o 1}\left(v_{o 2}\right)$ is given by the voltage divider between resistor $R_{C}$ and the equivalent small-signal resistor seen from the collector of transistor Q1 (Q2) to ground. Since in practical cases the latter is much greater than the former, $v_{\text {noise }}$ is almost entirely transferred to both output nodes, i.e. the small-signal component of $v_{o 1}$ and $v_{o 2}$ is equal to $v_{n o i s e}$. Therefore, in a single-ended CML gate the output noise voltage is essentially equal to the supply noise, while in a differential gate its effect on output is insignificant, since $v_{\text {noise }}$ affects both voltages $v_{o 1}$ and $v_{o 2}$ approximately in the same manner. In any case, due to the very low logic swing, the effect of supply noise on the output voltage is reduced by using the ground voltage as the high supply voltage, since it is usually less noisy than other available voltage sources, as depicted in Fig. 2.1.

Regarding the supply noise that we may superimpose to the negative supply voltage, it does not have any effect on output voltage of a CML gate from inspection of Fig. 2.1, since $I_{S S}$ is an ideal current source (even if it were not ideal, the effect of the noise signal would be a common mode signal with no effect on the differential output). Moreover, differential gates do not need a reference voltage, which in single-ended gates reduces the noise margin (due to unavoidable fluctuations and tolerances on its voltage) and requires a further power consumption contribution.

In both differential and single-ended topologies, the output logic level can be inverted with no extra circuitry, since in the former the inversion is simply obtained by exchanging output nodes, while in the latter it is achieved by taking output signal from the opposite output node. This increased degree of freedom often allows for reducing the gate count when implementing a given logic function, as will be discussed in Chapter 3.
From the previous observations, that can be easily extended to more complex gates, it is apparent that differential gates are largely preferred in practical applications, since they have a better noise immunity, speed or a more favorable trade-off with power consumption. Therefore, in the following sections only differential gates will be considered.

### 2.3 THE BUFFERED BIPOLAR CURRENT-MODE (ECL) INVERTER

The Emitter Coupled Logic (ECL) inverter is obtained by adding an output buffer (i.e., a common collector stage) to each output node of the CML inverter, as shown in Fig. 2.6. In particular, output buffers are biased
by a constant current $I_{\mathcal{C}}$, while the internal CML gate is biased by the current $I_{S S}$. Traditionally, output buffers are introduced to enhance the driving capability of the gate, and their effects on the gate delay will be dealt with in Chapters 4 and 5. Alternatively, output buffers may be added to down-shift the common-mode output voltage by a base-emitter voltage $V_{B E 3,4,}$, as will be shown in Section 2.5.3.


Fig. 2.6. ECL inverter topology.

In the differential case, the input voltage is defined by relationship (2.3a) as for the CML inverter, while the output voltage is given by

$$
\begin{equation*}
v_{o}=v_{o 1}-v_{o 2}=\left\lfloor-R_{C} i_{C 1}-V_{B E, 3}\right\rfloor-\left\lfloor-R_{C} i_{C 2}-V_{B E, 4}\right\rfloor=-R_{C}\left(i_{C 1}-i_{C 2}\right) \tag{2.28}
\end{equation*}
$$

which is equal to the transfer characteristics of the differential CML inverter in (2.3b). As a consequence, the differential ECL inverter has the same static behavior as the CML gate, i.e. the same logic swing, small-signal gain and noise margin already evaluated in Section 2.2.1. The only difference is that both output nodes voltages are down-shifted by the constant voltage $V_{B \in, 3,4}$.

In the single-ended case, input and output voltages are $v_{i 1}$ and $v_{o 1}$, thus one of the two buffers can be avoided (if the opposite value of output is not required), and voltage $v_{i 2}$ is set to a constant value $V_{\text {REF }}$ that lies in the middle of the logic swing, as already discussed in Section 2.2.2. Being both voltage levels $V_{O H}$ and $V_{O L}$ down-shifted by $V_{B E 3,4}$ with respect to those of
the CML inverter in Section 2.2.2, logic swing and noise margin of the ECL inverter are still given by (2.9) and (2.14). It is worth noting that, due to the shift of $V_{O H}$ and $V_{O L}$, from (2.20) the logic threshold and the reference voltage $V_{\text {REF }}$ must be set to

$$
\begin{equation*}
V_{R E F}=V_{L T}=-\frac{R_{C} I_{S S}}{2}-V_{B E 3,4} \tag{2.29}
\end{equation*}
$$

In practical cases, differential ECL gates are preferred to single-ended ones for the same reasons clarified in Section 2.2.4.

### 2.4 THE MOS CURRENT-MODE INVERTER

The MOS Current-Mode inverter gate is based on the source-coupled pair of NMOS transistors biased by a constant current $I_{S S}$ implemented by an NMOS current mirror, as shown in Fig. 2.7. In this figure, a positive supply voltage $V_{D D}$ is used, as required by the analog circuitry that is usually integrated within the same chip [KA92].

The operation of the differential MOS Current-Mode inverter is similar to that of the bipolar CML counterpart. In particular, assuming transistor operation in the saturation region, currents $i_{D 1}$ and $i_{D 2}$ of transistors M1-M2 can be expressed as a function of the differential input voltage $v_{i}=v_{i 1}-v_{i 2}$ as [SS91]

being parameters $W_{n}$ and $L_{n}$ the effective NMOS transistor channel width and length, $C_{O X}$ the oxide capacitance per area, $\mu_{n}$ the NMOS carrier
mobility. From (2.30), the bias current $I_{S S}$ is completely steered to one of the two output branches for a magnitude of the differential input voltage greater than $\sqrt{2 I_{S S} / \mu_{n} C_{O X}\left(W_{n} / L_{n}\right)}$. The current steered in the MOS Current-Mode inverter is converted into the differential output voltage $v_{o}=v_{o 1}-v_{o 2}$ through the PMOS active load M3-M4.


Fig. 2.7. Source-Coupled inverter gate.

Since modeling of MOS transistors is not straightforward as for bipolar devices, the output transfer characteristics of the circuit in Fig. 2.7 will be evaluated through a simplified analysis and a linearization of the PMOS load, as will be shown in the following.

### 2.4.1 Static modeling of the PMOS active load

The current-to-voltage conversion in the circuit in Fig. 2.7 is performed by the two PMOS transistors M3-M4, both of which have a source-gate voltage equal to $V_{D D}$ and a much smaller source-drain voltage (in the order of hundreds of millivolts). Therefore, transistors M3-M4 work in the triode region, and can thus be modeled as an equivalent linear resistor $R_{D}$.

To evaluate $R_{D}$, let us consider the expression of the drain current $i_{D}$ of a PMOS transistor working in the triode region used in the BSIM3v3 MOSFET model, which represents the standard model for deep submicron CMOS technologies [CH99]

$$
\begin{equation*}
i_{D}=\frac{I_{D S A T 0}}{1+R_{D S} \frac{I_{D S A T 0}}{V_{S D}}} \tag{2.31}
\end{equation*}
$$

where the parameter $R_{D S}=\left(R_{D S W}{ }^{*} 1 \mathrm{E}-6\right) / W$ depends on the empiric model parameter $R_{D S W}$, which accounts for source/drain parasitic resistor, and heavily affects the I-V relationship in today's CMOS processes with lightlydoped drain (LDD). It is worth noting that $R_{D S}$ does not represent a physical resistor, but only a corrective factor.

Ratio $I_{D S A 10} / V_{S D}$ in relationship (2.31) can be evaluated by considering the expression of current $I_{\text {DSATO }}$ valid for both NMOS and PMOS transistors working in the linear region

$$
\begin{equation*}
I_{D S A T 0}=\mu_{e f f} C_{O X} \frac{W}{L}\left(\left|V_{G S}\right|-\left|V_{T}\right|-A_{\text {bulk }} \frac{\left|V_{D S}\right|}{2}\right)\left|V_{D S}\right| \tag{2.32}
\end{equation*}
$$

whose parameters are reported in the following with the subscript $p$ if referred to a PMOS transistor, and subscript $n$ in the case of an NMOS transistor. In (2.32), parameter $V_{T}$ is the threshold voltage (not the thermal voltage), $V_{G S}$ and $V_{D S}$ are the source-gate and source-drain voltages, and $\mu_{e f f}$ is the effective carrier mobility defined as [CH99]

$$
\begin{align*}
\mu_{e f f}= & \frac{\mu_{0}}{1+\left(U_{A}+U_{C}\left|V_{S B}\right|\right)\left(\frac{\left|V_{G S}\right|+\left|V_{T}\right|}{T_{O X}}\right)+U_{B}\left(\frac{\left|V_{G S}\right|+\left|V_{T}\right|}{T_{O X}}\right)^{2}} \\
& \cdot \frac{1}{1+\frac{\left|V_{D S}\right|}{E_{S A T} L}} \tag{2.33}
\end{align*}
$$

where $E_{S A T}$ is the critical electric field at which carrier velocity becomes saturated, $U_{A}, U_{B}$ and $U_{C}$ are model parameters, $V_{S B}$ is the source-bulk voltage, and $T_{O X}$ is oxide thickness. It is worth noting that, in the denominator of (2.33), the terms including $V_{G S}$ model the mobility
degradation due to the vertical electric field in the MOS transistor, while those including $V_{D S}$ model the carrier velocity saturation due to the lateral electric field. In particular, in the case of the active load PMOS transistors, we have to set $V_{S G}=V_{D D}$ and $V_{S B}=0$ in the expression (2.33) of the effective mobility $\mu_{\text {eff }, p}$.

Parameter $A_{\text {bulk }}$ in (2.32) is slightly greater than the unity and is given by

$$
\begin{align*}
A_{\text {bulk }}= & \frac{1}{1+K_{E T A}\left|V_{S B}\right|}\left\{1+\frac{K_{10 X}}{2 \sqrt{\phi_{S}-\left|V_{S B}\right|}}\left[\frac{A_{0} L}{L+2 \sqrt{X_{J} X_{\text {dep }}}}\right.\right.  \tag{2.34}\\
& \left.\left.\cdot\left(1-A_{G S}\left(\left|V_{G S}\right|-\left|V_{T}\right|\right)\left(\frac{L}{L+2 \sqrt{X_{J} X_{\text {dep }}}}\right)^{2}\right)+\frac{B_{0}}{W+B_{1}}\right]\right\}
\end{align*}
$$

which depends on $W, L$ and various other BSIM3v3 model parameters, and can be simplified by considering its maximum value, $A_{\text {bulk, max }}$. This can be obtained by setting $W$ to its minimum value in (2.34) and maximizing the resulting function with respect to $L$, with straightforward calculations. As an example, for the $0.35-\mu \mathrm{m}$ CMOS process with main parameters reported in Table 2.4 and $V_{D D}=3.3 \mathrm{~V}$, the PMOS transistor parameter $A_{b u l k, m a x, p}$ results in 1.34.

TABLE 2.4

| $C_{O X}$ | $4.6 \mathrm{fF} / \mu \mathrm{m}^{2}$ |
| :---: | :---: |
| $\mu_{n o} C_{O X}$ | $175 \mu \mathrm{~A} / \mathrm{V}^{2}$ |
| $\mu_{p o} C_{O X}$ | $60 \mu \mathrm{~A} / \mathrm{V}^{2}$ |
| $(W / L)_{\min }$ | $0.6 \mu \mathrm{~m} / 0.3 \mu \mathrm{~m}$ |
| $V_{T n 0}($ short channel $)$ | 0.54 V |
| $V_{I p 0}($ short channel $)$ | -0.72 V |
| maximum $V_{D D}$ | 3.3 V |

From these considerations, by assuming $V_{S D}$ to be small, terms $A_{\text {bulk,p }} V_{S D} / 2$ and $V_{S D} / E_{S A T, p} L_{p}$ can be neglected in (2.33). Therefore (2.32) becomes

$$
\begin{equation*}
I_{D S A T 0}=\frac{V_{S D}}{R_{\mathrm{int}}} \tag{2.35}
\end{equation*}
$$

where we have defined

$$
\begin{equation*}
R_{\mathrm{int}}=\frac{1}{\mu_{e f f, p} C_{O X} \frac{W_{p}}{L_{p}}\left(V_{D D}-\left|V_{T, p}\right|\right)} \tag{2.36}
\end{equation*}
$$

which represents the "intrinsic" resistance of the PMOS transistor in the triode region (equal to that derived in (1.75) for a long-channel MOS transistor), since it expresses the behavior of the MOS transistor in the triode region without accounting for the parasitic drain/source resistance.

Now, the expression of $i_{D}$ in (2.31) can be simplified by expanding it in Taylor series truncated at the first-order term

$$
\begin{equation*}
i_{D}=I_{\text {DSATO }}\left(1-\frac{R_{D S}}{R_{\mathrm{int}}}\right) \tag{2.37}
\end{equation*}
$$

From (2.37), the equivalent resistance of the PMOS transistors $R_{D}=V_{S D} / i_{D}$ results in

$$
\begin{equation*}
R_{D}=\frac{R_{\mathrm{int}}}{1-\frac{R_{D S}}{R_{\mathrm{int}}}} \tag{2.38}
\end{equation*}
$$

### 2.4.2 Input-output characteristics

The output voltage transfer characteristics $v_{o}\left(v_{i}\right)$ of the MOS CurrentMode inverter can be evaluated by substituting the equivalent resistance $R_{D}$ expressed in (2.38) into the circuit in Fig. 2.7. Thus, the differential output voltage $v_{o}$ is equal to

$$
\begin{equation*}
v_{o}=v_{o 1}-v_{o 2}=-R_{D}\left(i_{D 1}-i_{D 2}\right) \tag{2.39}
\end{equation*}
$$

which is formally equal to the transfer characteristics (2.3b) of the bipolar Current-Mode inverter. By evaluating transistor currents through relationship (2.30) and substituting them into (2.38), the output transfer characteristics results in

$$
v_{o}\left(v_{i}\right)= \begin{cases}R_{D} I_{S S} & \text { if } v_{i}<-\sqrt{\frac{2 I_{S S}}{\mu_{n} C_{O X} \frac{W_{n}}{L_{n}}}}  \tag{2.40}\\ -v_{i} R_{D} I_{S S} \sqrt{\frac{\mu_{e f f, n} C_{O X}}{I_{S S}} \frac{W_{n}}{L_{n}}-\left(\frac{\mu_{e f f, n} C_{O X}}{2 I_{S S}} \frac{W_{n}}{L_{n}} v_{i}\right)^{2}} & \text { if }\left|v_{i}\right| \leq \sqrt{\frac{2 I_{S S}}{\mu_{n} C_{O X} \frac{W_{n}}{L_{n}}}} \\ -R_{D} I_{S S} & \text { if } v_{i}>\sqrt{\frac{2 I_{S S}}{\mu_{n} C_{O X} \frac{W_{n}}{L_{n}}}}\end{cases}
$$

whose typical behavior is plotted versus input voltage $v_{i}$ in Fig. 2.8. It is worth noting that (2.40) is symmetrical with respect to zero, thus the logic threshold is equal to

$$
\begin{equation*}
V_{L T}=0 \tag{2.41}
\end{equation*}
$$

and low output voltage $V_{O L}$ and high output voltage $V_{O H}$ are

$$
\begin{align*}
& V_{O L}=-R_{D} I_{S S}  \tag{2.42a}\\
& V_{O H}=R_{D} I_{S S} \tag{2.42b}
\end{align*}
$$

thus logic swing is equal to

$$
\begin{equation*}
V_{S W I N G}=V_{O H}-V_{O L}=2 R_{D} I_{S S} \tag{2.43}
\end{equation*}
$$

As already done in Section 2.2.1 for the CML gate, the small-signal voltage gain around the logic threshold (2.41) results in

$$
\begin{equation*}
A_{V}=g_{m, n} R_{D}=\frac{V_{S W I N G}}{2} \sqrt{\frac{\mu_{0, n} C_{O X}}{\left(1+\frac{V_{T, n}}{E_{S A T, n} L_{n}}\right)} \frac{W_{n}}{L_{n}} \frac{1}{I_{S S}}} \tag{2.44}
\end{equation*}
$$

where (2.43) was substituted, and the NMOS transconductance $g_{m, n}$ was evaluated by using its long-channel expression (1.66), but properly accounting for short-channel effects through the effective mobility $\mu_{\text {eff, }, n}$. The latter was derived from (2.33) by assuming $V_{G S}$ of M1-M2 to be small
enough to neglect mobility degradation, thus simplifying relationship (2.33) into $\mu_{e f f, n}=\mu_{0, n} /\left(1+\frac{V_{D s}}{E_{\text {Sir, }, \text { Left,n }}}\right)$. Moreover, since $v_{i l}=v_{i 2}=v_{o 1}=v_{o 2}=V_{D D}-V_{S W I N G} / 4$ and $I_{D 1,2}=I_{S S} / 2$ when the gate is biased around the logic threshold, voltage $V_{D S}$ of transistors M1-M2 is equal to their $V_{G S}$, which in (2.44) was underestimated by $V_{T, n}$ for the sake of simplicity (i.e., by setting $V_{D S} \cong V_{T, n}$ in $\left.\mu_{e f f, n}\right)$.


Fig. 2.8. Transfer characteristics of a differential SCL inverter.

As for the CML inverter, the half swing $R_{\nu} I_{S S}$ must be kept low enough to ensure NMOS transistors M1-M2 to be kept out of the triode region. Differently from the bipolar gate, this condition must be satisfied to avoid reduction of NMOS driving capability and small-signal gain $A_{V}$, rather than avoiding a charge excess in the base region. In particular, when the gate voltage of an NMOS transistor is high (i.e. equal to $V_{D D}$ ), the drain voltage is equal to $V_{D D}-R_{D} I_{S S}$, thus the triode region is avoided if the gate-drain voltage $V_{G D}$ is lower than the threshold voltage from (1.40)-(1.41)

$$
\begin{equation*}
V_{G D}=V_{D D}-\left[V_{D D}-R_{D} I_{S S}\right]=R_{D} I_{S S} \leq V_{T, n} \tag{2.45}
\end{equation*}
$$

which imposes an upper bound to $R_{D} I_{S S}$, and hence to the logic swing, from (2.43).

### 2.4.3 Evaluation of the noise margin

Due to the symmetry of output characteristics (2.40), the noise margin $N M$ is equal to $N M_{\mathcal{L}}$ or equivalently to $N M_{H}$, which is evaluated by calculating input value $V_{\text {IHmin }}$ that makes the derivative of (2.40) equal to -1 , and the associated value of output voltage $V_{\text {онmin }}$. After simple calculations, $V_{\text {IHmin }}$ results to [APP02]

$$
\begin{align*}
V_{I H \text { min }} & =\sqrt{\frac{2 I_{S S}}{\mu_{e f, n} C_{O X} \frac{W_{n}}{L_{n}}}-\frac{I_{S S}}{2 \mu_{e f f, n} C_{O X} \frac{W_{n}}{L_{n}}} \frac{1}{A_{V}^{2}}\left(\sqrt{1+8 A_{V}^{2}}+1\right)}  \tag{2.46}\\
& \cong \sqrt{\frac{2 I_{S S}}{\mu_{e f f, n} C_{O X} \frac{W_{n}}{L_{n}}}\left(1-\frac{1}{\sqrt{2} A_{V}}\right)}
\end{align*}
$$

where (2.44) was used and $A_{V} \gg 1 / \sqrt{8}$ was assumed. Therefore, by approximating $V_{\text {OHmin }}$ to $V_{\text {OH }}$ in (2.42b), the noise margin results to

$$
\begin{align*}
N M & =V_{O H \text { min }}-V_{I H \text { min }}=R_{D} I_{S S}\left(1-\frac{\sqrt{2}}{A_{V}} \sqrt{1-\frac{1}{\sqrt{2} A_{V}}}\right) \\
& \cong \frac{V_{S W I N G}}{2}\left(1-\frac{\sqrt{2}}{A_{V}}\right) \tag{2.47}
\end{align*}
$$

where $A_{V} \gg 1 / \sqrt{2}$ was assumed. By inspection of (2.47), the noise margin of an SCL gate is proportional to half the logic swing, and roughly equal to it if $A_{V}$ is in the order of $4 \div 5$.

### 2.4.4 Validation of the static model

The approximate model of static parameters $V_{S W I N G}, A_{V}$ and $N M$ discussed in the previous subsections was compared to simulation results obtained for an SCL inverter by using a $0.35-\mu \mathrm{m}$ CMOS process, whose main parameters are summarized in Table 2.4 [APP02]. To this end, several DC simulations
were performed with $V_{D D}=3.3 \mathrm{~V}, I_{S S}$ ranging from $5 \mu \mathrm{~A}$ to $100 \mu \mathrm{~A}$, by choosing the transistors aspect ratio to get $A_{V}$ ranging from 2 to 7 , and $R_{D} I_{S S}$ from 200 mV to 700 mV (i.e., a maximum value slightly lower than $V_{T, n}$ ).

The simulated and predicted results are plotted in Figs. 2.9a, 2.9b and 2.9 c , in which the scattering plots of the logic swing, the magnitude of the voltage gain and the resulting noise margin are respectively reported.


Fig. 2.9a. Scattering plot of predicted vs. simulated logic swing.


Fig. 2.9b. Scattering plot of predicted vs. simulated voltage gain magnitude.

Noise margin


Fig. 2.9c. Scattering plot of predicted vs. simulated noise margin.

From Figs. 2.9a-2.9c, it is evident that the predicted values are close to the simulated ones. More specifically, the maximum error between the model results and the simulated ones for $V_{S W I N G}, A_{V}$ and $N M$ is equal to $24.7 \%, 25.9 \%$ and $24.7 \%$, respectively, and in typical cases the error is significantly lower, as can be deduced from the average and standard deviation values of the error reported in Table 2.5. It is worth noting that the model always underestimates $V_{\text {SWING }}$ and overestimates $A_{V}$.

TABLE 2.5

| model <br> error | maximum (\%) | Average (\%) | standard deviation (\%) |
| :---: | :---: | :---: | :---: |
| $V_{\text {SWING }}$ | 24.7 | 14.7 | 5 |
| $A_{V}$ | 25.9 | 12.9 | 4 |
| $N M$ | 24.7 | 7.6 | 5.7 |

### 2.4.5 The buffered MOS Current-Mode inverter and remarks

As already discussed for bipolar ECL gates, to improve the SCL gate driving capability or to shift the common-mode value of the output nodes voltage, an
output buffer can be added to each output node, as shown in Fig. 2.10, where $v_{o}=v_{o 1}-v_{o 2}$ is the differential output voltage of the gate, and $v_{i, b u f 1}$ and $v_{i, b u f 2}$ are the input voltages of the two buffers, respectively.


Fig. 2.10. SCL inverter gate with output buffers.

The output buffer is a source-follower stage biased by the current source $I_{S F}$, which by inverting (1.42) down-shifts output voltages of the internal SCL gate by a gate-source voltage $V_{G S}$ of transistors M5-M6 set by current $I_{S F}$, according to

$$
\begin{equation*}
V_{G S, S 6}=V_{T, n}+\sqrt{\frac{I_{S F}}{\frac{\mu_{e f f, b u f} C_{O X}}{2} \frac{W_{b u f}}{L_{b u f}}}} \tag{2.48}
\end{equation*}
$$

being $W_{b u f} / L_{b u f}$ the aspect ratio of buffer transistors M5-M6. The effective mobility $\mu_{\text {eff; } n}$ in (2.33) can be simplified into

$$
\begin{aligned}
\mu_{e f f, b u f}= & \frac{\mu_{0}}{1+\left[U_{A}+U_{C}\left(V_{D D}-V_{T, n}\right)\left(\frac{2 V_{T, n}}{T_{O X}}\right)+U_{B}\left(\frac{2 V_{T, n}}{T_{O X}}\right)^{2}\right.} \\
& \cdot \frac{1}{1+\frac{\left|V_{T, n}+\frac{1}{2} V_{S W I N G}\right|}{E_{S A T} L}}
\end{aligned}
$$

where $V_{D S}$ and $V_{S B}$ are approximated by their maximum values (i.e., $V_{G S, 56}+V_{S W I N G} / 2$ and $V_{D D}-V_{G S}$, respectively), and $V_{G J, 56}$ is underestimated by $V_{T, n}$.

The static parameters $V_{S W I N G}, A_{V}$ and $N M$ of the SCL inverter in Fig. 2.10 can be derived by properly modifying the results obtained in the previous section. Indeed, the small-signal gain of the common-drain stage (i.e., the ratio $\left.v_{o} /\left(v_{i, b u f l}-v_{i, b u f 2}\right)\right)$ is equal to [LS94]

$$
\begin{equation*}
\frac{v_{o}}{v_{i, b u f 1}-v_{i, b u f 2}}=\frac{1}{1+\frac{g_{m b, b u f}}{g_{m, b u f}}} \tag{2.50}
\end{equation*}
$$

where $g_{m b, \text { buf }}$ and $g_{m, \text { buf }}$ are respectively the body effect transconductance and the transistor transconductance, and their ratio is almost constant and close to unity ${ }^{1}$. Thus, the voltage gain $A_{V}$ and the logic swing $V_{S W I N G}$ are obtained by multiplying those of an SCL gate in (2.43)-(2.44) by (2.50). Obviously, these parameters must be substituted in (2.47) to achieve the noise margin of the circuit in Fig. 2.10.

Predicted values of $V_{\text {SWING }}, A_{V}$ and $N M$ were compared to simulation results, in the same conditions as in Section 2.4.4. Simulations reveal that the logic swing, voltage gain and noise margin agree well with the model and the error is due only to the internal SCL gates, due to the good accuracy of relationship (2.50).

For the sake of completeness, transistor transconductance $g_{m, \text { buf }}$ in (2.50) can be evaluated by using its long-channel expression (1.66) and substituting into it the effective mobility $\mu_{\text {eff; buf }}$ in (2.49).

[^1]Until now, SCL gates have been assumed to include a PMOS active load. Nevertheless, other kinds of load could be considered, such as a physical resistor $R_{D}$ or a diode-connected NMOS/PMOS. The first solution is not feasible since resistors need a wide silicon area to be integrated, and in addition they are affected by a parasitic capacitance greater than the PMOS transistor for practical values required (as an example, for the $0.35-\mu \mathrm{m}$ CMOS process considered, this occurs for $R_{D}$ greater than $1 \mathrm{k} \Omega$, which is lower than typical values used). The diode-connected transistor load has other drawbacks, among which the loss of a threshold voltage in output levels and the floating output node for a high output level [R96]. Furthermore, the MOS diode load is slower than the PMOS active load for practical bias currents (for the process considered, this occurs for $I_{\mathrm{s}}$ greater than $1 \mu \mathrm{~A}$, which is lower than typical values). For these reasons, only the PMOS active load will be considered in the following.

### 2.5 FUNDAMENTAL CURRENT-MODE LOGIC GATES

Differential Current-Mode gates are usually implemented according to the series-gating approach [T89] [W90], i.e. by properly stacking sourcecoupled transistor pairs, as will be clarified in the following subsections. Since MOS and bipolar Current-Mode gates have the same principle of operation, i.e. the switching of the source/emitter-coupled transistor pair, bipolar gates will be mainly focused in the remainder of the section. Results and topologies can be immediately extended to MOS Current-Mode gates by substituting n -channel devices to npn transistors in bipolar gates.

### 2.5.1 Principle of operation of Current-Mode gates: the series gating

 conceptAs depicted in Fig. 2.11, Current-Mode bipolar (MOS) gates are made up of a bias current source which is steered to one of the two output resistors by a network consisting of a bipolar npn (n-channel) transistors network, according to its input signals value. To allow a correct operation, in which $I_{s s}$ entirely flow through only one of the two output resistors and the two output nodes are thus opposite, there must be only a unique conductive path from $I_{S S}$ to output nodes for all possible input values. It is worth noting that current $I_{S S}$ does not entirely reach one of the two output nodes, since each emitter-coupled pair outputs a (collector) current that is reduced by the common-base current gain $\alpha_{F}$ with respect to the input (emitter) current. Being $\alpha_{F} \approx 1$ in practical cases, a switching emitter coupled pair is biased by a
current approximately equal to $I_{S S}$, thus the noise margin in a complex gate for all possible inputs is the same as that of a simple inverter in Section 2.2. This allows for extending results in Sections 2.2-2.3 to arbitrary CML/ECL gates. The same observation holds for MOS Current-Mode logic, since each source-coupled pair generates an output (drain) current equal to the input (source) current.


Fig. 2.11. General topology of a CML gate.

To understand how bipolar series gates are implemented, let us analyze the basic AND ( $\cdot$ ) and OR $(+)$ operation of two input signals, $A$ and $B$. In particular, observe that the function $A \cdot B$ is simply achieved by stacking two emitter coupled pairs as in Fig. 2.12a, since current $i_{A \cdot B}$ is high ${ }^{2}$ only when both transistors Q1-Q3 are ON, which occurs if $A=B=1$. The current obtained $i_{A \cdot B}$ is then converted into an output voltage through a resistor $R_{C}$. It is worth noting that the current-to-voltage conversion is an inverting operation, i.e.

[^2]the output voltage is low when the output current is high. Since a differential output voltage is required, the other unused branches (i.e. collector of Q2-Q4 in Fig. 2.12a) must be connected to generate the opposite to $i_{A \cdot B}$ current, that in Fig. 2.12a is referred to as $\overline{i_{A B}}$ and is approximately equal to $I_{S S}-i_{A \cdot B}$. This branch is then connected to the other output resistor $R_{C}$ to perform the current-to-voltage conversion. The complement of the function (e.g. to compensate the inversion introduced in the current-to-voltage conversion) is trivially obtained by exchanging the two output nodes.

The $O R$ operation can be still performed by the stacked topology in Fig. 2.12a by rearranging signals through De Morgan laws. In particular, since $A+B=\overline{\bar{A}} \cdot \overline{\bar{B}}$, the logic OR of $A$ and $B$ is simply obtained by complementing (i.e. exchanging differential signals) output, as well as inputs $A$ and $B$ (see Fig. 2.12b). In cases where signals $A, B$ are applied to emitter coupled pairs Q1-Q2 and Q3-Q4 which are not stacked, their OR can be simply performed by connecting the collector of Q1 and Q3 (and that of Q2 and Q4 to obtain the complement). The resulting topology is reported in Fig. 2.11c.


Fig. 2.12a. Logic AND between two input signals, $A$ and $B$.


Fig. 2.12b. Logic OR between two input signals, $A$ and $B$ associated with stacked transistor pairs.


Fig. 2.12c. Logic OR between two input signals, $A$ and $B$, associated with non-stacked transistor pairs.

In the following subsection, some of the most used Current-Mode series gates are introduced. After analyzing the operation of such gates, issues related to the minimum supply voltage required for a correct functioning are addressed.

### 2.5.2 Some examples of Current-Mode series gates

By applying the concepts developed in the previous subsection, the bipolar AND2 gate has the topology reported in Fig. 2.13. From this circuit, the NAND2 gate is obtained by inverting the output nodes, the NOR2 gate is
achieved by inverting the input signals, while the OR2 gate is achieved by inverting both the input and output signals, from the De Morgan laws.


Fig. 2.13. AND2 gate topology.

Operation of the AND gate in Fig. 2.13 can be easily verified by observing that output is high when $I_{S S}$ is steered to the left-hand output node by transistors Q1-Q3, which are active only when $A=B=1$.

From an applicative point of view, a more important Current-Mode circuit is the XOR gate, as will be discussed in Section 2.6.1. Its topology is easily found by expanding the XOR function (represented by the operator $\oplus$ )

$$
\begin{equation*}
A \oplus B=\bar{A} \cdot B+A \cdot \bar{B} \tag{2.51}
\end{equation*}
$$

where each term can be implemented by means of the topology in Fig. 2.12b. The two currents obtained can be OR-ed by summing them according to Fig. 2.12c, i.e. by connecting both the two branches to the output node, while connecting unused branches to the complementary output, as shown in Fig. 2.14. This topology is equivalent to the well-known Gilbert Quad [PM91].

Analogously, the $2: 1$ multiplexer (MUX) gate with control signal $\phi$ has the topology in Fig. 2.15, and its operation is easily understood by observing
that the output is set by the emitter-coupled pair Q3-Q4 or Q5-Q6, depending on whether current $I_{S S}$ is steered by transistor Q1 or Q2, i.e. when control signal $\phi$ is high or low.


Fig. 2.14. CML XOR gate topology.


Fig. 2.15. CML MUX gate topology.

The 1:2 demultiplexer (DEMUX) gate with control signal $\phi$ can be built as in Fig. 2.16 by following the same reasoning. To be more specific, the emitter-coupled pair Q3-Q4 or Q5-Q6 is activated by current $I_{S S}$ steered by transistors Q1-Q2, according to the control signal value. The inactivated transistors pair leaves both of its output nodes at a high voltage.


Fig. 2.16. CML DEMUX gate topology.

As an example of sequential blocks, let us consider the D latch gate implementing the logic function

$$
\begin{equation*}
O U T=C K \cdot D+\overline{C K} \cdot O U T_{\text {previous }} \tag{2.52}
\end{equation*}
$$

where $C K$ is usually the clock signal, which enables input $D$ when it is high and keeps output at the previous value when it is low. Thus, the bias current must be steered to an emitter-coupled pair driven by input $D$ when $C K=1$, and to a bistable sequential circuit when $C K=0$. As is well known [R96], the latter block can be implemented through positive feedback by cascading two inverter gates, as depicted in Fig. 2.17. In practical cases, the two cascaded inverter gates are actually implemented by resorting to the feedback emittercoupled pair in Fig. 2.17, since each of the two transistors along with a load resistor is an inverting stage. The resulting topology of the Current-Mode D latch is reported in Fig. 2.18.


Fig. 2.17. Positive feedback inverter gates implementing the memory circuit.


Fig. 2.18. CML D latch gate.

Until now, CML topologies of fundamental gates have been considered, and their ECL counterparts can easily be derived by adding the two output buffers as in Fig. 2.19, as already done for the simple inverter.


Fig. 2.19. ECL series gates.

### 2.5.3 Supply voltage limitations in bipolar Current-Mode gates

As discussed in the previous subsections, series gates are implemented by multiple levels of stacked emitter-coupled pairs. The transistor pairs connected to the current source $I_{S S}$ belong to the lower level, while pairs connected to load resistors $R_{C}$ are at the upper level. Let $n$ the number of levels, i.e. the maximum number of stacked pairs from current source to load resistors.

In CML gates, the input of transistor pairs at the upper level are driven by the output of the previous gate. However, emitter-coupled pairs at lower levels cannot be directly connected to the output node of the driving gate, in order to keep all transistors out of the saturation region. This is a fundamental requirement in CML gates, since saturated transistors have a very low transconductance and a substantial amount of stored base charge, which dramatically slows down the switching of the circuit [R96]. This is not acceptable in bipolar gates, which are to be used in high-speed applications, as will be discussed in Section 2.6.

To avoid the saturation region, the input voltages of lower transistor pairs are progressively reduced through a level shifter circuit with a voltage level shift $V_{\text {SHIFT }}$ [T89]. To understand this point, let us consider two contiguous levels $i$ and $(i+1)$ in the generic $n$-level CML series gate in Fig. 2.20, whose input values $v_{i}$ are equal to the output voltage $v_{o}$ of the previous gate reduced by ( $i-1$ ) times the voltage shift is applied (i.e., $v_{i}=v_{o}-(i-1) \cdot V_{\text {SHIFT }}$ ). Let assume
transistors $\mathrm{Q}_{\mathrm{i}}$ and $\mathrm{Q}_{\mathrm{i}+1}$ to be ON (due to a high input voltage $V_{D D^{-}}(i-1) \cdot V_{\text {SHIFT }}$ and $v_{i+1}=V_{D D^{-}}-i \cdot V_{\text {SHIFT }}$, respectively). The resulting base-collector voltage of the lower transistor $\mathrm{Q}_{\mathrm{i}+1}$ is ${ }^{3}$

$$
\begin{equation*}
V_{B C, i+1}=v_{i+1}-\left(v_{i}-V_{B E}\right)=-V_{S H F F T}+V_{B E} \tag{2.53}
\end{equation*}
$$

Thus to maintain it lower than $V_{C B, o n}=0.5 \mathrm{~V}$ (which avoids the operation in the saturation region), the voltage level shift must be kept greater than

$$
\begin{equation*}
V_{S H F T} \geq V_{B E}-V_{C B, o n}=V_{C E, s a t} \approx 0.3 \mathrm{~V} \tag{2.54}
\end{equation*}
$$

where $V_{B E}$ was assumed to be about 0.8 V .
In practical cases, the basic level shift is implemented by means of the topology in Fig. 2.21 [R96], [GMC91], where all transistors are biased by the current source $I_{C C}$. In this figure, which refers to the case with 4 levels, the upper level is directly taken from the output of another CML gate, and the successive levels are downshifted by $V_{B E}, 2 V_{B E}$ and $3 V_{B E}$ (i.e., $V_{S H I F T}$ is set to $V_{B E}$ ). The number of npn transistors required in an $n$-level shifter is equal to ( $n-1$ ). The main limitation of the circuit in Fig. 2.21 is its slow switching for a high number of levels and for large fan-out values [GMC91]. To overcome this problem, the alternative solution in Fig. 2.22 based on cascaded emitter follower stages is used. The speed increase is achieved at the cost of higher power consumption, since a current source $I_{C C}$ is required for each $V_{B E}$ drop.


Fig. 2.20. Evaluation of minimum level shift between two adjacent levels.

[^3]

Fig. 2.21. A circuit solution to implement a 4-level shifter.


Fig. 2.22. A faster circuit solution to implement a 4-level shifter.

The level shifter circuits presented before can be applied to ECL circuits with only slight modifications. Indeed, the only difference with respect to

CML circuits is that even the upper level is downshifted by a $V_{B E}$ drop. In other terms, an ECL gate is obtained from a CML one by adding one transistor to the level shifter, and taking the input of the former one level below with respect to the latter (e.g., in Figs. 2.21-2.22 the input at the upper level are taken as $v_{o, 2}$, that at the second level as $v_{o, 3}$ and so on).

In practical cases, the number of logic levels $n$ in series gates is limited by the available supply voltage. This limit can be understood by considering a CML $n$-level series gate with a level shifter driving its lowest level, whose transistor in the ON state is assumed to be Q2, as depicted in Fig. 2.23. To guarantee the correct behavior of the current mirror, the collector-emitter voltage across transistor Q 1 has to be greater than $V_{C E, s a t}$. Equivalently, the supply voltage must be greater than the sum of ( $n-1$ ) base-emitter drops of transistor in the level shifter and that of transistor Q 2 , other than $V_{c E, s a t}$ of transistor Q 1 and the small voltage across resistor $R_{E 1}$, as discussed in Section 2.1. Thus, for a given supply voltage $V_{D D}$, the number of logic levels $n$ has the following upper bound

$$
\begin{equation*}
n \leq \frac{V_{D D}-V_{C E s t}-V_{R E, 1}}{V_{B E, o n}} \approx \frac{V_{D D}}{V_{B E, o n}} \tag{2.55}
\end{equation*}
$$



Fig. 2.23. CML circuit to evaluate the supply limitation.

An analogous result is found for ECL gates, which have a greater by unity number of level-shifter $V_{B E}$ drops, thus leading to the following upper bound

$$
\begin{equation*}
n \leq \frac{V_{D D}-V_{C E, s a t}}{V_{B E, o n}}-1 \approx \frac{V_{D D}}{V_{B E, o n}}-1 \tag{2.56}
\end{equation*}
$$

Inspection of relationships (2.55)-(2.56) reveals that the number of logic levels essentially depends on the ratio of supply voltage and $V_{B E}$ drop. While the former tends to be reduced for reasons related to power consumption [CB95a], the latter does not scale even for more advanced technologies [CB95b], [RP96], thus the number of logic levels tends to decrease. For example, the supply voltage in current applications can be as low as 3 V (or slightly lower), thus from (2.55) three logic levels are allowed for CML gates, and two for ECL gates. Therefore, to allow at least two levels of series gating, as required in the implementation of logic functions, it is apparent that supply voltage will no longer scale in the near future. This justifies why series gates are not a suitable circuit solution in applications that require a low supply voltage, in which cases alternative logic styles must be used. Further details will be provided in Chapter 8, where low-voltage logic styles will be analyzed and modeled.

### 2.5.4 MOS Current-Mode series gates and supply voltage limitations

Since MOS Current-Mode gates have a principle of operation similar to that of bipolar ones, considerations introduced in the previous subsection can easily be generalized to MOS circuits. In particular, implementation of logic functions through MOS Current-Mode gates can be carried out by exploiting the series-gating approach presented in Section 2.5.1. Therefore, topologies of bipolar gates presented in Section 2.5.2 can be extended to MOS technology by simply substituting each npn transistor with an NMOS, and each load resistor by a PMOS active load, as already done for the SCL inverter in Section 2.4.

Like the bipolar counterparts, MOS Current-Mode series gates require level shifters to operate correctly, i.e. to keep NMOS transistors out of the triode region. By following the same procedure as in the previous subsection, the voltage shift between adjacent levels must be greater than

$$
\begin{equation*}
V_{S H F T} \geq V_{G S}-V_{T, n}=V_{D S, \text { sat }} \tag{2.57}
\end{equation*}
$$

In practical cases, the two solutions in Figs. 2.21 and 2.22 with npn transistors replaced by NMOS devices are used. Accordingly, the maximum number of series-gating levels allowed for an assigned $V_{D D}$ is given by (2.55)-(2.56) where the NMOS gate-source voltage $V_{G S}$ replaces the baseemitter voltage $V_{B E}$ of npn transistors, respectively for gates without and with output buffers. Since the gate-source voltage reduces for more advanced CMOS processes due to the scaling of the threshold voltage (even slowly for reasons related to the subthreshold current [CB95b], [RP96]), MOS Current-Mode gates are more amenable than bipolar circuits for lowvoltage applications.

### 2.6 TYPICAL APPLICATIONS OF CURRENT-MODE CIRCUITS

Current-Mode logic is currently used in a number of applications, due to its high-speed potential, as well as the reduced switching noise [KKI97], [M97], [K98]. Such properties make Current-Mode logic more suitable than standard CMOS logic in applications ranging from RF circuits to fiber-optic communications and high-resolution mixed-signal CMOS circuits, as discussed in the following.

### 2.6.1 Radio Frequency applications

A fundamental block in RF applications is the Phase-Locked Loop (PLL), which allows for frequency synthesis, clock generation, data recovery and synchronization [R961], [H96], [LR00], [DS03]. The block diagram of a PLL is shown in Fig. 2.24 (in some applications the input frequency divider is omitted).

The periodic input signal is usually generated by a crystal reference, and its frequency is divided by $M$ by means of the input frequency divider. The phase of the signal obtained is compared to that of the feedback signal through a mixer, which is generally implemented by a Gilbert Cell (i.e. the XOR gate in Fig. 2.14). The charge pump is essentially an amplifier driving an RC loop filter, which is introduced to stabilize the closed-loop circuit.

The filtered signal drives a Voltage Controlled Oscillator (VCO) generating a periodic output signal, whose frequency is divided through the feedback frequency divider by a factor of $N$. Among these blocks, the phase detector and the frequency dividers are frequently implemented through Current-Mode logic circuits. The VCO can also be implemented by a Current-Mode ring oscillator (whose design is discussed in Chapter 8), when the noise requirement does not necessarily require the use of an LC
oscillator. Since the speed of a PLL is mainly limited by the feedback frequency divider other than the VCO, it is essential to properly design Current-Mode gates to maximize their operating frequency.

In the specific case of frequency dividers, they are usually implemented by cascading a high-speed prescaler circuit (typically divide-by- 8 circuits) and a low-speed divider, whose modulus can be eventually varied. Techniques to achieve a high speed prescaler and a ring oscillator while consciously managing the trade-off with the power consumption are dealt with in the following chapters.


Fig. 2.24. Schematic of a Phase-Locked Loop (PLL).

Frequency dividers are often implemented as a cascade of divide-by-two stages. In general, frequency dividers can be classified into regenerative and static type, depending on the principle of operation of the divide-by-two cell. The regenerative frequency divider (RFD), as shown in Fig. 2.25, is made up of a mixer with a periodic input signal having a frequency $f_{\text {in }}$ and an output signal at frequency $f_{\text {out }}$ [IIS89], [FBA90], [KUO92], [R98]. The latter one is fed back to the mixer, which generates two sidebands at frequencies $f_{\text {in }}-f_{\text {out }}$ and $f_{\text {in }}+f_{\text {out }}$, as well as the harmonics $2\left(f_{\text {in }} \pm f_{\text {out }}\right), 3\left(f_{\text {in }} \pm f_{\text {out }}\right)$, etc. The low-pass filter cuts off all harmonics, excepting that at the lowest frequency $f_{\text {in }}-f_{\text {out }}$, that is amplified by the successive amplifier ${ }^{4}$. At the steady state, the input signal frequency $f_{\text {in }}-f_{\text {out }}$ of the amplifier and its output frequency $f_{\text {out }}$ output are equal, thus leading to the divide-by-two behavior

$$
\begin{equation*}
f_{\text {out }}=\frac{f_{\text {in }}}{2} \tag{2.58}
\end{equation*}
$$

[^4]By following the same reasoning, it can easily be verified that the sideband at frequency $f_{\text {in }}+f_{\text {out }}$ is rejected by the feedback loop.


Fig. 2.25. Schematic of a regenerative frequency divider.

The maximum input frequency allowed by the regenerative divider depends on the speed of the mixer, or equivalently by the delay $\tau_{P D}$ of a XOR gate. To be more specific, the XOR gate implementing the mixer in Fig. 2.25 must be able to switch every half period of the output signal, i.e. every period of the input signal. Since the input period must be greater than or equal to the XOR delay, the maximum input frequency allowed by the dynamic divider is equal to

$$
\begin{equation*}
f_{i n, \max }=\frac{1}{\tau_{P D}} \tag{2.59}
\end{equation*}
$$

Strategies to design the Current-Mode XOR gate for a high speed or an efficient trade-off with power consumption will be addressed in Chapters 5 and 7 for the bipolar and CMOS gates, respectively.

Regenerative dividers require a further frequency constraint, in order to correctly operate according to (2.58), which is based on the assumption that harmonics at frequencies $2\left(f_{\text {in }} \pm f_{\text {out }}\right), 3\left(f_{\text {in }} \pm f_{\text {out }}\right)$, etc. are strongly attenuated by the mixer circuit. This can be achieved when all harmonics generated by the mixer are rejected, or equivalently the harmonics at the lowest frequency $2\left(f_{\text {in }}-f_{\text {out }}\right)$ leads to an output sideband at a greater than the maximum output frequency $f_{\text {out, max }}=f_{\text {in,max }} / 2$. At the steady state, harmonics $2\left(f_{\text {in }}-f_{\text {out }}\right)$ must be equal to the output frequency $f_{\text {out }}$ of the amplifier, thus its effect on the output is an undesired sideband at a frequency $3 f_{i n} / 2$. As a consequence, the loop rejects this sideband when the input frequency is kept high enough so that the $3 f_{i n} / 2>f_{i n, \text { max }} / 2$, which imposes a lower limit value $f_{i n, \text { min }}$ equal to

$$
\begin{equation*}
f_{i n, \text { min }}=\frac{f_{i n, \text { max }}}{3} \tag{2.60}
\end{equation*}
$$

Static frequency dividers are based on cascaded Master-Slave T flip-flops (T-FF), which switch every two edges of the periodic input signal applied to their clock input, thus performing a divide-by-two operation [FBA90], [K91], [KOS91], [IIT95]. In practical cases, T-FFs consist of two crosscoupled feedback D latches driven by opposite clock signals, as depicted in Fig. 2.26. After two clock edges, the input of each latch crosses both gates and turns to the opposite value, due to the inversion associated with the cross coupling.


Fig. 2.26. Schematic of a T-FF divide-by-two stage.

In contrast to regenerative dividers, static circuits are able to work at arbitrarily low input frequencies. This greater flexibility is achieved at a cost of a lower speed, as the maximum input frequency allowed is lowered by a factor of 2 . To understand this point, let us observe that the time needed by a latch to generate its output after the transition of the input signal is equal to the CK-Q latch delay $\tau_{P D}$ (i.e. the delay between the transition of the clock input and output). Thus, the time available to each latch (i.e. half an input period) must be greater than or equal to the CK-Q delay, in order to correctly generate the latch output. As a consequence, the maximum input frequency results to

$$
\begin{equation*}
f_{i n, \text { max }}=\frac{1}{2 \tau_{P D}} \tag{2.61}
\end{equation*}
$$

which is halved compared to (2.59), by assuming delay of XOR gate and D latch to be comparable (actually, as will be shown in Chapters 4-5, the latter is slower, thus the speed advantage of the RFD is greater than 2 ). Strategies
to improve speed in (2.61) and manage the trade-off with the power consumption will be discussed in Chapter 8.

In many RF applications, the frequency divider may also be required to have a programmable modulus $N$, as in the case of dual-modulus prescalers [SPW91], [CHL92], [MSO92], [SMS94], [VK95], [CW98], [HFP01], [KBW01]. In this case, the frequency divider is a Finite State Machine based on D flip-flops (D-FF) and combinational logic that updates the count and sets the modulus. Essentially, its speed depends on the speed of the D-FFs used, which in practical cases are of the Master-Slave type, shown in Fig. 2.27. The signal to be divided drives the clock terminals of the D latches, while the combinational logic properly sets their inputs.


Fig. 2.27. Schematic of a D-FF Master-Slave.

In sequential circuits, flip-flops affect the speed through two timing parameters: the CK-Q delay and the setup time $t_{\text {SETUP }}$ [R96]. By definition, the CK-Q delay of the D-FF in Fig. 2.27 is equal to the CK-Q delay $\tau_{P D}$ of the single latch, which will be modeled in Chapters 4 and 6 for a bipolar and CMOS latch, respectively. Instead, the flip-flop setup time (i.e. the amount of time before a clock transition in which the inputs must be kept constant) is equal to the D-Q delay of the single latch between the transition of $I N$ and the subsequent output (more details are provided in Chapter 8). Indeed, the input of the Slave D-FF at the falling clock edge is the output of the Master D-FF, which is generated after the input transition by the latch D-Q delay. As a consequence, input $I N$ must settle to the correct value a D-Q delay before the clock transition, i.e. the flip-flop setup time is equal to the (Master) latch D-Q delay.

### 2.6.2 Optic-fiber communications

Another important application field of Current-Mode logic is the implementation of integrated circuits for signal multiplexing/demultiplexing in optic-fiber systems, whose typical structure is depicted in Fig. 2.28 [ARL95], [195], [SR01], [R02].


Fig. 2.28. Block diagram of a fiber-optic link.

In Fig. 2.28, the E/O block is a semiconductor laser diode converting electrical signals to optical ones, while photodetector O/E performs the opposite conversion. To exploit the wide bandwidth of the optic-fiber channel, parallel input signals are serially transferred to the optic fiber through a multiplexer (MUX) at a clock rate $f$, and are amplified by an erbium-doped fiber amplifier (EDFA). Serial data crossing the optic fiber are then transferred in parallel through a demultiplexer (DEMUX). A clock recovery circuit is needed to resynchronize the clock signal of the receiver circuit to that of the transmitter. In practical cases, MUX as well as clockrecovery with DEMUX are implemented in a single chip to achieve a high speed. For current bipolar and CMOS technologies, data rate as high as 40 $\mathrm{Gb} / \mathrm{s}$ [RM96], [RDR01] and $10 \mathrm{~Gb} / \mathrm{s}$ [TUF01], [NIE03] have been achieved.

The schematic of an 8:1 MUX is reported in Fig. 2.29a, where each 2:1 MUX is implemented by the cell shown in Fig. 2.29b. Since the circuit in

Fig. 2.29a is driven at half the clock frequency, a retiming D-FF at the output node is added to reduce the jitter contribution due to a different from $50 \%$ clock duty cycle [SD93], even at the cost of a lower speed.


Fig. 2.29a. Block diagram of an 8:1 MUX.

The 8:1 MUX in Fig. 2.29a consists of 2:1 MUX cells connected in a tree-like fashion, according to three logic levels [L95]. The one providing the output signal works at full rate, while the other ones work at a halved speed with respect to the previous one. Such progressively halved operating frequencies are obtained from the (halved) clock signal through divide-bytwo dividers, the first of which could be regenerative, while the following ones are static. The $2: 1$ MUX cell in Fig. 2.29b consists of a Master-Slave D-FF for input IN1 and a Master-Slave-Slave D-FF for input IN2. The latter contains a further latch that delays the arrival of IN2 to the multiplexer by a half period, as required by the alternate selection of IN1 and IN2.

The schematic of a 1:8 DEMUX used in Fig. 2.28 is analogous to that of the MUX in Fig. 2.29a, with 1:2 DEMUX cells being connected in a treelike manner, as shown in Fig. 2.30a [L96], [LL96], [S96]. The schematic of the fundamental 1:2 DEMUX is depicted in Fig. 2.30b.


Fig. 2.29b. Schematic of a $2: 1 \mathrm{MUX}$ used in the $8: 1 \mathrm{MUX}$.


Fig. 2.30a. Block diagram of a 1:8 DEMUX.


Fig. 2.30b. Schematic of a 1:2 DEMUX used in the 1:8 DEMUX.

In the $1: 2$ DEMUX cell in Fig. 2.30b, the two output signals are taken from the output buffers, respectively. It is worth noting that a 1:2 DEMUX in Fig. 2.30a belonging to a logic level far from the output can have a much lower speed than the last one. The same consideration holds in MUX cells depicted in Fig. 2.29a.

### 2.6.3 High-resolution mixed-signal ICs

In general, logic circuits must satisfy assigned constraints in terms of speed, power consumption and silicon area. Moreover, additional requirements on the switching noise must be taken into account in the design process of CMOS mixed-signal ICs, which consist of both a digital and an analog section sharing the same substrate. Mixed-signal circuits are costeffective in a number of applications, such as video signal processing, magnetic disk recording channel processors, oversampled A/D and D/A converters [KCA90], [KA92]. In such cases, the switching noise generated by the logic gates couples with the analog circuitry and degrades its resolution.

Even though exhibiting an appealing speed performance, noise margin, ease of design, a low static power consumption and a low area, the CMOS static logic style generates a considerable amount of noise due to the supply current spikes needed during the switching of logic gates. Such current spikes determine voltage drops in parasitic resistors and inductors associated with the supply rails, bonding pads, bonding wires, package pins, as well as in the substrate resistance [ACKS93].

Until now, solutions at various levels of abstraction have been proposed to partially attenuate the effect of the switching noise on the operation of the analog circuits in mixed-signal ICs. From the technology point of view, using a silicon-on-insulator (SOI) or a highly-doped epitaxial wafer CMOS technology can reduce the amount of noise coupled with the analog circuitry, though increasing costs.

From a physical point of view, optimal floorplanning and a safe layout style can improve the immunity of analog circuits to switching noise [S02]. Regarding the floorplan, distance of analog blocks from digital one should be maximized. As far as the layout is concerned, some precautions should be taken, such as making substrate almost equipotential through widespread substrate contacts, as well as shielding analog blocks by means of diffused guardbands, as depicted in Fig. 2.31.

From a circuit point of view, analog circuits should be designed by exploiting differential topologies, since they are intrinsically more immune to (common-mode) external noise. Topologies with a high Power Supply

Rejection Ratio (PSRR) should also be used to attenuate the effect of the supply switching noise on analog signals.


Fig. 2.31. A mixed-signal IC and parasitic effects.

At the system level, the effect of switching noise can be mitigated by reducing the common impedance $Z_{\text {supply }}$ of the paths from the supply to the analog and digital sections. This is easily understood by observing that digital circuits basically affect operation of analog section through their supply current $I_{\text {digital }}$, which determines a voltage drop on the common impedance $Z_{\text {supply }}$ equal to $Z_{\text {supply }} \cdot \|_{\text {digital }}$, which perturbs the analog supply voltage. In practical cases, common impedance $Z_{\text {supply }}$ is minimized by using separate analog and digital power distribution networks, as well as separate bonding pads, bonding wires, package pins and printed circuit board runs. Moreover, the effect of the digital section on the analog circuitry is minimized by resorting to multiple pins and bonding wires to reduce their parasitic inductance, as well as to on-chip bypass capacitors, at the expense of silicon area and number of pads and pins.

When a very high resolution must be achieved (e.g. 16-18 bit), remedies that attenuate the transmission of the switching noise to analog circuits discussed above are not sufficient, and integration of both analog and digital blocks on the same chip requires the generated switching noise to be lowered. Thus, CMOS static logic is no longer an amenable solution, and alternative logic styles must be selected [SKD96], [NA97], [KH00]. To be more specific, switching noise is reduced by reducing digital supply current variations, and a logic style drawing a constant supply current would be highly desirable. Such property is achieved in Current-Mode logic, since each gate constantly requires a supply current equal to its bias current, as already discussed in the previous subsections. For this reason, CMOS Source Coupled Logic has been successfully used in high-resolution mixed-signal circuits [DKS90], [LWO91], [KDN91], [F97], [JMS97]. A quantitative evaluation of the switching noise produced by CMOS Current-Mode logic will be discussed in Section 3.5.

As discussed in the previous sections, Current-Mode logic gates can be implemented in a differential or in a single-mode fashion, of which the former allows for a better performance in terms of speed and power efficiency. Moreover, differential implementation allows logic gates for rejecting (common mode) supply disturbances, since they have the same effect on output nodes. This improvement in immunity to noise allows for reducing the noise margin requirement, and can represent an advantage in cases where the logic swing reduction is beneficial, which are addressed in Chapter 7.

It is worth noting that differential bipolar Current-Mode gates have the same features as CMOS gates in terms of switching noise and immunity to supply noise. This is exploited in RF circuits operating at very high frequencies, at which impedance of parasitic capacitances significantly lowers, thus making transmission of spurious signals easier. Thanks to a better noise immunity, noise margin (and hence logic swing) can be reduced, thereby enhancing speed and power-delay trade-off performance, as will be discussed in Chapter 5.

Until now, Current-Mode logic gates have been presented and analyzed in terms of their principle of operation and static behavior, as well as from an application point of view, reporting some specific and widely used logic gates. In the next chapter, techniques to implement arbitrary logic functions with a minimum amount of hardware are reviewed.

## Chapter 3

## DESIGN METHODOLOGIES FOR COMPLEX CURRENT-MODE LOGIC GATES

In this chapter, general methodologies to map a given logic function into a series gate are discussed and applied to practical examples. After introducing some basic concepts, a graphic method [CJ89], its analytical formulation and a strategy based on VEMs [MKA92] are described by assuming a given input variables ordering. Finally, guidelines to properly choose the input ordering are discussed.

### 3.1 BASIC CONCEPTS ON THE DESIGN OF A SERIES GATE

Let us consider a given $n$-variable logic function $F\left(X_{1} \ldots X_{n}\right)$ of input signals $X_{1}, \ldots, X_{n}$. According to Fig. 2.11, this function is implemented in a series gate by properly choosing the topology of its npn/NMOS network, which consists of stacked emitter/source coupled transistor pairs. In the following, for the sake of brevity we will refer to only bipolar gates, and extension to MOS circuits is immediately obtained by substituting an NMOS transistor to each npn bipolar transistor.

In general, the npn network in Fig. 2.11 must provide a unique conductive path from $I_{S S}$ to the output nodes for all possible input values, as discussed in Section 2.5.1. For this reason, emitter-coupled pairs are not allowed to share their emitter node with other transistor pairs (indeed, if two transistor pairs shared their emitter, the two conducting transistors - one for each pair -would draw the same current, thus determining two different current paths). From this observation, the current source $I_{S S}$ must be connected to the emitter of only one transistor pair Q1-Q2. Analogously, each of the two collector nodes of Q1-Q2 is connected to the emitter of
another transistor pair which is stacked to the first one, i.e. it lies in an upper logic level. By iterating this reasoning, a tree of stacked emitter-coupled pairs is obtained, whose transistor pairs at the highest level have their collector node connected to one of the two output nodes. Note that the collector nodes of two different transistors may be connected to the same transistor pair.

To understand how the npn network topology is related to the function $F\left(X_{1} \ldots X_{n}\right)$ to be implemented, let us assume in the following that input $X_{1}$ is applied to transistors belonging to the highest level (i.e. the first one), input $X_{2}$ is applied to the immediately lower level (i.e. the second level) and so on, up to input $X_{n}$ associated with the lowest level ${ }^{1}$ (i.e. the last one), as depicted in Fig. 3.1.


Fig. 3.1. Correspondence among input signals and logic levels.

By assuming $\alpha_{F} \approx 1$, the collector current of the generic transistor Qa in a series gate can be associated with a logic level $I_{\mathbb{Q} a}$ equal to 0 (when it is

[^5]equal to zero) or 1 (when it is equal to $I_{S S}$ ). The same consideration holds for the logic level $I_{O}$ (or $\overline{I_{O}}$ ) associated with the current flowing through the load resistance connected to the output node $v_{o}$ (or $\overline{v_{o}}$ ). It is worth noting that, since the output voltage $v_{o}$ is low when the output current logic level $I_{O}$ is high, the logic level $V_{O}$ associated with output voltage is the complement of that associated with the output current
\[

$$
\begin{equation*}
V_{o}=\overline{I_{O}} \tag{3.1}
\end{equation*}
$$

\]

From this relationship, a simple method can be derived to evaluate the function $F\left(X_{1} \ldots X_{n}\right)$ implemented by an assigned npn network topology, as will be discussed in the following subsection.
3.1.1 Evaluation of function $F\left(X_{1}, \ldots, X_{n}\right)$ implemented by a given topology

In general, in an $n$-level series gate, the collector current $I_{Q a}$ of transistor Qa at the $j$-th level is high only if a conductive path between this node and the current source exists. This conductive path, which will be referred to as an active path in the following, consists of at most $(n-j+1)$ stacked transistors driven by input signals $x_{j} \ldots x_{n}$ which are all at the high level (in the following, lower case variables $x_{i}$ will be used to refer to a literal, $X_{i}$, or its complement, $\overline{X_{i}}$ )

$$
\begin{equation*}
\left(x_{j} \cdot x_{j+1} \cdot \ldots \cdot x_{n}\right)_{\text {active_path }}=1 \tag{3.2}
\end{equation*}
$$

From (3.2), an active path is unambiguously associated with the product of literals $x_{j} \ldots x_{n}$ resulting to 1 (i.e. the set of input signals driving the transistors of the active path), since there is always one active path at a time for each input value. For the sake of clarity, some examples are reported in Fig. 3.2. According to these considerations, $I_{Q a}$ is equal to 1 if the first, or the second, etc. among all possible paths from the collector node of transistor Qa to the current source is active. Therefore, $I_{Q a}$ is analytically expressed by the OR of the products of literals $x_{j} \ldots x_{n}$ driving the transistors of all possible paths starting from the collector of transistor Qa to the current source.

By reiterating the same reasoning for the current $\overline{I_{O}}$ flowing through the load resistance connected to the output node $\bar{v}_{o}$, from (3.1) it follows that function $F\left(X_{1} \ldots X_{n}\right)$ is equal to the OR of the products of literals $x_{1} \ldots x_{n}$ associated with (i.e., driving the transistors of) all possible paths from the
output node $\overline{v_{o}}$ to the current source. Obviously, the same result is obtained by analytically expressing the current $I_{O}$ flowing through the load resistance connected at the output node $v_{o}$, and then complementing the result, from eq. (3.1). In practical cases, of the two currents $\overline{I_{O}}$ and $I_{O}$, it is convenient to evaluate that having a lower number of paths to the current source, when identifying the function implemented by a given topology.


Fig. 3.2. Correspondence of product of literals and active paths.

As an example, let us evaluate the function implemented by the series gate in Fig. 2.13. To simplify calculations, it is convenient to evaluate the boolean expression of current $\overline{I_{O}}$ rather than $I_{O}$, since in the former case there is only one possible active path (i.e. Q3-Q1), while in the latter case there are two possible active paths (i.e. Q4-Q1 and Q2). The only possible active path $\mathrm{Q} 3-\mathrm{Q} 1$ connecting the output node $\overline{v_{o}}$ and the current source
consists of transistors driven by signals $B$ and $A$, which is associated with the product $A B$, hence function $F(A, B)$ results in

$$
\begin{equation*}
F(A, B)=\overline{I_{O}}=A \cdot B \tag{3.3a}
\end{equation*}
$$

where relationship (3.1) was used. Of course, the same expression is obtained by expressing current $I_{O}$ and then complementing it. Indeed, the first path Q4-Q1 is associated with the product $A \cdot \bar{B}$, while the second path Q 2 is associated with $\bar{A}$, thus function $I_{O}$ results in the OR of these terms, yielding

$$
\begin{equation*}
F(A, B)=\overline{I_{O}}=\overline{A \cdot \bar{B}+\bar{A}}=(\overline{A \cdot \bar{B}}) \cdot A=(\bar{A}+B) \cdot A=A \cdot B \tag{3.3b}
\end{equation*}
$$

where the De Morgan laws were applied.
Even though the case of a combinational series gate has been discussed until now, extension to sequential blocks is straightforward. Indeed, in practical cases, Current-Mode sequential blocks are frequently D latches (Fig. (2.18)) or Master-Slave D-FF (Fig. (2.27)), which have already been described. In addition, in the infrequent cases where combinational logic is embedded into a sequential series gate, the same topology as the D latch gate is still used, since it is sufficient to replace the only combinational emitter coupled pair Q3-Q4 in Fig. (2.18) by the desired combinational function $F\left(X_{1} \ldots X_{n}\right)$, as depicted in Fig. 3.3.

According to Fig. 3.3, the resulting series gate has $(n+1)$ levels. For the above considerations, only combinational functions will be discussed in the following, without loss of generality.

Until now, the identification of the function implemented by an assigned circuit topology has been addressed. However, in practical design cases, one has to identify the topology which implements a given boolean function $F\left(X_{1} \ldots X_{n}\right)$. Basic concepts on this design aspect will be introduced in the following subsection.


Fig. 3.3. Sequential series gate with embedded combinational logic.

### 3.1.2 Series-gate implementation of an assigned function $F\left(X_{1} \ldots X_{n}\right)$

The unambiguous correspondence discussed in Section 3.1.1 between products of literals $x_{1} \cdots \cdot x_{n}$ and active paths from output node $\overline{v_{o}}$ (or $v_{o}$ ) to the current source provides a simple method to build an npn network which implements a given combinational function $F\left(X_{1} \ldots X_{n}\right)$. Indeed, each of the $2^{n}$ possible input values is associated with the unique product $x_{1} \cdot x_{2} \cdot \ldots \cdot x_{n}$ being equal to 1 , in which each literal is complemented if the correspondent input bit is 0 (for example, input 0110 is associated with product $\overline{X_{1}} \cdot X_{2} \cdot X_{3} \cdot \overline{X_{4}}$ ). This product $x_{1} \cdot x_{2} \cdot \ldots \cdot x_{n}$ is in turn associated with a unique active path consisting of transistors driven by the correspondent literals. Therefore, an unambiguous correspondence between input values and stacked transistors' paths exists.

From the previous considerations, an arbitrary combinational function can be built by implementing an npn network having all transistor paths associated with the $2^{n}$ possible inputs (or, equivalently, the correspondent literal products) and then properly connecting each of the upper collector branch to $\overline{v_{o}}$ or $v_{o}$ to set the output to the desired value. Assuming input $X_{j}$ to be applied to the $j$-th level with $j=1 \ldots n$ as in Fig. 3.1, such an npn network has a tree-like structure with an emitter coupled pair connected to each
collector node of transistors lying at the lower level [CJ89], thereby doubling the number of transistors in a logic level compared to the lower level. For the sake of clarity, such a general npn network is shown in Fig. 3.4 for an arbitrary 3 -variable function $F\left(X_{1}, X_{2}, X_{3}\right)$. In this figure, it is possible to identify all paths associated with the 8 possible input values. For example, the path Q7-Q3-Q1 is associated with ( 000 ), Q8-Q3-Q1 is associated with (100), and so on.


Fig. 3.4. General topology implementing an arbitrary 3-variable function.

To map the value of a boolean function $F$ to the series-gate output voltage $v_{o}$ (or equivalently its current $\overline{I_{o}}$ ), a proper choice of connections of collector nodes of the highest transistors to one of the two output nodes (represented by the gray box in Fig. 3.4) is needed. To understand this point, it is useful to express function $\overline{I_{O}}=V_{O}=F\left(X_{1} \ldots X_{n}\right)$ in the standard sum-ofproduct form [C95], i.e. as the OR of products including all literals $x_{1} \cdot x_{2} \cdot \ldots \cdot x_{n}$ (usually referred to as minterms). Since, as shown in Section
3.1.1, $\overline{I_{O}}$ is equal to the OR of the products $x_{1} \cdot x_{2} \cdot \ldots \cdot x_{n}$ associated with all possible paths from $\overline{v_{o}}$ to the current source (i.e. the paths whose transistor at the highest level has its collector terminal connected to the output node $\overline{v_{o}}$ ), a given function $\overline{I_{o}}=F\left(X_{1}, \ldots, X_{n}\right)$ is simply obtained by identifying the transistor paths associated with minterms and then connecting the collector node of their highest transistor to the node $v_{o}$. The collector terminal of the other transistors at the highest level must be connected to the output node $v_{o}$.

Obviously, the same series gate topology is obtained by considering the current $I_{O}$ rather than $\overline{I_{O}}$, with the collectors of the highest transistors belonging to paths associated with minterms of $I_{U}$ being connected to node $v_{o}$, and all other collectors being connected to node $\overline{v_{o}}$. In practical cases, it is convenient to implement the function having the lowest number of minterms (i.e. lower than or equal to $2^{n-1}$ ) between $I_{O}$ and $\overline{I_{o}}$.

As an example, let us consider the implementation of the 4 -variable function $F\left(X_{1}, X_{2}, X_{3}, X_{4}\right)$ reported in the truth table in Table 3.1. From inspection of this table, the function $\overline{I_{O}}=F$ has 6 minterms, which is lower than $2^{4-1}=8$, thus it is convenient to implement current $\overline{I_{O}}$ rather than $I_{O}$.

According to Table 3.1, the standard sum-of-product form of function $F$ is

$$
\begin{align*}
F\left(X_{1}, X_{2}, X_{3}, X_{4}\right)= & \overline{X_{1} X_{2}} X_{3} \overline{X_{4}}+X_{1} \overline{X_{2} X_{3}} X_{4}+X_{1} X_{2} \overline{X_{3} X_{4}}+  \tag{3.4}\\
& +X_{1} X_{2} \overline{X_{3}} X_{4}+X_{1} X_{2} X_{3} \overline{X_{4}}+X_{1} X_{2} X_{3} X_{4}
\end{align*}
$$

Implementation of this function starts from the general 4 -variable npn network topology in Fig. 3.5a, where connections of collector terminals have to be specified. In this figure, the paths associated with the 6 minterms are highlighted with a gray line.

According to Table 3.1, or equivalently to relationship (3.4), function $F$ is specified by connecting the collectors of transistors Q19, Q24, Q18, Q26, Q 22 and Q 30 to the output node $\overline{v_{o}}$, and the remaining collectors to the output node $v_{o}$. The resulting series gate is reported in Fig. 3.5b.

TABLE 3.1

| $X_{1} X_{2} X_{3} X_{4}$ | associated <br> minterm | $F\left(X_{1}, X_{2}, X_{3}, X_{4}\right)$ | $\overline{F\left(X_{1}, X_{2}, X_{3}, X_{4}\right)}$ |
| :---: | :---: | :---: | :---: |
| 0000 | $\overline{X_{1} X_{2} X_{3} X_{4}}$ | 0 | 1 |
| 0001 | $\overline{X_{1} X_{2} X_{3} X_{4}}$ | 0 | 1 |
| 0010 | $\overline{X_{1} X_{2} X_{3}} \overline{X_{4}}$ | 1 | 0 |
| 0011 | $\overline{X_{1} X_{2}} X_{3} X_{4}$ | 0 | 1 |
| 0100 | $\overline{X_{1} X_{2}} \overline{X_{3} X_{4}}$ | 0 | 1 |
| 0101 | $\overline{X_{1} X_{2}} \overline{X_{3} X_{4}}$ | 0 | 1 |
| 0110 | $\overline{X_{1} X_{2} X_{3}} \overline{X_{4}}$ | 0 | 1 |
| 0111 | $\overline{X_{1} X_{2} X_{3} X_{4}}$ | 0 | 1 |
| 1000 | $X_{1} \overline{X_{2} X_{3} X_{4}}$ | 0 | 1 |
| 1001 | $X_{1} \overline{X_{2} X_{3} X_{4}}$ | 1 | 0 |
| 1010 | $X_{1} \overline{X_{2} X_{3}} \overline{X_{4}}$ | 0 | 1 |
| 1011 | $X_{1} \overline{X_{2} X_{3} X_{4}}$ | 0 | 1 |
| 1100 | $X_{1} X_{2} \overline{X_{3} X_{4}}$ | 1 | 0 |
| 1101 | $X_{1} X_{2} \overline{X_{3} X_{4}}$ | 1 | 0 |
| 1110 | $X_{1} X_{2} X_{3} \overline{X_{4}}$ | 1 | 0 |
| 1111 | $X_{1} X_{2} X_{3} X_{4}$ | 1 | 0 |

Inspection of Fig. 3.5b shows that the resulting topology has many useless transistors, and thus can be significantly simplified to reduce the gate area and delay. For example, the emitter coupled pairs with short-circuited collectors Q15-Q16 and Q27-Q28 do not perform any logic operation, since they always steer the emitter current to the same output branch, regardless of their driving input value $X_{1}$. Accordingly, the two emitter coupled pairs can be deleted without affecting the gate operation. This also shows that in general a given boolean function can be implemented by different series-gate topologies.


Fig. 3.5a. General topology implementing a 4-variable function.


Fig. 3.5b. Series gate topology implementing function $F$ in eq. (3.4).

### 3.1.3 Limitations of the general series-gate design approach

The series-gate design strategy discussed in the previous subsection is general and systematic, but is also very inefficient in terms of transistor count. Indeed, as a result, in an $n$-level series gate the npn network has an overall number of transistors given by

$$
\begin{equation*}
\sum_{j=1}^{n} 2^{j}=2^{n+1}-2 \approx 2^{n+1} \tag{3.5}
\end{equation*}
$$

which exponentially increases as increasing the number of logic levels. This rapid increase in the transistor count is highly undesirable in terms of silicon area and speed performance (due to the higher parasitic capacitances, as will be discussed in Chapter 4-6). Furthermore, as was shown in the example in Section 3.1.2, the design strategy presented leads to an npn network having an unnecessarily high number of transistors. Therefore, to avoid such transistor redundancy, it is essential to use alternative design strategies that are capable of implementing an arbitrary function in a series gate having a minimum transistor count. In the following, two different approaches proposed in the literature will be considered to minimize the series gate transistor count, a graphical procedure (Section 3.2), along with its analytical formulation (Section 3.3) and a VEM-based approach (Section 3.4). Moreover, in an $n$-level series gate there are many different possible input orderings, and their choice will be dealt with in Section 3.5 for different design goals.

### 3.2 A GRAPHICAL REDUCTION METHOD

Among the minimization techniques proposed in the literature, such as [CP86], [CJ89] and [MKA92], the graphical approach introduced in [CJ89] is discussed in this section.

### 3.2.1 Basic concepts on the graphical approach in [CJ89]

The design strategy in [CJ89] is based on the observation that redundant emitter-coupled pairs can often be found in the general series-gate npn network. To understand this point, consider the case where the general npn network topology contains two equal subcircuits $N_{1}$ and $N_{2}$ (connected to the same output nodes) being driven by two different currents $i_{1}$ and $i_{2}$, as depicted in Fig. 3.6a. It is apparent that the two subcircuits can be simplified
into a single network $N_{12}$ (equal to $N_{1}$ and $N_{2}$ ) driven by the current $\left(i_{1}+i_{2}\right)$, as depicted in Fig. 3.6b.

The minimization method in [CJ89] consists of two formal ways to eliminate redundant emitter-coupled pairs in a general tree npn network topology (see Figs. 3.4-3.5a), and both of them are based on the circuit simplification described in Fig. 3.6b. To be more specific, let us consider an emitter-coupled pair Q1-Q2 driven by input $X_{j}$ with the two collector terminals connected to two identical subcircuits $N_{1}$ and $N_{2}$, as depicted in Fig. 3.7a. Such an emitter-coupled pair is defined as a complementary pair in [CJ89].


Fig. 3.6a. Equal subcircuits driven by two different currents.


Fig. 3.6b. Simplification of two equal subcircuits into a single one.

By applying the simplification in Fig. 3.6, the two subcircuits can be lumped into the single network $N_{12}$ driven by the emitter-coupled pair Q1Q2, whose collector terminals are short-circuited, as in Fig. 3.7b. In this case, assuming $\alpha_{t} \approx 1$, the current $i_{C 1}+i_{C 2}$ provided to the network $N_{12}$ is equal to the input current $i_{x}$ of the transistor pair Q1-Q2, regardless of the value of the input $X_{j}$. As a consequence, the emitter-coupled pair $\mathrm{Q} 1-\mathrm{Q} 2$ can be eliminated, as shown in Fig. 3.7b. This simplification step is called CPE (Complementary Pair Elimination) and is performed for all transistor pairs belonging to the same level at a time, usually starting from the top level and ending at the lowest level. However, the same simplifications are obtained by starting from the latter one and ending at the top level.


Fig. 3.7a. Emitter-coupled pair driving two identical subcircuits.

The second elimination step is the NPE (Normal Pair Elimination), and is applied to normal pairs, which are defined as emitter-coupled pairs lying in the same series-gate level. To understand this simplification, let us consider the two normal pairs Q1-Q2 and Q3-Q4 in Fig. 3.8a driving equal subcircuits $N_{1}$ and $N_{2}$, and being driven by the two different currents $i_{x}$ and $i_{y}$, respectively. According to Fig. 3.6, the two normal pairs can be lumped into a single emitter-coupled pair driven by the sum of the currents $i_{x}+i_{y}$, as depicted in Fig. 3.8b. This elimination of normal pairs (NPE) is usually performed for all transistor pairs belonging to the same level at a time, starting from the top level and ending at the lowest level. However, the same
simplifications are obtained by starting from the latter one and ending at the top level. Obviously, the NPE can be applied to all levels with the exception of the bottom level, since it has not subcircuits lying at a lower level.


Fig. 3.7b. Complementary Pair Elimination (CPE) of Q1-Q2.


Fig. 3.8a. Normal pairs driving the same subcircuit.

This minimization strategy was developed by applying the CPE to each logic level starting from the top and ending at the lowest level, and then applying the NPE to the same levels. It is worth noting that the two simplifications may be applied in a different order without changing the topology of the resulting circuit. To better understand the procedure, a practical example is presented in the following subsection.


Fig. 3.8b. Normal Pair Elimination of Q3-Q4.

### 3.2.2 A design example

Let us consider the series-gate implementation of the function $F$ expressed by eq. (3.4).

As a preliminary step, the general 4-variable npn network topology in Fig. 3.5a has to be built. Subsequently, the CPE must be applied to each transistor level. In the following, the CPE (NPE) applied to the $j$-th level will be referred to as CPEj (NPEj). When the CPE1 is applied, the two emittercoupled pairs Q15-Q16 and Q27-Q28 with short-circuited collectors are substituted by a short circuit, according to Fig. 3.7b. The resulting circuit is depicted in Fig. 3.9a, where the CPE2 can be applied to eliminate the complementary pairs Q11-Q12 and Q25-Q26.

The simplified circuit after applying the CPE2 is depicted in Fig. 3.9b, where it can be noticed that no further simplification can be achieved by applying CPE3 and CPE4, since there are no other complementary pairs driving equal subcircuits.

Now, the NPE1 has to be applied to Fig. 3.9b, where normal pairs Q17Q18, Q21-Q22, Q23-Q24 and Q29-Q30 drive the same subcircuit (in the specific case of the first level, it consists of the connections to the output nodes) and thus can be lumped into Q17-Q18, thereby connecting the collector terminals of the driving transistors Q8, Q10, Q5, Q14 to the emitter of Q17-Q18. The resulting simplified circuit is reported in Fig. 3.9c.


Fig. 3.9a. Series gate topology after CPE1.

The circuit in Fig. 3.9c can be further simplified by applying the NPE2, since normal pairs Q7-Q8 and Q13-Q14 drive equal subcircuits (the collector of the left-hand transistor directly drives the output node $v_{o}$, and the right-hand transistor drives the transistor pair Q17-Q18). Therefore, the two emitter-coupled pairs can be lumped into the transistor pair Q13-Q14, after connecting the collector of Q6 to that of Q3, as depicted in Fig. 3.9d. Since no normal pairs driving the same subcircuit exist, no further simplification is
obtained by applying the NPE3, thus the circuit in Fig. 3.9d is the minimized series gate implementing the assigned function $F$, which is depicted in a more compact manner in Fig. 3.9e.


Fig. 3.9b. Series gate topology after CPE2.


Fig. 3.9c. Series gate topology after NPE1.


Fig. 3.9d. Series gate topology after NPE2.


Fig. 3.9e. Final simplified circuit implementing function $F$.

Comparison of Figs. 3.5-3.9e shows that the transistor count has been reduced from 30 to 14 through the procedure described in Section 3.2.1, which is a significant advantage in terms of area, and is also beneficial in terms of delay.

### 3.3 AN ANALYTICAL FORMULATION OF THE DESIGN STRATEGY IN [CJ89]

The minimization strategy discussed in the previous subsection is based on a graphical approach leading to the elimination of redundant emittercoupled pairs through the two basic steps CPE and NPE. However, analytical minimization procedures are usually preferred since they can be applied with less effort, and also afford a better understanding of the concepts on which the strategy is based. In the following, an interpretation of CPE and NPE simplification steps is discussed to derive an analytical formulation of the strategy presented in Section 3.2.

### 3.3.1 Analytical interpretation of CPE/NPE

An analytical interpretation of the CPE can be derived by exploiting the correspondence of each path from the output node $\overline{v_{o}}$ to the current source with a unique minterm. This minterm has the same literals of the inputs driving the transistors belonging to the path, as discussed in Section 3.1.2. Furthermore, it is useful to observe that the npn network implements both boolean functions $F$ and $\bar{F}$, hence subcircuit $N_{1}$ in Fig. 3.7a implements some minterms of both functions. To be more specific, subcircuit $N_{1}$ implements all minterms that can be expressed as $g\left(X_{1} \ldots X_{j-1}\right) \cdot \overline{X_{j}} \cdot i_{x}$, and all minterms of $\bar{F}$ which can be expressed as $\overline{g\left(X_{1} \ldots X_{j-1}\right)} \cdot \overline{X_{j}} \cdot i_{x}$ (being $g$ a function of $X_{1} \ldots X_{j-1}$, and $i_{x}$ the product of the literals $x_{j+1} \ldots x_{n}$ associated with the path from the emitter of Q1-Q2 to the current source). Moreover, from Fig. 3.7a, $N_{2}$ implements all minterms of $F$ and $\bar{F}$ which are expressed by $g\left(X_{1} \ldots X_{j-1}\right) \cdot X_{j} \cdot i_{x}$ and $\overline{g\left(X_{1} \ldots X_{j-1}\right)} \cdot X_{j} \cdot i_{x}$, respectively. Therefore, lumping subcircuits $N_{1}$ and $N_{2}$ into $N_{12}$ is analytically equivalent to collecting all pairs of minterms of $F$ (as well as $\bar{F}$ ) containing the product $\overline{X_{j}} \cdot x_{j+1} \cdot \ldots \cdot x_{n}$ and $X_{j} \cdot x_{j+1} \cdot \ldots \cdot x_{n}$, respectively, into a single term

$$
\begin{equation*}
g\left(X_{1} \ldots X_{j-1}\right) \cdot\left(\overline{X_{j}}+X_{j}\right) \cdot i_{x}=g\left(X_{1} \ldots X_{j-1}\right) \cdot i_{x} \tag{3.6}
\end{equation*}
$$

where simplification $\overline{X_{j}}+X_{j}=1$ justifies the elimination of transistors Q1Q2 in Fig. (3.7b). Obviously, the CPEj leads to a simplification only if $N_{1}$ and $N_{2}$ are equal, i.e. if for each term of $F$ (or $\bar{F}$ ) including product $\overline{X_{j}} \cdot x_{j+1} \cdot \ldots \cdot x_{n}$ there is a corresponding one containing $X_{j} \cdot x_{j+1} \cdot \ldots \cdot x_{n}$. Instead, when at least a product without the corresponding one exists, the CPEj does not lead to any simplification in the circuit under design.

In regard to NPE, its analytical interpretation is easily found by inspecting Figs. 3.8aa-3.8b, in which transistor pairs Q1-Q2 and Q3-Q4 are lumped in a single transistor pair Q1-Q2, and equal subcircuits $N_{1}-N_{2}$ are lumped into a single subcircuit $N_{12}$. Subcircuit $N_{1}$ in Fig. 3.8a implements all minterms of $F$ that can be expressed as $g\left(X_{1} \ldots X_{j}\right) \cdot \dot{i}_{x}$, as well as all minterms of $\bar{F}$ which can be expressed as $\overline{g\left(X_{1} \ldots X_{j}\right)} \cdot i_{x}$. Analogously, subcircuit $N_{2}$ implements all minterms of $F$ and $\bar{F}$ that can be expressed as $g\left(X_{1} \ldots X_{j}\right) \cdot i_{y}$ and $\overline{g\left(X_{1} \ldots X_{j}\right)} \cdot i_{y}$, respectively. Therefore, the NPEj simplification, which allows for lumping transistors Q1-Q2 and Q3-Q4 into Q1-Q2 and equal subcircuits $N_{1}-N_{2}$ into $N_{12}$, is analytically equivalent to collecting all products having equal literals $x_{1}, \ldots, x_{j}$ and different $x_{j+1}, \ldots, x_{n}$

$$
\begin{equation*}
g\left(X_{1} \ldots X_{j}\right) \cdot i_{x}+g\left(X_{1} \ldots X_{j}\right) \cdot i_{y}=g\left(X_{1} \ldots X_{j}\right) \cdot\left(i_{x}+i_{y}\right) \tag{3.7}
\end{equation*}
$$

As observed for the CPE, the NPEj leads to such a simplification only if $N_{1}$ and $N_{2}$ are equal, i.e. if for each term of $F$ (or $\bar{F}$ ) in the form $g\left(X_{1} \ldots X_{j}\right) \cdot i_{x}$ (being $i_{x}$ a product of $x_{j+1} \ldots x_{n}$ ) there is a corresponding one equal to $g\left(X_{1} \ldots X_{j}\right) \cdot i_{y}$ (being $i_{y}$ a different product of $x_{j+1} \cdot \ldots \cdot x_{n}$ ). Equivalently, the simplification of two minterms $g\left(X_{1} \ldots X_{j}\right) \cdot i_{x}$ and $g\left(X_{1} \ldots X_{j}\right) \cdot i_{y}$ is possible only if there is not any other minterm having the same $i_{x}$ (or, equivalently, $i_{y}$ ) but different literals $x_{1} \ldots x_{j}$.

### 3.3.2 Analytical simplification through CPE/NPE: an example

In this subsection, function $F$ in (3.4) is simplified through CPE and NPE in an analytical manner, according to their interpretation discussed in the previous subsection. For the sake of compactness, the AND operator "." will be omitted.

When applying CPE1 to function $F$ in (3.4), no simplification is obtained, since there are no minterms differing for only literal $x_{1}$. From a topological point of view, this means that it is not possible to simplify any transistor pair
connected to $\overline{v_{o}}$, as confirmed in Figs. 3.5b and 3.9a. Instead, CPE2 allows for simplifying terms $X_{1} \overline{X_{2} X_{3}} X_{4}$ and $X_{1} X_{2} \overline{X_{3}} X_{4}$, since they differ for only literal $x_{2}$ and there are no other terms including product $\overline{X_{3}} X_{4}$. According to relationship (3.6), after applying CPE2, function $F$ becomes

$$
\begin{align*}
F\left(X_{1}, X_{2}, X_{3}, X_{4}\right)= & X_{1} \overline{X_{3}} X_{4}+\overline{X_{1}} \overline{X_{2}} X_{3} \overline{X_{4}}+X_{1} X_{2} \overline{X_{3}} \overline{X_{4}}  \tag{3.8}\\
& +X_{1} X_{2} X_{3} \overline{X_{4}}+X_{1} X_{2} X_{3} X_{4}
\end{align*}
$$

This is equivalent to lumping transistor pairs Q23-Q24 and Q25-Q26 into Q23-Q24 and subsequently eliminating the complementary pair Q11-Q12, as in Fig. (3.9b). In regard to CPE3, it does not lead to any simplification into function $F$, since in (3.8) there are no pairs of products differing for only literal $x_{3}$. Even though there are two terms differing for $x_{4}$, i.e. $X_{1} X_{2} X_{3} \overline{X_{4}}$ and $X_{1} X_{2} X_{3} X_{4}$, CPE4 does not simplify (3.8), as there are other terms in the form $x_{1} x_{2} x_{3} X_{4}$ which do not have a corresponding term $x_{1} x_{2} x_{3} \overline{X_{4}}$. The resulting topology after CPE is that in Fig. 3.9b.

After simplifying $F$ through CPEs, let us apply the NPE simplification. In particular, NPE1 applied to function $F$ in (3.8) leads to

$$
\begin{align*}
F\left(X_{1}, X_{2}, X_{3}, X_{4}\right) & =X_{1}\left(\overline{X_{3}} X_{4}+X_{2} \overline{X_{3} X_{4}}+X_{2} X_{3} \overline{X_{4}}\right. \\
& \left.+X_{2} X_{3} X_{4}\right)+\overline{X_{1}}\left(\overline{X_{2}} X_{3} \overline{X_{4}}\right) \tag{3.9}
\end{align*}
$$

which is equivalent to lumping Q17-Q18, Q21-Q22, Q23-Q24 and Q29-Q30 into the transistor pair Q17-Q18 driven by transistors Q8, Q10, Q5 and Q14, as in Fig. 3.9c. Analogously, NPE2 allows for collecting terms having equal literals $x_{1}-x_{2}$ and different other literals. In particular, let us consider function $F$ after NPE1 in (3.9), in which three terms including the same factor $X_{2}$ could potentially be collected, i.e. $X_{1} X_{2} \overline{X_{3}} \overline{X_{4}}, \quad X_{1} X_{2} X_{3} X_{4}$ and $X_{1} X_{2} X_{3} \overline{X_{4}}$, even though only the first two can be collected to simplify the circuit. With referral to the previously introduced notation, this can be understood by observing that the three terms have the common factor $g\left(X_{1}, X_{2}\right)=X_{1} X_{2}$ and differ for other literals through products $i_{x}=\overline{X_{3} X_{4}}$, $i_{y}=X_{3} X_{4}$ and $i_{z}=X_{3} \overline{X_{4}}$, respectively. The first two terms can be collected according to (3.7), while $X_{1} X_{2} X_{3} \overline{X_{4}}$ cannot be simplified since there is the
other minterm $\overline{X_{1}} \overline{X_{2}} X_{3} \overline{X_{4}}$ having the same $i_{2}=X_{3} \overline{X_{4}}$ but different literals $x_{1} x_{2}$. Accordingly, after applying NPE2, (3.9) simplifies into

$$
\begin{align*}
F\left(X_{1}, X_{2}, X_{3}, X_{4}\right) & =X_{1}\left[X_{2}\left(\overline{X_{3}} \overline{X_{4}}+X_{3} X_{4}\right)+X_{2} X_{3} \overline{X_{4}}\right. \\
& \left.+\overline{X_{3}} X_{4}\right]+\overline{X_{1}}\left[\overline{X_{2}}\left(X_{3} \overline{X_{4}}\right)\right] \tag{3.10}
\end{align*}
$$

which is topologically equivalent to lumping transistor pairs Q7-Q8 and Q13-Q14 into the emitter-coupled pair Q7-Q8 driven by transistors Q3 and Q6, as reported in Fig. 3.9d. Since in (3.10) there are no other terms in brackets which include the same factor $X_{3}\left(\overline{X_{3}}\right)$, NPE3 does not lead to any further simplification.

Even though function $F$ was manipulated through CPE-NPE in this example, it is apparent that the same topological simplifications are obtained by simplifying function $\bar{F}$ rather than $F$. Therefore, of the two functions, in practical cases it is convenient to manipulate that having the lowest number of minterms. For the sake of clarity, simplification of function $\bar{F}$ is also discussed in the following, by starting from its standard sum-of-product form obtained from Table 3.1

$$
\begin{align*}
\bar{F}\left(X_{1}, X_{2}, X_{3}, X_{4}\right)= & \overline{X_{1} X_{2} X_{3} X_{4}}+\overline{X_{1}} \overline{X_{2}} \overline{X_{3}} X_{4} \\
& +\overline{X_{1}} \overline{X_{2}} X_{3} X_{4}+\overline{X_{1}} X_{2} \overline{X_{3}} \overline{X_{4}} \\
& +\overline{X_{1}} X_{2} \overline{X_{3}} X_{4}+\overline{X_{1}} X_{2} X_{3} \overline{X_{4}}  \tag{3.11}\\
& +\overline{X_{1}} X_{2} X_{3} X_{4}+X_{1} \overline{X_{2}} \overline{X_{3}} \overline{X_{4}} \\
& +X_{1} \overline{X_{2}} X_{3} \overline{X_{4}}+X_{1} \overline{X_{2}} X_{3} X_{4}
\end{align*}
$$

From inspection of this expression, CPE1 simplifies minterms $X_{1} \overline{X_{2}} \overline{X_{3}} X_{4}$ and $\overline{X_{1}} \overline{X_{2}} \overline{X_{3}} \overline{X_{4}}$ (since they only differ for literal $x_{1}$ and there are no other
terms ${ }^{2}$ including product $\overline{X_{2} X_{3} X_{4}}$ ), as well as $\overline{X_{1} X_{2}} X_{3} X_{4}$ and $X_{1} \overline{X_{2}} X_{3} X_{4}$ (they also differ only for literal $x_{1}$ and there are no other terms including product $\overline{X_{2}} X_{3} X_{4}$ )

$$
\begin{align*}
\bar{F}\left(X_{1}, X_{2}, X_{3}, X_{4}\right)= & \overline{X_{2} X_{3} X_{4}}+\overline{X_{1}} X_{2} \overline{X_{3}} \overline{X_{4}}+\overline{X_{1} X_{2} X_{3} X_{4}} \\
& +\overline{X_{2}} X_{3} X_{4}+\overline{X_{1}} X_{2} \overline{X_{3}} X_{4}+\overline{X_{1}} X_{2} X_{3} \overline{X_{4}}  \tag{3.12}\\
& +\overline{X_{1}} X_{2} X_{3} X_{4}+X_{1} \overline{X_{2}} X_{3} \overline{X_{4}}
\end{align*}
$$

This is equivalent to eliminating transistor pairs Q15-Q16 and Q27-Q28, whose collectors are all connected to the output node $v_{o}$. When applying CPE2, terms $\overline{X_{1} X_{2} X_{3}} X_{4}$ and $\overline{X_{1}} X_{2} \overline{X_{3}} X_{4}$ differ for only literal $x_{2}$ and there are no other terms containing product $\overline{X_{3}} X_{4}$, thus they can be simplified according to (3.6) into the single term $\overline{X_{1} X_{3}} X_{4}$

$$
\begin{align*}
\bar{F}\left(X_{1}, X_{2}, X_{3}, X_{4}\right) & =\overline{X_{2} X_{3}} \overline{X_{4}}+\overline{X_{2}} X_{3} X_{4}+\overline{X_{1} X_{3}} X_{4}+ \\
& +\overline{X_{1}} X_{2} \overline{X_{3} X_{4}}+\overline{X_{1}} X_{2} X_{3} \overline{X_{4}}+  \tag{3.13}\\
& +\overline{X_{1}} X_{2} X_{3} X_{4}+X_{1} \overline{X_{2}} X_{3} \overline{X_{4}}
\end{align*}
$$

which is topologically equivalent to lumping transistor pairs Q23-Q24 and Q25-Q26 into Q23-Q24 and subsequently eliminating the complementary pair Q11-Q12, as in Fig. (3.9b). When CPE3 is applied to function $\bar{F}$, even though terms $\overline{X_{1}} X_{2} \overline{X_{3}} \overline{X_{4}}$ and $\overline{X_{1}} X_{2} X_{3} \overline{X_{4}}$ in (3.13) differ for only $x_{3}$, there is another term including $\overline{X_{4}}$ (i.e. $X_{1} \overline{X_{2}} X_{3} \overline{X_{4}}$ ) which does not have the correspondent term differing for only $x_{3}$ (i.e. $X_{1} \overline{X_{2}} \overline{X_{3}} \overline{X_{4}}$ ). Therefore CPE3 does not simplify such terms, and the same argument holds for CPE4 (indeed, even though terms $\overline{X_{1}} X_{2} X_{3} \overline{X_{4}}$ and $\overline{X_{1}} X_{2} X_{3} X_{4}$ differ for only literal $x_{4}$, there are several terms including $X_{4}$ which do not have a

[^6]corresponding term differing only for $x_{4}$ ). Detailed application of NPE is omitted and left to the reader.

The final step of the simplification procedure based on analytical CPENPE is the series-gate implementation of the simplified function, which is addressed in the following subsection.

### 3.3.3 Circuit implementation of the simplified function after CPE-NPE

After applying the analytical CPE-NPE, function $F$ must be mapped into a series-gate topology by simply exploiting the correspondence of each minterm $x_{1} x_{2} \ldots x_{n}$ with the path connecting the output node $\bar{v}_{o}$ to the current source having transistors driven by the same literals. This is easily obtained by resorting to considerations in Section 3.1 and starting the design from the first level.

To better understand this step, the simplified function (3.10) of the example presented in Section 3.3.2 is implemented in the following. From (3.10), there are two terms proportional to literal $x_{1}$, thus the first level includes two transistor pairs Q1-Q2 and Q3-Q4, according to Fig. 3.10. It is worth noting that the term containing factor $X_{1}\left(\overline{X_{1}}\right)$ is implemented by connecting transistor Q 2 (Q4) driven by the same signal to output $\bar{v}_{o}$ according to (3.1).

Now, let us consider the second level by implementing the factors multiplying $X_{1}$ and $\overline{X_{1}}$ in the first term. In particular, factor $\overline{X_{2}}\left(X_{3} \overline{X_{4}}\right)$ multiplying $\overline{X_{1}}$ contains one term including literal $\overline{X_{2}}$, which is implemented by the transistor pair Q5-Q6, where the collector of transistor Q5 driven by $\overline{X_{2}}$ is connected to the transistor pair Q3-Q4 driven by $\overline{X_{1}}$. Analogously, factor $X_{2}\left(\overline{X_{3} X_{4}}+X_{3} X_{4}\right)+X_{2} X_{3} \overline{X_{4}}+\overline{X_{3}} X_{4}$ multiplying $X_{1}$ contains three terms, which are implemented by the corresponding transistor pairs Q7-Q8, Q5-Q6 and Q9-Q10, as depicted in Fig. 3.10. It is worth noting that the third term does not contain literal $x_{2}$, thus it is implemented by the transistor pair Q9-Q10 lying at the third level (i.e., by skipping the second level).

In regard to the third level, the term $\overline{X_{3}} \overline{X_{4}}+X_{3} X_{4}$ multiplying $X_{2}$ is implemented by the transistor pair Q11-Q12 and the existing Q9-Q10, with the collector of Q11 and Q10 being connected to the emitter of Q7-Q8. The fourth level consists of only the transistor pair Q13-Q14 since it is the lowest one. Once the insertion of transistor pairs has been completed, all floating collector nodes must be connected to the other output node, $v_{o}$, thereby
leading to the final series gate topology in Fig. 3.10. As expected, this circuit is equal to that in Fig. 3.9e obtained by applying the graphical design strategy discussed in Section 3.2. It is apparent that an identical topology would have been obtained by implementing function $\bar{F}$, rather than $F$.


Fig. 3.10. Final series gate topology implementing simplified function $F$ in (3.10).

### 3.4 A VEM-BASED REDUCTION METHOD

In this section, the tabular approach based on Variable-Entered Mapping (VEM) proposed in [MKA92] is presented. The Variable-Entered Mapping is a well-known technique to minimize multiplexer implementations of boolean functions, and can be exploited to minimize the number of transistors in a series gate. Indeed, the npn network of a series gate can be thought of as the connection of transistor pairs, each of which implements a 2:1 current multiplexer, as the current applied to their emitter is steered to one side according to the input value.

Let us consider an $n$-variable logic function $F\left(X_{1} \ldots X_{n}\right)$ of input signals $X_{1} \ldots X_{n}$, with $X_{1}$ being applied to transistors belonging to the first level, $X_{2}$
being applied to the second level, and so on. Function $F$ can be minimized through the VEM technique [MKA92], [C95], which starts from its truth table with $X_{n}$ placed into the leftmost column followed to the right by variables $X_{n-1} \ldots X_{1}$. The simplification technique iteratively eliminates variable $X_{j}$ with $j=1 \ldots n$ by grouping contiguous rows in pairs and lumping each of them in a single row, thereby expressing in the output expression its dependence on $X_{j}$. Accordingly, at the $j$-th step the number of input variables of the truth table is reduced by $2^{j}$ through suppression of $X_{1} \ldots X_{j}$. After $n$ steps, the VEM technique leads to a single-row truth table, i.e. the expression of the minimized function $F$ (and $\bar{F}$ ).

As an example, let us minimize function $F$ in (3.4) by starting from its truth table in Table 3.1, which has to be rewritten in order to place variable $X_{n}$ into the leftmost column and $X_{1}$ to the rightmost one, as reported in Table 3.2a.

TABLE 3.2a

| $X_{4} X_{3} X_{2} X_{1}$ | $F\left(X_{1}, X_{2}, X_{3}, X_{4}\right)$ | $\overline{F\left(X_{1}, X_{2}, X_{3}, X_{4}\right)}$ |
| :---: | :---: | :---: |
| 0000 | 0 | 1 |
| 0001 | 0 | 1 |
| 0010 | 0 | 1 |
| 0011 | 1 | 0 |
| 0100 | 1 | 0 |
| 0101 | 0 | 1 |
| 0110 | 0 | 1 |
| 0111 | 1 | 0 |
| 1000 | 0 | 1 |
| 1001 | 1 | 0 |
| 1010 | 0 | 1 |
| 1011 | 1 | 0 |
| 1100 | 0 | 1 |
| 1101 | 0 | 1 |
| 1110 | 0 | 1 |
| 1111 | 1 | 0 |

The elimination of $X_{1}$ is achieved by grouping pairs of contiguous entries differing for only the value of $X_{1}$ into a single one. For example, the first two rows in Table 3.2a refer to input values 0000 and 0001 , which differ for the value of $X_{1}$, which is equal to 0 and 1 , respectively. Since output is equal to 0 in both cases, the two rows can be simplified into the first row of the reduced
truth table (i.e. with only input variables $X_{4}, X_{3}, X_{2}$ ) in Table 3.2 b , where output is equal to 0 . The third and fourth row in Table 3.2a, which refer to input values 0010 and 0011 again differing for only $X_{1}$, can be simplified into the second row in Table 3.2b, where output has the same value of $X_{1}$ for both input values. By following the same reasoning, reduction of the input variable $X_{1}$ leads to the truth table in Table 3.2b, which also reports the simplification of function $\bar{F}$. Although in this case its results are exactly the complement of those obtained for function $F$, this is not true in general for the following steps, and hence the two functions will be minimized separately.

TABLE 3.2b

| $X_{4} X_{3} X_{2}$ | $F\left(X_{2}, X_{3}, X_{4}\right)$ | $\overline{F\left(X_{2}, X_{3}, X_{4}\right)}$ |
| :---: | :---: | :---: |
| 000 | 0 | 1 |
| 001 | $X_{1}$ | $\overline{X_{1}}$ |
| 010 | $\overline{X_{1}}$ | $X_{1}$ |
| 011 | $X_{1}$ | $\overline{X_{1}}$ |
| 100 | $X_{1}$ | $\overline{X_{1}}$ |
| 101 | $X_{1}$ | $\overline{X_{1}}$ |
| 110 | 0 | 1 |
| 111 | $X_{1}$ | $\overline{X_{1}}$ |

Again, Table 3.2 b can be reduced by eliminating input variable $X_{2}$ by grouping consecutive rows differing for only the value of $X_{2}$ into pairs and simplifying each of them into a single row. For example, the first two rows refer to input values 000 and 001 , which differ for the value of $X_{2}$. In the two cases, output is equal to 1 only if both $X_{1}$ and $X_{2}$ are equal to 1 , thus the two rows can be simplified into the first one in Table 3.2c, which also reports the complete truth table after elimination of $X_{2}$, as well as that of function $\bar{F}$, that is not the complement of function $H^{3}$.
${ }^{3}$ As observed before, these results are not exactly the complement of those obtained for function $F$. To be more specific, analytical results of $\bar{F}$ would be exactly the complement of those found for function $F$ if property $A+\bar{A} B=A+B$ would be used. However, this simplification has not to be introduced in VEMs, since it no longer allows for minimizing the MUX-based (and hence series-gate) implementation.

The truth table of $F$ and $\bar{F}$ obtained after eliminating variable $X_{3}$ from Table 3.2c is reported in Table 3.2d. Then, the expression of function $F$ and $\bar{F}$ is derived from simple inspection of Table 3.2d

$$
\begin{align*}
F & =\left\lfloor X_{1} X_{2} \overline{X_{3}}+\left(\overline{X_{1} X_{2}}+X_{1} X_{2}\right) X_{3} \overline{X_{4}}+\left\lfloor X_{1} \overline{X_{3}}+X_{1} X_{2} X_{3} \mid X_{4}\right.\right.  \tag{3.14a}\\
\bar{F} & =\left[\left(\overline{X_{1} X_{2}}+\overline{X_{2}}\right) \overline{X_{3}}+\left(X_{1} \overline{X_{2}}+\overline{X_{1} X_{2}}\right) X_{3} \sqrt{X_{4}}\right. \\
& +\left[\overline{X_{1} X_{3}}+\left(\overline{X_{1}} X_{2}+\overline{X_{2}}\right) X_{3}\right] X_{4} \tag{3.14b}
\end{align*}
$$

TABLE 3.2c

| $X_{4} X_{3}$ | $F\left(X_{3}, X_{4}\right)$ | $\overline{F\left(X_{3}, X_{4}\right)}$ |
| :---: | :---: | :---: |
| 00 | $X_{1} X_{2}$ | $\overline{X_{1} X_{2}}+\overline{X_{2}}$ |
| 01 | $\overline{X_{1} X_{2}}+X_{1} X_{2}$ | $X_{1} \overline{X_{2}}+\overline{X_{1}} X_{2}$ |
| 10 | $X_{1}$ | $\overline{X_{1}}$ |
| 11 | $X_{1} X_{2}$ | $\overline{X_{1} X_{2}+\overline{X_{2}}}$ |

TABLE 3.2d

| $X_{4}$ | $F\left(X_{4}\right)$ | $\overline{F\left(X_{4}\right)}$ |
| :---: | :---: | :---: |
| 0 | $X_{1} X_{2} \overline{X_{3}}+\left(\overline{X_{1} X_{2}}+X_{1} X_{2}\right) X_{3}$ | $\left(\overline{X_{1}} X_{2}+\overline{X_{2}}\right) \overline{X_{3}}+\left(X_{1} \overline{X_{2}}+\overline{X_{1}} X_{2}\right) X_{3}$ |
| 1 | $X_{1} \overline{X_{3}}+X_{1} X_{2} X_{3}$ | $\overline{X_{1} X_{3}}+\left(\overline{X_{1} X_{2}}+\overline{X_{2}}\right) X_{3}$ |

The series gate topology implementing function in (3.14) can be identified by following the procedure discussed in Section 3.3.3. It is worth noting that function (3.14) is written by first collecting terms including literal $x_{4}$, then $x_{3}$, etc., while in Section 3.3.3, function $F$ (or $\bar{F}$ ) was written by first collecting terms including literal $x_{1}$, then $x_{2}$ and so on. Therefore, function (3.14) must be first expanded in a sum-of-product form and then rewritten by first collecting terms containing $x_{1}$, then $x_{2}$ and so on. In this example, manipulation of function $F$ in (3.14a) leads to the same expression given in (3.10), thus the series gate topology obtained with the VEM minimization is equal to that obtained by applying CPE/NPE in Figs. 3.9e3.10. It can also be shown that the same topology is obtained by minimizing
$\bar{F}$, rather than $F$. Actually, the equivalence of topologies obtained by resorting to CPE/NPE and VEM is general and is justified in the following.

The equivalence of the VEM-based minimization technique and that presented in Sections 3.3-3.4 can be understood by noting that, as was shown in the previous example, the simplified function $F(\bar{F})$ after VEM minimization is in the form

$$
\begin{equation*}
F=F\left(X_{1}, X_{2}, X_{3}, 0\right) \overline{X_{4}}+F\left(X_{1}, X_{2}, X_{3}, 1\right) X_{4} \tag{3.15a}
\end{equation*}
$$

where $F\left(X_{1}, X_{2}, X_{3}, 0\right)$ (as well as $\left.F\left(X_{1}, X_{2}, X_{3}, 1\right)\right)$ can be written again in the form

$$
\begin{equation*}
F\left(X_{1}, X_{2}, X_{3}, 0\right)=F\left(X_{1}, X_{2}, 0,0\right) \overline{X_{3}}+F\left(X_{1}, X_{2}, 1,0\right) X_{3} \tag{3.15b}
\end{equation*}
$$

Generalizing the result, all terms in each bracket are collected into two terms including factor $X_{j}$ and $\overline{X_{j}}$, which is the same operation in (3.7) performed by NPEj. To be more specific, it is easy to verify that when applying NPE starting from the bottom level and ending at the top level, NPEj leads to the same analytical simplifications as the $j$-th step of the VEM-based minimization. Furthermore, CPEj is intrinsically performed at the $j$-th step of the VEM minimization, since terms differing for only literal $x_{j}$ are lumped into a single term, which is equivalent ${ }^{4}$ to perform the simplification in (3.6).

### 3.5 INPUT ORDERING VERSUS DESIGN GOAL

In the previous sections, minimization methods to reduce the transistor count in a series gate have been discussed by adopting a given input ordering, i.e. with input $X_{1}$ driving transistors belonging to the top level, $X_{2}$ driving those of the immediately lower level and so on. However, in general, in an $n$-variable function $F\left(X_{1} \ldots X_{n}\right)$ there are $n$ ! possible input orderings, which are reduced to $n!/ k!$ if this function is symmetric in $k$ variables [MKA92]. In the following, criteria to identify the most convenient input ordering for a design aiming at a high speed, minimum transistor count, low switching noise or reduced number of series-gate levels are discussed.

Let us consider the case of a high-speed design, such as that of a specific gate lying in the critical path. In this case, it is useful to observe that in

[^7]general the gate delay depends on the logic level which the switching input is applied to. Moreover, the gate input signals do not arrive at the same time, since they are generated by previous gates lying in different paths. To improve the speed performance, the gate output signal OUT must switch as soon as possible, therefore the latest arriving input signal $X_{\text {latest }}$ must be applied to the gate input that exhibits minimum delay. In the specific case of Current-Mode circuits, as will be shown in Chapters 4 and 6, the delay is minimum for inputs applied to the top level of the series gate topology. Accordingly, the inputs $X_{1} \ldots X_{n}$ must be reordered to connect the latest arriving signal to the top level and the progressively earlier signals to lower levels.

When minimizing the transistor count is the main target, it is useful to observe that it depends on the input ordering [MKA92]. Therefore, the most convenient input ordering is found by exhaustively minimizing function $F\left(X_{1} \ldots X_{n}\right)$ for all possible different orderings, and then selecting that leading to the lowest number of transistors. In such cases, the comparison can be easily automated to reduce the design effort.

Another design option, namely the minimization of the switching noise, is an important target in practical applications involving high-resolution CMOS ICs, as discussed in Section 2.6.3. In these cases, the supply current can be kept almost constant by using Current-Mode gates, whose supply current is approximately their bias current $I_{S S}$. To understand the fundamental criteria to select the best input ordering which meets this goal, let us analyze how the switching noise is produced in CMOS Current-Mode circuits by considering the SCL inverter in Fig. 3.11, where capacitance $C_{V D D 1}\left(C_{V D D 2}\right)$ is the sum of capacitances between the output node $v_{o 1}\left(v_{o 2}\right)$ and the supply, whereas $C_{G N D 1}\left(C_{G N D 2}\right)$ is the sum of capacitances between the output node $v_{o 1}\left(v_{o 2}\right)$ and ground. Detailed calculation of parasitic capacitances will be developed in Chapter 5, and is not necessary to understand the following considerations.

Without loss of generality, let us assume an abrupt input switching of $v_{i n 1}$ $\left(v_{i n 2}\right)$ from low to high (high to low), which determines an output switching of $v_{o 1}\left(v_{o 2}\right)$ from high to low (low to high). During the switching transient, the ground current changes from its steady-state value $I_{S S}$ due to the additive current contributions $i_{C G N D 1}$ and $i_{C G N D 2}$ flowing through capacitances $C_{G N D 1}$ and $C_{G N D 2}$. Being equal to the ground current, the supply current has a timevarying component equal to $i_{C G N D 1}+i_{C G N D 2}$, whose peak amplitude $i_{V D D}$ must be kept within assigned bounds to avoid an excessive switching noise. As will be demonstrated in Chapter 5, output voltages $v_{o 1}$ and $v_{o 2}$ have an exponential waveform with a time constant $R_{D} \cdot C_{\text {out }}$, being $C_{\text {out }}$ the overall capacitance at each node and $R_{D}$ the equivalent PMOS resistance evaluated in Section 2.4.1. Since the overall capacitance at the output node $v_{o 1}$ and $v_{o 2}$
is respectively $\left(C_{V D D 1}+C_{G N D 1}\right)$ and $\left(C_{V D D 2}+C_{G N D 2}\right)$, the two output voltages result as

$$
\begin{align*}
& v_{o 1}(t)=V_{D D}-V_{\text {SWING }}\left(1-e^{-\frac{t}{R_{D}\left(C_{\text {VDD }}+C_{\text {CWD1 }}\right)}}\right)  \tag{3.16a}\\
& v_{o 2}(t)=V_{D D}-V_{\text {SWING }} e^{-\frac{t}{R_{D}\left(C_{\text {VDD } 2}+C_{\text {GwD } 2}\right)}}
\end{align*}
$$



Fig. 3.11. Circuit for the evaluation of the switching noise in a Source-Coupled inverter gate.

According to (3.16), the maximum amplitude $i_{V D D}$ of the time-varying contribution $i_{C G N D 1}+i_{C G N D 2}$ of the supply current is given by its value for $t=0$

$$
i_{V D D}=i_{C G N D 1}(0)+i_{C G N D 2}(0)=\left.C_{G N D 1} \frac{\partial v_{o 1}}{\partial t}\right|_{t=0}+\left.C_{G N D 2} \frac{\partial v_{o 2}}{\partial t}\right|_{t=0}
$$

$$
\begin{align*}
& =\frac{V_{S W N G}}{R_{D}}\left[\frac{C_{G N D 1}}{C_{\text {VDD1 }}+C_{G N D 1}}-\frac{C_{G N D 2}}{C_{\text {VDD2 }}+C_{G N D 2}}\right]  \tag{3.17}\\
& =I_{\text {SS }}\left[\frac{C_{G N D 1}}{C_{\text {VDD1 }}+C_{G V D 1}}-\frac{C_{G N D 2}}{C_{\text {VDD2 }}+C_{G N D 2}}\right]
\end{align*}
$$

which is proportional to the value of the bias current, and can be heavily reduced by matching the two capacitive terms with opposite sign in (3.17), i.e. by equalizing capacitances at the two output nodes. Except for the matching inaccuracy due to process tolerances, the capacitances at the two output nodes are equal in an SCL inverter gate. Indeed, it is symmetrical and drives symmetrical (differential) gates, thus parasitic transistor capacitances and gate input capacitances at the two output nodes are equal.

For more complex gates, the considerations introduced above still hold, and the switching noise is again minimized by matching the capacitances at the two output nodes. However, as opposite to the simple case of the inverter gate, in general the parasitic capacitances at the output nodes are different, since the number of transistors connected to an output node is different from the other one. For example, the circuit Fig. 3.9e has three transistors connected to output node $v_{o}$ and two transistors to the other. Since the number of transistors connected to each output node depends on the input ordering, the switching noise in SCL gates is minimized by considering all possible input orderings and selecting that leading to the most balanced number of branches connected to the output nodes [MKA92], [ACK93].

Finally, it is interesting to observe that in some specific cases the number of series-gate levels in an $n$-input gate can be reduced with respect to $n$ by properly selecting the input ordering, which allows for reducing the supply voltage (see Section 2.5.3). In particular, let us consider the transistor pairs belonging to two contiguous levels, the $j$-th level associated with input $X_{j}$ and the $(j+1)$-th immediately lower one associated with $X_{j+1}$. In general, transistors lying at the $(j+1)$-th level cannot be driven by signals generated for the stacked $j$-th level, but a voltage level downshifting is needed to avoid operation in the triode region. However, if transistors driven by $X_{j+1}$ are placed into the same series-gate level as transistors driven by $X_{j}$ (i.e. they are both driven by signals generated for the $j$-th level), correct operation is still guaranteed in the particular case where transistors driven by $X_{j}$ are never stacked to transistors driven by $X_{j+1}$. Equivalently, the number of series-gate levels can be reduced by unity if all possible active paths contain either transistors driven by $X_{j}$ or transistors driven by $X_{j+1}$. From an analytical point of view, this occurs when all products of the minimized function $F\left(X_{1} \ldots X_{n}\right)$ written in the sum-of-product form alternatively contain either factor $x_{j}$ or
$x_{j+1}$. As an example, the MUX gate in Fig. 2.15 has 3 inputs but only 2 series-gate levels, since in its expression

$$
\begin{equation*}
O U T(A, B, \Phi)=\Phi A+\bar{\Phi} B \tag{3.18}
\end{equation*}
$$

each product contains either $A$ or $B$.
Summarizing, if there is a pair of input variables which are never included into the same product in the minimized function $F\left(X_{1}, \ldots, X_{n}\right)$, a reduction in the number of levels is allowed, provided that the two input variables are associated with contiguous levels. Accordingly, in such cases an input ordering with contiguous inputs $x_{j}$ and $x_{j+1}$ must be adopted. Further reduction in the number of levels is allowed if there are various pairs of input variable with the requisites above discussed, and also in this case the two signals of each pair must be assigned to contiguous inputs.

## Chapter 4

## MODELING OF BIPOLAR CURRENT-MODE GATES

In this chapter, the delay modeling of CML and ECL gates is addressed. After briefly reviewing the previously proposed models, a more efficient approach is presented and applied to practical circuits.

### 4.1 INTRODUCTION TO MODELING METHODOLOGIES

Various approaches have been proposed in the literature to determine the delay expression of CML and ECL circuits. The most significant modeling approaches are based on the sensitivity analysis [TS79], [CBA88], [F90], [FBA90], the average branch current [KB92], [YC92], and the circuit linearization [SE94], [H95], [SE96]. Moreover, other strategies can be found in [GMO90].

The approach based on the sensitivity analysis was proposed in [TS79] in a simple form, and reused in other several works such as [CBA88] for the CML and ECL inverter, in [F90] for the ECL inverter and in [FBA90] for the ECL XOR gate with some approximations. By using the sensitivity analysis, the propagation delay of CML and ECL gates is represented as a weighted sum of all circuit time constants and transit time of transistors, according to

$$
\begin{equation*}
\tau_{P D}=\sum_{i=1}^{N} \sum_{j-1}^{M} k_{i j} R_{i} C_{j}+\sum_{l=1}^{P} k_{l} \tau_{F_{i}} \tag{4.1}
\end{equation*}
$$

where $N, M$ and $P$ are the number of resistors, capacitors and transistors, respectively, that are included in the half circuit (due to the symmetry, only
the half circuit can be used). The weighting factors $k_{i j}$ in relationship (4.1) are the key parameters of the sensitivity analysis, and are determined by observing that

$$
\begin{equation*}
k_{i j}=\frac{\partial^{2} \tau_{P D}}{\partial R_{i} \partial C_{j}} \tag{4.2}
\end{equation*}
$$

where the derivative of delay $\tau_{P D}$ can be numerically evaluated by determining the delay change due to the increment of both $R_{i}$ and $C_{j}$, through four circuit simulations.

The propagation delay is expressed as the sum of $(N \cdot M+P)$ terms. As an example, the simple case of a CML inverter has $N=4$ and $M=6$, thus leading to 24 terms [CBA88] and 96 simulations (one term was eliminated using circuit considerations).

The large number of simulations is the main drawback of the method. Besides, according to our experience and that of other authors, this approach is not independent of process parameters. Indeed, time constant factors $k_{i j}$ and $k_{l}$ have to be evaluated each time the technology changes (in fact, all papers have different coefficients). In addition, a model with such a high number of terms is not useful either to really understand the behavior of the gate or for an initial pencil-and-paper design.

To partially overcome these limitations, a simplification in the number of terms resulting from the sensitivity analysis was proposed in [FBA90]. The simplification is performed by neglecting the less significant terms by comparing the time constant values for the specific process used. Although this allows for reducing the number of terms to 10 for the case of the CML inverter gate, the procedure is not general, since it depends on the technology used. Moreover, the simplification in [FBA90] does not reduce the number of simulations required, since it must be performed after simulation.

The average branch current analysis was introduced in [KB92] and [YC92], and successively used in [SE96]. In this analysis strategy, the charge control transistor model is used to derive the differential equations describing the logic gates, that are solved by approximating currents with their mean value. This strategy was used to analytically model the propagation delay associated with the emitter follower in ECL circuits. By means of the average branch current analysis, a simple delay expression was derived in [KB92], and a quite similar expression was successively obtained in [SE96] by slightly different calculations. In [YC92] the delay is expressed by multiple expressions (whose validity depends on the logic swing value), that are much more complex than those in [KB92] and [SE96] and are thus useless. Although the emitter follower delay model in [KB92] and [SE96] is
simple, it does not take into account the dependence of delay on the bias current, that is significant in practical current ranges, as will be shown in Section 4.2.

The approach to the delay modeling presented in [SE94], [H95], [SE96], is based on the linearization of the device models. In particular, [SE94] proposes a CML delay model written as the superposition of the delay contributions associated with capacitances, justifying it for the linear dependence of delay on the time constants which is demonstrated by the sensitivity analysis strategy. Each term is found by solving in the time domain the circuit including only a capacitor at a time, but no closed-form expression is found. Indeed, multiple expressions are derived for each delay contribution, whose validity depends on the input waveform rise time. This makes the procedure in [SE94] not so appealing for modeling and design purposes, especially for a pencil-and-paper approach. In addition, the model is not guaranteed to be process-independent, since an empiric correction is introduced in the final delay expression to improve fitting with simulations.

In [SE96], a similar approach is used to model the delay of CML and ECL series gates. However, some assumptions are not very reasonable. For example, small-signal expressions are used to model the transistor capacitances, and the delay associated with them is weighted in a different manner with respect to that of the load capacitance. This explains the low accuracy of the model, which may differ from simulation results by as high as $50 \%$, according to the authors' experience.

The approach followed in [H95] starts from the linearization of the CML inverter gate by replacing transistors by their small-signal model. Successively, the delay is represented as the superposition of the delay contributions associated with each single capacitance, as justified by the Elmore's theory [E48]. The resulting expression is very simple, but there are some weak points in the delay evaluation strategy. For example, the baseemitter capacitance of the bipolar transistor is evaluated by means of its small-signal expression, in contrast with others [TS79], [SE96], [R96] that introduce a better linearized form. In addition, the base-collector capacitance is not split into the intrinsic and extrinsic contributions, as will be discussed below in regard to SPICE models. For these reasons, a high inaccuracy (as high as $60 \%$ ) of the model by Harada can be shown by comparison with simulations (the real inaccuracy was not highlighted in the original paper since only two simulations were performed).

Other modeling methodologies were proposed in the literature [GMO90], [BK95a], but they are much more complex and are not expressed in a closed form, therefore they are less suitable for design than those above discussed.

It is worth noting that, among the approaches described above, only the sensitivity analysis is capable to model both the delay of CML and ECL gates. Indeed, the average branch current analysis was only used to evaluate
the delay of the emitter follower in ECL gates, while the other methodologies can be only applied to CML gates. Moreover, none of the discussed modeling approaches is simple, general and accurate enough to be helpful in managing the design trade-offs in CML and ECL gates. For these reasons, the authors developed a novel strategy presented in the following that overcomes these limitations [AP99].

### 4.2 AN EFFICIENT APPROACH FOR CML GATES

The authors' approach starts from the observation that, in a bipolar Current-Mode gate, the transistors operate in the linear region, as saturation is avoided to allow high-speed performance. This suggests that transistors can be modeled by the small-signal model valid in the linear region, that has to be properly evaluated to account for a wide variation of currents and voltages. In the following, we consider the small-signal transistor SPICE model reported in Fig. 4.1.


Fig. 4.1. SPICE small-signal model of the bipolar transistor.

The small-signal parameters $g_{m}, r_{\pi}$ and $r_{o}$ in Fig. 4.1 are the transconductance, the base-emitter resistance and the output resistance of the transistor. Due to the symmetry of the input-output DC characteristics, it is reasonable to linearize the circuit around the logic threshold $v_{d}=0$, or equivalently with the bias current equally divided into the two output branches. Under this assumption, the usual expressions $g_{m}=I_{S S} / 2 V_{T}$ and $r_{\pi}=2 \beta_{F} V_{T} / I_{S S}$ hold for the transistors driven by the switching input. Moreover,
the transistor output resistance $r_{o}$ can be always neglected in practical cases, since it is much higher than the resistance $R_{C}$ seen by the collector to ground.

The parasitic resistances $r_{b}, r_{e}$ and $r_{c}$ model the resistance associated with base, emitter and collector diffusion. The distributed base-collector junction capacitance is accounted for by two lumped capacitances connected to the outer and the inner base node, $C_{b c x}$ and $C_{b c i}$, that are the extrinsic and intrinsic contribution. To be more specific, the total base-collector capacitance $C_{b c}$ is split into $C_{b c x}$ and $C_{b c i}$ according to the model parameter $X C J C$ that ranges between 0 and 1:

$$
\begin{align*}
C_{b c x} & =(1-X C J C) \cdot C_{b c}  \tag{4.3a}\\
C_{b c i} & =X C J C \cdot C_{b c} \tag{4.3b}
\end{align*}
$$

The base-emitter capacitance $C_{b e}$ consists of the diffusion and junction contributions, $C_{D}$ and $C_{j e}$, while $C_{c s}$ models the collector-substrate junction capacitance. It is worth noting that the well-known small-signal capacitance expressions are inadequate and the capacitances have to be properly linearized, since voltages move rapidly over a wide range.

Once the transistors are replaced by the model in Fig. 4.1, the equivalent linear circuit of a gate can be further simplified. Indeed, by observing that the circuit is generally symmetric and assuming differential signaling (as in practical cases, as discussed in Chapter 2), analysis can be limited to its halfcircuit.

To simplify the delay evaluation, the linearized half circuit can be approximated by a single-pole system with time constant $\tau$, that can be evaluated by means of the open-circuit time constant method [CG73], [MG87]. By assuming a step input waveform ${ }^{1}$, the delay $\tau_{P D}$ can be expressed as

$$
\begin{equation*}
\tau_{P D}=0.69 \tau=0.69 \sum_{i=1}^{M} R_{C i} C_{i} \tag{4.4}
\end{equation*}
$$

where $R_{C i}$ is the resistance seen by capacitance $C_{i}$ when the others are considered open circuits. It is worth noting that the assumption of singlepole behavior has lead to a linear dependence of the delay on the time

[^8]constants, which is consistent with observations reported in the literature [CBA88], [F90], [FBA90], [SE94], [SE96].

From inspection of relationship (4.4), the approach followed makes it possible to represent propagation delay with very few terms, whose number is equal to that of circuit capacitances. Moreover, each term has and evident physical meaning, since it is the time constant associated with the related capacitance [H97].

The general delay expression of CML gates (4.4) provides a better insight into the dependence of delay on design parameters (i.e., the bias current $I_{S S}$ and the load resistance $R_{C}$ ) and process parameters. This dependence will be made more explicit for various gates in the next sections. In addition, the dependence on the bias current will be carefully analyzed to develop design criteria to consciously manage the power-delay trade-off.

### 4.3 SIMPLE MODELING OF THE CML INVERTER

The CML inverter topology is shown in Fig. 4.2, where the capacitance $C_{L}$ models the external load due to the input capacitance of driven gates (or a generic output load) and the wiring capacitance. Its delay can be modeled by applying the strategy described in the previous sections, by replacing each transistor by the linear circuit in Fig. 4.1 and limiting analysis to the half circuit. Therefore, the equivalent circuit in Fig. 4.3 of a CML inverter is obtained, where the transistor output resistance $r_{o}$ is neglected since it is usually much higher than the load resistance $R_{C}$.

By linearizing the circuit around the logic threshold $v_{d}=0$, parameters $g_{m}$ and $r_{\pi}$ in Fig. 4.3 are evaluated as $I_{S S} / 2 V_{T^{\prime}}$ and $2 \beta_{F} V_{T} / I_{S S}$, respectively. The junction capacitances $C_{b c x}, C_{b c i}, C_{j e}$ and $C_{c s}$ must be evaluated by using relationship (1.12)-(1.13), where voltages $V_{1}$ and $V_{2}$ are the minimum and maximum direct junction voltages, and are evaluated at the steady state when a high or a low input value is applied. For the sake of clarity, voltages $V_{1}$ and $V_{2}$ are analytically evaluated in Table 4.1 for the junction capacitances of the circuit in Fig. 4.2. For example, the maximum direct voltage $V_{2}$ seen by $C_{b c x}$ occurs when the base voltage is high (i.e., equal to the ground voltage) and consequently the collector voltage is equal to $R_{C} l_{S S}=-V_{\text {SWING }} / 2$, thus yielding $V_{2}=V_{\text {SWING }} / 2$. Instead, when the base voltage is low (i.e. equal to $-V_{S W I N G} / 2$ ), the collector voltage is at the ground voltage, leading to $V_{1}=-V_{\text {SWING }} / 2$. Maximum and minimum direct voltages across the other capacitances can be analogously evaluated.

In regard to the capacitance $C_{j e}$, it has a maximum voltage equal to the base-emitter voltage $V_{B E \text {;on }}$ when the collector current is biased by the whole current $I_{S S}$ (i.e. when a high input voltage is applied)

$$
\begin{equation*}
V_{B E, o n}=V_{T} \ln \frac{I_{S S}}{\alpha_{F} I_{E S}} \tag{4.5}
\end{equation*}
$$

where (1.17) was inverted and the other transistor was assumed to be in cutoff. In relationship (4.5), $\alpha_{F}$ is the common-base current gain in (1.19), and $I_{E S}$ is the saturation current of the base-emitter junction, defined as the product $A_{E} \cdot J_{C S}$ in eq. (1.17). Due to the negligible dependence on the collector current, the maximum voltage $V_{B E, o n}$ can be approximated to a constant value, which is evaluated for an intermediate bias current. The minimum voltage across the base-emitter junction results in $\left(V_{B E, o n}-R_{C} I_{S S}\right)$, since for a low input voltage the transistor base voltage is reduced by $R_{C} I_{S S}$ compared to the value for a high input, whereas the emitter voltage is the same as before (it is set to $-V_{B E, \text { on }}$ by the transistor in the ON state, regardless of the input voltage).


Fig. 4.2. CML inverter.

The base-emitter diffusion capacitance $C_{D}$ is associated with the variation $\Delta Q_{B}$ of the base charge associated with minority carriers, that goes from $\tau_{F} I_{S S}$ (when $I_{S S}$ is completely steered to the transistor) to about 0 (when $I_{S S}$ is completely steered to the other transistor). As discussed for $C_{j e}$, this charge
variation is due to the base-emitter voltage variation $V_{2}-V_{1}=R_{C} I_{S S}$, therefore capacitance $C_{D}$ can be linearized as the ratio $\Delta Q_{B} / \Delta V_{B E}=\tau_{F} / R_{C}$. However, as pointed out in [R96], this expression is affected by a significant error, and an empiric factor of two has to be introduced, leading to

$$
\begin{equation*}
C_{D}=2 \frac{\tau_{F}}{R_{C}} \tag{4.6}
\end{equation*}
$$

TABLE 4.1

| capacitance | $V_{1}$ | $V_{2}$ |
| :--- | :--- | :--- |
| $C_{b c x}$ | $-V_{S W I N G} / 2$ | $V_{S W I N G} / 2$ |
| $C_{b c i}$ | $-V_{S W I N G} / 2$ | $V_{S W I N G} / 2$ |
| $C_{j e}$ | $V_{T} \cdot \ln \left(I_{S S} / \alpha_{F} I_{E S}\right)-V_{S W I N G} / 2$ | $V_{T} \cdot \ln \left(I_{S S} / \alpha_{F} I_{E S}\right)$ |
| $C_{c s}$ | $-V_{D D}$ | $-V_{D D}+V_{S W I N G} / 2$ |



Fig. 4.3. Equivalent linear circuit of the CML inverter.

By approximating the equivalent circuit of the CML inverter in Fig. 4.3 to a single pole system with time constant $\tau$, and evaluating it by the time constant method, the delay $\tau_{P D}$ can be expressed as

$$
\begin{align*}
\tau_{P D} & =0.69\left\{\frac{r_{e}+r_{b}}{1+g_{m} r_{e}} C_{b e}+r_{b} C_{b c i}\left[1+\frac{g_{m}\left(r_{c}+R_{C}\right)}{1+g_{m} r_{e}}\right]\right. \\
& \left.+\left(r_{c}+R_{C}\right)\left(C_{b c i}+C_{b c x}+C_{c s}\right)+R_{C} C_{L}\right\} \tag{4.7}
\end{align*}
$$

The propagation delay is the sum of four main terms, which have a simple circuit meaning and can be evaluated with pencil and paper. The first term is the contribution made by the base-emitter capacitance, the second is due to a Miller effect on the intrinsic base-collector capacitance, the third is a contribution which arises at the inner collector node (i.e., before parasitic resistance $r_{c}$ ) and the last one is due to the load capacitance at the output node. It is worth noting that the term $g_{m} /\left(1+g_{m} r_{e}\right)$ is the equivalent transconductance of the transistor having a series resistance $r_{e}$ at the emitter node.

### 4.3.1 Accuracy of the CML simple model

To evaluate the accuracy of relationship (4.7), a comparison between the modeled propagation delay and SPICE simulations was carried out. To generalize the comparison, two different technologies were taken into consideration. The first is a BiCMOS technology whose npn bipolar transistor has a transition frequency equal to 6 GHz , the second is the HSB2 high-speed bipolar technology (by courtesy of ST Microelectronics) with an npn transition frequency equal to 20 GHz . The SPICE model of the $6-\mathrm{GHz}$ and $20-\mathrm{GHz}$ transistors considered is reported in Fig. 4.4a and 4.4b, respectively.

Some parameters extracted from the model in Fig. 4.4a and 4.4 b are reported in Table 4.2, where $I_{t{ }^{\prime}}$ represents the bias current after which the transit time starts increasing due to high injection level effects, and was evaluated by a few DC simulations. In practical cases, the bias current should not significantly exceed $I_{t \kappa^{\prime}}$ because the transit time degradation determines an increase in the diffusion capacitance (4.6), and in turn a speed degradation.
.MODEL QN
$+N P N$
$+X T B=1.92 \quad E G=1.15 \quad X T I=3.84$
$+I S=8.91 E-18$
$B F=100$
$N F=1$
$+V A F=100$
$+B R=0.0936$
$+V A R=8$
$+R B=476$
$+C J E=4.59 E-14$
$+C J C=2.7 E-14$
$I K F=0.0192$
$I S E=5.18 E-18 N E=1.45$
$N R=1$
$I K R=0.00289 \quad I S C=5.46 E-18 \quad N C=1.07$
$R B M=166 \quad R E=9.53 \quad R C=110$
$V J E=1.07 \quad M J E=0.5 \quad F C=0.5$
$+C J S=7.28 E-14$
$V J C=0.646 \quad M J C=0.35$
$X C J C=.146$
$+T F=2.64 E-11$
$V J S=0.45 \quad M J S=0.3$
$+P T F=30$
$X T F=27.3 \quad V T F=3 \quad I T F=0.0201$
$T R=6.54 E-08$

Fig. 4.4a. SPICE model of the bipolar transistor (BiCMOS process).

## .MODEL C12TYP

$+N P N$
$+I S=7.40 E-018 B F=1.00 E+002 B R=1.00 E+000 N F=1.00 E+000$
$+N R=1.00 E+000 T F=6.00 E-012 T R=1.00 E-008 X T F=1.00 E+001$
$+V T F=1.50 E+000 I T F=2.30 E-002 P T F=3.75 E+001 V A F=4.50 E+001$
$+V A R=3.00 E+000 I K F=3.10 E-002 I K R=3.80 E-003 I S E=2.80 E-016$
$+N E=2.00 E+000 I S C=1.50 E-016 N C=1.50 E+000 R E=5.26 E+000$
$+R B=5.58 E+001 I R B=0.00 E+000 R B M=1.55 E+001 R C=8.09 E+001$
$+C J E=3.21 E-014 V J E=1.05 E+000 M J E=1.60 E-001 \quad C J C=2.37 E-014$
$+V J C=8.60 E-001 \mathrm{MJC}=3.40 E-001 X C J C=2.30 E-001 C J S=1.95 E-014$
$+V J S=8.20 E-001 M J S=3.20 E-001 E G=1.17 E+000 X T B=1.70 E+000$
$+X T I=3.00 E+000 K F=0.00 E+000 A F=1.00 E+000 F C=5.00 E-001$
Fig. 4.4b. SPICE model of the bipolar transistor (HSB2 process).

TABLE 4.2

|  | BiCMOS | HSB2 |
| :---: | :---: | :---: |
| $\tau_{F}$ | 26.4 ps | 6 ps |
| $r_{b}$ | $476 \Omega$ | $56 \Omega$ |
| $r_{c}$ | $110 \Omega$ | $81 \Omega$ |
| $r_{e}$ | $10 \Omega$ | $5 \Omega$ |
| $I_{\tau F}$ | 1.4 mA | 2.4 mA |

By using the parameters in Figs. 4.4a and 4.4b, assuming a logic swing of 500 mV and a supply voltage of 5 V , the resulting coefficients $K_{j}$ used to evaluate the transistor junction capacitances are listed in Tables 4.3a and 4.3b, that refer to the BiCMOS and HSB2 process, respectively.

## TABLE 4.3a

(BiCMOS process)

| Capacitanc <br> e | $V_{2}(\mathrm{~V})$ | $V_{1}(\mathrm{~V})$ | $\phi(\mathrm{V})$ | $M$ | $C_{j 0}(\mathrm{fF})$ | $K$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $C_{b c x}$ | 0.250 | -0.250 | 0.646 | 0.35 | 27 | 1.013 |
| $C_{b c i}$ | 0.250 | -0.250 | 0.646 | 0.35 | 27 | 1.013 |
| $C_{i e}$ | 0.841 | 0.591 | 1.07 | 0.5 | 45.9 | 1.916 |
| $C_{c s}$ | -4.75 | -5 | 0.45 | 0.3 | 72.8 | 0.477 |

TABLE 4.3b
(HSB2 process)

| Capacitanc <br> e | $V_{2}(\mathrm{~V})$ | $V_{1}(\mathrm{~V})$ | $\phi(\mathrm{V})$ | $m$ | $C_{J 0}(\mathrm{fF})$ | $K$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $C_{b c x}$ | 0.250 | -0.250 | 0.86 | 0.34 | 23.7 | 1.007 |
| $C_{b c i}$ | 0.250 | -0.250 | 0.86 | 0.34 | 23.7 | 1.007 |
| $C_{j e}$ | 0.846 | 0.596 | 1.05 | 0.16 | 32.1 | 1.21 |
| $C_{c s}$ | -4.75 | -5 | 0.82 | 0.32 | 23.7 | 0.538 |

After substituting these values, the delay predicted by relationship (4.7) versus the bias current is plotted in Figs. 4.5 a and 4.5 b for the two technologies, with a load capacitance $C_{L}$ equal to $0,100 \mathrm{fF}$ and 1 pF .

The error of the model (4.7) with respect to SPICE simulations versus the bias current $I_{S S}$ and with a load capacitance $C_{L}$ equal to $0 \mathrm{~F}, 100 \mathrm{fF}$ and 1 pF , is plotted in Figs. 4.6a and 4.6 b for the BiCMOS and HSB2 technology, respectively.

The worst-case error is $18 \%$ and $42 \%$ in the unrealistic case of a zero load capacitance for the BiCMOS and HSB2 process, respectively. Moreover, outside the high-level injection region (i.e., for $I_{S S}>I_{\tau F}$ ) the mean error is equal to $5 \%$ and $17 \%$. The error decreases as increasing the load capacitance. This is because the contribution due to the linear capacitance $C_{L}$ becomes dominant, and the real circuit behavior is closer to that of a onepole circuit whose time constant is equal to that resulting at the output node. Finally, it is worth noting that although the results for high-level injection bias currents are less significant, the error is still low.


Fig. 4.5a. Predicted and simulated delay vs. $I_{S S}$ for the BiCMOS process.


Fig. 4.5b. Predicted and simulated delay vs. $I_{S S}$ for the HSB2 process.


Fig. 4.6a. Error of (4.7) vs. $I_{S S}$ for the BiCMOS process.


Fig. 4.6b. Error of (4.7) vs. $I_{S S}$ for the HSB2 process.

### 4.4 ACCURATE MODELING OF THE CML INVERTER

The simple model discussed in the previous section has proved to be accurate enough for pencil and paper calculations. However, a greater accuracy would be desirable if the model were applied to a timing analyzer software, that is used to estimate the overall delay of complex logic networks in place of computationally expensive circuit simulators.

The main reason for the difference between the simulated and predicted delay concerns the non-linear behavior of the circuit. In particular, the terms affected by non-linearity in relationship (4.7) are the capacitances. Thus, the accuracy could be improved by introducing new values of coefficients $K_{j}$ that multiply the capacitances, instead of those obtained from relationship (1.13). To be more specific, let us use the delay expression (4.7) by evaluating the capacitance coefficients from a few simulation runs by means of a numerical procedure. By representing each junction capacitance with its zero-bias value and introducing a corrective coefficient for each capacitance, from (4.7) we obtain the improved delay model

$$
\begin{align*}
\tau_{P D} & =0.69\left\{\frac{r_{e}+r_{b}}{1+g_{m} r_{e}}\left(K_{b e} C_{j e o}+K_{D} C_{D}\right)+r_{b} K_{b c i} X_{c c c} C_{b c o}\left[1+\frac{g_{m}\left(r_{c}+R_{C}\right)}{1+g_{m} r_{e}}\right]\right. \\
& \left.+\left(r_{c}+R_{C}\right)\left(K_{b c i} X_{c c c} C_{b c o}+K_{b c x}\left(1-X_{c c c}\right) C_{b c o}+K_{c s} C_{c s o}\right)+R_{C} C_{L}\right) \tag{4.8}
\end{align*}
$$

The five coefficients $K_{j}$ associated with the capacitances have to be evaluated to fit simulation results of delay, by minimizing the error between analytical and simulated results. Among the many ways to obtain the coefficients $K_{j}$, the most efficient is that of minimizing the functional (4.9), since it requires few simulation runs

$$
\begin{equation*}
S\left(K_{b c i}, K_{b c x}, K_{c c}, K_{b e}, K_{D}\right)=\sum_{j=1}^{n}\left|\frac{\tau_{P D S p i c e}\left(I_{S S j}\right)-\tau_{P D}\left(I_{S S j}\right)}{\tau_{P D S p i c e}\left(I_{S S j}\right)}\right| \tag{4.9}
\end{equation*}
$$

where the least square formulation was not used to improve the convergence speed, since each term is much lower than unity. In (4.9), the number of simulations $n$ is itself a variable, but, according to our experience, it can be set to the number of capacitances (i.e., $n=5$ ). Since we consider the load capacitance in the worst condition (i.e., $C_{L}=0$, as discussed in the previous section), simulations differ only for the bias current value. Minimization of functional $S$ can be achieved with any of the numerical software tools such as Mathcad©, Matlab© or Mathematica©. Differently from traditional sensitivity analysis, our approach only needs a few Spice simulations, since
only the circuit time constants associated with the capacitors are considered in the delay expression (4.8), rather than all possible products of resistances and capacitances in (4.1).

### 4.4.1 Accuracy of the CML accurate model

As an example, coefficients $K$ were evaluated by using the bias current values (in mA) $[0.1,0.2,0.6,1,1.4]$ and $[0.1,0.4,1.2,1.6,2.2]$ to minimize the functional $S$ in (4.9) for the BiCMOS and HSB2 technology, respectively. The coefficients $K$ obtained are listed in Table 4.4 for both technologies. From the comparison with the values reported in Table 4.3a and 4.3b, it follows that the most evident difference between the simple and the improved model is due to the coefficients $K_{j}$ of the capacitances $C_{c s}$ and $C_{j e}$ for the BiCMOS process (which differ by a factor of two) and to the coefficients $K_{j}$ of the capacitance $C_{c s}$ for the HSB2 process (which differs by a factor higher than three).

TABLE 4.4

| coefficients | 6 <br> process$\quad \mathrm{GHz}$ | $20$ <br> process | $\mathbf{G H z}$ |
| :---: | :---: | :---: | :---: |
| $K_{b c x}$ | 0.816 | 0.998 |  |
| $K_{b c i}$ | 0.999 | 0.995 |  |
| $K_{C S}$ | 0.942 | 1.789 |  |
| $K_{b e}$ | 1.009 | 0.984 |  |
| $K_{D}$ | 1.076 | 1.100 |  |

The model error versus the bias current $I_{S S}$ and with a load capacitance of $0 \mathrm{~F}, 100 \mathrm{fF}$ and 1 pF , is plotted in Figs. 4.7a and 4.7b for the BiCMOS and HSB2 technology, respectively. It is worth noting that there is no difference in the accuracy of the model for the two technologies. The largest error is lower than $5 \%$ for both of them, and outside the high-level injection region the mean error is lower than $2 \%$. This confirms that the improved model is well suited for the implementation of a timing analyzer.


Fig. 4.7a. Error of (4.8) vs. $I_{S S}$ for the BiCMOS process.


Fig. 4.7b. Error of (4.8) vs. $I_{S S}$ for the HSB2 process.

### 4.5 SIMPLE AND ACCURATE MODELING OF THE ECL INVERTER

The modeling approach discussed in the previous sections can also be applied to ECL gates. To show this, let us consider an ECL inverter gate, depicted in Fig. 4.8, that is a symmetrical gate made up of a CML circuit followed by two common-collector (CC) stages with function of output
buffers. Again, since transistors operate in the linear region (or at the boundary of the cut-off region), they can be replaced by their small-signal model in Fig. 4.1. Moreover, due to the symmetry and the assumption of differential signaling, the analysis can be again restricted to the half circuit.

Being the ECL gate made up of cascaded blocks (i.e., the CML circuit and the output buffer), we can assume the ECL propagation delay to be composed of two separated contributions: the CML propagation delay $\tau_{P D(C M L)}$ and the common-collector propagation delay $\tau_{P D(C C)}$. To simplify the analysis, we can split the two contributions by applying the Thevenin theorem at the output of the CML stage. In particular, the intrinsic CML delay (i.e., without considering the CC stage) is represented by the delay of the Thevenin voltage with respect to the time when the input switches. The loading effect of the CC stage on the CML stage is accounted for by introducing the output impedance of the latter while driving the CC stage and evaluating its delay. Therefore, the CML propagation delay derives from that evaluated in the previous section by setting $C_{L}=0$, while the propagation delay of the CC stage must be evaluated by driving it with the Thevenin equivalent circuit seen by the output of the CML stage. To further simplify the circuit analysis, let us approximate the CML output impedance to its load resistance $R_{C}$.

From the above considerations, after using the transistor model in Fig. 4.1, the delay of the CC stage is evaluated by analyzing the equivalent circuit in Fig. 4.9. It is worth noting that in this circuit the base-collector parasitic is not split, since in this case it is not magnified by the Miller effect and makes a very small contribution.


Fig. 4.8. ECL inverter.


Fig. 4.9. Equivalent linear circuit of the CC stage driven by the Thevenin equivalent of the CML stage.

The transfer function of the circuit in Fig. 4.9 is given by

$$
\begin{equation*}
\frac{V_{o}}{V_{i n}} \approx \frac{1+\frac{C_{b e}}{g_{m}} s}{1+\left[\frac{R_{C}+r_{b}}{g_{m} r_{\pi}} C_{L}+\frac{C_{b e}+C_{L}}{g_{m}}+\left(R_{C}+r_{b}\right) C_{b c}\right] s+\frac{R_{C}+r_{b}}{g_{m}}\left(C_{b e} C_{L}\right) s^{2}} \tag{4.10}
\end{equation*}
$$

where the product $C_{b c}\left(C_{b e}+C_{L}\right)$ has been neglected with respect to $C_{b e} C_{L}$ in the coefficient of $s^{2}$. Usually, (4.10) has two complex and conjugate poles which justify the well-known ringing behavior [LS94], [KB92]. By neglecting the zero of (4.10), which is at a frequency much higher than that of the poles, the normalized step response of (4.10) in the time domain is

$$
\begin{equation*}
y(t)=1-\frac{1}{\sqrt{1-\xi^{2}}} e^{-\xi \omega_{n} t} \sin \left(\sqrt{1-\xi^{2}} \omega_{n} t+\operatorname{arctg} \frac{\sqrt{1-\xi^{2}}}{\xi}\right) \tag{4.11}
\end{equation*}
$$

where the pole frequency, $\omega_{n}$, and the damping factor, $\xi$, are found to be

$$
\begin{equation*}
\omega_{n}=\sqrt{\frac{g_{m}}{\left(R_{C}+r_{b}\right) C_{b e} C_{L}}} \tag{4.12a}
\end{equation*}
$$

$$
\begin{align*}
\xi & =\frac{1}{2}\left[\frac{1}{r_{\pi}} \sqrt{\frac{R_{C}+r_{b}}{g_{m}}} \sqrt{\frac{C_{L}}{C_{b e}}}+\sqrt{\frac{1}{g_{m}\left(R_{C}+r_{b}\right)} \frac{\left(C_{b e}+C_{L}\right)^{2}}{C_{b e} C_{L}}}\right. \\
& \left.+\sqrt{g_{m}\left(R_{C}+r_{b}\right) \frac{C_{b e}}{C_{L}}}\right] \tag{4.12b}
\end{align*}
$$

The time, normalized to the inverse of the pole frequency, for which a fixed percentage of the steady-state output voltage is reached only depends on the damping factor. In particular the normalized propagation delay $\tau_{P D n}$ (i.e., $\left.\tau_{P D n}=\tau_{P D} \cdot \omega_{n}\right)$ can be found setting in (4.11) $y(t)$ equal to 0.5 . The solution, plotted in Fig. 4.10, is quite linear and ranges from 1 to 1.6, for typical values of $\xi$ from 0.1. to 0.8 [AP99].

damping factor
Fig. 4.10. Normalized delay of the CC stage $\tau_{P D n}$ vs. damping factor $\xi$.

Considering in the following the worst case of $\tau_{P D n}=1.6$, and using relationship (4.12a) for the pole frequency, we get

$$
\begin{equation*}
\tau_{P D(c c)}=\frac{\tau_{P D n}}{\omega_{n}} \approx 1.6 \sqrt{\frac{R_{C}+r_{b}}{g_{m}} C_{b e} C_{L}} \tag{4.13}
\end{equation*}
$$

The use of relationship (4.13) allows evaluation of the common collector delay in a quite simple way, and takes into account the dependence of the delay on the bias current. By substituting the usual small-signal expressions of transconductance $g_{m}=I_{C C} / V_{T^{\prime}}$ and the base-emitter capacitance $C_{b e}=C_{j e}+C_{D}$ (where $C_{D}=g_{m} \tau_{F^{\prime}}$ and $C_{j e}$ are the diffusion and junction contributions), relationship (4.13) becomes

$$
\begin{equation*}
\tau_{P D(c c)}=\frac{\tau_{P D n}}{\omega_{n}} \approx 1.6 \sqrt{\left(R_{C}+r_{b}\right)\left(\frac{C_{j e} V_{T}}{I_{C C}}+\tau_{F}\right) C_{L}} . \tag{4.14}
\end{equation*}
$$

Since the base-emitter voltage only slightly varies during a transition, the small-signal value of capacitances $C_{j e}$ and $C_{D}$ can be used. From inspection of the CC stage delay model (4.14), its dependence on the bias current $I_{C C}$ is small and becomes less significant as increasing $I_{C C}$ since the diffusion capacitance tends to dominate over $C_{j e}$. Hence the propagation delay of the ECL gate can be written as

$$
\begin{align*}
\tau_{P D} & =\tau_{P D(C M L)}+\tau_{P D(C C)} \\
& =0.69\left\{\frac{r_{e}+r_{b}}{1+g_{m 1} r_{e}} C_{b e 1}+r_{b} C_{b c i 1}\left[1+\frac{g_{m 1}\left(r_{c}+R_{C}\right)}{1+g_{m 1} r_{e}}\right]\right.  \tag{4.15}\\
& \left.+\left(r_{c}+R_{C}\right)\left(C_{b c i 1}+C_{b c x 1}+C_{c s 1}\right)\right\}+1.6 \sqrt{\frac{R_{C}+r_{b}}{g_{m 2}} C_{b e 2} C_{L}}
\end{align*}
$$

where the indexes 1 and 2 refer to the CML and the CC stage, respectively. It is worth noting that, differently from the CML case, the minimum and maximum direct voltages across the base-collector junction are now $V_{1}=-V_{S W I N G} / 2-V_{B E, o n}$ and $V_{2}=V_{S W I N G} / 2-V_{B E, \text { on }}$, where $V_{B E, o n}$ is the base emitter voltage of the CC stage transistor, that is approximately independent of the bias current, as discussed in Section 4.3.

### 4.5.1 Validation and improvement of the ECL model

In order to evaluate the accuracy of the simple model proposed, the delay model (4.15) was compared to SPICE simulations for the bipolar technologies above discussed, by assuming the same conditions as in Section 4.3. The error found with the HSB2 technology is plotted in Figs. 4.11a and
4.11 b for $C_{L}$ equal 100 fF and 1 pF , respectively, by biasing the common collector with $0.4 \mathrm{~mA}, 1 \mathrm{~mA}$ and 1.8 mA . It is seen that the worst case percentage error is much lower than $20 \%$. Analogous results are obtained for the BiCMOS process.


Fig. 4.11a. Error of the ECL simple model (4.15) vs. $I_{S S}$ for the HSB2 process and $C_{L}=100 \mathrm{fF}$.


Fig. 4.11b. Error of the ECL simple model (4.15) vs. $I_{s S}$ for the HSB2 process and $C_{L}=1 \mathrm{pF}$.

The simple delay model discussed so far can be improved in terms of accuracy by adopting a numerical procedure similar to that used for the CML gate. More specifically, the propagation delay is represented as

$$
\begin{equation*}
\tau_{P D}=H_{1} \tau_{P D(C M L)}+H_{2} \sqrt{\frac{g_{m 2} \tau_{F}+C_{j e 2}}{g_{m 2}}} C_{L}^{\alpha_{1}}\left(R_{C}+r_{b}\right)^{\alpha_{2}} \tag{4.16}
\end{equation*}
$$

where $\tau_{P D(C M L)}$ is given by (4.8) by setting $C_{L}=0$. The model has 9 unknown coefficients: $\alpha_{1}$ and $\alpha_{2}$ which regard only the CC terms, $H_{1}$ and $H_{2}$ which weight the CML and ECL contribution, and the 5 coefficients K inside $\tau_{P D(C M L)}$ that were evaluated in the previous section.

Coefficients $\alpha_{1}, \alpha_{2}, H_{1}$ and $H_{2}$ in relationship (4.16) can be found by minimizing the functional in (4.17)

$$
\begin{equation*}
S_{E C L}\left(H_{1}, H_{2}, \alpha_{1}, \alpha_{2}\right)=\sum_{j=1}^{n}\left|\frac{\tau_{P D S p i c e}\left(I_{S S j}, I_{c c j}, C_{L}\right)-\tau_{P D}\left(I_{S S j}, I_{c c j}, C_{L}\right)}{\tau_{P D S p i c e}\left(I_{S S j}, I_{c c j}, C_{L}\right)}\right| \tag{4.17}
\end{equation*}
$$

A good choice of the number of simulations $n$ is 5 . Moreover, the simulations should be run by setting the load capacitance to low and high values ( 100 fF and 1 pF , for the adopted processes), and properly distributing the bias currents $I_{S S}$ and $I_{C C}$ in the range outside high-level injection.

Coefficients $\alpha_{1}, \alpha_{2}, H_{1}$ and $H_{2}$ were evaluated for the BiCMOS and HSB2 process by running five simulations, uniformly covering the entire region of currents $I_{S S}$ and $I_{C C}$ in which the model is valid. The coefficients found are reported in Table 4.5.

## TABLE 4.5

\(\left.$$
\begin{array}{|l|l|ll|}\hline \begin{array}{l}\text { coefficien } \\
\text { ts }\end{array} & \begin{array}{l}\mathbf{6} \\
\text { process }\end{array}
$$ \& \mathbf{G H z} \& \begin{array}{l}\mathbf{2 0} <br>

process\end{array}\end{array}\right]\)| GHz |
| :--- |
| $\alpha_{1}$ |

In this model the main difference between the simple and the improved model is due to the values of parameters $H_{1}$ and $H_{2}$. Several simulations were run with different bias current and load capacitance values to evaluate the error of the accurate ECL model. The error found with the HSB2 process
by biasing the common collector current with $0.4 \mathrm{~mA}, 1 \mathrm{~mA}$ and 1.8 mA is plotted in Figs. 4.12a and 4.12b, for $C_{L}$ equal to 100 fF and 1 pF , respectively. The worst-case error is lower than $10 \%$, while the typical error is around $5 \%$. Similar results were found for the BiCMOS process.


Fig. 4.12a. Error of the ECL accurate model (4.16) vs. $I_{S S}$ for the HSB2 process and $C_{L}=100 \mathrm{fF}$.


Fig. 4.12b. Error of the ECL accurate model (4.16) vs. $I_{S S}$ for the HSB2 process and $C_{L}=1 \mathrm{pF}$.

### 4.6 SIMPLE MODELING OF BIPOLAR CML MUX/XOR GATES

The CML MUX and XOR gates are shown in Figs. 4.13 and 4.14, respectively. Since these circuits are two-input logic gates, different delay values should be considered, depending on the switching input [AP00]. As a general consideration, the delay is greater for inputs associated with transistors that are at a lower level as stated in [FBA90] (i.e., Q1-Q2 in Figs. 4.13 and 4.14), as will be justified above.

Let us first consider the case where the input driving transistors Q1-Q2 switches, i.e., $\phi$ and $B$ in the MUX and XOR gate, respectively, while the others are kept constant. Due to the symmetry, in the MUX and XOR gates without loss of generality since the opposite case is analogous, we assume input $A$ to be high (and input $B$ of the MUX to be low to have an output transition). As a consequence, transistors Q3 and Q6 are ON and can be replaced by the model in Fig. 4.1, while Q4 and Q5 are OFF and their effect on the circuit is accounted for by only junction capacitances (equal to those of Q3 and Q6).


Fig. 4.13. CML MUX gate.

Since differential operation is assumed, the half-circuit model of both MUX and XOR gates in Fig. 4.15 is obtained, where the small-signal parameters $g_{m}$ and $r_{\pi}$ are evaluated around the logic threshold as for the inverter, thus resulting equal to $I_{S S} / 2 V_{T}$ and $2 \beta_{F} V_{T} / I_{S S}$. The base-emitter diffusion capacitance of the switching pair Q1-Q2 is still given by (4.6), while that of transistors Q3 and Q6 is evaluated by the well-known smallsignal expression $g_{m} \tau_{F}$, since the voltage variation at the emitter node of the upper transistors is small. For the same reason, small-signal values [GM93] for $C_{c s 1}, C_{j e 3}, C_{D 3}$ and $C_{j e 4}$ can be used. All the other junction capacitances are evaluated through relationship (1.12)-(1.13).


Fig. 4.14. CML XOR gate

Finally, it is worth noting that we are assuming a realistic case in which input $A$ is driven by another CML gate. Hence, in the upper input we have to include the equivalent output resistance of the previous gate, represented by the two resistances $R_{C}$ in series with $r_{b 3}$ and $r_{b 5}$. However, this equivalent output resistance must not be considered in the switching lower input,
because it is driven by an ideal voltage generator which represents the voltage variation at the input node. Loading effects at that node can be properly taken into account by evaluating the propagation delay of the previous gate.


Fig. 4.15. Equivalent linear circuit of the CML MUX/XOR gate.

By assuming a dominant pole behavior, and thus using relationship (4.4), neglecting terms $r_{\pi}$ with respect to terms $1 / g_{m}$ and lumping $C_{b c x 3}, C_{b c i 3}$ and $C_{b c x 5}, C_{b c i 5}$ into $C_{b c 3}$ and $C_{b c 5}$, respectively, the MUX/XOR equivalent circuit in Fig. 4.15 gives the following delay

$$
\begin{align*}
\tau_{P D} & =0.69\left\{\frac{r_{e l}+r_{b l}}{1+g_{m l} r_{e l}} C_{b e l}+r_{b l} C_{b c i 1}\left[1+\frac{g_{m l}\left(r_{c l}+R_{e q}\right)}{1+g_{m l} r_{e l}}\right]+\left(r_{c l}+R_{e q}\right)\left(C_{b c 1}+C_{c s 1}\right)\right. \\
& +\frac{1}{g_{m u}} C_{b e 3}+\left(2 R_{C}+r_{c u}+r_{b u}\right)\left(C_{b c 3}+C_{b c 5}\right)  \tag{4.18}\\
& \left.+\left(R_{C}+r_{c u}\right)\left(C_{c s 3}+C_{c s 5}\right)+\left(R_{C}+r_{b u}+2 r_{e u}\right) C_{j e 4}+R_{C} C_{L}\right\}
\end{align*}
$$

where subscripts $l$ and $u$ refer to lower and upper transistors, respectively, and $R_{e q}$ is the equivalent resistance at the emitter of Q3, given by

$$
\begin{equation*}
R_{e q}=r_{e u}+\frac{r_{m u}+r_{b u}+R_{C}}{1+\beta_{F}} \approx \frac{1}{g_{m}}+\frac{R_{C}}{\beta_{F}} \tag{4.19}
\end{equation*}
$$

From relationship (4.18), the propagation delay is the sum of eight main terms which have a simple circuit meaning. Three terms are due to the lower transistors, four to the upper transistors and one to the load.

### 4.6.1 Validation of the MUX/XOR model

The MUX/XOR delay in (4.18) was compared to SPICE simulations in the same conditions as in Section 4.3, that determine the junction capacitances given in Table 4.6a and Table 4.6b for the BiCMOS and HSB2 technology, respectively.

TABLE 4.6a
(BiCMOS process)

| Capacitance | $V_{2}(\mathrm{~V})$ | $V_{1}(\mathrm{~V})$ | $\phi(\mathrm{V})$ | $m$ | $C_{j 0}(\mathrm{fF})$ | $K$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $C_{b c x l}$ | 0.254 | 0 | 0.646 | 0.35 | 27 | 0.933 |
| $C_{b c i l}$ | 0.254 | 0 | 0.646 | 0.35 | 27 | 0.933 |
| $C_{c s l}$ | $-4.18^{\wedge}$ | - | 0.45 | 0.3 | 72.8 | 0.4973 |
| $C_{i e l}$ | 0.841 | 0.591 | 1.07 | 0.5 | 45.9 | 1.467 |
| $C_{\text {ie }}$ | $0.57^{\wedge}$ | - | 1.07 | 0.5 | 45.9 | 1.467 |
| $C_{b c 3}$ | 0.25 | -0.25 | 0.646 | 0.35 | 27 | 1.083 |
| $C_{i e 3}$ | $0.82^{\wedge}$ | - | 1.07 | 0.5 | 45.9 | 2.082 |
| $C_{c s 3}, C_{c s 5}$ | -4.75 | -5 | 0.45 | 0.3 | 72.8 | 0.477 |
| $C_{b c 5}$ | 0.25 | -0.25 | 0.646 | 0.35 | 27 | 0.942 |

this capacitance was evaluated using its small-signal expression.

TABLE 4.6b
(HSB2 process)

| Capacitance | $V_{2}(\mathrm{~V})$ | $V_{1}(\mathrm{~V})$ | $\phi(\mathrm{V})$ | $m$ | $C_{j 0}(\mathrm{fF})$ | $K$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $C_{b c x l}$ | 0.254 | 0 | 0.86 | 0.34 | 23.7 | 0.95 |
| $C_{b c i l}$ | 0.254 | 0 | 0.86 | 0.34 | 23.7 | 0.95 |
| $C_{c s 1}$ | $-4.18^{\wedge}$ | - | 0.82 | 0.32 | 19.5 | 0.561 |
| $C_{j e l}$ | 0.841 | 0.591 | 1.05 | 0.16 | 32.1 | 1.21 |
| $C_{i e 4}$ | $0.57^{\wedge}$ | - | 1.05 | 0.16 | 32.1 | 1.136 |
| $C_{b c 3}$ | 0.25 | -0.25 | 0.86 | 0.34 | 23.7 | 1.057 |
| $C_{j e 3}$ | $0.82^{\wedge}$ | - | 1.05 | 0.16 | 32.1 | 1.282 |
| $C_{c s 3} C_{c s 5}$ | -4.75 | -5 | 0.82 | 0.32 | 19.5 | 0.538 |
| $C_{b c 5}$ | 0.25 | -0.25 | 0.86 | 0.34 | 23.7 | 0.956 |

this capacitance was evaluated using its small-signal expression.

The error of the simple model compared to the simulated delay of the XOR gate versus the bias current $I_{S S}$ with load capacitance equal to 0 F , 100 fF and 1 pF , is plotted in Figs. 4.16a and 4.16b for the BiCMOS and HSB2 technology, respectively. The worst-case error is always lower than $15 \%$, and decreases as increasing the load capacitance, because $C_{L}$ tends to dominate over the parasitic capacitances. As expected, the simulated delay of the MUX gate was found to be equal to the XOR.


Fig. 4.16a. Error of $(4.18)$ vs. $I_{S S}$ for the BiCMOS process.


Fig. 4.16b. Error of (4.18) vs. $I_{S S}$ for the HSB2 process.

### 4.6.2 Extension to the $M U X / X O R$ when upper transistors switch

The results presented above refer to the delay associated with the switching of transistors Q1 and Q2. Now, let us consider the delay associated with the switching of the transistors at the upper level (Q3-Q6), that is expected to be lower than the previous one. Indeed, when the input signals of the upper transistors switch, transistors Q1-Q2 have already switched, and their capacitances do not have to be considered in the general delay expression (4.4). Therefore, the resulting delay is surely lower than (4.18), since the latter also includes the contributions of capacitances associated with Q1-Q2.

To evaluate the delay associated with signals driving the upper transistors, assume that the bias current is already entirely steered to transistors Q3-Q4 through transistor Q1 (i.e., its base voltage is high), and that the signal driving Q3-Q4 switches. Since transistors Q5-Q6 are OFF, they affect the delay only through their parasitic capacitance $C_{e q}$ seen from the output nodes to ground. As a consequence, the circuit can be schematized as the CML inverter made up of transistors Q3-Q4 loaded by a capacitance $C_{e q}$ in parallel to the load capacitance $C_{L}$. Therefore, the delay associated with the upper transistors is simply obtained from relationship (4.7) as

$$
\begin{align*}
\tau_{P D} & =0.69\left\{\frac{r_{e}+r_{b}}{1+g_{m} r_{e}} C_{b e 3,4}+r_{b} C_{b c i 3,4}\left[1+\frac{g_{m}\left(r_{c}+R_{C}\right)}{1+g_{m} r_{e}}\right]\right.  \tag{4.20}\\
& \left.+\left(r_{c}+R_{C}\right)\left(C_{b c i 3,4}+C_{b c x 3,4}+C_{c s 3,4}\right)+R_{C}\left(C_{L}+C_{e q}\right)\right\}
\end{align*}
$$

where the capacitance $C_{e q}$ accounts for the collector-substrate and basecollector contributions, $C_{c s}$ and $C_{b c}$,

$$
\begin{equation*}
C_{e q}=C_{b c i 5,6}+C_{b c x 5,6}+C_{c s 5,6} \tag{4.21}
\end{equation*}
$$

that are equal to those of Q3-Q4. Obviously, capacitances of Q3-Q4 are evaluated as those of Q1-Q2 for the inverter in Table 4.3a-b. The error found with respect to SPICE simulations is equal to that of the inverter previously discussed, as expected.

### 4.7 ACCURATE MODELING OF BIPOLAR CML MUX/XOR GATES AND EXTENSION TO ECL GATES

To improve the accuracy of the CML MUX/XOR model (4.18) and (4.20), it is useful to apply the methodology presented in Section 4.4 (by introducing five coefficients to be evaluated by a few simulation runs with a numerical procedure). To be more specific, the model associated with lower transistors (4.18) becomes

$$
\begin{align*}
\tau_{P D} & =K_{1}\left\{\frac{r_{e l}+r_{b l}}{1+g_{m l} r_{e l}}\left(C_{j e 1}+K_{2} C_{D 1}\right)+r_{b l} C_{b c i 1}\left[1+\frac{g_{m l}\left(r_{c l}+R_{e q}\right)}{1+g_{m l} r_{e l}}\right]\right. \\
& +\left(r_{c l}+R_{e q}\right)\left(C_{b c 1}+C_{c s 1}\right)+\frac{1}{g_{m u}} C_{b e 3} \\
& +\left(2 R_{C}+r_{c u}+r_{b u}\right)\left(K_{3} C_{b c o 3}+K_{5} C_{b c o 5}\right) \\
& \left.+K_{4}\left(R_{C}+r_{c u}\right)\left(C_{c s o 3}+C_{c s o 5}\right)+\left(R_{C}+r_{b u}+2 r_{e u}\right) C_{j e 4}\right\}+0.69 R_{C} C_{L} \tag{4.22}
\end{align*}
$$

where the values for the base-collector and collector-substrate capacitances of Q3 and Q5 are those used in the small signal model (i.e., in zero-bias condition). It should be noted that, differently from the inverter, coefficients $K_{1} \ldots K_{5}$ are not introduced for all the parasitic capacitances so that the procedure keeps its intrinsic simplicity. Thus, to minimize error without increasing the number of the parameters, four of the coefficients are associated to the most critical capacitances, while one is used for all the propagation delay terms (i.e. all capacitances) except that associated with the load capacitance. To evaluate the five coefficients, the functional (4.9) is to be minimized after performing five simulations.

The accurate model parameters are summarized in Table 4.7 for both technologies. The error found with respect to simulations is always lower than $5 \%$, and is typically about $2 \%$.

TABLE 4.7

|  | BiCMOS process | HSB2 process |
| :---: | :---: | :---: |
| $K_{1}$ | 0.27 | 0.452 |
| $K_{2}$ | 3.633 | 1.895 |
| $K_{3}$ | 1.224 | 1.471 |
| $K_{4}$ | 4.509 | 2.169 |
| $K_{5}$ | 1.034 | 2.171 |

Regarding the delay associated with upper transistors (4.20), that is equal to that of the inverter, the same considerations made in Section 4.4 still hold, and coefficients are again those reported in Table 4.4.

The ideas discussed until now about the MUX/XOR CML gates can be immediately extended to the ECL counterparts, as done for the inverter in Section 4.5. To be more explicit, the ECL delay is evaluated by using relationships (4.14)-(4.15) and substituting the CML inverter delay by the MUX/XOR CML delay. Comparison with SPICE simulations shows that this model provides results that are even better than those of the inverter model, since the maximum error reduces to $13 \%$ and $14 \%$ for the BiCMOS and HSB2 process, respectively.

The simple ECL delay model can be further improved by introducing the strategy in Section 4.5 based on relationship (4.16) and unknown coefficients. More specifically, coefficients $K_{1} \ldots K_{5}$ in Table 4.7 are introduced in the inner CML gate and the four coefficient $\alpha_{1}, \alpha_{2}, H_{1}$ and $H_{2}$ are introduced in the overall delay expression (4.16). The resulting error is always lower than $4 \%$ and $7 \%$ for the BiCMOS and HSB2 process, respectively.

### 4.8 EVALUATION OF CML/ECL GATES INPUT CAPACITANCE

In the previous sections, the delay of a logic gate was evaluated by assuming a generic load capacitance $C_{L}$, that in practical cases consists of the wiring capacitance and the input capacitance of the following gates. Therefore, to evaluate the delay of cascaded gates, it is fundamental to evaluate the input capacitance of a CML/ECL gate. To this end, let us consider the emitter-coupled pair of a CML/ECL inverter, that can be simplified into the linear circuit in Fig. 4.3.

To simplify the evaluation of the input capacitance $C_{i n}$ of the circuit in Fig. 4.3, let us neglect the small parasitic resistance $r_{e}$ and apply the Miller theorem to the intrinsic and extrinsic base-collector capacitance. As a result, the input impedance can be represented with the linear circuit in Fig. 4.17.


Fig. 4.17. Simplified circuit for evaluating the input capacitance of a CML/ECL gate.

In the circuit in Fig. 4.17, the input impedance is the parallel of that of $C_{b c x}\left(1-A_{V}\right)$ and the series of $r_{b}$ and $Z_{x}$. If resistance $r_{b}$ were equal to zero, the capacitive component of the input impedance would be the parallel of all capacitances $C_{b c x}\left(1-A_{V}\right), C_{b c i}\left(1-A_{V}\right)$ and $C_{b e}$ (in parallel to a resistance $\left.r_{\pi}\right)$. However, resistance $r_{b}$ is far from being zero for current bipolar technologies, as was observed in regard to the delay model in Section 4.3, and cannot be neglected with respect to $Z_{x}$. Moreover, resistance $r_{b}$ tends to significantly increase the impedance at the right of $C_{b c x}\left(1-A_{V}\right)$. As a consequence, as a first-order approximation, the input capacitance can be modeled by only the capacitance

$$
\begin{equation*}
C_{i n} \approx C_{b c x}\left(1+g_{m} R_{C}\right)=C_{b c x}\left(1+\frac{I_{S S} R_{C}}{2 V_{T}}\right) \tag{4.23}
\end{equation*}
$$

which tends to underestimate the equivalent input capacitance, as the load effect on the previous gate is actually heavier, since the input impedance is actually lower than that of (4.23), due to the parallel of $r_{b}+Z_{x}$. It is worth noting that the same expression holds for the other CML/ECL logic gates.

Relationship (4.23) gives 138 fF and 109 fF for the BiCMOS and HSB2 technologies, respectively. The input capacitance of a CML/ECL gate can be evaluated from SPICE simulations by driving it through an inverter, and then evaluating the linear capacitance value that leads to the same delay. The simulated results obtained for the input capacitance of a CML/ECL gate are 120 fF and 100 fF for the BiCMOS and the HSB2 technologies, respectively. Therefore, relationship (4.23) is within $15 \%$ of the input capacitance extracted from SPICE simulations. According to relationship (4.23), the simulated input capacitance of a CML/ECL gate was found to be essentially independent of the bias current for an assigned logic swing (or equivalently the product $\left.R_{( } I_{s s}\right)$.

The approximate relationship (4.23) was found by neglecting parasitic resistance $r_{e}$ and approximating the input impedance to a single capacitance, i.e. the resistive contribution was neglected. According to the above considerations, this leads to a sufficient accuracy for practical purposes. However, a more accurate expression will be derived in Section 8.1.1, which is useful when a better accuracy is required.

### 4.9 BIPOLAR CURRENT-MODE D LATCH

The input capacitance evaluation discussed above allows for extending the model of MUX/XOR in Section 4.6 and 4.7 to the CML and ECL D latch gate, the first of which is depicted in Fig. 4.18 (the ECL gate is obtained by adding the output buffers). The only difference between the MUX/XOR gate and the D latch is due to the different load at the output nodes. Indeed, transistors Q5 and Q6, which are connected in a positive-feedback loop, load the output nodes with their equivalent input capacitance. Moreover, the effect of the positive feedback can be neglected since it is significant only for loop gain greater than unity, i.e. when the differential input voltage of Q5-Q6 (equal to the output voltage) crosses the gate logic threshold, after which the gate is assumed to have already switched by definition of gate delay. Since the feedback affects the circuit operation only at the end of the switching transient, we conclude that for most of the switching time the effect of feedback is negligible. As a consequence, the D latch delay is equal to that of the MUX/XOR gate loaded by a capacitance equal to the external capacitance $C_{L}$ and the input capacitance (4.23) of the emitter-coupled pair Q5-Q6. Therefore, the D latch delay when the clock signal $C K$ switches
(often referred as the CK-Q latch delay [P01]) is given by relationship (4.18) where the load capacitance $C_{L}$ is replaced by an equivalent capacitance $C_{L}{ }^{\text {, }}$

$$
\begin{equation*}
C_{L}^{\prime}=C_{L}+C_{b c x 5,6}\left(1+\frac{I_{S S} R_{C}}{2 V_{T}}\right) \tag{4.24}
\end{equation*}
$$

and the same observation holds for the delay associated with input $D$ (often referred as D-Q latch delay), where the same substitution has to be carried out in relationship (4.20).


Fig. 4.18. CML D latch gate.

For the considered technologies and under the conditions above discussed, the transistor capacitances result equal to those of MUX/XOR gate in Table 4.6a-b. The only parameter which has to be modified in those Tables is the coefficient $K_{j}$ of $C_{b c 5,6}$, which is equal to 2.025 and 2.013 for the BiCMOS and HSB2 technology, respectively.

The simple CK-Q delay model (4.18) with (4.24) and the simulated delay of the D latch versus the bias current $I_{S S}$ with a load capacitance $C_{L}$ equal to $0 \mathrm{~F}, 100 \mathrm{fF}$ and 1 pF , are plotted in Figs. 4.19a and 4.19 b for the BiCMOS and HSB2 technologies, respectively.


Fig. 4.19a. Simple CK-Q model (4.18) with (4.24) and simulated delay vs $I_{S S}$ for the BiCMOS process.


Fig. 4.19b. Simple CK-Q model in (4.18) and (4.24) and simulated delay vs. $I_{S S}$ for the HSB2process.

The error, plotted in Figs. 4.20a and 4.20b, is always lower than $20 \%$, and decreases as increasing the load capacitance, as for the MUX/XOR gate. Analogous results are obtained for the D-Q delay, for which the maximum error is always lower than $18 \%$.


Fig. 4.20a. Error of the simple CK-Q delay model (4.18) with (4.24) vs. $I_{S S}$ for the BiCMOS process.


Fig. 4.20b. Error of the simple CK-Q delay model (4.18) with (4.24) vs. $I_{S S}$ for the HSB2 process.

The accuracy of the simple model can be improved by introducing suitable unknown coefficients, as for the MUX/XOR gate. Since the loading effects of Q5 and Q6 are estimated in an approximated manner, the accurate model is obtained from the one developed for the MUX/XOR by adding a further term which accurately takes into account the loading effect associated with the input capacitance of transistors Q5 and Q6. According to relationships (4.22) and (4.24), in the model of the MUX/XOR gate we replace the load capacitance $C_{L}$ by

$$
\begin{equation*}
C_{L}^{\prime}=C_{L}+C_{6} \tag{4.25}
\end{equation*}
$$

where the unknown capacitance $C_{6}$ is found by minimizing the functional (4.17). For the two technologies considered, the value of the unknown coefficients $K_{1} \ldots K_{5}$ and $C_{6}$ is reported in Table 4.8.

TABLE 4.8

|  | BiCMOS process | HSB2 process |
| :---: | :---: | :---: |
| $K_{1}$ | 0.6 | 0.901 |
| $K_{2}$ | 1.485 | 0.858 |
| $K_{3}$ | 0.922 | 1.014 |
| $K_{4}$ | 0.939 | 1.014 |
| $K_{5}$ | 0.922 | 1.014 |
| $C_{6}$ | $165 \mathrm{E}-15$ | $53.7 \mathrm{E}-15$ |

The error found for the accurate model of the CK-Q D latch delay, plotted in Figs. 4.21a and 4.21b for the two processes considered, is always lower than $5 \%$, and is typically about $2 \%$. Thus the same accuracy as the MUX/XOR is achieved. Analogous results are obtained for the accurate D-Q delay model.


Fig. 4.21a. Error of the accurate CK-Q delay model (4.22) with (4.25) vs. $I_{S S}$ for the BiCMOS process.


Fig. 4.21b. Error of the accurate CK-Q delay model (4.22) with (4.25) vs. I $I_{S S}$ for the HSB2 process.

## Chapter 5

## OPTIMIZED DESIGN OF BIPOLAR CURRENTMODE GATES

In this chapter, a general methodology to maximize the speed performance and to manage the power-delay trade-off in CML and ECL gates is presented and applied to several fundamental gates.

### 5.1 INTRODUCTION TO OPTIMIZED METHODOLOGY IN CML GATES

The bias current of a bipolar Current-Mode gate determines its power consumption and significantly affects the delay, as shown in Chapter 4. Since the power consumption in bipolar Current-Mode gates is rather large in practical cases, an efficient strategy to manage the power-delay trade-off and to optimize the speed performance is highly desirable. Such a design strategy can be derived by analytically expressing the power-delay trade-off, or equivalently a delay expression as an explicit function of the bias current $I_{S S}$.

The delay expressions introduced in Chapter 4 depend on process and design parameters, that consist of the bias current $I_{S S}$ and the load resistance $R_{C}$. In the design of bipolar gates, a suitable value of the logic swing has to be chosen from considerations on noise margin as discussed in Chapter 2, thus the product $R_{C} I_{S S}$ is constant and equal to $V_{S W I N G} / 2$ from relationship (2.9). As a consequence, delay expressions reported in Chapter 4 can be expressed as a function of only $I_{S S}$, once the logic swing has been preliminarily set.

It will be shown in the following subchapters that the delay of a generic CML gate as a function of the bias current can be expressed as

$$
\begin{equation*}
\tau_{P D}=a I_{S S}+\frac{b}{I_{S S}}+c \tag{5.1}
\end{equation*}
$$

where coefficients $a, b$ and $c$ depend on the logic swing, the power supply voltage and process parameters. We will show that coefficient $a$ depends on the base-emitter diffusion capacitance of switching transistors, while coefficient $b$ is due to an equivalent capacitance at the output node.

Delay (5.1) can be minimized by properly setting the bias current, as can be shown by differentiating (5.1) for $I_{S S}$ and setting the result to zero. The minimum propagation delay, $\tau_{P D o p}$, is achieved by setting the bias current to

$$
\begin{equation*}
I_{\text {SSop }}=\sqrt{\frac{b}{a}} \tag{5.2}
\end{equation*}
$$

and it is equal to

$$
\begin{equation*}
\tau_{P D o p}=2 \sqrt{a b}+c \approx 2 \sqrt{a b} \tag{5.3}
\end{equation*}
$$

where the term $c$ can be usually neglected, as will be discussed in the next subchapters.

Relationship (5.1) can also be used to express the power-delay product $P D P$, that quantitatively measures the efficiency of the power-delay tradeoff [O99], [CBF01], giving

$$
\begin{equation*}
P D P=V_{D D} I_{S S} \tau_{P D} \approx V_{D D}\left(a I_{S S}^{2}+b\right) \tag{5.4}
\end{equation*}
$$

$P D P$ always increases while increasing the bias current, which means that the power efficiency increases when the bias current is reduced. However, it is useful to observe that the power-delay product can be reduced only to some extent, since it tends to be constant for very low bias currents such that $a I_{S S}^{2} \ll b$.

Reducing power dissipation is a key issue in modern digital circuits. Hence, a design strategy to achieve both high speed and reduced power dissipation is a target to pursue. A high-speed performance can be obtained by setting the bias current to its optimum value (5.2), while a reduction of power consumption can be only obtained by using a lower bias current [AP991], [AP00]. These opposite requirements can be accomplished by observing that, from relationship (5.4), the power-delay product decreases with the bias current through a square law, thus for a given bias current reduction a lower delay increase is expected. In other words, it is possible to
reduce the power dissipation while paying for a smaller speed penalty with respect to the optimum case (5.2)-(5.3).

To evaluate the power-delay trade-off and quantitatively express the speed penalty due to the bias current reduction, we introduce the ratio $T_{P D}$ between the propagation delay and its optimum value $\tau_{\text {PDop }}$ (i.e., $\left.T_{P D}=\tau_{P D} / \tau_{P D o p}\right)$. We will also normalize the current to its optimum value $I_{S S o p}$ (i.e., $I_{N}-I_{S S} / I_{S S o p}$ ), and from relationships (5.1) and (5.3) we get

$$
\begin{equation*}
T_{P D}=\frac{\sqrt{a b}\left(I_{N}+\frac{1}{I_{N}}\right)+c}{2 \sqrt{a b}+c} \approx \frac{1}{2}\left(I_{N}+\frac{1}{I_{N}}\right) \tag{5.5}
\end{equation*}
$$

where the coefficient $c$ was assumed to be lower than $\sqrt{a b}$, as occurs in practical cases. It is worth noting that, even in the unrealistic case $c=\sqrt{a b}$, the error on the approximated expression of (5.5) with respect to (5.1) for values of $I_{N}$ typically greater than 0.5 is lower than $7 \%$.

The expression (5.5) of $T_{P D}$ is independent of the circuit and process parameters. Moreover, inspection of its plot, depicted in Fig. 5.1, shows that a reduction in the bias current around its optimum value (i.e., $I_{N}$ lower than 1) determines only a small increase in the resulting propagation delay, as expected.


Fig. 5.1. $T_{P D}$ versus $I_{N}$ through relationship (5.5).

As an example, Fig. 5.1 shows that a $50 \%$ bias current reduction with respect to the optimum value (i.e., $I_{N}=0.5$ ) leads to a delay increase by only $25 \%$. Moreover, a further bias current reduction leads to a greater delay increase that is not usually acceptable, as can be seen for a $60 \%$ bias current reduction which gives a $45 \%$ increase in the propagation delay with respect to the optimum case. This is because for $I_{N}<0.5$ the term $1 / I_{N}$ in relationship (5.5) tends to dominate over $I_{N}$, yielding

$$
\begin{equation*}
T_{P D} \approx \frac{1}{2} \frac{1}{I_{N}} \tag{5.6}
\end{equation*}
$$

which shows that, for a given reduction of the bias current by a factor $1 / I_{N}$ with respect to $I_{S S o p}$, the related increase of the delay factor is about halved. From a power-delay trade-off point of view, this is easily explained by observing that for $I_{N} \ll 0.5$ (i.e. $a I_{S S}^{2} \ll 0.5 \cdot b$ ) the power-delay product (5.4) no longer decreases as decreasing the bias current, but tends to be constant.

In our opinion, a good trade-off between power and speed, which will be used in the following, is achieved by setting $I_{N}=0.6$ (i.e., a bias current equal to $60 \%$ of the optimum value). This choice determines a propagation delay which is only $10 \%$ worse than the optimum value, while it reduces the power dissipation by $40 \%$ with respect to that needed in an optimum design [AP991].

### 5.2 OPTIMIZED DESIGN OF THE CML INVERTER

In the previous section, a general design methodology of CML circuits has been discussed without referral to the specific gate considered. Now, the inverter gate is analyzed from a design point of view, by resorting to the concepts and notations above introduced. Since in general the transistor emitter area is a design parameter, the case of minimum-area transistors is first dealt with, and then extended to the more general case of a larger area.

### 5.2.1 Design with minimum transistor area

Consider a CML inverter gate, whose design parameters are the bias current $I_{S S}$, the load resistance $R_{C}$ and the transistor emitter area $A_{E}$. The latter parameter $A_{E}$ defines the transistor resistance and capacitance parasitics, and, for the sake of simplicity, is assumed to be equal to the minimum value allowed by the adopted process, therefore in the design is a constant. Moreover, as clarified in Section 5.1, the load resistance $R_{C}$ has to
be set equal to $V_{S W I N G} / 2 I_{S S}$ to achieve the desired logic swing preliminarily chosen (typically around $400-500 \mathrm{mV}$ ), therefore the only design parameter is the bias current.

The explicit delay dependence on the bias current can be found by properly manipulating the delay model (4.7). In particular, by assuming $g_{m} r_{e} \ll 1$ as in practical cases ${ }^{1}$ and substituting $R_{C}-V_{S W I N G} / 2 I_{S S}$, the delay expression (4.7) can be written in the form of relationship (5.1) with coefficients $a, b$ and $c$ being equal to

$$
\begin{align*}
& a=0.69\left(4 \frac{r_{e}+r_{b}}{V_{\text {SWING }}} \tau_{F}+r_{b} C_{b c i} \frac{r_{c}}{2 V_{T}}\right) \approx 2.76 \frac{r_{e}+r_{b}}{V_{\text {SWING }}} \tau_{F}  \tag{5.7a}\\
& b=0.35 \cdot V_{\text {SWING }}\left(C_{b c i}+C_{b c x}+C_{c s}+C_{L}\right)  \tag{5.7b}\\
& c=0.69\left[r_{b} C_{b c i}\left(1+\frac{V_{\text {SWIN }}}{4 V_{T}}\right)+r_{c}\left(C_{b c}+C_{c s}\right)+\left(r_{e}+r_{b}\right) C_{j e}\right] \tag{5.7c}
\end{align*}
$$

where coefficient $a$ (responsible for the delay increase for $I_{S S}>I_{S S o p}$ ) is essentially due to the base-emitter diffusion capacitance (4.6) through transit time $\tau_{F}$, while coefficient $b$ is due to the total capacitance at the output node (represented by the term in brackets in (5.7b)).

From relationship (5.7) and (5.2)-(5.3), the resulting values of optimum bias current $I_{S S o p}$ and the correspondent minimum delay $\tau_{P D o p}$ are

$$
\begin{align*}
& I_{\text {SSop }}=\sqrt{\frac{b}{a}}=\frac{V_{S W I N G}}{2} \sqrt{\frac{C_{b c}+C_{c s}+C_{L}}{2\left(r_{e}+r_{b}\right) \tau_{F}}}  \tag{5.8}\\
& \tau_{P D o p} \approx 2 \sqrt{a b} \approx 2 \sqrt{\left(r_{e}+r_{b}\right) \tau_{F}\left(C_{b c i}+C_{b c x}+C_{c s}+C_{L}\right)} \tag{5.9}
\end{align*}
$$

Relationship (5.8) shows that the optimum bias current, and hence the power consumption, proportionally increases as increasing the logic swing, whereas the minimum delay achievable (5.9) is independent of the logic swing. Thus, the logic swing should be kept as low as possible, in order to exploit the speed potential of the technology used without wasting an excessive power.

[^9]Relationships (5.8) and (5.9) clarify the usual belief that CML circuits' speed is due to their small logic swing. Indeed, from (5.9) this assumption fails when the optimum bias current (5.8) is used, while is correct when the delay dependence on $V_{\text {SWING }}$ is evaluated with an assigned bias current. This can be understood by observing that for $I_{S S}<I_{S S o p}$ the dominant term in delay (5.1) is $b / I_{S S}$, that from (5.7b) can be lowered by decreasing the logic swing.

Relationship (5.9) can also be used to identify the main speed limits while optimizing a bipolar process. Indeed, by assuming the inverter to be driving other CML gates modeled by their input capacitance (4.23) (i.e., $C_{L}$ proportional to $C_{i n}$ ), the main capacitive contributions are the diffusion capacitance (through the transit time $\tau_{F}$ ), as well as the sum of the extrinsic base-collector capacitance and the collector-substrate capacitance. Finally, the main resistive contributions are the base and the emitter resistance, of which the first is usually the dominant term.

Of course, even though the optimum case has been discussed so far, the same design strategy to manage the power-delay trade-off as in Section 5.1 can be also used to reduce the power consumption without significantly degrading the speed performance.

### 5.2.2 Design with non-minimum transistor area

When non-minimum transistors are used, they are (or can be assumed to be) made up of the parallel of $N$ unitary transistors, each of which being characterized by junction capacitances $C_{b c i}, C_{b c x}, C_{c s}$ and $C_{j e}$, as well as parasitic resistances $r_{b}, r_{e}$ and $r_{c}$. These unitary transistors connected in parallel can be modeled as a single transistor with each junction capacitance equal to that of the minimum-area transistor multiplied by a factor $N$, and parasitic resistances equal to those of the minimum-transistor divided by the factor $N$ [AM88]. As a consequence, from inspection of (5.7), substituting the parasitic capacitances and resistances, relationship (5.1) must be rewritten as

$$
\begin{equation*}
\tau_{P D}=a \frac{I_{S S}}{N}+\left(b_{1}+\frac{b_{0}}{N}\right) \frac{N}{I_{S S}}+c \tag{5.10}
\end{equation*}
$$

where $a$ and $c$ are again given by (5.7a) and (5.7c), while $b_{0}$ and $b_{1}$ are

$$
\begin{align*}
& b_{0}=0.35 \cdot V_{\text {SWING }} C_{L}  \tag{5.11a}\\
& b_{1}=0.35 \cdot V_{\text {SWING }}\left(C_{b c i}+C_{b c x}+C_{c s}\right) \tag{5.11b}
\end{align*}
$$

The delay expression (5.10) is minimized by setting the bias current to

$$
\begin{equation*}
I_{\text {SSop }}(N)=\sqrt{\frac{b_{1}+\frac{b_{0}}{N}}{a}}=N \sqrt{\frac{1+\frac{b_{0}}{N b_{1}}}{1+\frac{b_{0}}{b_{1}}}} I_{\text {SSop }}(1) \tag{5.12}
\end{equation*}
$$

that was found by equating to zero the derivative of (5.10) with respect to $I_{S S}$, and defining $I_{S S o p}(1)$ as the optimum current (5.8) found for minimum area (i.e, for $N=1$ ). The resulting minimum delay as a function of $N$, found by substituting (5.12) into (5.10), results in

$$
\begin{equation*}
\tau_{\text {PDop }}(N)=2 \sqrt{a\left(b_{1}+\frac{b_{0}}{N}\right)}+c \approx \sqrt{\frac{1+\frac{b_{0}}{N b_{1}}}{1+\frac{b_{0}}{b_{1}}}} \tau_{\text {PDop }}(1) \tag{5.13}
\end{equation*}
$$

where $\tau_{P D_{o p}}(1)$ is the minimum delay (5.9) obtained for $N=1$. By observing that the term in the square root of (5.13) is lower than unity, the delay of the CML gate can be reduced by increasing the transistor emitter area, but this is obtained at the cost of a greater power dissipation from (5.12). To be more specific, when the load capacitance $C_{L}$ is much greater than the parasitic capacitances at the output node, from (5.11) the ratio $b_{0} / b_{1}$ is much higher than unity. As a consequence, the delay reduction and the increase in the bias current are both proportional to $\sqrt{N}$ from relationships (5.12) and (5.13), which means that the power-delay product is equal to that already found assuming minimum-area transistors. In this case, the power efficiency of the CML is equal to that achieved by optimizing with a unitary transistor, since a given delay reduction is achieved at the price of an equal increase of the power consumption.

On the other hand, with a small capacitive load, the delay reduction is low compared to the increase in bias current, which is proportional to $N$ from (5.12). Therefore, an inconvenient amount of power dissipation is needed.

Unless under very high speed constraints, the above approach is seldom useful, since the other fundamental target is usually to minimize the power dissipation, and a bias current greater than $I_{S S o p}(1)$ is rarely tolerable. Thus, in practical cases a different approach should be followed, as explained in the following. In particular, let us consider relationship (5.10) for the fixed bias current value $I_{S S o p}(1)$ and optimize it for parameter $N$ (i.e., differentiate
it for $N$ and, setting the result to zero, solve for an integer $N$ ). The resulting optimum value of $N$ is

$$
\begin{equation*}
N^{\prime}=\operatorname{int}\left(\sqrt{\frac{a}{b_{1}}} I_{\text {SSop }}(1)\right)=\operatorname{int}\left(\frac{I_{\text {SSop }}(1)}{\left.I_{\text {SSop }}(1)\right|_{C_{L}=0}}\right) \tag{5.14}
\end{equation*}
$$

where the function $\operatorname{int}(x)$ gives the maximum integer value that is not greater than the argument $x$.

From (5.14), the number of unitary transistors needed to reduce the propagation delay for $I_{S S}=I_{S S o p}(1)$ is equal to the ratio of the optimum current with a unitary area and the optimum current with a unitary area evaluated without a load capacitance. Using these design criteria, after substituting (5.14) and (5.8) into (5.10), the resulting delay is

$$
\begin{equation*}
\tau_{P D}\left(I_{S S o p}(1), N^{\prime}\right)=2 \sqrt{a b_{1}}+b_{0} \sqrt{\frac{a}{b_{1}+b_{0}}}+c \tag{5.15}
\end{equation*}
$$

By comparing relationship (5.15) with that evaluated under unitary area (5.9), we see that the emitter area optimization with the bias current set to $I_{S S o p}(1)$ allows for a delay reduction at the cost of an increased area occupation.

The increase in the transistor area can also be helpful when the optimum current for the unitary case, $I_{S S o p}(1)$, leads to non-negligible high-injection level effects, that determine an increase in the transistor transit time and thus reduce the transistor speed performance [AM88]. However, it is worth noting that the increase with respect to the unitary area leads to a proportional increase in the input capacitance of the CML gate, from (4.23). Thus, if the gate under design is the main load of a previous gate, it may not be convenient to design a transistor with a non-unitary area.

Obviously, the same design criteria highlighted in Section 5.1 using a current which is $40 \%$ lower than the optimum value can be applied here to reduce the power consumption with respect to the optimum case, without significantly degrade the speed performance.

### 5.2.3 Design examples

To better understand the design procedure previously described, let us consider the design of a CML inverter gate, by using both the BiCMOS and the HSB2 technology discussed in Chapter 4, and assuming fan-out values equal to 1 and 10 . By assuming the gates to be powered with 5 V and with a
logic swing $V_{\text {SWING }}=500 \mathrm{mV}$, we obtain the data summarized in Tables 4.34.4.

As already discussed in Section 4.8, for the BiCMOS technology the two load conditions are equivalent to 120 fF and 1.2 pF load capacitance, respectively. The resulting optimum bias currents from (5.8) are 0.7 mA and 1.8 mA , which lead to a propagation delay of 129 ps and 277 ps (the values are very close to those given by SPICE simulations, which are 126 ps and 276 ps ). For a fan-out of 10 the current is slightly higher than the highinjection level, but there is no appreciable degradation in the propagation delay. However, since the optimum current evaluated with $C_{L}=0$ is 0.4 mA , according to (5.14) the delay can be further reduced to 166 ps by setting the number of unitary transistors equal to 4 . In this case, the simulated delay is $15 \%$ higher than that predicted analytically.

For the HSB2 technology, the two fan-out conditions above considered are equivalent to 100 fF and 1 pF load capacitance, respectively. The optimum bias currents are 3.4 mA and 9.4 mA , which lead to a nominal propagation delay of 19 ps and 42 ps . While for the former case the analytical results are very close to the SPICE simulation, which gives 21 ps , with the bias current of 9 mA the transistors suffers from strong highinjection level effects, and the analytical propagation delay obtained is not realistic. According to (5.14), we set the number of unitary transistors to 3 . Thus the delay is reduced to 30 ps , which differs from the simulated value by $10 \%$, and avoids the strong condition of high-injection level.

A reduction of power dissipation can be achieved by decreasing the bias current by $40 \%$ with respect to the optimum value, that only leads to a small increase in the propagation delay, as discussed in Section 5.1.

### 5.3 OPTIMIZED DESIGN OF THE ECL INVERTER

The delay of an ECL gate given by (4.15) consists of two main terms (see Section 4.5), one associated with the inner CML gate and the other due to the output buffer, thus we split the optimized design into two steps.

First the bias current $I_{S S}$ of the CML stage has to be computed to minimize its delay $\tau_{P D(C M L)}$. As discussed in Section 4.5, the delay contribution $\tau_{P D(C M L)}$ of the inner CML gate is evaluated by setting its load capacitance to zero, thus its optimum bias current is found by following the same strategy as in Section 5.2 by setting $C_{L}=0$ into (5.8). From (5.2) the optimum bias current results to be

$$
\begin{equation*}
I_{\text {SSop }}=\frac{V_{\text {SWING }}}{2} \sqrt{\frac{C_{b c i}+C_{c s 1}}{2\left(r_{e}+r_{b} b^{\prime}\right) \tau_{F}}} \tag{5.16}
\end{equation*}
$$

The delay contribution of the output buffer $\tau_{P D(C C)}$ in (4.15) should also be reduced to minimize the ECL inverter delay. However, relationship (4.15) tends asymptotically to a minimum value for the output buffer bias current $I_{C C}$ approaching infinity. Therefore, a design strategy to keep (4.15) as low as possible with reasonable values of $I_{C C}$ is required. To this end, observe that, during the falling output signal transition, if the emitter voltage cannot follow the base voltage at the same speed, the CC stage cuts off, and this results in a much larger delay with respect to the value resulting from relationship (4.15), that was derived by assuming the CC stage to be working in the linear region. As a consequence, we have to guarantee that the CC stage works almost in the linear region most of the time.

A simple design strategy allowing the output buffer transistor to work in the linear region was proposed in [KB92]. This strategy is based on the consideration that the medium current value which discharges the load capacitance is $C_{L} \Delta V_{O} / \Delta t$, where $\Delta V_{O}$ is the voltage swing at each output node (equal to $V_{S W I N G} / 2$ ) and $\Delta t$ is the time required for the output transition (approximately equal to $2 \tau_{P D(C C)}$, since the delay $\tau_{P D(C C)}$ is needed for a half output transition). Thus, in [KB92] the bias current $I_{C C}$ is set equal to the average discharge current (i.e., $I_{C C}=C_{L} V_{S W I N G} / 2 \tau_{P D(C C)}$ ), and a bias current greater than this critical value would not substantially reduce the delay. However, this choice has two drawbacks. The first is the dependence of $\tau_{P D(C C)}$ on $I_{C C}$, that from the expression $I_{C C}=C_{L} V_{S W I N G} / 2 \tau_{P D(C C)}$ leads to a dependence of $I_{C C}$ on itself. The second is the linear dependence of $I_{C C}$ on the load capacitance, which can determine excessive power dissipation for the practical ranges of $C_{L}$. To overcome these drawbacks, an alternative approach is outlined below.

The delay of the common collector in (4.15) can be written as

$$
\begin{align*}
& \tau_{P D(c c)}=1.6 \sqrt{\left(R_{C}+r_{b}\right)\left(\tau_{F}+\frac{V_{T}}{I_{C C}} C_{j e 2}\right) C_{L}} \\
& =1.6 \sqrt{\left(\frac{\tau_{P D(c c) \min }}{1.6}\right)^{2}+\left(R_{C}+r_{b}\right) \frac{V_{T}}{I_{C C}} C_{j e 2} C_{L}} \tag{5.17}
\end{align*}
$$

where $\tau_{P D(C C) \text { min }}$ is the asymptotic value of $\tau_{P D(C C)}$ reached for $I_{C C} \rightarrow \infty$ (i.e. $\left.\tau_{P D(C C) \min }=1.6 \sqrt{\left(R_{C}+r_{b}\right) C_{L} \tau_{F}}\right)$. Therefore, after defining the output buffer delay normalized to its asymptotic minimum $\tau_{N}$ (i.e., $\left.\tau_{N}=\tau_{P D(C C)} / \tau_{P D(C C) m i n}\right)$, relationship (5.17) can be inverted to express the bias current as

$$
\begin{equation*}
I_{C C}=\frac{C_{j e 2}}{\left(\tau_{n}^{2}-1\right) \tau_{F}} V_{T} \tag{5.18}
\end{equation*}
$$

that allows for evaluating the bias current $I_{C C}$ needed to achieve a given value of $\tau_{n}$. From simulations, it was found that the resulting $I_{C C}$ maintains the CC stage in the linear region most of the time if $\tau_{n}<1.4$. Therefore, under this condition, the delay evaluation in Section 4.5 and thus relationship (5.18) are valid.

As a rule of thumb to achieve a low CC propagation delay (i.e., close to the asymptotic value) with a low power dissipation, it is suggested to set $\tau_{n}=1.1$, that leads to a CC delay worse by $10 \%$ with respect to the asymptotic value. It is worth noting that, for practical load capacitance values, (5.18) gives a lower value of the bias current, compared to that resulting from [KB92]. This can be seen by comparing relationship (5.18) with that proposed in [KB92] (i.e., $\left.I_{C C}=C_{L} V_{S W I N G} / 2 \tau_{P D(C C)}\right)$. Indeed, straightforward calculations show that the latter is lower for load capacitance values such that

$$
\begin{equation*}
C_{L}<\frac{10}{\left(\tau_{n}^{2}-1\right)^{2}}\left(\frac{V_{T}}{V_{S W I N G}}\right)^{2} \frac{C_{j e 2}^{2}}{C_{D 2}} \approx \frac{C_{j e 2}^{2}}{C_{D 2}} \tag{5.19}
\end{equation*}
$$

where the right-hand side represents a capacitance much lower than the input capacitance of a bipolar current-mode gate (4.23), since $C_{j e 2} / C_{D 2}$ is surely lower than unity and $C_{j e 2}$ is lower than the input capacitance (4.23). Therefore, in practical cases where at least a unitary fan-out is considered, condition (5.19) is certainly satisfied.

By following the approach discussed, the power dissipation of the optimized ECL inverter is found to be independent of the load capacitance. It is, in fact, composed of the CML and CC stage contributions which are constant with respect to $C_{L}$. In order to reduce the power dissipation, the bias current $I_{S S}$ can be conveniently chosen lower than the optimum value (5.16) by $40 \%$, as discussed in Section 5.1.

To better understand the design criteria introduced for the bias current of the output buffers, let us consider the HSB2 technology and design the ECL inverter gate for a fan-out of 1 and 10 . The optimum bias currents are independent of the load, and their values are $I_{S S o p}=1.7 \mathrm{~mA}$ and $I_{C C}=0.8 \mathrm{~mA}$. The delay of the differential stage is independent of the load, and its value is 12 ps . Thus the resulting propagation delays are 30 ps and 72 ps for a load capacitance of 0.1 pF and 1 pF , respectively. These values are very close to 28 ps and 86 ps which are those given by SPICE.

Regarding the transistor area, the transistors of the inner CML gate must be minimum sized. Indeed, the CML delay is evaluated for $C_{L}=0$, thus coefficient $b_{0}$ in (5.11a) is equal to zero and, from relationship (5.13), the delay cannot be reduced by increasing the emitter area, or equivalently the factor $N$.

From simulations, it can be seen that there exists an optimum emitter area of the output buffers transistors that minimizes the delay. This is because the area increase leads to a reduction of resistance $r_{b}$ in the buffer delay (4.14) (even though this is not significantly beneficial, since $R_{C} \gg r_{b}$ ) and at the same time to an increase in the capacitance $C_{b c}$ (that was neglected in (4.14)). Therefore, due to the two opposite effects of the buffer delay, an optimum exists. However, since the buffer delay is only slightly affected by the emitter area, this optimization leads to a very low delay reduction, that is in the order of a few percentage points, as found from simulations.

From these considerations, it is clear that the output buffer transistors' area should be increased only to avoid high-injection level effects. To be more specific, the high-injection condition must be avoided at the average current that flows in the buffer transistors, that is evaluated in the following according to the method in [KB92]. Since the buffer delay is defined as the time needed by each output node to cross half of its overall voltage swing (that is equal to $V_{S W I N G} / 2$ ), the average value of the current flowing in the load capacitance during the charge transition is

$$
\begin{equation*}
\bar{i}_{C_{L}}=\frac{V_{S W I N G}}{4} \frac{C_{L}}{\tau_{P D(C C)}} \tag{5.20}
\end{equation*}
$$

where $\bar{i}_{C_{L}}$ is equal to the difference between the average emitter current, $\bar{i}_{E}$, and the bias current $I_{C C}$. As a result, by substituting the optimized bias current (5.18) into (5.20), the average emitter current is equal to

$$
\begin{equation*}
\bar{i}_{E}=I_{C C}+\frac{V_{S W I N G}}{4} \frac{C_{L}}{\tau_{P D(C C)}}=\frac{C_{j e 2}}{\left(\tau_{n}^{2}-1\right) \tau_{F}} V_{T}+\frac{V_{S W I N G}}{4} \frac{C_{L}}{\tau_{P D(C C)}} \tag{5.21}
\end{equation*}
$$

To avoid high-injection level effects, the emitter area of the buffer transistors has to be set to the ratio of the average emitter current (5.21) and the maximum current after which the high-injection level occurs in a minimum-area device, that is a characteristic parameter of the process used. For both the BiCMOS and HSB2 technologies, the emitter area increase is not required, because of the high value of the current after which the highinjection level occurs.

### 5.4 COMPARISON BETWEEN THE CML AND ECL INVERTER

It is a common belief that ECL gates exhibit superior speed performance compared to CML gates. However, this does not appear so evident by comparing these gates with an equal bias current per gate. Indeed, the CML bias current is entirely spent to provide current $I_{S S}$, while in the ECL gate the current is divided between $I_{S S}$ and the bias current $I_{C C}$ of each of the two output buffers. Therefore, in the ECL gate, the delay of the inner CML may be lower than that of the CML gate (due to the lower capacitive load), but the output buffer introduces a further delay (4.14). As a consequence, a quantitative analysis is required to better understand the performance of CML and ECL gates in terms of delay, power consumption and their tradeoff.

To compare the speed performance of CML and ECL gates designed for minimum delay, the optimum CML delay (5.3) is rewritten using (5.11) as

$$
\begin{equation*}
\tau_{P D(C M L) o p}=2 \sqrt{a\left(b_{1}+0.35 V_{S W N G} C_{L}\right)}+c \tag{5.22}
\end{equation*}
$$

Note that term $b_{1}$ represents term $b$ without the contribution due to the load capacitance

$$
\begin{equation*}
b_{1}=b-0.35 V_{\text {SWING }} C_{L} \tag{5.23}
\end{equation*}
$$

Thus the ECL delay can be rearranged as

$$
\begin{equation*}
\tau_{P D(C M L) o p}=2 \sqrt{a b_{1}}+c+1.6 \sqrt{\left(R_{C}+r_{b}\right) \tau_{F} C_{L}} \tag{5.24}
\end{equation*}
$$

By solving the inequality $\tau_{P D(C M L) o p}<\tau_{P D(E C L) o p}$, it is found that when $R_{C}>2.4 r_{e}+1.3 r_{b}$, which is generally true for high-speed technologies, $\tau_{P D(C M L)}$ is always lower than $\tau_{P D(E C L)}$ [AP991]. On the other hand, in the infrequent case $R_{C}<2.4 r_{e}+1.3 r_{b}$, the inequality $\tau_{P D(C M L) o p}<\tau_{P D(E C L) o p}$ is true for load capacitance values such that

$$
\begin{equation*}
C_{L}<15 \frac{\left(r_{e}+r_{b}\right)\left(R_{C}+r_{b}\right)}{\left(2.4 r_{e}+1.3 r_{b}-R_{C}\right)^{2}}\left(C_{b e}+C_{c s}\right) \tag{5.25}
\end{equation*}
$$

where the value given by the ratio is usually greater than one. The right-hand side of relationship (5.25) represents an impractically high load capacitance in high-speed circuits, thus in actual cases the CML gate is again faster than the ECL. Summarizing, the CML gates have always a better speed performance than ECL gates with modern technologies, and as opposite to
the traditional assumption, this property also holds for a high capacitive loads. To further illustrate this, the delay of optimized CML and ECL gate are plotted versus the load capacitance for the BiCMOS and HSB2 technologies in Figs. 5.2a and 5.2b, where a minimum-delay design under the conditions previously used is assumed.

The speed advantage of the CML gate is achieved at the cost of a higher power consumption, in some cases. Indeed, the optimized CML and ECL gates require a bias current equal to

$$
\begin{align*}
& I_{C M L}=I_{\text {SSop }}=\frac{V_{\text {SWING }}}{2} \sqrt{\frac{C_{b c}+C_{c s}+C_{L}}{2\left(r_{e}+r_{b}\right) \tau_{F}}}  \tag{5.26}\\
& I_{E C L}=I_{C M L o p}+2 I_{C C}=\left[\sqrt{\left.\frac{C_{b c 1}+C_{c s 1}}{2\left(r_{e}+r_{b}\right) \tau_{F}} \frac{V_{\text {SWING }}}{2}+\frac{2 C_{\text {je2 }}}{\left(\tau_{n}^{2}-1\right] \tau_{F}} V_{T}\right]}\right. \tag{5.27}
\end{align*}
$$

Delay (ps)


- $\operatorname{tpd}(\mathrm{CML})$ with Iss=Issop
-     - tpd(ECL) with Iss=Issop

Fig. 5.2a. CML and ECL delay versus $C_{L}$ for the optimized design (BiCMOS technology).


Fig. 5.2b. CML and ECL delay versus $C_{L}$ for the optimized design (HSB2 technology).

From a straightforward analysis, it can be seen that the current value given by relationship (5.26) is lower than that given by (5.27) for load capacitance values such that

$$
\begin{align*}
C_{L} & <\frac{1}{\tau_{n}^{2}-1}\left[4 \frac{\left(r_{e}+r_{b}\right) C_{j e 2}}{\left(\tau_{n}^{2}-1\right) \tau_{F}} \frac{V_{T}}{V_{\text {SWING }}}\right.  \tag{5.28}\\
& \left.+\sqrt{\frac{2\left(r_{e}+r_{b}\right)\left(C_{b c 1}+C_{c s 1}\right)}{\tau_{F}}}\right] \frac{V_{T}}{V_{\text {SWING }}} C_{j e 2}
\end{align*}
$$

The right-hand side of relationship (5.28) results equal to 0.6 pF and 0.1 pF , for the BiCMOS and HSB2 process, as illustrated in detail in Figs. 5.3a and 5.3 b , where the bias current required by the optimized CML and ECL gate is plotted versus the load capacitance.

Total bias current (uA)


- Total bias current for CML
-     - Total bias current for ECL

Fig. 5.3a. Total bias current of the CML and ECL inverters versus $C_{L}$ for optimized design (BiCMOS process).


Fig. 5.3b. Total bias current of the CML and ECL inverters versus $C_{L}$ for optimized design (HSB2 process).

From inspection of Figs. 5.3a and 5.3b, as predicted by relationship (5.28), the CML inverter implemented with the BiCMOS technology needs less power than the ECL gate for a load capacitance lower than 0.6 pF , i.e. a fan out of 5 as discussed in Section 4.8. Analogously, for the HSB2
technology the CML needs more power for a load capacitance higher than 0.1 pF , that corresponds to a unity fan-out. As a consequence, in practical cases where the fan-out is not lower than one, for the HSB2 technology the CML inverter always exhibits a greater dissipation than the ECL counterpart.

Finally, in order to analyze the trade-off between power and speed of the CML and ECL gate, their power-delay product is depicted in Figs. 5.4a and 5.4 b . From inspection of these figures, it is apparent that the power-delay product of the CML gate is always better than that of the ECL gate. In the same way, it can be seen that the same consideration holds for the CML and ECL gates designed with a bias current $40 \%$ lower than the optimum value.

In summary, a comparison of the CML and ECL inverter designed for minimum delay has been carried out. The analytical results demonstrate that the CML inverter can achieve the lowest delay, even though this advantage is sometimes obtained at the cost of a greater power dissipation. Moreover, the power efficiency of the CML gate is always better than the ECL gate, which means that the speed advantage of the former is greater than its power increase.

Power - delay product ( pJ )


Fig. 5.4a. Power-delay product of the CML and ECL inverter versus $C_{L}$ for optimized design (BiCMOS process).

Power - delay product (pJ)


- Power - delay product for CML
-     - Power - delay product for ECL

Fig. 5.4b. Power-delay product of the CML and ECL inverter versus $C_{L}$ for optimized design (HSB2 process).

### 5.5 OPTIMIZED DESIGN OF BIPOLAR CURRENT-MODE MUX/XOR AND D LATCH

In this section, the methodology presented in Section 5.2 is applied to the design of the MUX/XOR and D latch gates for best speed or an efficient power-delay trade-off.

### 5.5.1 Design of $M U X / X O R$ CML gates with minimum transistor area

Let us consider the MUX and XOR CML gates depicted in Fig. 5.5 and 5.6, respectively. For the sake of simplicity, let us assume the transistor emitter area to be minimum, while the more general case of a larger area will be dealt in the next subsection.

As in the case of a CML inverter in Section 5.4, under the assumption of minimum-area transistors, the only design parameter is the bias current $I_{S S}$, since the product $R_{C} I_{S S}$ is constant and is determined from the logic swing preliminarily assigned. The delay of a MUX/XOR as a function of $I_{S S}$ can be easily obtained from analytical manipulation of the delay model discussed in Section 4.6. Since the delay of a MUX/XOR depends on whether the switching input is applied to the lower or the upper level transistors, the two cases will be discussed separately.


Fig. 5.5. CML MUX gate.


Fig. 5.6. CML XOR gate.

Let us first consider the case where the switching input is applied to the lower level transistors Q1-Q2, for which the correspondent delay is modeled by relationship (4.18). As discussed for the inverter, by assuming $g_{m} r_{e} \ll 1$ and substituting $R_{C}=V_{S W I N G} / 2 I_{S S}$, the delay expression (4.18) can be rewritten in the form of relationship (5.1) with coefficients $a, b$ and $c$ being equal to [AP00]

$$
\begin{align*}
a & =0.69\left(4 \frac{r_{e l}+r_{b l}}{V_{S W I N G}} \tau_{F}+r_{b l} C_{b c i 1} \frac{r_{c l}+r_{e u}+\left(\frac{r_{b u}}{\beta+1}\right)}{2 V_{T}}\right) \approx 2.76 \frac{r_{e l}+r_{b l}}{V_{S W I N G}} \tau_{F}  \tag{5.29a}\\
b & =0.69\left[\frac{V_{S W I N G}}{2}\left(2 C_{b c 3}+C_{c s 3}+C_{j e 4}+2 C_{b c 5}+C_{c s 5}+C_{L}\right)+2 V_{T} C_{j e 3}\right. \\
& \left.+\left(C_{b c 1}+C_{c s 1}\right)\left(2 V_{T}+\frac{V_{S W I N G}}{2 \beta}\right)\right]  \tag{5.29b}\\
& \approx 0.35 V_{S W I N G}\left[2 C_{b c 3}+C_{c s 3}+C_{j e 4}+2 C_{b c 5}+C_{c s 5}+C_{L}\right. \\
& \left.+4 \frac{V_{T}}{V_{S W I N G}}\left(C_{b c 1}+C_{c s 1}+C_{j e 3}\right)\right]
\end{align*}
$$

$$
\begin{align*}
c & =0.69\left[r_{b l} C_{b c i 1}\left(2+\frac{V_{s W 1 N G}}{4 V_{T}(\beta+1)}\right)+\left(r_{c l}+r_{e u}+\frac{r_{b u}}{\beta+1}\right)\left(C_{b c 1}+C_{c s 1}\right)+\tau_{F}\right. \\
& \left.+\left(r_{e l}+r_{b l}\right) C_{j e 1}+\left(r_{c u}+r_{b u}\right)\left(C_{b c 3}+C_{b c 5}\right)+r_{c u}\left(C_{c s 3}+C_{c s 5}\right)+\left(r_{b u}+2 r_{e u}\right) C_{j e 4}\right](5.29 \mathrm{c})  \tag{5.29c}\\
& \approx 0.69\left[2 r_{b l} C_{b c i 1}+r_{c l}\left(C_{b c 1}+C_{c s 1}\right)+\left(r_{e l}+r_{b l}\right) C_{j e 1}+\tau_{F}\right. \\
& \left.+\left(r_{c u}+r_{b u}\right)\left(C_{b c 3}+C_{b c 5}\right)+r_{c u}\left(C_{c s 3}+C_{c s 5}\right)+\left(r_{b u}+2 r_{e u}\right) C_{j e 4}\right]
\end{align*}
$$

where coefficient $a$ (responsible for the delay increase for $I_{S S}>I_{S S o p}$ ) is essentially due to the base-emitter diffusion capacitance, and is equal to that of an inverter (5.7a). Coefficient $b$ consists of two terms, one proportional to
$V_{\text {SWING }}$ that accounts for the total capacitance at the output node, and the other proportional to $V_{T}$ that takes into account the total capacitance at the emitter node of Q3-Q6 (their contribution is lowered by a factor $4 V_{T} / V_{\text {SWING }}$ since they see the resistance (4.19) approximately equal to $1 / g_{m}$, that is lower than that seen by the output node, $R_{C}$, by the same factor). In analogy to the inverter gate, the weighted sum of capacitances in (5.29b) in brackets can be interpreted as an equivalent capacitance at the output node, i.e. it introduces the same delay contribution as if it was connected directly to the output node. By comparison with relationship (5.7b), this equivalent capacitance (and thus coefficient $b$ ) is roughly four times as that of an inverter for $C_{L} \rightarrow 0$ and is equal when the external load is dominant (i.e., for $C_{L} \rightarrow \infty$ ).

From (5.2) and (5.29), the optimum bias current for the MUX/XOR CML gate is

$$
\begin{align*}
I_{\text {SSop }} & =\frac{V_{\text {SWING }}}{2}  \tag{5.30}\\
& \cdot \sqrt{\frac{2 C_{b c 3}+C_{c s 3}+C_{j e 4}+2 C_{b c 5}+C_{c s 5}+C_{L}+4 \frac{V_{T}}{V_{S W I N G}}\left(C_{b c 1}+C_{c s 1}+C_{j e 3}\right)}{2\left(r_{e l}+r_{b l}\right) \tau_{F}}}
\end{align*}
$$

that is essentially proportional to $V_{\text {SWING }}$, as for the inverter gate. Indeed, the last addend in the numerator under the square root in (5.30) is significantly lower than the other, because it has a lower number of capacitances and is multiplied by the term $4 V_{T} / V_{S W I N G}$ that is lower than unity for practical values of the logic swing. By comparison of relationship (5.30) with (5.8), the MUX/XOR optimum bias current for a negligible load capacitance is more than twice as that of a CML inverter, but tends to the same optimum bias current for high values of the load capacitance.

Regarding the minimum delay, from (5.3) and (5.29), its expression results to

$$
\begin{align*}
\tau_{P D o p} & =2 \sqrt{\left(r_{e l}+r_{b l}\right) \tau_{F}} \\
& \cdot \sqrt{2 C_{b c 3}+C_{c s 3}+C_{j e 4}+2 C_{b c 5}+C_{c s 5}+C_{L}+4 \frac{V_{T}}{V_{S W I N G}}\left(C_{b c 1}+C_{c s 1}+C_{j e 3}\right)}(  \tag{5.31}\\
& \approx 2 \sqrt{\left(r_{e l}+r_{b l}\right) \tau_{F}} \cdot \sqrt{2 C_{b c 3}+C_{c s 3}+C_{j e 4}+2 C_{b c 5}+C_{c s 5}+C_{L}}
\end{align*}
$$

It is worth noting that, for the optimum bias current, the optimum propagation delay (5.31) is roughly twice that of an inverter for low values of the load capacitance, and tends to it for high values of $C_{L}$. As observed for relationship (5.30), the first addend under square root is significantly lower than the other, thus the minimum delay is basically independent of the logic swing. As a result, according to observations made for the inverter gate, the logic swing should be kept as low as possible to reduce the power consumption, since it does not give any benefit on speed performance.

Regarding the delay of a MUX/XOR CML gate associated with the switching of an input that is applied to transistors at the upper level (Q3-Q4 or Q5-Q6), its expression (4.20) can be rewritten in the form (5.1) with coefficients given by

$$
\begin{align*}
a & =0.69\left(4 \frac{r_{e}+r_{b}}{V_{\text {SWING }}} \tau_{F}+r_{b} C_{b c i 3,4} \frac{r_{c}}{2 V_{T}}\right) \approx 2.76 \frac{r_{e}+r_{b}}{V_{\text {SWING }}} \tau_{F}  \tag{5.32a}\\
b & =0.35 \cdot V_{\text {SWING }}\left(C_{b c 3,4}+C_{c s 3,4}+C_{b c 5,6}+C_{c s 5,6}+C_{L}\right)  \tag{5.32b}\\
& =0.35 \cdot V_{\text {SWING }}\left(2 C_{b c 3,4}+2 C_{c s 3,4}+C_{L}\right) \\
c & =0.69\left[r_{b} C_{b c i 3,4}\left(1+\frac{V_{S W I N G}}{4 V_{T}}\right)+r_{c}\left(C_{b c 3,4}+C_{c s 3,4}\right)+\left(r_{e}+r_{b}\right) C_{j e 3,4}\right] \tag{5.32c}
\end{align*}
$$

From inspection of relationships (5.32), coefficient $a$ is equal to that of an inverter in (5.7a), while coefficient $b$ is roughly twice as that of an inverter for $C_{L} \rightarrow 0$ and equal to it for $C_{L} \rightarrow \infty$. As a consequence, the optimum bias current

$$
\begin{equation*}
I_{S S o p}=\sqrt{\frac{b}{a}}=\frac{V_{S W I N G}}{2} \sqrt{\frac{2\left(C_{b c 3,4}+C_{c s 3,4}\right)+C_{L}}{2\left(r_{e}+r_{b}\right) \tau_{F}}} \tag{5.33}
\end{equation*}
$$

is roughly greater than that of an inverter by a factor of $\sqrt{2}$ for $C_{L} \rightarrow 0$ and equal to it for $C_{L} \rightarrow \infty$. The same consideration holds for the optimum MUX/XOR delay associated with an input that drives the transistors at the upper level

$$
\begin{equation*}
\tau_{P D o p} \approx 2 \sqrt{a b} \approx 2 \sqrt{\left(r_{e}+r_{b}\right) \tau_{F}\left(2 C_{b c 3,4}+2 C_{c s 3,4}+C_{L}\right)} \tag{5.34}
\end{equation*}
$$

### 5.5.2 Design of MUX/XOR CML gates with non-minimum transistor area and examples

In Section 5.5.1, the assumption of minimum-area transistors was made to simplify the analysis. Now let us consider the more general case where the emitter area of the transistors Q1-Q2 at the lower level is $N_{1}$ times the minimum area, while that of transistors at the upper level Q3-Q6 is $N_{2}$ times the minimum area.

First, let us consider a MUX/XOR gate with switching inputs applied to transistors Q1-Q2, whose delay can be evaluated by remembering that the minimum-area transistor capacitances have to be multiplied by the factor $N_{1}$ $\left(N_{2}\right)$ for lower transistors (upper transistors), whereas resistances are divided by the same factor. Therefore, by following the same procedure as in Section 5.2.2 for the inverter gate, delay can be written as

$$
\begin{equation*}
\tau_{P D}=a \frac{I_{S S}}{N_{1}}+\left(b_{0}+b_{1} N_{1}+b_{2} N_{2}\right) \frac{1}{I_{S S}}+c \tag{5.35}
\end{equation*}
$$

where $a$ and $c$ are still given by (5.29a) and (5.29c) that were derived under the assumption of minimum area, while $b_{0}, b_{1}$ and $b_{2}$ are

$$
\begin{align*}
& b_{0}=0.35 V_{S W I N G} C_{L}  \tag{5.36a}\\
& b_{1}=1.38 V_{T}\left(C_{b c 1}+C_{c s 1}+C_{j e 3}\right)  \tag{5.36b}\\
& b_{2}=0.35 V_{\text {SWING }}\left(2 C_{b c 3}+C_{c s 3}+C_{j e 4}+2 C_{b c 5}+C_{c s 5}\right) \tag{5.36c}
\end{align*}
$$

where $b_{0}$ is due to the load capacitance, $b_{1}$ is due to the capacitance at the emitter node of Q3-Q6, while $b_{2}$ is due to the capacitance at the output node, according to considerations presented in Section 5.5.1.

From relationship (5.35), the delay increases as increasing the emitter area of transistors at the upper level, according to [SE96]. As a result, parameter $N_{2}$ has to be set to its minimum value (as close to unity as possible) that avoids high-injection effects.

As far as the area of transistors at the lower level is concerned, once $N_{2}$ is set on the basis of the considerations reported above, parameter $N_{1}$ can be optimized along with the bias current to reduce the delay (5.35). To be more specific, the optimum bias current that minimizes the delay as a function of $N_{1}$ is

$$
\begin{equation*}
I_{S S o p}\left(N_{1}\right)=\sqrt{\frac{b_{0}+b_{1} N_{1}+b_{2} N_{2}}{\frac{a}{N_{1}}}}=N_{1} \sqrt{\frac{1+\frac{b_{0}+b_{2} N_{2}}{b_{1} N_{1}}}{1+\frac{b_{0}+b_{2}}{b_{1}}}} I_{\text {SSop }}(1) \tag{5.37}
\end{equation*}
$$

while the optimum delay versus factor $N_{1}$ becomes

$$
\begin{equation*}
\tau_{\text {PDop }}\left(N_{1}\right)=2 \sqrt{\frac{a}{N_{1}}\left(b_{0}+b_{1} N_{1}+b_{2} N_{2}\right)}=\sqrt{\frac{1+\frac{b_{0}+b_{2} N_{2}}{b_{1} N_{1}}}{1+\frac{b_{0}+b_{2}}{b_{1}}}} \tau_{\text {PDop }}(1) \tag{5.38}
\end{equation*}
$$

where $I_{S S o p}(1)$ and $\tau_{P D o p}(1)$ refer to the case with minimum transistors Q1-Q2 (i.e., $N_{1}=1$ ). As already observed for the inverter gate in Section 5.3.2, the term in the square root of (5.38) is lower than unity, hence a delay reduction is allowed by increasing the emitter area of transistors Q1-Q2. This speed improvement is obtained at the cost of a greater power dissipation from (5.37). To better understand this point, it is worth noting that $N_{1}$ is always much lower than $\left(b_{0}+b_{2}\right) / b_{1}$ in relationship (5.37)-(5.38). This can be easily verified by analytically expressing $\left(b_{0}+b_{2}\right) / b_{1}$

$$
\begin{equation*}
\frac{b_{0}+b_{2}}{b_{1}}=\frac{V_{S W I N G}}{4 V_{T}} \frac{C_{L}+2 C_{b c 3}+C_{c s 3}+C_{j e 4}+2 C_{b c 5}+C_{c s 5}}{C_{b c 1}+C_{c s 1}+C_{j e 3}} \tag{5.39}
\end{equation*}
$$

that is very high with respect to practical values of $N_{1}$. Indeed, even in the pessimistic case of $C_{L}=0$, the ratio of capacitances in (5.39) is greater than 2, while term $V_{S W I N G} / 4 V_{T}$ is greater than 4 since the logic swing is at least 400 mV . From these considerations, $\left(b_{0}+b_{2}\right) / b_{1} N_{1}$ results much greater than unity, hence the delay reduction and the bias current increase are both proportional to $\sqrt{N_{1}}$, from (5.37) and (5.39). As for the case of the inverter gate, the power efficiency is equal to that achieved by optimizing with a unitary transistor.

In general, except for the case of very heavy speed constraints, the approach discussed above is seldom useful, since a bias current greater than $I_{S S o p}(1)$ is rarely tolerable. Thus, the alternative approach to the area optimization developed for the inverter gate in Section 5.3.2 has to be used. To be more explicit, let us minimize delay (5.35) for parameter $N_{1}$ with the fixed bias current value $I_{S S o p}(1)$, thus leading to the following optimum value of $N_{1}$

$$
\begin{equation*}
N_{1}^{\prime}=\operatorname{int}\left(\sqrt{\frac{a}{b_{1}}} I_{\text {SSop }}(1)\right) \tag{5.40}
\end{equation*}
$$

where the function $\operatorname{int}(x)$ was defined in Section 5.2.2. For the design criteria (5.40), by substituting (5.30) into (5.35), the resulting delay is

$$
\begin{equation*}
\tau_{P D}\left(I_{S S o p}(1), N_{1}^{\prime}\right)=2 \sqrt{a b_{1}}+\left(b_{0}+b_{2} N_{2}\right) \sqrt{\frac{a}{b_{0}+b_{1}+b_{2} N_{2}}}+c \tag{5.41}
\end{equation*}
$$

By comparing relationship (5.41) to the delay with a unitary area (5.31), the emitter area optimization with the bias current set to $I_{S S o p}(1)$ allows for a delay reduction at the cost of an increased area. As already discussed for the inverter gate, this area increase leads to a proportional increase in the input capacitance (4.23) of the CML gate, which slows down the response of the previous gate. As a consequence, the area increase is helpful only when the optimum current for the unitary case, $I_{S S o p}(1)$, leads to non-negligible highinjection level effects.

Regarding the MUX/XOR gate with switching inputs applied to transistors Q3-Q6, results are equal to those obtained for an inverter gate. Indeed, as clarified in Section 4.6, in this case delay is equal to that of an inverter with properly modified capacitances. Therefore, exactly the same considerations as in Section 5.2.2 hold by substituting factor $N_{2}$ to parameter $N$, as well as using unitary-area coefficients (5.32) instead of (5.7).

### 5.5.3 Design of the CML D latch

The results presented in the previous subsections can be easily extended to the case of a CML D latch. Indeed, as observed in Section 4.9, the only difference between the XOR and the D latch is an additive capacitance (4.23) associated with cross-coupled transistors Q5-Q6. As it will be demonstrated above, this leads to an increase in coefficient $b$ with respect to the MUX/XOR gate, while the other coefficients in (5.29a) and (5.29c) do not change. Therefore, from (5.2) and (5.3), for negligible load capacitances, the increased load due to transistors Q5-Q6 leads to an optimum bias current and delay worse than those found for XOR and the inverter gates. However, the difference is strongly reduced for load conditions in which the load capacitance is dominant. In the following a more detailed analysis is presented.

When the delay is evaluated for an input applied to the lower transistors and for a minimum-area design, coefficient $b$ becomes

$$
\begin{align*}
b & \approx 0.35 V_{S W I N G}\left(2 C_{b c 3}+C_{c s 3}+C_{j e 4}+2 C_{b c 5}+C_{c s 5}+C_{L}\right.  \tag{5.42}\\
& \left.+C_{b c x 5,6}\left(1+\frac{V_{S W I N G}}{4 V_{T}}\right)+4 \frac{V_{T}}{V_{\text {SWING }}}\left(C_{b c 1}+C_{c s 1}+C_{j e 3}\right)\right]
\end{align*}
$$

where, as usual, it was assumed $g_{m} r_{e} \ll 1$ and substituted $R_{C}=V_{S W I N G} / 2 I_{S S}$. Hence, the optimum value of the bias current results to
$I_{\text {SSop }}=\frac{V_{\text {SWING }}}{2}$
$\sqrt{\frac{2 C_{b c 3}+C_{c c 3}+C_{j e 4}+2 C_{b c 5}+C_{c s 5}+C_{L}+C_{b c r 5,6}\left(1+\frac{V_{s W I N G}}{4 V_{T}}\right)+4 \frac{V_{T}}{V_{s W i N G}}\left(C_{b c 1}+C_{c s 1}+C_{j e 3}\right)}{2\left(r_{e l}+r_{b l}\right) \tau_{F}}}$
$\tau_{\text {PDop }}=2 \sqrt{\left(r_{e l}+r_{b l}\right) \tau_{F}}$
$\cdot \sqrt{2 C_{b c 3}+C_{c s 3}+C_{j e 4}+2 C_{b c 5}+C_{c s 5}+C_{L}+C_{b c 55,6}\left(1+\frac{V_{s W W N}}{4 V_{T}}\right)+4 \frac{V_{T}}{V_{s W I N G}}\left(C_{b c 1}+C_{c s 1}+C_{j e 3}\right)}$
When transistors Q1-Q2 are allowed to have an emitter area greater than the minimum one by a factor $N_{1}$, as well as transistors Q3-Q6 by a factor $N_{2}$, the only modification to coefficients (5.36) of the MUX/XOR gate delay (5.35) with an input applied to lower transistors Q1-Q2 has to be introduced in $b_{2}$, that becomes

$$
\begin{equation*}
b_{2}=0.35 V_{S W I N G}\left[2 C_{b c 3}+C_{c s 3}+C_{j e 4}+2 C_{b c 5}+C_{c s 5}+C_{b c x 5,6}\left(1+\frac{V_{S W I N G}}{4 V_{T}}\right)\right] \tag{5.45}
\end{equation*}
$$

therefore the considerations on the sizing of $N_{1}$ and $N_{2}$ for the MUX/XOR gate in Section 5.5.2 are again extended to the D latch.

For the delay associated with an input driving transistors Q3-Q4 and minimum-area design, coefficient $b$ is obtained by substituting relationship (4.23) into (5.32b)

$$
\begin{equation*}
b=0.35 \cdot V_{\text {SWING }}\left[2 C_{b c 3,4}+2 C_{c s 3,4}+C_{L}+C_{b c x 5,6}\left(1+\frac{V_{\text {SWING }}}{4 V_{T}}\right)\right] \tag{5.46}
\end{equation*}
$$

whereas the other coefficients are given by (5.32a) and (5.32c). Once again, coefficient $a$ is equal to that of an inverter (5.7a), while coefficient $b$ is more than four times larger than that of an inverter for $C_{L} \rightarrow 0$ (more explicitly, by a factor 6.3 and 5 for the BiCMOS and the HSB2 process, respectively) and equal to it for $C_{L} \rightarrow \infty$.

As a consequence, the optimum bias current

$$
\begin{equation*}
I_{\text {SSop }}=\sqrt{\frac{b}{a}}=\frac{V_{\text {SWIGG }}}{2} \sqrt{\frac{2\left(C_{b c 3,4}+C_{c s 3,4}\right)+C_{L}+C_{b x x 5,6}\left(1+\frac{V_{\text {sWIG }}}{4 V_{T}}\right)}{2\left(r_{e}+r_{b}\right) \tau_{F}}} \tag{5.47}
\end{equation*}
$$

that is roughly greater than that of an inverter by more than twice (more explicitly, by a factor 2.5 and 2.2 for the BiCMOS and HSB2 process, respectively) for $C_{L} \rightarrow 0$ and equal to it for $C_{L} \rightarrow \infty$. The same increase is observed in the optimum delay, whose expression is

$$
\begin{equation*}
\tau_{P D o p} \approx 2 \sqrt{\left(r_{e}+r_{b}\right) \tau_{F}\left[2 C_{b c 3,4}+2 C_{c c 3,4}+C_{L}+C_{b c c 5,6}\left(1+\frac{V_{S W N G}}{4 V_{T}}\right)\right]} \tag{5.48}
\end{equation*}
$$

When area factor $N_{2}$ of transistors Q3-Q4 is also a design parameter, the same criteria as in Section 5.5 .2 can be used, by observing that the only difference with respect to the MUX/XOR gate is in coefficient $b_{2}$, that becomes

$$
\begin{align*}
b_{2} & =0.35 V_{\text {SWING }} \mid 2 C_{b c 3}+C_{c r 3}+C_{j e 4}+2 C_{b c 5}  \tag{5.49}\\
& \left.+C_{c s 5}+C_{b c 5,6,6}\left(1+\frac{V_{\text {swING }}}{4 V_{T}}\right)\right]
\end{align*}
$$

whereas the other coefficients are given by relationship (5.29a), (5.29c), (5.36a) and (5.36b).

In regard to the ECL implementation of MUX/XOR gates, the design criteria for an inverter discussed in Section 5.3 are again valid by simply substituting the expression of coefficients $a, b$ and $c$ of MUX/XOR gates to those of the inverter.

### 5.5.4 Design examples

To illustrate the proposed procedure in detail, the MUX/XOR gate with both the BiCMOS and the HSB2 technologies were designed assuming a supply voltage of $5 \mathrm{~V}, V_{\text {SWING }}=500 \mathrm{mV}$, thus Tables $4.2-4.3$ can be used. Very similar results can be obtained for the D latch.

The fan-out was assumed to be equal to 1 and 10 , equivalent to a load capacitance of 120 fF and 1.2 pF for the BiCMOS process, and a capacitance of 100 fF and 1 pF for the HSB2 technology, respectively.

For the BiCMOS process and applying the input to lower transistors, the two load conditions lead to an optimum bias current of 1 mA and 1.9 mA respectively, which lead to a delay of 243 ps and 338 ps . The values are very close to those given by SPICE simulations which yield 234 ps and 351 ps . Even though the current is slightly higher than 1.4 mA for a fan-out of 10 , the transistors work marginally at the high-injection level and there is no appreciable degradation in the propagation delay.

For the HSB2 process and the two fan-out conditions, the optimum bias current is 4.8 mA and 10 mA respectively, which leads to a high power dissipation and is significantly greater than the high-injection level. Hence, we can use the $60 \%$ optimum bias current, i.e. a bias current of 2.8 mA and 6 mA for the two fan-out conditions, respectively. In the former case minimum-size transistors can be still used and the resulting delay is 45 ps which has an error of $15 \%$ with respect to the value predicted. In the second case the bias current is excessively higher than the high-injection level and we have to use a transistor with an area 3 times the minimum size. Under this condition a simulated delay of 69 ps is obtained, that differs by $13 \%$ from the theoretical value.

### 5.6 SUMMARY AND REMARKS

In this chapter, a design methodology for CML and ECL gates has been discussed. This strategy allows for sizing the bias current, the load resistance and the transistor area to efficiently manage the power-delay trade-off, that is crucial in this kind of logic.

In regard to CML gates, their general delay expression (5.1) introduced permits to use the same design approach regardless of the specific logic gate considered. Indeed, although the general delay expression (5.1) was validated for the case of the inverter, MUX, XOR and D latch gates, it can be immediately extended to the other CML gates. This is because delay can always be written as the sum of time constants (4.4), thus for more complex gates, and even more series-gating levels, the capacitances and the resistances seen by them have the same dependence on the bias current as
the gate considered. As a result, analysis based on relationship (5.1) is extremely general, and the only difference among the various gates is in the expression of coefficients $a, b$ and $c$ (the latter of which is negligible and does not affect the optimization).

In the design approach discussed, coefficient $a$ is due to the base-emitter diffusion capacitance, whose contribution to the delay increases as increasing the bias current. Its expression is always the same, regardless of the considered gate. Coefficient $b$ is due to the other capacitances, that can be represented with an equivalent capacitance at the output node (i.e. as the gate were a simple inverter), and this contribution to delay decreases as increasing the bias current. This explains why delay can always be minimized for the bias current by relationship (5.2), that makes the two contributions equal.

By following this approach, the power-delay trade-off was shown to be represented in the general form (5.5), that holds regardless of the specific logic gates considered. From this representation, it was shown that the power-delay product increases as increasing the bias current. As a consequence, the power consumption can be reduced without significantly affecting the speed performance by reducing the bias current with respect to the optimum value. The rule of thumb proposed to achieve a favorable power efficiency is to set the bias current equal to $60 \%$ of the optimum value, that allows for a $40 \%$ power saving and only a $10 \%$ delay increase with respect to the optimum case.

Various criteria to size the transistors' emitter area were discussed. However, unless heavy speed constraints are imposed, the area must eventually be increased only to the minimum value that avoids the highinjection level effects. When a very high-speed performance is required, the suggested approach is to use the optimum bias current evaluated for minimum emitter area, and then to properly increase the emitter area.

A design strategy was also proposed for ECL gates to obtain nearly the minimum achievable delay. The approach discussed is independent of the specific gate considered and allows a suitable output buffer bias current to be found. The bias current obtained is demonstrated to be independent of the load capacitance for high-speed design cases.

A comparison of the CML and ECL inverter designed for high-speed design was also carried out. In contrast with the usual belief, results demonstrate that the CML inverter can achieve the lowest delay, even though this advantage is sometimes obtained at the cost of a greater power dissipation. Moreover, the power efficiency of CML gates is always better than ECL gates, i.e. the speed advantage of CML gates is greater than their power increase with respect to ECL gates.

## Chapter 6

## MODELING OF MOS CURRENT-MODE GATES

In Chapter 4, a strategy to model the delay of bipolar Current-Mode gates has been discussed and applied to several gates. In this chapter, the same subject is investigated for CMOS Current-Mode logic circuits by extending the fundamental ideas introduced in Chapter 4.

### 6.1 INTRODUCTION TO THE DELAY MODELING OF MOS CURRENT-MODE GATES

The bipolar CML/ECL delay model can be extended to MOS SCL gates by properly simplifying the equations of the BSIM3v3 model currently used for submicron MOS transistors [CH99], whose complex equations are unsuitable for formulating an efficient model and understanding the delay dependence on design and process parameters. To simplify the general analysis of CMOS Current-Mode circuits, the PMOS active load included in every CMOS SCL gate is approximated as an equivalent resistor, as discussed in Section 2.4.1. Parasitic capacitances of MOS transistors will also be evaluated and linearized starting from the exact relationships of the BSIM3v3 model. Moreover, the relationship between the current and the gate-source voltage of NMOS transistors is simplified by introducing an equivalent linear transconductance. Once NMOS and PMOS transistors are properly linearized, the approach previously used for bipolar gates can be applied to CMOS SCL gates.

In this chapter, a methodology to evaluate the delay of MOS CurrentMode gates is developed and applied to the inverter, MUX, XOR and D latch. Gates both with and without an output buffer are considered, and the effect of the input waveform rise time on the delay is also treated. To the
best authors' knowledge, except for works written by themselves [AP01], [AP02], no other approaches to the delay modeling of CMOS Current-Mode gates have been proposed in the literature.

### 6.2 MODELING OF THE SOURCE-COUPLED INVERTER

Let us consider the SCL inverter gate in Fig. 6.1, where $I_{S S}$ is the bias current, $V_{D D}$ is the supply voltage and $C_{L}$ is the external load capacitance that accounts for the wiring capacitance and the input capacitance of the driven gates. It is worth noting that, unlike CML gates, a negative supply is rarely used because SCL gates are typically used in mixed-signal circuits, where the same positive supply voltage is used for analog blocks.


Fig. 6.1. Source-Coupled inverter gate.

In this section, the delay $\tau_{P D}$ of the SCL inverter gate in Fig. 6.1 is evaluated when the differential input $v_{i}=v_{i 1}-v_{i 2}$ switches. To simplify analysis, the PMOS active load is modeled as an RC circuit in Section 6.2.1. The timing analysis of the circuit in Fig. 6.1 is carried out in Subsection 6.2.2 by assuming a step input for the sake of simplicity, and is then extended to more general input waveforms in Section 6.2.3.

### 6.2.1 Circuit model of the PMOS active load

To simplify the circuit analysis of Fig. 6.1 and evaluate the SCL inverter delay, PMOS transistors M3-M4 can be approximated as a linear resistance $R_{D}$ given by relationship (2.38) in a DC condition. Moreover, to accurately model the transient behavior of the gate, the PMOS drain-bulk, $C_{d b, p}$, and gate-drain capacitance, $C_{g d, p}$, must also be accounted for.

In the BSIM3v3 model, the PMOS drain-bulk capacitance is associated with the depletion region of the drain-bulk junction, and consists of the two contributions in (1.55a) and (1.55b), the bottom capacitance and the sidewall capacitance [R96], [CH99]. Due to a wide variation of their voltage during switching, the two contributions must be evaluated by multiplying the zero-bias capacitances $C_{j}$ and $C_{j s w}$ by coefficients $K_{j, p}$ and $K_{j s w, p}$ respectively, according to relationship (1.13). Voltages $V_{1}$ and $V_{2}$ in (1.13) are evaluated as explained in Section 4.3 for the bipolar gates. For example, the minimum direct voltage $V_{1}$ seen by the drain-bulk capacitance of M2 occurs when the input voltage $v_{i}$ is high, hence the PMOS drain voltage is equal to ( $V_{D D^{-}}$ $V_{S W I N G} / 2$ ), thus resulting to $V_{1}=-V_{S W I N G} / 2$, since the bulk is connected to the supply $V_{D D}$. Instead, when $v_{i}$ is low, the PMOS drain voltage is $V_{D D}$, hence $V_{2}=0$.

From inspection of the MOS transistor layout in Fig. 6.2, junction perimeter and area result as $2\left(W_{p}+L_{d, p}\right)$ and $W_{p} L_{d, p}$, respectively, being $W_{p}$ the effective channel width and $L_{d, p}$ the other junction dimension (in Chapter 2, the symbol $L_{x}$ was used instead of $L_{d, p}$ ).


Fig. 6.2. Simplified layout of a MOS transistor.

From Fig. 6.2, the linearized drain-bulk capacitance of PMOS transistors results in

$$
\begin{equation*}
C_{d b, p}=K_{j, p} C_{j, p} W_{p} L_{d, p}+2 K_{j s w, p} C_{j s w, p}\left(W_{p}+L_{d, p}\right) \tag{6.1}
\end{equation*}
$$

In relationship (6.1), the bottom zero-bias capacitance $C_{j, p}$, as well as the built-in potential across the junction $p b$ and the grading coefficient $m j$ necessary to evaluate coefficient $K_{j}$, are standard BSIM3v3 model parameters. The same consideration holds for the parameters of the side-wall contribution (with subscript $s w$ ). Parameter $L_{d, p}$ is extrapolated from layout design rules.

The PMOS gate-drain capacitance $C_{g d, p}$ is equal to the sum of the overlap and the channel contributions. The former is equal to the product of the overlap capacitance $C_{g d 0}$, that is a standard BSIM3v3 parameter, and the channel width $W_{p}$. The latter is the intrinsic contribution associated with the channel charge of the PMOS transistors working in the triode region, $C_{g d p, p i n t}$. Unfortunately, the effects of the channel charge in short channel transistors cannot be treated in the same manner as for long-channel devices, since the decomposition of channel capacitance into gate-source and gate drain capacitances no longer applies in submicron technologies [YEC83], [T98], [CH99]. In the BSIM3v3 capacitance model, the channel charge transfer is modeled by transcapacitances $C_{i j}$, defined as

$$
\begin{equation*}
C_{i j}=\frac{\partial Q_{i}}{\partial V_{j}} \tag{6.2}
\end{equation*}
$$

where $Q_{i}$ is the total charge associated with the MOSFET terminal $i$, and $V_{j}$ is the voltage of the terminal $j$. Since transcapacitances $C_{i j}$ are not reciprocal, i.e. $C_{i j} \notin C_{j i}$, in general the charge flow in submicron MOSFET transistors cannot be modeled by two-terminal capacitances [T98], [CH99]. However, for the specific case of the PMOS active load in SCL gates, an equivalent intrinsic capacitance $C_{g d, p, i n t}$ can still be introduced to model the channel charge flowing through the drain terminal, $Q_{d}$. The channel charge flow through the drain is only due to the variation of the drain-gate voltage $V_{d}$ (the gate, source, and bulk voltages of transistors M3-M4 are fixed); therefore, it can be described by the capacitance $C_{d d}$, since by definition (6.2) it expresses the drain charge variation due to the variation of $V_{d}$. To evaluate the capacitance $C_{d d}$ of PMOS transistors, let us consider the expression of $Q_{d}$ in the strong inversion for a $40 / 60$ charge partitioning (as indicated by parameter Xpart equal to 1 in the transistor model) [CH99]

$$
\begin{align*}
Q_{d} & =-W_{p} L_{p} C_{O X}\left[\frac{V_{S G}-\left|V_{T}\right|}{2}-\frac{3 A_{\text {bulk }} V_{S D}}{4}+\frac{\left(A_{\text {bulk }} V_{S D}\right)^{2}}{8\left(V_{S G}-\left|V_{T}\right|-\frac{A_{\text {bukk }}}{2} V_{S D}\right)}\right]  \tag{6.3}\\
& \approx-W_{p} L_{p} C_{O X}\left[\frac{V_{S G}-\left|V_{T}\right|}{2}-\frac{3 A_{\text {bulk }} V_{S D}}{4}\right]
\end{align*}
$$

where $V_{S D} \ll \frac{V_{S G}-\left|V_{T}\right|}{A_{\text {bulk }}}$ was assumed, and the BSIM3v3 parameter $A_{\text {bulk }}$ is defined as

$$
\begin{align*}
A_{b u l k}= & \frac{1}{1+K_{E T A}\left|V_{S B}\right|}\left\{1+\frac{K_{10 X}}{2 \sqrt{\phi_{S}-\left|V_{S B}\right|}}\left[\frac{A_{0} L_{p}}{L_{p}+2 \sqrt{X_{J} X_{\text {dep }}}}\right.\right. \\
& \left.\left.\cdot\left(1-A_{G S}| | V_{G S}\left|-\left|V_{T}\right|\right)\left(\frac{L_{p}}{L_{p}+2 \sqrt{X_{j} X_{\text {dep }}}}\right)^{2}\right)+\frac{B_{0}}{W_{p}+B_{1}}\right]\right\} \tag{6.4}
\end{align*}
$$

which depends on $W_{p}, L_{p}$ and various other BSIM3v3 model parameters.
The dependence of (6.3) on $W_{p}$ and $L_{p}$ can be further simplified. Indeed, it can be evaluated in the worst case where $A_{\text {bulk }}$ is equal to its maximum value $A_{\text {bulk,max }}$. From relationship (6.4), parameter $A_{\text {bulk }}$ is maximized by setting $W_{p}$ to the minimum value allowed by the process (i.e. $W_{p}=W_{p, \text { min }}$ ) and by maximizing function (6.4) for $L_{p}$ with straightforward calculations. As an example, for the PMOS transistor in a $0.35-\mu \mathrm{m}$ CMOS process and $V_{D D}=3.3$ V , the maximum value of $A_{\text {bulk }}$ is $A_{b u l k, \max }=1.34$, that is slightly greater than unity as expected [CH99].

By differentiating relationship (6.3) for $V_{D}$ and approximating $A_{\text {bulk }}$ to its maximum value $A_{b u l k, \max }$, capacitance $C_{d d}$ results to

$$
\begin{equation*}
C_{d d}=\frac{\partial Q_{d}}{\partial V_{d}}=\frac{3}{4} A_{b u l k, \max } W_{p} L_{p} C_{O X} \tag{6.5}
\end{equation*}
$$

From inspection of relationship (6.5), it is apparent that the expression derived for long-channel devices (i.e., $C_{g d, p, \text { int }}=1 / 2 W_{p} L_{p} C_{O X}$ ) is inadequate.

From the above considerations, each of the two PMOS transistors implementing the active load of an SCL gate will be represented in the following by the RC circuit in Fig. 6.3.


Fig. 6.3. Equivalent linear circuit of the PMOS active load.

### 6.2.2 Delay model of the SCL inverter for a step input

To model the propagation delay of the SCL inverter in Fig. 6.1, it is useful to observe that the NMOS transistors M1-M2 work in the saturation region most of the time, and their source voltage is the same for both input logic values, since it is fixed by the NMOS transistor in the ON state (i.e. drawing a current $I_{S S}$ ). Thus, as already discussed for Current-Mode bipolar gates, the circuit can be linearized around the bias point with $v_{i}=0$ and then simplified by applying the half-circuit concept, since the circuit is symmetrical and a differential input is applied.

The equivalent linear half circuit obtained is shown in Fig. 6.4, where transistor M1 (M2) is represented by its small-signal model (subscript $n$ refers to NMOS transistors), while PMOS transistor is replaced by the equivalent circuit in Fig. 6.3. The capacitive effects associated with NMOS transistors consist of the drain-bulk junction capacitance $C_{d b, n}$ and the gatedrain capacitance $C_{g d, n}$, that pertain to NMOS transistors working in the saturation region. The former is a junction capacitance and can be linearized in the same way as capacitance $C_{d b, p}$, i.e. by multiplying its zero-bias value by a coefficient $K_{j}$ evaluated with $V_{1}=-V_{D D}+V_{S W I N G} / 2$ and $V_{2}=-V_{D D}$. Capacitance $C_{g d, n}$ is due to the overlap of the gate layer and the drain diffusion, thus it is equal to the product of the channel width $W_{n}$ and the BSIM3v3 model parameter $C_{\text {gdo }}$ representing the overlap gate-drain capacitance per unit channel width.


Fig. 6.4. Equivalent linear circuit of the SCL inverter gate.

The network in Fig. 6.4 is a first-order circuit with a time constant $\tau$ that can be evaluated by applying the open-circuit time constant method [CG73] and neglecting the high-frequency zero [LS94], whose effect on the transient behavior of the gate is a small negative initial overshoot during switching. By assuming the input waveform to be a unity step, the resulting delay is $0.69 \tau$, hence the propagation delay $\tau_{P D, S C L}$ of the SCL gate results to be

$$
\begin{equation*}
\tau_{P D, s C L}=0.69 R_{D}\left(C_{g d, n}+C_{d b, n}+C_{g d, p}+C_{d b, p}+C_{L}\right) \tag{6.6}
\end{equation*}
$$

The expression (6.6) is simple and, hence, can be profitably used in pencil-and-paper calculations. Moreover, it shows how delay depends on design and process parameters and allows the designer to get the necessary intuitive insight into the circuit behavior.

To evaluate the accuracy of relationship (6.6), its results were compared to Spectre simulations by adopting a $0.35-\mu \mathrm{m}$ CMOS process, whose main parameters are reported in Table 6.1, and BSIM3v3 parameters useful to evaluate capacitances in (6.6) are shown in Table 6.2.

TABLE 6.1

| $C_{O X}$ | $4.6 \mathrm{fF} / \mu \mathrm{m}^{2}$ |
| :---: | :---: |
| $\mu_{n o} C_{O X}$ | $175 \mu \mathrm{~A} / \mathrm{V}^{2}$ |
| $\mu_{p o} C_{O X}$ | $60 \mu \mathrm{~A} / \mathrm{V}^{2}$ |
| $(W / L)_{\min }$ | $0.6 \mu \mathrm{~m} / 0.3 \mu \mathrm{~m}$ |
| $V_{\text {Inv }}($ short channel $)$ | 0.54 V |
| $V_{T \rho 0}($ short channel $)$ | -0.72 V |
| maximum $V_{D D}$ | 3.3 V |

TABLE 6.2

| NMOS |  | PMOS |  |
| :--- | :--- | :--- | :--- |
| $p b$ | $6.9 \mathrm{E}-1$ | $p b$ | 1.02 |
| $m j$ | $3.1 \mathrm{E}-1$ | $m j$ | $5.5 \mathrm{E}-1$ |
| $c j$ | $9.3 \mathrm{E}-4$ | $c j$ | $1.42 \mathrm{E}-3$ |
| $p b s w$ | $6.9 \mathrm{E}-1$ | $p b s w$ | 1.02 |
| $m j s w$ | $1.9 \mathrm{E}-1$ | $m j s w$ | $3.9 \mathrm{E}-1$ |
| $c j s w$ | $2.8 \mathrm{E}-4$ | $c j s w$ | $3.8 \mathrm{E}-4$ |
| $L d$ | $1.1 \mathrm{E}-6$ | $L d$ | $1.1 \mathrm{E}-6$ |
| $c g s 0$ | $2.1 \mathrm{E}-10$ | $c g s 0$ | $2.1 \mathrm{E}-10$ |
| $c g d 0$ | $2.1 \mathrm{E}-10$ | $c g d 0$ | $2.1 \mathrm{E}-10$ |

Simulations were carried out under a variety of design and load conditions. The bias current was varied from $5 \mu \mathrm{~A}$ to $100 \mu \mathrm{~A}$, the transistors aspect ratios were sized to obtain the typical values $V_{S W I N G}=700 \mathrm{mV}$ and $A_{V}=4$, and the load capacitance $C_{L}$ was set to $0 \mathrm{~F}, 50 \mathrm{fF}, 200 \mathrm{fF}$ and 1 pF .

In Figs. $6.5 \mathrm{a}, 6.5 \mathrm{~b}, 6.5 \mathrm{c}$ and 6.5 d the simulated delay and that predicted by relationship (6.6) are plotted versus the bias current $I_{S S}$ for a load capacitance equal to $0 \mathrm{~F}, 50 \mathrm{fF}, 200 \mathrm{fF}$ and 1 pF , respectively. As expected, delay is decreased by increasing the bias current $I_{S S}$ and asymptotically tends to a constant value.


Fig. 6.5a. Simulated and theoretical delay vs. bias current $I_{S S}$ with $C_{L}=0 \mathrm{~F}$.


Fig. 6.5b. Simulated and theoretical delay vs. bias current $I_{S S}$ with $C_{L}=50 \mathrm{fF}$.


Fig. 6.5c. Simulated and theoretical delay vs. bias current $I_{S S}$ with $C_{L}=200 \mathrm{fF}$.


Fig. 6.5d. Simulated and theoretical delay vs. bias current $I_{S S}$ with $C_{L}=1 \mathrm{pF}$.

The model (6.6) is in agreement with simulated results, as is shown in Fig. 6.6, that reports the model error with respect to Spectre simulations versus $I_{S S}$, for the used values of the load capacitance. The maximum error in actual cases (i.e., with non-zero load capacitance) is equal to $15 \%$, and it is as high as $19 \%$ in the nonrealistic case with $C_{L}=0 \mathrm{~F}$. To get an idea about reasonable values used in practical cases, consider an SCL gate with $I_{B}=20$ $\mu \mathrm{A}, C_{L}=50 \mathrm{fF},(W / L)_{n}=8 / 0.3,(W / L)_{p}=0.6 / 0.7$ and $V_{D D}=3.3 \mathrm{~V}$. The predicted and simulated delay are respectively equal to 620 ps and 730 ps , and differ by $15 \%$.


Fig. 6.6. Error of (6.5) with respect to simulated results vs. bias current $I_{s j}$.

As observed for CML gates, the analytical model presented so far could be further improved by introducing unknown coefficients that multiply the circuit capacitances and evaluating them through minimization of the error functional (4.6). However, the model accuracy obtained is better than that of CML circuits and it is sufficient for most purposes, thus the improved model is a less interesting option for SCL gates.

### 6.2.3 Extension of the delay model to arbitrary input waveforms

In general, the delay of a logic gate depends on the input signal waveform [R96]. For example, this dependence cannot be neglected in the case of static CMOS logic [ADR00], [BNK98]. Therefore, in the following some considerations are introduced to extend the validity of the model developed in Section 6.2.2 for a step input.

To understand the effect of a non-zero input rise time, let us first consider the case where a step input is applied to the SCL inverter. For the sake of simplicity, assume that the input voltage $v_{i}$ abruptly switches from low to high. Before switching, the voltages $v_{o 1}$ and $v_{o 2}$ are equal to $V_{D D}$ and ( $V_{D D^{-}}$ $V_{S W I N G} / 2$ ), while after switching the steady-state voltages are ( $V_{D D}-V_{S W I N G} / 2$ ) and $V_{D D}$, respectively. During the switching, the bias current is abruptly steered on M1-M3 due to the step input voltage, and the left-hand half circuit is equivalent to an RC circuit driven by a constant current $I_{S S}$, while the right-hand half one is an RC circuit with no input current ${ }^{1}$.

When an input waveform with a non-zero rise time is applied, the current steering is more smoothed, thus slowing down the output voltage response and increasing the time $t_{\text {out, } 50}$ required to reach its $50 \%$ value (i.e., delay tends to be increased with respect to the step input case). At the same time, the time $t_{i n, 50}$ required by $v_{i}$ to reach its $50 \%$ value also increases as increasing the input rise time (i.e., delay tends to be reduced), therefore two opposite effects on delay (defined as $\left.\left(t_{o u t, 50}-t_{i n, 50}\right)\right)$ are observed. In SCL gates, the two effects tend to compensate each other, thus leading to a weak delay dependence on the input rise time.

The weak dependence of SCL gates delay on the input rise time is confirmed by simulations. To be more specific, delay of a single SCL gate was evaluated by using the CMOS process described above, for a load capacitance equal to $0 \mathrm{~F}, 50 \mathrm{fF}, 100 \mathrm{fF}, 200 \mathrm{fF}, 400 \mathrm{fF}$ and 1 pF , and a bias current ranging from $5 \mu \mathrm{~A}$ to $100 \mu \mathrm{~A}$. The considered gate was driven by a

[^10]realistic input waveform, obtained from the output voltage of either a single SCL gate or a chain made up of two to four SCL gates. A very wide range of conditions was considered, including the cases with an input waveform much faster or slower than the output waveform by up to two orders of magnitude. The results obtained are collected in the scattering plot of Fig. 6.7, where the step input delay of the considered SCL gate is reported in the x -axis and the actual delay for a realistic waveform on the y -axis. Analysis of Fig. 6.7 reveals that the actual delay is close to that with step input. More specifically, the difference between the two delay values is always lower than $24 \%$ even in unrealistically high input rise time, and it is typically much lower, since its average value is only $6.8 \%$. This means that the delay of an SCL gate is not very sensitive to the input waveform, in contrast to an ideal first-order circuit behavior.


Fig. 6.7. Scattering plot of the SCL inverter delay with actual input waveform vs. delay with step input waveform.

### 6.3 MODELING OF THE SOURCE-COUPLED INVERTER WITH OUTPUT BUFFERS

To improve the driving capability of the gate and therefore its switching speed, or to shift the common-mode value of the output nodes voltage for
reasons discussed in Chapter 2, an output buffer can be added to each output node, as shown in Fig. 6.8, where $v_{o}=v_{o l}-v_{o 2}$ is the differential output voltage of the gate, and $v_{i, b u f l}$ and $v_{i, b u f f}$ are the input voltages of the two buffers, respectively. Each output buffer is a source-follower stage biased by the current source $I_{S F}$.

The propagation delay $\tau_{P D, S C L b u f}$ of an SCL gate with the output buffer in Fig. 6.8 can be evaluated by applying the methodology described in Section 6.2 and extending it as done for ECL gates in Section 4.5. To be more specific, the delay of the inverter gate in Fig. 6.8 can be decomposed into the contribution $\tau_{P D, S C L}$ of the internal SCL gate and that of the buffer $\tau_{P D, \text { buf }}$ :

$$
\begin{equation*}
\tau_{P D, S C L \text { uf }}=\tau_{P D, S C L}+\tau_{P D, \text { buf }} \tag{6.7}
\end{equation*}
$$



Fig. 6.8. Source-Coupled inverter gate with output buffers.

The delay contribution of the internal SCL gate, $\tau_{P D, S C L}$, is given by (6.6) in which $C_{L}$ must be set to zero, while that of the buffer, $\tau_{P D, \text { buf }}$, is evaluated by driving the source-follower stage with the Thevenin equivalent circuit seen at the output of the internal SCL gate, modeled with a voltage source $V_{t h}$ and a resistance ${ }^{2} R_{t h}$. Hence, $\tau_{P D, \text { buf }}$ can be evaluated by analyzing the

[^11]circuit in Fig. 6.9, which depicts the linearized buffer circuit driven by the Thevenin equivalent circuit of the SCL gate.


Fig. 6.9. Equivalent linear circuit of output buffers driven by the internal SCL inverter.

In Fig. 6.9, capacitors $C_{g d, \text { buf }}$ and $C_{g s, \text { buf }}$ represent the gate-drain and the gate-source contributions, and are evaluated as small-signal capacitances in the saturation region. To be more specific, $C_{g d, b u f}$ is due to the overlap of the gate layer and the drain diffusion, thus it is equal to the product of the overlap capacitance per unit length, $C_{g d 0}$, and the effective buffer transistors' channel width, $W_{b u f}$. Regarding the capacitance $C_{g s, b u f,}$, noting that buffer transistors work in the saturation region most the time, it must be evaluated as $2 / 3 \cdot\left(W_{b u f} L_{b u f} C_{O X}\right)$. Moreover, since it is between two nodes whose voltage gain in (2.50) is close to unity, we have to account for a Miller effect [MG87]. Hence, the equivalent Miller capacitance at the input of the buffer (i.e., in parallel to $C_{g d, b u f}$ ) is given by $C_{g s, b u f}\left[1-1 /\left(1+\frac{g_{m b, b u f}}{g_{m, b u f}}\right)\right]$, where $g_{m b, \text { buf }}$ and $g_{m, b u f}$ are respectively the body effect transconductance and the transistor transconductance in (1.70) and (1.66), respectively, and their ratio is almost constant (typically, $g_{m b, b u f} / g_{m, b u f}$ ranges from 0.1 to 0.2 , and for the $0.35-\mu \mathrm{m}$ CMOS process used $\left.g_{m b, b u f} f g_{m, b u f} \equiv 0.13\right)$.
delay results as $0.69\left(b_{1}-a_{1}\right)$ [E48]. Circuit analysis, for example using the extra element theorem [M89], shows that adding $C_{\text {out }}$ to the internal SCL gate determines an equal increase in $a_{1}$ and $b_{1}$ by $R_{D} C_{o u t}$, therefore the delay is not affected by $C_{o u t}$.

The transconductance $g_{m, \text { buf }}$ of the transistor implementing the buffer can be evaluated from (1.66) as explained in Section 2.4.2, provided that in the effective electron mobility (2.33), $V_{D S}$ and $V_{S B}$ are approximated with their maximum values (i.e., $V_{G S}+V_{S W I N G} / 2$ and $V_{D D^{-}} V_{G S}$, respectively), and $V_{G S}$ is underestimated by $V_{T, n}$.

The circuit in Fig. 6.9 has a transfer function with two poles and one high-frequency zero, that exhibits a dominant-pole behavior for practical values of $I_{S S}$ and $C_{L}$. This is a fundamental difference with respect to ECL bipolar circuits, that have a second-order behavior, and is essentially due to the fact that MOSFET transistors usually have a much lower transconductance at a given bias current [LS94]. Detailed analysis shows that the second pole is greater than the first one at least by one order of magnitude.

From the previous considerations, the buffer circuit can be assimilated to a first-order system, whose delay is equal to 0.69 times its time constant. By applying the time constant method, the following expression of the buffer delay is obtained

$$
\begin{equation*}
\tau_{P D, b u f}=0.69\left[R_{D}\left(C_{g d, b u f}+C_{g s, b u f} \frac{\frac{g_{m b, b u f}}{g_{m, b u f}}}{1+\frac{g_{m b, b u f}}{g_{m, \text { buf }}}}\right)+\frac{C_{L}+C_{g s, b u f}}{g_{m, b u f}}\right] \tag{6.8}
\end{equation*}
$$

where the source-bulk capacitance was neglected with respect to $C_{L}$. From (6.8), the buffer delay is equal to the sum of two terms, whose circuital meaning is apparent: the first is proportional to $R_{D}$ and models the loading effect of the buffer on the internal SCL gate, the second is inversely proportional to $g_{m, b u f}$, and hence it depends on the buffer driving capability.

Accuracy of (6.7)-(6.8) was tested by extensive Spectre simulations using the $0.35 \mu \mathrm{~m}$ CMOS process described above, with bias current $I_{S H}$ ranging from $5 \mu \mathrm{~A}$ to $100 \mu \mathrm{~A}$, bias current $I_{S S}$ set to $5 \mu \mathrm{~A}, 20 \mu \mathrm{~A}, 50 \mu \mathrm{~A}$ and 100 $\mu \mathrm{A}$, and $V_{D D}$ set to 3.3 V . Moreover, the internal SCL transistors' aspect ratios were sized to obtain the typical values $V_{\text {SWING }}=700 \mathrm{mV}$ and $A_{V}=4$, while the buffer transistors' aspect ratios were set to $0.6 / 0.3,3 / 0.3$ and $6 / 0.3$, and the load capacitance $C_{L}$ was set to $0 \mathrm{~F}, 50 \mathrm{fF}, 200 \mathrm{fF}$ and 1 pF . As an example, the resulting curves of delay predicted by (6.7)-(6.8) and that simulated for $C_{L}=200 \mathrm{fF}$ and $(W / L)_{b u f}=0.6 / 0.3$ are plotted in Figs. 6.10a6.10 d , where $I_{S S}$ is set to $5 \mu \mathrm{~A}, 20 \mu \mathrm{~A}, 50 \mu \mathrm{~A}$ and $100 \mu \mathrm{~A}$, respectively.

From inspection of Figs. 6.10a-6.10d, the delay model agrees well with simulated results. Indeed, among the cases considered, the worst accuracy of $35 \%$ was found for $C_{L}=200 \mathrm{fF}, I_{S S}=5 \mu \mathrm{~A}$ and $I_{S F}=100 \mu \mathrm{~A}$. Note that such a
strong current unbalance between the internal SCL and the output buffers is not encountered in practical cases, since it leads to a buffer delay much lower than the internal SCL one (i.e., a very inefficient distribution of the total bias current used between the internal SCL gate and output buffers).

For other load and bias conditions error was found to be always lower, and for realistic values of bias currents $I_{S S}$ and $I_{S F}$ that lead to similar delay values for the internal SCL gate and the output buffer, the error was always lower than $20 \%$, and usually is much lower. Indeed, the average error for $(W / L)_{b u f}$ equal to $0.6 / 0.3,3 / 0.3$ and $6 / 0.3$ is $14.7 \%, 11.3 \%$ and $12.6 \%$, respectively.

To have an idea about reasonable values used in practical cases, consider an SCL gate with $I_{S S}=20 \mu \mathrm{~A}, I_{S F}=40 \mu \mathrm{~A}, C_{L}=50 \mathrm{fF}, \quad(W / L)_{n}=8 / 0.3$, $(W / L)_{p}=0.6 / 0.7, \quad\left(W^{\prime} L\right)_{b u f}=3 / 0.3$ and $V_{D D}=3.3 \mathrm{~V}$, whose predicted and simulated delay of 341 ps and 412 ps , respectively, differ by $17 \%$.

As was observed for the SCL inverter without output buffers in Section 6.2.2, the accuracy of the analytical model (6.7)-(6.8) could be improved by following an approach analogous to that used in Section 4.5 for the ECL inverter, i.e. by introducing unknown coefficients to be evaluated through minimization of the error functional (4.9). However, as already highlighted for the SCL inverter, the model accuracy is better than that of ECL circuits and is usually adequate, thus the improved model is not developed for the SCL inverter with output buffers.

Regarding the effect of the input waveform, an even weaker than the SCL inverter dependence of delay on the input rise time was observed. Therefore, the same considerations as in Section 6.2.3 are valid, and the delay can be always be approximated to that for a step input.


Fig. 6.10a. Predicted and simulated propagation delay vs. buffer bias current, $I_{S F}$, for an SCL inverter with output buffer for $W / L_{b u f}=0.6 / 0.3$,

$$
C_{L}=200 \mathrm{fF} \text { and } I_{B}=5 \mu \mathrm{~A}
$$



Fig. 6.10b. Predicted and simulated propagation delay vs. buffer bias current, $I_{S F}$, for an SCL inverter with output buffer for $W / L_{\text {buf }}=0.6 / 0.3$, $C_{L}=200 \mathrm{fF}$ and $I_{B}=20 \mu \mathrm{~A}$.


Fig. 6.10c. Predicted and simulated propagation delay vs. buffer bias current, $I_{S t}$, for an SCL inverter with output buffer for $W / L_{b u f}=0.6 / 0.3$,

$$
C_{L}=200 \mathrm{fF} \text { and } I_{B}=50 \mu \mathrm{~A} .
$$



Fig. 6.10d. Predicted and simulated propagation delay vs. buffer bias current, $I_{S F}$, for an SCL inverter with output buffer for $W / L_{b u f}=0.6 / 0.3$, $C_{L}=200 \mathrm{fF}$ and $I_{B}=100 \mu \mathrm{~A}$.

### 6.4 MODELING OF THE SOURCE-COUPLED MUX/XOR GATE

In this section, the approach introduced in Sections 6.2-6.3 for the SCL inverter is applied to evaluate the delay of the Source-Coupled MUX and XOR gate, both with and without output buffers. The approach is consistent to that used for CML/ECL gates in Chapter 4.

### 6.4.1. Delay model of the MUX/XOR gate without output buffer

Let us consider the SCL MUX and XOR gates shown in Figs. 6.11-6.12, respectively, for which two different delay values will be considered, depending on whether the switching input is applied to the lower or upper transistors. Between them, the worst-case delay occurs when the switching inputs $v_{i}$ is applied to transistors that are at a lower level (i.e., M1-M2), for the same reasons discussed in Section 4.6 for CML gates.

Let us consider the case where input $v_{i}$ is applied to transistors M1-M2, i.e. $v_{i}$ represents signals $\phi$ and $B$ in the MUX and XOR gate, respectively, while the others are kept constant. As clarified in Section 4.6, assume input $A$ to be high (and input $B$ of the MUX to be low to have an output transition) without loss of generality. Under this condition, biasing the gate at the logic threshold $v_{i}=0$, transistors M3 and M6 lie in the saturation region, while M4 and M5 are in cut-off. It is worth noting that the XOR gate has the same delay as the MUX, since its topology is obtained from the latter by setting $B=\bar{A}$, hence in the following only the MUX gate will be considered.


Fig. 6.11. SCL MUX gate.


Fig. 6.12. SCL XOR gate.

As already observed in Section 6.2.2, transistors M1-M2 work in the saturation region most of the time, and their source voltage is the same for both input logic values, thus the circuit can be linearized around the logic threshold and then simplified by applying the half-circuit concept.

In the resulting linearized half-circuit, shown in Fig. 6.13, transistors M1M3 (M2-M6) are represented by their small-signal model including an equivalent transconductance $G_{M}$, PMOS transistor M7 (M8) is replaced by its equivalent circuit in Fig. 6.3, and input $A$ is supposed to be driven by an equal circuit, whose resistance $R_{D}$ is connected to the gate of M3.


Fig. 6.13. Equivalent linear half circuit of the SCL MUX/XOR gate.

In Fig. 6.13, the capacitive contributions of the transistors are: $C_{d b}$ and $C_{s b}$, that represent the (equal) drain-bulk and source-bulk junction capacitances, $C_{g d}$ that schematizes the channel and the overlap contribution between gate and drain, and $C_{g s}$ that is the gate-source capacitance. An external load capacitance $C_{L}$ accounts for the wiring capacitance and the input capacitance of the driven logic gates.

By approximating the circuit in Fig. 6.13 to a first-order network with time constant $\tau$, that can be evaluated by resorting to the time constant method [CG73], its delay for a step input equals $0.69 \tau$. Therefore, the propagation delay $\tau_{P D, M U X}$ of the SCL MUX gate results to

$$
\begin{align*}
\tau_{P D, s C L} & =0.69\left[R_{D}\left(2 C_{g d, 3}+C_{d b, 3}+C_{g d, 5}+C_{d b, 5}+C_{g d, p}+C_{d b, p}+C_{L}\right)\right.  \tag{6.9}\\
& \left.+\frac{1}{G_{M, n}}\left(C_{g d, 1}+C_{d b, 1}+C_{g s, 3}+C_{s b, 3}+C_{g s, 4}+C_{s b, 4}\right)\right]
\end{align*}
$$

where the equivalent transconductance $G_{M, n}$ of M3 cannot be approximated by its small-signal value $g_{m, n}$ in the saturation region, because when M3 switches its voltages vary greatly around the bias point considered. Rather, the equivalent NMOS transconductance $G_{M, n}$ of M3 can be evaluated as the ratio of the total variation of current $i_{D, 3}$ and the total variation of gate-source voltage $V_{G S, 3}$ during a complete switching. Considering that the current of M3 varies from $I_{S S}$ to zero (or vice versa), and $V_{G S, 3}$ varies from $V_{T h, 3}+\sqrt{I_{S S} /\left(\left.\frac{\mu_{\text {eff. }, n} C_{O X}}{2} \frac{W}{L}\right|_{3}\right)}$ to $V_{T n, 3}$, the resulting expression of $G_{M, n}$ is

$$
\begin{equation*}
G_{M, n}=\sqrt{\left.\frac{\mu_{e f f, n} C_{O X}}{2} \frac{W}{L}\right|_{3} I_{S S}}=\frac{g_{m, n}}{2} \tag{6.10}
\end{equation*}
$$

where the expression (1.66) of the small-signal transconductance $g_{m, n}$ was used. As a result, in large-signal operation, the average transconductance of switching transistors is half of that in a small-signal condition around the logic threshold.

By substituting the expression of the small-signal voltage gain (2.44) and relationship (6.10), the MUX/XOR delay model (6.9) for an input applied to lower transistors can be rewritten as

$$
\begin{align*}
\tau_{P D, S C L} & =0.69 R_{D}\left[3 C_{g d, 1}+2 C_{d b, 3}+C_{g d, p}+C_{d b, p}+C_{L}\right. \\
& \left.+\frac{2}{A_{V}}\left(2 C_{g d, 1}+3 C_{d b, 1}+C_{g s, 3}\right)\right] \tag{6.11}
\end{align*}
$$

where, since all NMOS transistors have an equal aspect ratio, it was considered that $C_{d b, 5}=C_{d b, 3}$ and $C_{g d, 5}=C_{g d, 3}=C_{g d, 1}$. Moreover, it was assumed that $C_{s b, 3}=C_{s b, 4}=C_{d b, 1}$ given that MOS transistor is symmetric, and $C_{g s, 4}=C_{g d, 1}$
because M4 is OFF and its gate-source capacitance is only due to the overlap contribution like $C_{g d, 1}$.

A simple interpretation of relationship (6.11) can be given by comparison with the expression of the SCL inverter delay (6.6). Indeed, all the capacitive terms in brackets of (6.11) are multiplied by $0.69 R_{D}$ and thus make the same contribution to the delay as if all of its correspondent capacitances were lumped at the output node. As a consequence, terms in brackets of (6.11) represent the equivalent capacitance at the output node due to all transistors. More specifically, terms $\left(3 C_{g d, 1}+2 C_{d b, 3}+C_{g d, p}+C_{d b, p}+C_{L}\right)$ are associated with capacitances already connected at the output nodes, while terms $\left(2 C_{g d, 1}+3 C_{d b, 1}+C_{g s, 3}\right)$ are lowered by a factor $2 / A_{V}$ (that is usually lower than unity for typical values of $A_{V}$, as will be discussed in Chapter 7), since these capacitances see a small-signal resistance lower than $R_{D}$ by the same factor $\left(1 / G_{M, n}=2 R_{D} / A_{V}\right)$.

In relationship (6.11), drain-bulk and source-bulk junction capacitances are linearized according to relationship (6.1), where coefficient $K_{j}$ are evaluated on the basis of the values of voltages $V_{1}$ and $V_{2}$ in Table 6.3. It is worth noting that capacitance $C_{d b, 1}$ experiences a small voltage change, since it is due to the variation of $V_{G S, 3}$, therefore it can be evaluated in a smallsignal condition around the direct voltage $\left(-V_{D D}+V_{G S, 3}\right)$, where $V_{G S, 3}$ can be approximated to $V_{T, n}$ for the sake of simplicity.

TABLE 6.3

| capacitance | $V_{1}$ | $V_{2}$ |
| :--- | :--- | :---: |
| $C_{d b, 3}$ | $-V_{D D}$ | $-V_{D D}+V_{S W I N G} / 2$ |
| $C_{d b, p}$ | $-V_{S W I N G} / 2$ | 0 |
| $C_{d b, l}$ | $-V_{D D}+V_{G S, 3} \simeq-V_{D D}+V_{T, n}$ | $*$ |

${ }^{*} C_{d b, 1}$ was evaluated in a small-signal condition

The NMOS capacitances $C_{g d 1}, C_{g d 3}$ and $C_{g d 5}\left(C_{g s 4}\right)$ make an overlap contribution equal to the product of the capacitance per unit channel width $C_{g d 0}\left(C_{g s 0}\right)$ and their channel width, $W_{n}$. Moreover, capacitance $C_{g s 3}$ of transistor M3 is the one belonging to the saturation region, that is equal to $2 / 3 \cdot W_{n} L_{n} C_{O X}$.

### 6.4.2. Validation of the model of MUX/XOR gate without output buffer

The delay model (6.11) of the SCL MUX/XOR gate was validated by extensive simulations with the $0.35-\mu \mathrm{m}$ CMOS process previously used. To
encompass a wide range of cases, the bias current was varied from $5 \mu \mathrm{~A}$ to $100 \mu \mathrm{~A}$, while the transistors' aspect ratios were sized to obtain a logic swing ranging from 400 mV to the maximum value allowed for operation in the saturation region of M3-M6 (i.e., $2 V_{T, n} \approx 1.4 \mathrm{~V}$, according to (2.43) and (2.45), and a voltage gain ranging from 2 to 7 . In addition, the load capacitance $C_{L}$ was set to $0 \mathrm{~F}, 50 \mathrm{fF}, 200 \mathrm{fF}$ and 1 pF .

Some of the results obtained are reported in Figs. 6.14a-6.14d, where transistors were sized to guarantee $V_{\text {SWING }}=700 \mathrm{mV}$ and $A_{V}=4$. In particular, Figs. 6.14a-6.14d report the plot of the MUX/XOR delay simulated and predicted by (6.11) with a load capacitance of $0 \mathrm{~F}, 50 \mathrm{fF}, 200 \mathrm{fF}$ and 1 pF , respectively.

The error of relationship (6.11) with respect to simulations under the same load conditions in Figs. 6.14a-6.14d is plotted in Fig. 6.15 versus $I_{S S}$. From inspection of Fig. 6.15, the error of the MUX/XOR gate models is always within $20 \%$ with $C_{L}=0 \mathrm{~F}$, and is typically lower. The same order of magnitude was found in the other simulations, thus the accuracy is usually adequate for modeling purposes, and the improvement of the model by introducing unknown coefficients as in Section 4.7 is rarely required.

In regard to the effect of the input waveform, the same results as in Section 6.2 .3 were found, therefore the MUX/XOR SCL gate delay when the switching input is applied to upper transistors can be always be approximated to that for a step input in (6.11).


Fig. 6.14a. Simulated and predicted MUX/XOR delay vs. $I_{S S}$ for $C_{L}=0 \mathrm{~F}$.


Fig. 6.14b. Simulated and predicted MUX/XOR delay vs. $I_{S S}$ for $C_{L}=50 \mathrm{fF}$.


Fig. 6.14c. Simulated and predicted MUX/XOR delay vs. $I_{S S}$ for $C_{L}=200 \mathrm{fF}$.


Fig. 6.14d. Simulated and predicted MUX/XOR delay

$$
\text { vs. I } I_{S S} \text { for } C_{L}=1 \mathrm{pF} \text {. }
$$



Fig. 6.15. Error of the MUX/XOR predicted delay with respect to simulations vs. $I_{S S}$.

### 6.4.3. MUX/XOR with the upper transistors switching

We have so far analyzed the case where the switching input $v_{i}$ is applied to lower transistors M1-M2. Now, let us evaluate delay when $v_{i}$ is applied to
transistors at the upper level (M3-M6), which is expected to be lower than the former.

Let us evaluate the delay associated with signals driving the upper transistors by assuming that the bias current is already entirely steered to transistors M3-M4 through transistor M1. By applying the switching input $v_{i}$ to M3-M4, this source-coupled pair can be analyzed like the inverter in Section 6.2 by linearizing the circuit and considering only a half circuit. As in Section 4.6, after defining $C_{e q}$ as the parasitic capacitance contribution of M5-M6 (that are OFF) to the output nodes, the circuit can be schematized with the SCL inverter M3-M4 loaded by a capacitance $C_{e q}+C_{L}$. Therefore, the delay associated with the upper transistors is simply obtained from relationship (6.6) as

$$
\begin{equation*}
\tau_{P D, S C L}=0.69 R_{D}\left(2 C_{g d, n}+2 C_{d b, n}+C_{g d, p}+C_{d b, p}+C_{L}\right) \tag{6.12}
\end{equation*}
$$

where it was considered that $C_{e q}=C_{g d, n}+C_{d b, n}$, since M3-M6 have the same aspect ratio and thus the same capacitances $C_{g d, n}$ and $C_{d b, n}$, regardless of whether they work in the saturation or in cut-off region. The error of (6.12) with respect to simulations was found to be equal to that of the inverter previously discussed, as expected. Regarding the effect of the input waveform, as for the case previously analyzed, the MUX/XOR SCL gate delay when the switching input is applied to upper transistors can be always be approximated to that for a step input (6.12).

### 6.4.4 Delay model of the MUX/XOR gate with output buffers

Output buffer can also be inserted in MUX/XOR gates to improve their driving capability or to shift the output voltage common-mode value.

To evaluate the delay of the MUX/XOR gate with output buffers, the same approach as in Section 6.3 can be used and generalized to all SCL gates with output buffers. More specifically, the delay of a generic SCL gate with output buffers can be decomposed into the contribution of the internal SCL gate, $\tau_{P D, S C L}$, and that of the buffer, $\tau_{P D, b u f}$, as shown in relationship (6.7). The internal SCL gate delay depends on its topology (that in turn depends on the logic function implemented), and has to be evaluated by following the general methodology described in Section 6.2 and 6.4.1, i.e. as the delay for a zero load capacitance. The buffer contribution was already evaluated in Section 6.3 and expressed by relationship (6.8).

In the specific case of a MUX/XOR, the intrinsic SCL contribution is given by relationship (6.11)-(6.12) where $C_{L}$ must be set to zero. The loading effect of the buffer on the internal SCL gate is accounted for in $\tau_{P D, \text { buf }}$
evaluation, since it is carried out by driving each buffer with the Thevenin equivalent circuit seen at the SCL output.

The delay model in (6.8) and (6.11) of SCL MUX/XOR gates with output buffers was tested by extensive Spectre simulations using the $0.35 \mu \mathrm{~m}$ CMOS process described above, under the same conditions as the inverter in Section 6.3. The internal SCL transistors' aspect ratios were sized to obtain the typical values $V_{S W I N G}=700 \mathrm{mV}$ and $A_{V}=4$, while the buffer transistors' aspect ratios were set to $0.6 / 0.3,3 / 0.3$ and $6 / 0.3$. The delay curves obtained are not reported for the sake of brevity.

The worst-case error with respect to simulations occurred for $C_{L}=200 \mathrm{fF}$, and is plotted in Fig. 6.16 versus the buffer bias current $I_{S F}$ for different values of $I_{S S}$. Among these curves, the worst accuracy of $30 \%$ (even better than the inverter gate) was found for $I_{S s}=5 \mu \mathrm{~A}$ and $I_{S t}=100 \mu \mathrm{~A}$, which is an unrealistic case, due to the strong current unbalance between the internal SCL and the output buffers. The typical error found is much lower than $30 \%$, as can be deduced from the average error for $C_{L}$ equal to $0 \mathrm{~F}, 50 \mathrm{fF}, 200 \mathrm{~F}$ and 1 pF , which is $14.7 \%, 18.8 \%, 16.8 \%$ and $12.1 \%$, respectively.


Fig. 6.16. Error of predicted delay of the MUX/XOR gate with output buffers with respect to simulations vs. $I_{S F}$ for $C_{L}=200 \mathrm{fF}$.

### 6.5 EVALUATION OF SCL GATES INPUT CAPACITANCE AND EXTENSION TO THE D LATCH

In the previous subsections, the delay of a single SCL gate was computed by assuming a generic load capacitance $C_{L}$. In real circuits consisting of
cascaded logic gates, it is essential to evaluate the input capacitance of an SCL gate, since it loads the previous gates and affects their delay. For this reason, the input capacitance $C_{\text {input }}$ seen by each gate terminal of an NMOS source-coupled pair is evaluated in the following.

The source voltage of a switching source-coupled pair is independent of the input logic value, thus the input capacitance $C_{\text {input }}$ seen from the gate of each transistor can be assumed equal to its gate-source capacitance evaluated in the saturation region, given by

$$
\begin{equation*}
C_{\text {input }}=\frac{2}{3} W_{n} L_{n} C_{O X} \tag{6.13}
\end{equation*}
$$

The accuracy of the input capacitance estimation (6.13) was tested through several simulations. In particular, the linear load capacitance that makes delay of a reference gate to be equal to the delay obtained when loaded by the considered gate was evaluated under many bias conditions. Results show that relationship (6.13) is always within $15 \%$ of the simulated value, and typically differs by less than $10 \%$.

As already observed for the CML D latch gate, the input capacitance expression (6.13) permits to extend the delay model of the MUX/XOR gate developed in Section 6.4 to the D latch gate shown in Fig. 6.17.


Fig. 6.17. SCL D latch gate.

The only difference between the SCL MUX/XOR gate and the D latch is due to the different load at the output nodes. Indeed, by neglecting the positive-feedback loop involving transistors M5-M6 as clarified in Section 4.9 for bipolar gates, these transistors load both output nodes through their capacitances $C_{g d, n}$ and $C_{d b, n}$, as in the case of the MUX/XOR gate, and the capacitance seen from their gate terminal, that is given by relationship (6.13). Therefore, the D latch delay is equal to that of the MUX/XOR gate loaded by a capacitance equal to the sum of the external capacitance $C_{L}$ and the input capacitance (6.13) of the source-coupled pair M5-M6.

From the previous considerations, the CK-Q latch delay is given by relationship (6.11) where the load capacitance $C_{L}$ is replaced by an equivalent capacitance $C_{L}$,

$$
\begin{equation*}
C_{L}^{\prime}=C_{L}+C_{\text {input }} \tag{6.14}
\end{equation*}
$$

and, analogously, the D-Q latch delay is given by relationship (6.12) where (6.14) must be substituted. From (6.14), it is apparent that the delay of an SCL D latch gate is always greater than that of a MUX/XOR equally designed, and the speed penalty with respect to the latter is more pronounced for low values of $C_{L}$.

The delay model of the D latch discussed so far was validated by extensive simulations with the adopted $0.35-\mu \mathrm{m}$ CMOS process, under the same conditions as for the MUX/XOR SCL gate. Predicted and simulated delay curves are reported in Fig. 6.18 only for the worst case with $C_{L}=0 \mathrm{~F}$ for the sake of brevity, and for $C_{L}$ equal to $50 \mathrm{fF}, 200 \mathrm{fF}$ and 1 pF the delay curves were found to be very close to those in Figs. 6.14b-6.14d, as expected. The error of the delay model with respect to simulations is plotted in Fig. 6.19 versus the bias current $I_{S S}$ for a load capacitance of $0 \mathrm{~F}, 50 \mathrm{fF}$, 200 fF and 1 pF , respectively. From inspection of Fig. 6.19, the error of the D latch gate model is always within $23 \%$, and is typically lower.

Regarding the effect of the input waveform, the same results as MUX/XOR SCL gates were found, thus the delay of an SCL D latch gate can be always be approximated to that for a step input in (6.11)-(6.12) with (6.14).

The D latch model can simply be extended to the case with output buffers by decomposing delay into the contribution of the internal SCL gate, $\tau_{P D, S C L}$, and that of the buffer, $\tau_{P D, \text { buf }}$. As explained in Section 6.3, the internal SCL gate delay of the D latch gate is evaluated with a zero load capacitance, while the buffer contribution must be evaluated through relationship (6.8).


Fig. 6.18. Simulated and predicted D latch delay vs. $I_{S S}$ for $C_{L}=0 \mathrm{~F}$.


Fig. 6.19. Error of the D latch predicted delay with respect to simulations vs. $I_{S S}$.

The delay model of the SCL D latch with output buffers discussed above was validated by simulations with the adopted $0.35 \mu \mathrm{~m}$ CMOS process and under the same conditions used for the MUX/XOR gates in Section 6.4. The delay curves obtained are not reported for the sake of brevity, and the error is very close to that found for the MUX/XOR SCL gates in Section 6.4. Comparison with delay values obtained from realistic input waveforms
confirms the weak dependence of delay on the input rise time, thus the model discussed so far is also valid for general input waveforms.

## Chapter 7

## OPTIMIZED DESIGN OF MOS CURRENT-MODE GATES

In this chapter, a general methodology to maximize the speed performance and to manage the power-delay trade-off in SCL gates is presented and applied to several fundamental gates.

### 7.1 INTRODUCTION TO OPTIMIZED DESIGN OF SCL GATES

In the previous chapter, a general delay model for SCL gates has been developed and applied to some widely used gates. This model may be used to support the design process, but it is not very effective to this purpose since delay is expressed as a function of the PMOS equivalent resistance $R_{D}$, as well as on transistors' capacitances, that in turn depend on their aspect ratio.

In practical design cases, transistors' aspect ratio and bias current must be set to meet given specifications on noise margin, as well as on delay and power dissipation, or a trade-off of the two. Therefore, to clearly express design trade-offs, it would be more useful to have an explicit delay dependence on noise margin $N M$ and power dissipation $P$. To this aim, in this chapter the dependence of transistors' aspect ratio on $N M$ and $P$ is first evaluated and then substituted into the delay model described in the previous chapter. Thus, interdependence of design parameters is made explicit through a single equation, that is useful to explore the design space and to manage the possible trade-offs.

Regarding the noise margin performance of SCL gates, the results obtained in Chapter 2 will be used. As far as the power consumption is concerned, as already discussed for CML/ECL gates in Chapter 5, the power
consumption $P$ of an SCL gate is essentially static and is determined by its bias current $I_{S S}$

$$
\begin{equation*}
P \cong V_{D D} I_{S S} \tag{7.1}
\end{equation*}
$$

which has to be kept as low as possible for a given speed performance and an acceptable noise immunity. Therefore, once an adequate noise margin is ensured, a strategy to consciously manage the power-delay trade-off is crucial in design of SCL gates.

In this chapter, a systematic design procedure to size the bias current and transistors' aspect ratio to satisfy assigned requirements on noise margin and delay is introduced. In particular, criteria are provided to design SCL gates in typical design cases, where a high speed, a low power consumption or an optimum trade-off between the two is required. Results are successively extended to the case with output buffers, by providing optimum size of their transistors and bias current to minimize delay, assuming practical design conditions.

The design strategy proposed gives simple closed-form expressions of design parameters and explicitly relates delay and bias current (i.e., power consumption from (7.1)), thereby providing the designer with the required understanding of the trade-offs involved in the design. The delay dependence on the logic swing is also investigated, and in contrast with the usual assumption, the results show that the delay is not necessarily reduced by reducing the logic swing. The strategies developed are valid for all SCL gates and are independent of the CMOS process used, thus the guidelines provided afford a deeper understanding of SCL gates from a design point of view.

### 7.2 OPTIMIZED DESIGN METHODOLOGY IN SCL GATES WITHOUT OUTPUT BUFFERS

Consider an SCL gate without output buffers having a bias current $I_{S S}$ and a load capacitance $C_{L}$, with transistors' aspect ratios properly sized to obtain assigned values of logic swing $V_{\text {SWING }}$ and voltage gain $A_{V}$ that ensure a desired value of noise margin $N M$ to be achieved, as will be explained more clearly in Section 7.3.1.

It will be demonstrated in the next sections that the SCL gate considered has a delay $\tau_{P D, S C L}$ given by

$$
\begin{equation*}
\tau_{P D, S C L}=0.35 \cdot V_{S W I N G}\left(\frac{a}{V_{S W I N G}^{2}}+b \frac{V_{S W I N G}}{I_{S S}^{2}}+\frac{c+C_{L}}{I_{S S}}\right) \tag{7.2}
\end{equation*}
$$

where coefficients $a, b$ and $c$ depend on the gate considered, the CMOS process adopted (through standard BSIM3v3 model parameters) and the power supply voltage $V_{D D}$. In addition, coefficient $a$ is an increasing function of the voltage gain $A_{V}$ previously assigned.

Relationship (7.2) simply relates delay to the bias current $I_{S S}$, that determines the power consumption (7.1), as well as to the logic swing, that in turn defines noise margin for a given $A_{V}$. Therefore, eq. (7.2) can be suitably used to design SCL gates, as it expresses the trade-offs among delay, power consumption and noise margin. In particular, it shows that the delay reduces as increasing bias current (or, equivalently, power consumption), whereas for $I_{S S} \rightarrow \infty$ it asymptotically tends to an ideal minimum value given by

$$
\begin{equation*}
\tau_{P D, S C L, \text { min }}=0.35 \frac{a}{V_{S W I N G}} \tag{7.3}
\end{equation*}
$$

In order to derive simple design criteria, three practical cases will be considered in the following subsections: power-efficient (7.2.1), high-speed (7.2.2) and low-power design (7.2.3). Results are summarized and commented in Section 7.2.4.

### 7.2.1 Power-efficient design

According to relationship (7.2), the delay of an SCL gate can be reduced by increasing bias current (or, equivalently, the power consumption). The power-delay product $P D P$ [R96] can suitably measure the trade-off efficiency between delay and power dissipation. For an SCL gate, $P D P$ is equal to the product of expressions (7.1) and (7.2), and results to

$$
\begin{equation*}
P D P_{S C L} \approx 0.35 V_{D D} V_{\text {SWING }}\left(\frac{a}{V_{S W I N G}^{2}} I_{S S}+b \frac{V_{\text {SWIIG }}}{I_{S S}}+c+C_{L}\right) \tag{7.4}
\end{equation*}
$$

An optimum balance between delay and power consumption is accomplished when (7.4) is minimum, i.e. for the bias current $I_{S S, \text { opt_P }}$ PDP given by

$$
\begin{equation*}
I_{S S, \text { opt } P \text { PDP }}=\sqrt{\frac{b}{a}}\left(V_{S W I N G}\right)^{\frac{3}{2}} \tag{7.5}
\end{equation*}
$$

that was obtained by setting the derivative of (7.4) equal to zero and solving for $I_{S S}$. It is worth noting that the bias current (7.5) makes the first two terms
in bracket into (7.2) equal. From relationship (7.5), the bias current that optimizes the power-delay trade-off is independent of the load. Moreover, by substituting relationship (7.5) into (7.4), the minimum power-delay product results to be

$$
\begin{equation*}
P D P_{\text {opt }, S C L} \approx 0.35 V_{D D} V_{S W I N G}\left(2 \frac{\sqrt{a b}}{\sqrt{V_{\text {SWING }}}}+c+C_{L}\right) \tag{7.6}
\end{equation*}
$$

that increases when $V_{\text {SWING }}$ increases. As a consequence, from (7.6) it is apparent that the logic swing has to be chosen as low as possible for a power-efficient design (more details on the acceptable range of $V_{\text {SWING }}$ will be provided in Section 7.3.1).

### 7.2.2 High-speed design

When a high-speed performance is the principal concern in designing an SCL gate, two situations may occur. In the first one, a delay constraint $\tau_{P D, S C L}$ derived from considerations at the gate level has to be met by properly setting the bias current. Analytically, the bias current is found by inverting (7.2) for $I_{S S}$, leading to

$$
I_{S S}=0.17 \cdot V_{S W I N G} \frac{c+C_{L}}{\tau_{P D, S C L}-\tau_{P D, S C L, \text { min }}}\left[1+\sqrt{1+11.4 \cdot b \frac{\tau_{P D, S C L}-\tau_{P D, S C L, \text { min }}}{\left(c+C_{L}\right)^{2}}}\right] \text { (7.7) }
$$

where relationship (7.3) was substituted to make expression more compact. Obviously, no solution exists if the required delay is lower than $\tau_{P D, S C L, m i n}$.

In the second case, no delay constraint is given and the delay has to be kept as close to the asymptotic value $\tau_{P D, S C L, \text { min }}$ as possible, in order to exploit the speed potential of the circuit and the process used, while keeping the bias current within a reasonable range. However, it is not evident what is an acceptable range of $I_{S S}$ leading to a delay almost equal to the asymptotic minimum, as relationship (7.2) always decreases as increasing $I_{S S}$. In practical cases, where a high speed design is targeted, $\tau_{P D, S C L}$ must be close to $\tau_{P D, S C L, \text { min }}$, and bias current $I_{S S}$ should only be increased as long as a significant speed improvement is achieved, which no longer holds when delay approaches $\tau_{P D, S C L, m i n}$. A good compromise between the two opposite requirements can be found from analysis of relationship (7.2), as will be explained in the following.

The expression in brackets into relationship (7.2) consists of a constant term $a / V_{\text {SWING }}^{2}$ (which determines the asymptotic minimum (7.3)) and one that decreases as increasing bias current $I_{S S}$ (i.e., $b \frac{V_{S W I N G}}{I_{S S}^{2}}+\frac{c+C_{L}}{I_{S S}}$ ). For values of $I_{S S}$ sufficiently high such that the constant term dominates

$$
\begin{equation*}
\frac{a}{V_{S W I N G}^{2}} \geq b \frac{V_{S W N G}}{I_{S S}^{2}}+\frac{c+C_{L}}{I_{S S}} \tag{7.8}
\end{equation*}
$$

a further increase in the bias current does not lead to a significant speed advantage, and a high speed is achieved since delay is close to its minimum value. Instead, for lower values of $I_{S S}$ such that (7.8) does not hold (i.e., the left-hand side is lower than the right-hand side), the terms depending on $I_{S S}$ dominate over the constant one, and delay is highly sensitive to a bias current increase. However, in this case a high speed is not achieved, since delay is far from being minimum. A good compromise between the two opposite cases is achieved when the equality is considered in (7.8), i.e. when the terms depending on $I_{S S}$ equate the constant one. Indeed, in this case delay is easily found to be only twice the minimum achievable

$$
\begin{equation*}
\tau_{P D, S C L}=0.69 \frac{a}{V_{\text {SWING }}} \tag{7.9}
\end{equation*}
$$

and a further bias current increase is not so beneficial. As a consequence, the strict equality in (7.8) represents a reasonable condition for keeping the delay close to its minimum value and avoiding wasting a uselessly high bias current. This condition is achieved for $I_{S S}$ equal to

$$
\begin{equation*}
I_{S S, \text { opt_delay }}=\frac{c+C_{L}}{2 a} V_{\text {SWING }}^{2}\left(1+\sqrt{1+4 \frac{a b}{c+C_{L}} \frac{1}{V_{\text {SWING }}}}\right) \tag{7.10}
\end{equation*}
$$

which is always greater than $I_{S S, \text { opt_PDP }}$ in (7.5), especially for a high load capacitance $C_{L}$. This can be seen by observing that relationship (7.10) equates (7.5) if $C_{L}=c=0$ is assumed, while for practical values of $C_{L}$ and $c$ the other two terms associated with them tend to increase it, especially for high values of $C_{L}$. As a consequence, design criteria (7.10) allows for a high speed performance at the cost of a worse $P D P$ than the minimum achievable (7.6).

Relationship (7.9) clearly expresses the effect of logic swing on delay in high-speed design. To be more specific, in high-speed design ${ }^{1}$ the logic swing has to be set as high as possible to improve the speed performance (more details on the acceptable range of $V_{\text {SWING }}$ will be provided in Section 7.3.1). Surprisingly, this in contrast with the usual belief that the high-speed feature of SCL gates is due to the small logic swing [T01], that probably is due to a superficial extension of well-known properties of CML bipolar gates (see Section 5.2 for the effect of logic swing on their power-delay trade-off). As a result, regarding the delay dependence on the logic swing, MOS Current-Mode gates have a completely different behavior compared to bipolar ones, and an in-depth explanation of this difference is provided in Section 7.4.5. This substantial difference between CMOS and bipolar Current-Mode gates in high-speed applications must therefore be taken into account, in order to consciously exploit the speed potential of each technology.

### 7.2.3 Low-power design

In low-power design, e.g. when the gate speed is not the main target, or even in high performance applications for gates that do not lie in the critical path, the power consumption per gate allowed is usually an assigned parameter that is derived from requirements at the system level. Therefore, bias current is set to a very low value chosen from system considerations, and the only design parameter is the logic swing. More specifically, usual values of bias current are surely lower than the value (7.5) which balances power consumption and delay. From simple inspection of (7.2), condition $I_{S S} \ll I_{S S, \text { opt_PDP }}$ leads the term $b V_{S W I N G} / I_{S S}^{2}$ to dominate over $\left(c+C_{L}\right) / I_{S S}$. Moreover, under the same condition, considerations in Section 7.2.2 also lead the former term to dominate over $a / V_{S W I N G}^{2}$, since inequality $I_{S S} \ll I_{S S, \text { opt_delay }}$ surely holds (by remembering that $I_{S S, \text { opt_PDP }}<I_{S S, \text { opt_delay }}$ ). Thus, under low-power design, the delay expression can be approximated as

$$
\begin{equation*}
\tau_{P D, S C L} \cong 0.35 \cdot b\left(\frac{V_{S W I N G}}{I_{S S}}\right)^{2} \tag{7.11}
\end{equation*}
$$

[^12]which shows that in low-power design the logic swing has to be set as low as possible, as in the case of power-efficient design.

### 7.2.4 Remarks on the delay dependence on bias current and logic swing

To better understand the delay dependence on bias current and logic swing in SCL gates, it is useful to summarize results presented in previous sections for specific design cases, by collecting them in a unified treatment.

Following the considerations carried out above, the delay dependence on bias current can be approximated in three regions, according to

$$
\tau_{P D, S C L}= \begin{cases}0.35 \cdot b\left(\frac{V_{S W I N G}}{I_{S S}}\right)^{2} & \text { if } I_{S S} \ll I_{S S, \text { opt } \_P D P}  \tag{7.12}\\ 0.35 \cdot V_{S W I N G} \frac{c+C_{L}}{I_{S S}} & \text { if } I_{S S, \text { opt_PDP }} \ll I_{S S} \ll I_{S S, \text { opt_delay }} \\ 0.35 \cdot \frac{a}{V_{S W I N G}} & \text { if } I_{S S} \gg I_{S S, \text { opt_delay }}\end{cases}
$$

which is represented in Fig. 7.1.


Fig. 7.1. General delay dependence on bias current in SCL gates.

It is observed that, in low-power design (i.e. for $I_{S S} \ll I_{S S, \text { opt PDP }}$ ) the delay can be rapidly decreased by increasing $I_{S S}$, since it is proportional to $1 / I_{S S}^{2}$. In high-speed design, a less substantial delay reduction is achieved by increasing $I_{S S}$, since delay is proportional to $1 / I_{S S}$. Accordingly, the best power-delay compromise is found for the intermediate value of bias current $I_{S S, \text { opt_PDP. A }}$ A high speed (i.e. half the maximum achievable for $I_{S S} \rightarrow \infty$ ) is obtained for $I_{S S}=I_{S S, \text { opt_delay }}$. For greater values of $I_{S S}$, delay tends to be almost constant, thus small speed improvements are very expensive in terms of bias current, and hence of power consumption.

Regarding the delay dependence on logic swing, summarizing the three cases dealt with in the previous subsections, logic swing must be set as high as possible when a high speed is required, while it has to be reduced as much as possible when a low power consumption or a power-delay trade-off is targeted.

### 7.3 TRANSISTOR SIZING TO MEET NOISE MARGIN SPECIFICATION

In this section, general delay expression (7.2) of SCL gates will be derived by applying the modeling strategy described in Chapter 6. To be more specific, the delay of SCL gates was there expressed as a function of the PMOS equivalent resistance $R_{D}$ and transistors' capacitances, that in turn depend on their aspect ratios. In practical cases, aspect ratios have to be set to meet the noise margin requirements, for a given bias current. Therefore, to make delay dependence on bias current and logic swing explicit, it is necessary to find the expression of aspect ratios. To this aim, design criteria to find aspect ratios according to the noise margin required for an assigned bias current are discussed in the following.

### 7.3.1 Design criteria for $V_{\text {SWING }}$ and $A_{V}$ to meet a noise margin specification

Let us consider an SCL gate with an assigned bias current $I_{S S}$ and a noise margin requirement $N M$. From expression (2.47) of noise margin, this constraint can be met by suitably setting logic swing $V_{S W I N G}$ and voltage gain $A_{\nu}$.

To understand design aspects related to the choice of $A_{V}$, it is useful to observe that from (2.47) we have to choose $A_{V}$ sufficiently greater than $\sqrt{2}$, in order to avoid an excessive noise margin degradation with respect to the maximum value $V_{\text {SWING }} / 2$. However, from inspection of (2.44), high values
of $A_{V}$ are achieved by increasing NMOS transistors' aspect ratio, for a given value of $I_{S S}$ and $V_{S W I N G}$, thereby increasing parasitic capacitances and slowing down the circuit switching. In addition, (6.13) indicates that this also leads to an increase in the input capacitance, thus slowing down the driving gate as well. A good compromise to achieve an adequate noise margin without excessively degrading speed performance is obtained by setting $A_{V}$ equal to 4 or slightly greater (for instance, by choosing $A_{V}=4$, relationship (2.47) gives $\left.N M=0.65 \cdot V_{\text {SWING }} / 2\right)$. Therefore, in the following, $A_{V}$ is assumed to have been chosen in advance according to this criteria.

Regarding the logic swing, it must belong to a well-defined range [ $V_{\text {SWING,min }}, V_{\text {SWING,max }}$ ] to allow the circuit to operate correctly. Indeed, $V_{\text {SWING }}$ must be greater than its minimum allowed value $V_{\text {SWING,min }}$ that strictly satisfies the noise margin requirement $N M$ and that is found by inverting relationship (2.47)

$$
\begin{equation*}
V_{S W I N G, \min } \approx \frac{2 N M}{1-\frac{\sqrt{2}}{A_{V}}} \tag{7.13}
\end{equation*}
$$

Moreover, as discussed in Section 2.4.2, a high logic swing may lead the NMOS transistors to work in the linear region, thus degrading both speed and noise margin performance ${ }^{2}$. More specifically, the maximum value of $V_{\text {SWING }}$ allowed, $V_{\text {SWING,max }}$, is

$$
\begin{equation*}
V_{S W I N G, \max } \approx 2 V_{T, n} \tag{7.14}
\end{equation*}
$$

Summarizing, the voltage gain $A_{V}$ has to be chosen equal to 4 or slightly greater, while $V_{S W I N G}$ must be set according to design strategies developed in Section 7.2 , provided that it belongs to $\left[V_{\text {SWING,min, }}, V_{\text {SWING,max }}\right]$. More specifically, in design cases where power consumption is of concern, logic swing has to be set equal to $V_{\text {swing, min }}$, while for high-speed design it has to be chosen equal to $V_{\text {SWING,max }}$. Accordingly, in the following both $V_{\text {SWING }}$ and $A_{V}$ are supposed to be assigned on the basis of the noise margin specification.

[^13]Once desired values of $V_{\text {SWING }}$ and $A_{V}$ are chosen, they are obtained by properly setting transistors' aspect ratios, as explained in the following sections.

### 7.3.2 Transistor sizing versus $I_{S S}$

Let us consider an SCL gate with a given bias current $I_{S S}$ and assume that $V_{S W I N G}$ and $A_{V}$ have been chosen from the noise margin specification, as clarified in the previous subsection. Now, let us find design equations to size transistors' aspect ratio yielding the desired values of $V_{\text {SWING }}$ and $A_{V}$.

For a given $I_{S S}$ value, an assigned value of the logic swing can be achieved by appropriately setting the PMOS equivalent resistance $R_{D}$. More specifically, by inverting the logic swing expression (2.43), the suitable value of $R_{D}$ is

$$
\begin{equation*}
R_{D}=\frac{V_{S W I N G}}{2 I_{S S}} \tag{7.15}
\end{equation*}
$$

from which the proper value $W_{p} / L_{p}$ of the PMOS aspect ratio must be evaluated. This value of $R_{D}$ may be higher or lower than that obtained for the minimum transistor size, $W_{p, \text { min }} / L_{p, \text { min }}$, depending on the assigned value of $I_{S S}$. To understand this point, we first define $R_{D, \text { min_size }}$ as the PMOS resistance obtained for minimum transistor size, and $I_{H I G H}$ as the corresponding current that gives the desired logic swing

$$
\begin{equation*}
I_{H I G H}=\frac{V_{\text {SWING }}}{2 R_{D, \text { min_size }}} \tag{7.16}
\end{equation*}
$$

that, for example, is equal to $24.6 \mu \mathrm{~A}$ for the $0.35-\mu \mathrm{m}$ technology adopted and setting typical value $V_{\text {SWING }}=700 \mathrm{mV}$.

If $I_{S S}<l_{\text {HIGH }}$, or equivalently $R_{D}>R_{D, \text { min_size }}$, the PMOS aspect ratio must be lower than the minimum value $W_{p, \text { min }} / L_{p, \text { min }}$. Therefore, we have to set $W_{p}$ to its minimum value $W_{p, \text { min }}$, while setting $L_{p}$ to a value greater than $L_{p, \text { min }}$ that can be evaluated by inverting the expression of $R_{D}$ in relationship (2.38). More specifically, to simplify the inversion of (2.38) for $L_{p}$, this relationship can be expanded in Taylor series truncated to the first order around zero

$$
\begin{align*}
R_{D} & \approx R_{\mathrm{int}}\left(1+\frac{R_{D S}}{R_{\mathrm{int}}}\right)  \tag{7.17}\\
& =\frac{L_{p}\left[1+\mu_{e f f, p} C_{O X} \frac{1}{L_{p}}\left(V_{D D}-\left|V_{T, p}\right|\right) R_{D S W} 10^{-6}\right]}{\mu_{e f f, p} C_{O X} W_{p, \text { min }}\left(V_{D D}-\left|V_{T, p}\right|\right)}
\end{align*}
$$

whose parameters have been already defined in Chapter 2. It is worth noting that series truncation to first order is justified by the observation that the resistance in the linear region of MOS transistors is usually dominated by the contribution of intrinsic resistance $R_{\text {int }}$ (i.e., $R_{D} / R_{\text {int }} \ll 1$ ). Relationship (7.17) can be easily solved for $L_{p}$, leading to

$$
\begin{equation*}
L_{p} \approx W_{p, \text { min }} \mu_{e f f, p} C_{o X}\left(V_{D D}-\left|V_{T, p}\right|\right)\left(\frac{V_{S W I N G}}{2 I_{S S}}-\frac{R_{D S W} 10^{-6}}{W_{p, \text { min }}}\right) \tag{7.18}
\end{equation*}
$$

If $I_{S S}>I_{H I G H}$, or equivalently $R_{D}<R_{D, \text { min_size }}$, the PMOS aspect ratio must be greater than the minimum value $W_{p, \text { min }} / L_{p, \text { min }}$. Therefore, we have to set $L_{p}$ to its minimum value $L_{p, \text { min }}$, while $W_{p}$ is found by inverting (2.38)

$$
\begin{align*}
W_{p} & =\frac{2 I_{S S}}{V_{S W I N G}} \\
& \cdot \frac{L_{p, \text { min }}}{\mu_{e f f, p} C_{O X}\left(V_{D D}-\left|V_{T, p}\right| \left\lvert\,\left\{1-\frac{R_{D S W} 10^{-6}}{L_{p, \text { min }}}\left[\mu_{e f f, p} C_{O X}\left(V_{D D}-\left|V_{T, p}\right|\right) \mid\right]\right\}\right.\right.} \tag{7.19}
\end{align*}
$$

Now, let us discuss design criteria to guarantee a given voltage gain $A_{V}$. From relationship (2.44), once logic swing has been set by the PMOS transistor sizing according to the assigned bias current, voltage gain $A_{V}$ only depends on NMOS transistors' aspect ratio, that will be assumed to be the same for all NMOS transistors. Indeed, this is the usual case where voltage gain is equal for all source-coupled pairs, i.e. voltage gain and noise margin are the same for all inputs. The very uncommon case with different NMOS aspect ratios can be easily analyzed with slight modifications.

From the expression of the voltage gain (2.44), the NMOS aspect ratio $W_{n} / L_{n}$ must be increased as increasing $I_{S S}$ to guarantee an assigned gain $A_{V}$, as can be seen by inverting (2.44) for $W_{n} / L_{n}$

$$
\begin{equation*}
\frac{W_{n}}{L_{n}}=\frac{4}{\mu_{e f f, n} C_{O X}}\left(\frac{A_{V}}{V_{S W N G}}\right)^{2} I_{S S} \tag{7.20}
\end{equation*}
$$

showing direct proportionality to the bias current used, for an assigned value of $A_{V}$ and $V_{\text {SWING. }}$. Moreover, (7.20) confirms theoretical considerations in Section 7.3.1, for which a high value of $A_{V}$ leads to an increase of the NMOS aspect ratio and thus of their parasitic capacitances.

Substituting (2.43) into relationship (7.20), the resulting $W_{n} / L_{n}$ could be lower than $W_{n, \text { min }} / L_{n, \text { min }}$. This occurs when the bias current is lower than the value $I_{\text {LOW }}$ that inverting (7.20) results to be

$$
\begin{equation*}
I_{\text {LOW }}=\frac{1}{4} \frac{W_{n, \text { min }}}{L_{n, \text { min }}} \mu_{e f f, n} C_{O X}\left(\frac{V_{S W I N G}}{A_{V}}\right)^{2} \tag{7.21}
\end{equation*}
$$

For example, for the $0.35-\mu \mathrm{m}$ technology adopted and setting typical values $V_{S W I N G}=700 \mathrm{mV}$ and $A_{V}=4, I_{\text {LOW }}$ is equal to $1.45 \mu \mathrm{~A}$ (i.e., an impractically low value). However, actually it does not make sense to set the NMOS aspect ratio $W_{n} / L_{n}$ lower than its minimum (i.e. with $W_{n}=W_{n, \text { min }}$ and $L_{n}>L_{n, \text { min }}$ ), since defining the NMOS to be minimum sized (i.e. with $W_{n}=W_{n, \text { min }}$ and $L_{n}=L_{n, \text { min }}$ ) leads to a lower input capacitance, thus reducing the delay of the driving gate, and even keeps voltage gain higher than the desired value $A_{V}$, which is somewhat beneficial in terms of noise margin.

Summarizing the above considerations, the NMOS channel length $L_{n}$ always has to be set equal to the minimum value $L_{n, \text { min }}$ allowed by the CMOS process used. The NMOS channel width $W_{n}$ has to be sized according to relationship (7.20) for $I_{S S}>I_{\text {LOW }}$, while it must be set at its minimum for $I_{S S} \leq I_{\text {LOW }}$.

### 7.3.3 Summary and remarks on the transistor sizing versus $I_{S S}$

In the previous subsection, design criteria were provided to set transistors' aspect ratios in order to achieve desired values of $V_{\text {SWING }}$ and $A_{V}$ evaluated in Section 7.3.1. Expressions found are evaluated for a given (but unknown) bias current, therefore their dependence on $I_{S S}$ is made explicit.

For the sake of simplicity, design criteria to size aspect ratios of PMOS transistors to achieve an assigned value of $V_{\text {SWING }}$ are summarized below as a function of $I_{S S}$

$$
\left\{\begin{array}{l}
W_{p}=W_{p, \text { min }}  \tag{7.22a}\\
L_{p}=W_{p, \text { min }} \mu_{e f f, p} C_{O X}\left(V_{D D}-\left|V_{T, p}\right|\right)\left(\frac{V_{S W I N G}}{2 I_{S S}}-\frac{R_{D S W} 10^{-6}}{W_{p, \text { min }}}\right)
\end{array}\right.
$$

if $\Lambda_{S S}<I_{\text {HIGH }}$ and

$$
\left\{\begin{align*}
W_{p} & =\frac{2 I_{S S}}{V_{S W I N G}} . \\
& \frac{\mu_{e f f, p} C_{O X}\left(V_{D D}-\left|V_{T, p}\right|\left\{1-\frac{R_{D, \text { min }} 10^{-6}}{L_{p, \text { min }}}\left[\mu_{e f f ; p} C_{O X}\left(V_{D D}-\left|V_{T, p}\right|\right)\right]\right\}\right.}{}  \tag{7.22b}\\
L_{p}= & L_{p, \min }
\end{align*}\right.
$$

if $I_{S S}>I_{H I G H}$, where $I_{H I G H}$ is given by relationship (7.16). The NMOS sizing found for assigned values of $A_{V}$ and $V_{S W I N G}$ as a function of the bias current is

$$
\begin{equation*}
W_{n}=W_{n, \text { min }} \tag{7.23a}
\end{equation*}
$$

if $I_{S S} \leq I_{\text {LOW }}$ and

$$
\begin{equation*}
W_{n}=4 \frac{L_{n, \text { min }}}{\mu_{e f f, n} C_{O X}}\left(\frac{A_{V}}{V_{S W I N G}}\right)^{2} I_{S S} \tag{7.23b}
\end{equation*}
$$

if $I_{S S}>I_{\text {LOW }}$, where $I_{\text {LOW }}$ is given by relationship (7.21), and the channel length $L_{n}$ is always set to its minimum value $L_{n, \text { min }}$. Intuitively, an increase of $I_{S S}$ determines a decrease of $R_{D}$ to maintain the same $V_{\text {SWING }}$, thus $W_{p} / L_{p}$ must be increased. Accordingly, to keep $A_{V}$ to the desired value, the decrease of $R_{D}$ must be compensated by increasing the NMOS transconductance through their aspect ratio $W_{n} / L_{n}$.

It is worth noting that the condition $I_{\text {LOW }}<I_{H I G H}$ is always satisfied. This can be shown by approximating $R_{D}$ to its intrinsic contribution $R_{\text {int }}$ in (2.36), which dominates over the contribution due to parasitic source/drain resistance. Under this approximation, the ratio $I_{\text {HIGHI }} / I_{\text {LOW }}$ results to

$$
\begin{align*}
& \frac{I_{H I G H}}{I_{\text {LOW }}}=2\left(A_{V}\right)^{2} \frac{\mu_{e f f, p}}{\mu_{e f f, n}} \frac{\frac{W_{p, \text { min }}}{L_{p, \text { min }}}}{\frac{W_{n, \text { min }}}{L_{n, \text { min }}}} \frac{V_{D D}-\left|V_{T, p}\right|}{V_{S W I N G}} \\
& \approx\left(A_{V}\right)^{2} \frac{V_{D D}-\left|V_{T, p}\right|}{V_{\text {SWING }}}>1 \tag{7.24}
\end{align*}
$$

where the mobility ratio between PMOS and NMOS transistor is in the order of $1 / 2$ for current technologies, the ratio of minimum aspect ratio of PMOS and NMOS transistors is about unity, $A_{V}$ is surely greater than unity and $V_{D D}-\left|V_{T, p}\right| \gg V_{\text {SWING }}$. Therefore, (7.24) is largely greater than unity, confirming that $I_{\text {LOW }}<I_{\text {HIGH }}$.

Design criteria to size transistors' aspect ratios versus the bias current are summarized in Table 7.1. In particular, design equations of transistor sizes change according to the three possible bias current ranges: low current (L) for $I_{S S}<I_{\text {LOW }}$, medium current (M) for $I_{L O W} \leq I_{S S} \leq I_{H I G H}$, and high current (H) for $I_{S S}>\perp_{\text {HIGH }}$. In practical cases, logic gates are typically biased in the M or H range.

TABLE 7.1

|  | $\mathbf{L}\left(I_{S}<I_{\text {LOW }}\right)$ | $\mathbf{M}\left(I_{L O W} \leq I_{B} \leq I_{H I G H}\right)$ | $\mathbf{H}\left(I_{B}>I_{H I G H}\right)$ |
| :---: | :---: | :---: | :---: |
| $W_{n}$ | $W_{n, \text { min }}$ | eq. $(7.23 \mathrm{~b})$ | eq. 7.23 b$)$ |
| $L_{n}$ | $L_{n, \text { min }}$ | $L_{n, \text { min }}$ | $L_{n, \text { min }}$ |
| $W_{p}$ | $W_{p, \text { min }}$ | $W_{p, \text { min }}$ | eq. 7.22 b$)$ |
| $L_{p}$ | eq. $(7.22 \mathrm{a})$ | eq. $(7.22 \mathrm{a})$ | $L_{p, \text { min }}$ |

### 7.4 OPTIMIZED DESIGN OF THE SOURCE-COUPLED INVERTER

In this section, the design of the SCL inverter gate is addressed. To this aim, it is demonstrated that its delay expression can be expressed in the form of relationship (7.2) as a function of bias current, as well as parameters $V_{S W I N G}$ and $A_{V}$ previously assigned according to the strategy discussed in Section 7.3.

Delay expression (6.6) derived in Chapter 6 is a function of PMOS equivalent resistance $R_{D}$ and transistors' capacitances. Therefore, to make delay dependence on $I_{S S}, V_{S W I N G}$ and $A_{V}$ explicit, transistor aspect ratio
expressions evaluated in the previous section must be substituted in parasitic capacitances expressions. In particular, parasitic capacitances dependence on $I_{S S}$ changes according to the range ( $\mathrm{L}, \mathrm{M}$ or H ) which the bias current belongs to.

### 7.4.1 Delay expression versus bias current and logic swing in region $M$

Consider an SCL inverter gate biased in the region M, where, according to Table 7.1, $W_{n}$ and $L_{p}$ are given by (7.23b) and (7.22a), respectively, while $L_{n}$ and $W_{p}$ are minimum. By substituting these transistor sizes into capacitances described in Section 6.2, expressions of capacitances $C_{g d, n}$, $C_{d b, n}, C_{g d, p}$ and $C_{d b, p}$ can be shown to have the same dependence on the bias current, that can be expressed in the following compact way

$$
\begin{equation*}
C_{x y}^{M}=\frac{a_{x y}^{M}}{V_{S W I N G}^{2}} I_{S S}+b_{x y}^{M} \frac{V_{S W I N G}}{I_{S S}}+c_{x y}^{M} \tag{7.25}
\end{equation*}
$$

where $x$ and $y$ are the terminals that capacitance considered refers to, and the superscript refers to biasing region M . For instance, $C_{g d, n}^{M}$ represents the NMOS gate-to-drain capacitance expression for $I_{S S}$ ranging in the interval M , and its dependence on $I_{S S}$ is described by the associated coefficients $a_{g d, n}^{M}$, $b_{g d, n}^{M}$ and $c_{g d, n}^{M}$. As an example, coefficients of capacitance $C_{g d, n}^{M}$ in the region M are evaluated by substituting expression (7.23b) of $W_{n}$ into the general expression of $C_{g d, n}$ discussed in Section 6.2.2

$$
\begin{equation*}
C_{g d, n}=C_{g d 0, n} W_{n}=4 C_{g d 0, n} \frac{L_{n, \text { min }}}{\mu_{e f f, n} C_{o X}}\left(\frac{A_{V}}{V_{S W I N G}}\right)^{2} I_{S S} \tag{7.26}
\end{equation*}
$$

which, by comparison with relationship (7.25), leads to $b_{g d, n}^{M}$ and $c_{g d, n}^{M}$ equal to zero, and $a_{g d, n}^{M}$ equal to expression reported in Table 7.2. To analyze another example, consider expression of capacitance $C_{g d, p}^{M}$ evaluated in Section 6.2.1 and made it more explicit by substituting PMOS transistors' sizes (7.22a) pertaining to region M , thus obtaining

$$
\begin{align*}
C_{g d, p}= & \frac{3}{8} A_{b u l k, \max } \mu_{e f f, p} C_{O X}^{2} W_{p, \min }^{2}\left(V_{D D}-\left|V_{T, p}\right|\right) \frac{1}{I_{S S}^{2}}+C_{g d 0, p} W_{p, \min }+ \\
& -\frac{3}{4} A_{b u l k, \max } \mu_{e f f, p} C_{O X}^{2} W_{p, \min }\left(V_{D D}-\left|V_{T, p}\right|\right) R_{D S W} 10^{-6} \tag{7.27}
\end{align*}
$$

from which, by comparison to (7.25), coefficient $a_{g d, p}^{M}$ results equal to zero, while $b_{g d, p}^{M}$ and $c_{g d, p}^{M}$ result equal to the relationships reported in Table 7.2.

By following the same procedure, analytical expressions of the three coefficients for all the transistor capacitances are explicitly reported in Table 7.2 (those not included are equal to zero).

## TABLE 7.2

| $a_{g d, n}^{M}$ | $4 A_{V}^{2} C_{g d 0, n} \frac{L_{n, \text { min }}}{\mu_{e f f, n} C_{O X}}$ |
| :--- | :---: |
| $a_{d b, n}^{M}$ | $4 A_{V}^{2}\left(K_{j, n} C_{j, n} L_{d, n}+2 K_{j s w, n} C_{j s w, n}\right) \frac{L_{n, \text { min }}}{\mu_{e f f, n} C_{o X}}$ |
| $c_{d b, n}^{M}$ | $2 K_{j s w, n} C_{j s w, n} L_{d, n}$ |
| $b_{g d, p}^{M}$ | $\frac{3}{8} A_{b u l k, \text { max }} \mu_{e f f, p} C_{O X}^{2} W_{p, \text { min }}^{2}\left(V_{D D}-\left\|V_{T, p}\right\|\right)$ |
| $c_{g d, p}^{M}$ | $C_{g d 0, p} W_{p, \text { min }}-\frac{3}{4} A_{\text {bulk, max }} \mu_{e f f, p} C_{o X}^{2} W_{p, \text { min }}\left(V_{D D}-\left\|V_{T, p}\right\|\right) R_{D S W} 10^{-6}$ |
| $c_{d b, p}^{M}$ | $K_{j, p} C_{j, p} L_{d, p} W_{p, \min }+2 K_{j s w, p} C_{j s w, p}\left(L_{d, p}+W_{p, \min }\right)$ |

The general capacitance expression (7.25) could have been expected before detailed calculations by considering that design criteria for transistors' aspect ratios in the previous section lead to one dimension to be minimum and the other which varies with $I_{S S}$. In particular, for $I_{S S}$ belonging to the region M, from Table 7.1 and relationship (7.23b) it is apparent that in NMOS transistors the channel width proportionally varies with $I_{S S}$, while
from (7.22a) the PMOS channel length inversely proportionally depends on $I_{s j}$. As a consequence, each capacitance of NMOS transistors consists of a constant term and one proportional to $I_{\mathrm{SS}}$, while the PMOS capacitances contain a constant term and one inversely proportional to $I_{s s}$. These considerations provide an intuitive understanding of relationship (7.25).

From inspection of Table 7.2, coefficients $a_{x y}^{M}, b_{x y}^{M}$ and $c_{x y}^{M}$ only depend on the previously assigned value of $A_{V}$, as well as the process used and supply voltage that affects the evaluation of junction capacitances, hence in the design they are constant.

Relationship (7.25) can be suitably used to express the sum of capacitances in the delay model (6.6) of the SCL inverter as an explicit function of bias current and logic swing, assuming the gate to be biased in region M . Indeed, by using capacitance expression (7.25) with coefficient evaluated in Table 7.2, the sum of capacitances $C_{g d, n}+C_{d b, n}+C_{g d, p}+C_{d b, p}$ can be written as

$$
\begin{equation*}
C_{g d, n}+C_{d b, n}+C_{g d, p}+C_{d b, p}=\frac{a^{M}}{\left(V_{S W I N G}\right)^{2}} I_{S S}+b^{M} \frac{V_{S W I N G}}{I_{S S}}+c^{M} \tag{7.28}
\end{equation*}
$$

where coefficients $a^{M}, b^{M}$ and $c^{M}$ are defined as

$$
\begin{align*}
& a^{M}=a_{g d, n}^{M}+a_{d b, n}^{M}+a_{g d, p}^{M}+a_{d b, p}^{M}=a_{g d, n}^{M}+a_{d b, n}^{M}  \tag{7.29a}\\
& b^{M}=b_{g d, n}^{M}+b_{d b, n}^{M}+b_{g d, p}^{M}+b_{d b, p}^{M}=b_{g d, p}^{M}  \tag{7.29b}\\
& c^{M}=c_{g d, n}^{M}+c_{d b, n}^{M}+c_{g d, p}^{M}+c_{d b, p}^{M}=c_{g d, n}^{M}+c_{g d, p}^{M}+c_{d b, p}^{M} \tag{7.29c}
\end{align*}
$$

and those coefficients equal to zero were omitted. By substituting relationships (7.29) and (7.15) into eq. (6.6), delay of an SCL inverter biased in region M can be expressed as

$$
\begin{equation*}
\tau_{P D, S C L}=0.35 \cdot V_{\text {SWING }}\left[\frac{a^{M}}{\left(V_{S W I N G}\right)^{2}}+b^{M} \frac{V_{S W I N G}}{I_{S S}^{2}}+\frac{c^{M}+C_{L}}{I_{S S}}\right] \tag{7.30}
\end{equation*}
$$

that has the same form as relationship (7.2), as anticipated in Section 7.2.
7.4.2 Delay expression versus bias current and logic swing in region $L$ and $H$

Now let us consider an SCL gate biased in the region L, whose NMOS transistors have minimum size and PMOS transistors have $W_{p}=W_{p, \min }$ and $L_{p}$ given by relationship (7.22a), from Table 7.1. By substituting these sizes into capacitances' expressions discussed in Section 6.2 and following the same procedure as that used in Section 7.4.1, expressions of capacitances $C_{g d, n}$, $C_{d b, n}, C_{g d, p}$ and $C_{d b, p}$ are shown to have the same dependence on the bias current as in relationship (7.25), where superscript $M$ must be replaced by $L$. Analytical expressions of the non-zero coefficients for all the transistor capacitances are reported in Table 7.3, whose inspection shows that coefficients in Table 7.3 again depend only on the process used and supply voltage.

TABLE 7.3

| $c_{g d, n}^{L}$ | $C_{g d 0, n} W_{n, \text { min }}$ |
| :---: | :---: |
| $c_{d b, n}^{L}$ | $K_{j, n} C_{j, n} L_{d, n} W_{n, \text { min }}+2 K_{j s w, n} C_{j s w, n}\left(L_{d, n}+W_{n, \text { min }}\right)$ |
| $b_{g d, p}^{L}$ | $b_{g d, p}^{M}$ |
| $c_{g d, p}^{L}$ | $c_{g d, p}^{M}$ |
| $c_{d b, p}^{L}$ | $c_{d b, p}^{M}$ |

By reiterating the same procedure developed in region M , the sum of capacitances $C_{g d, n}+C_{d b, n}+C_{g d, p}+C_{d b, p}$ in region L results to

$$
\begin{equation*}
C_{g d, n}+C_{d b, n}+C_{g d, p}+C_{d b, p}=\frac{a^{L}}{V_{S W I N G}^{2}} I_{S S}+b^{L} \frac{V_{S W I N G}}{I_{S S}}+c^{L} \tag{7.31}
\end{equation*}
$$

where coefficients $a^{L}, b^{L}$ and $c^{L}$ are

$$
\begin{equation*}
a^{L}=a_{g d, n}^{L}+a_{d b, n}^{L}+a_{g d, p}^{L}+a_{d b, p}^{L}=0 \tag{7.32a}
\end{equation*}
$$

$$
\begin{align*}
& b^{L}=b_{g d, n}^{L}+b_{d b, n}^{L}+b_{g d, p}^{L}+b_{d b, p}^{L}=b_{g d, p}^{L}  \tag{7.32b}\\
& c^{L}=c_{g d, n}^{L}+c_{d b, n}^{L}+c_{g d, p}^{L}+c_{d b, p}^{L} \tag{7.32c}
\end{align*}
$$

By substituting relationships (7.31)-(7.32) and (7.15) into eq. (6.6), the delay of an SCL inverter biased in region L can be written as

$$
\begin{align*}
\tau_{P D, S C L} & =0.35 \cdot V_{\text {SWING }}\left(\frac{a^{L}}{V_{S W I N G}^{2}}+b^{L} \frac{V_{S W I N G}}{I_{S S}^{2}}+\frac{c^{L}+C_{L}}{I_{S S}}\right) \\
& =0.35 \cdot V_{\text {SWING }}\left(b^{L} \frac{V_{\text {SWING }}}{I_{S S}^{2}}+\frac{c^{L}+C_{L}}{I_{S S}}\right) \tag{7.33}
\end{align*}
$$

whose dependence on the bias current is easily understandable by considering that $L_{p}$ in relationship (7.22a) consists of a constant term and one inversely proportional to the bias current, while other transistors' dimensions are minimum.

By reiterating the same procedure and substituting transistor sizes reported in Table 7.1, the delay of an SCL inverter biased in region H can be expressed as

$$
\begin{equation*}
\tau_{P D, S C L}=0.35 \cdot V_{S W I N G}\left(\frac{a^{H}}{V_{S W I N G}^{2}}+b^{H} \frac{V_{S W I N G}}{I_{S S}^{2}}+\frac{c^{H}+C_{L}}{I_{S S}}\right) \tag{7.34}
\end{equation*}
$$

where coefficients $a^{H}, b^{H}$ and $c^{H}$ are

$$
\begin{align*}
& a^{H}=a_{g d, n}^{H}+a_{d b, n}^{H}+a_{g d, p}^{H}+a_{d b, p}^{H}  \tag{7.35a}\\
& b^{H}=b_{g d, n}^{H}+b_{d b, n}^{H}+b_{g d, p}^{H}+b_{d b, p}^{H}=0  \tag{7.35b}\\
& c^{H}=c_{g d, n}^{H}+c_{d b, n}^{H}+c_{g d, p}^{H}+c_{d b, p}^{H}=c_{d b, n}^{H}+c_{d b, p}^{H} \tag{7.35c}
\end{align*}
$$

whose terms are explicitly reported in Table 7.4.

TABLE 7.4

| $a_{g d, n}^{H}$ | $a_{g d, n}^{M}$ |
| :---: | :---: |
| $a_{d b, n}^{H}$ | $a_{d b, n}^{M}$ |
| $c_{d b, n}^{H}$ | $c_{d b, n}^{M}$ |
| $a_{g d, p}^{H}$ | $\begin{aligned} & 2\left(C_{g d 0, p}+\frac{3}{4} A_{b u l k, \max } C_{O X} L_{p, \min }\right) V_{S W I N G} \\ & \mu_{e f f, p} C_{O X}\left(V_{D D}-\left\|V_{T, p}\right\|\right)\left[1-\frac{\mu_{p, \min , p} C_{O X}}{L_{p, \text { min }}} R_{D S W} 10^{-6}\left(V_{D D}-\left\|V_{T, p}\right\|\right)\right] \end{aligned}$ |
| $a_{d b, p}^{H}$ | $\begin{aligned} & 2\left(K_{j, p} C_{j, p} L_{d, p}+2 K_{j s w, p} C_{j s w, p}\right) V_{S W I N G} \\ & \frac{L_{p, \text { min }}}{\mu_{e f f, p} C_{O X}\left(V_{D D}-\left\|V_{T, p}\right\|\right)\left[1-\frac{\mu_{e f f, p} C_{O X}}{L_{p, \text { min }}} R_{D S W} 10^{-6}\left(V_{D D}-\left\|V_{T, p}\right\|\right)\right]} \end{aligned}$ |
| $c_{d b, p}^{H}$ | $2 K_{j s w, p} C_{j s w, p} L_{d, p}$ |

(zero-valued coefficients are not reported)
7.4.3 Extension of the delay model in the region $M$ to region $L$ and $H:$ a unified expression of delay and remarks

From relationships $(7.30),(7,33)$ and $(7.34)$, the delay of an SCL inverter in each biasing region can be expressed in the same form. However, a compact delay model (i.e., defined by a single equation) would be more useful for design, and to this aim it will be shown that delay of an SCL inverter can be expressed in the compact form of relationship (7.2) regardless of the biasing region.

To obtain a compact delay model, it is useful to observe that the delay expression of the SCL inverter biased in region M can be approximately extended to other regions. Indeed, according to Table 7.1, for low bias currents (both in the range L and at the beginning of region M ), the dominant contribution to delay is that of $C_{g d, p}$ due to the high value of $L_{p}$, whose expression (7.22a) is the same in both ranges L and M . Therefore, delay equation in the range $M$ can be extended to region $L$ without significant
error. Analogously, for high bias currents (both in the range H and at the end of range M), capacitances $C_{d b, n}$ and $C_{g d, n}$ are dominant due to the high value of $W_{n}$, whose expression (7.23b) is the same in both ranges M and H , allowing to extend the model valid in the range M to the interval H . As a result, the delay model (7.30), valid in region M, well approximates the delay expression regardless of the biasing region. Thus the validity of relationship (7.2) for the SCL inverter, using coefficients $a, b$ and $c$ of the region $M$ is confirmed

$$
\begin{align*}
& a=a^{M}=a_{g d, n}^{M}+a_{d b, n}^{M}  \tag{7.36a}\\
& b=b^{M}=b_{g d, p}^{M}  \tag{7.36b}\\
& c=c^{M}=c_{d b, n}^{M}+c_{g d, p}^{M}+c_{d b, p}^{M} \tag{7.36c}
\end{align*}
$$

As an example, for the $0.35-\mu \mathrm{m}$ CMOS process used and assuming $V_{\text {SWING }}=700 \mathrm{mV}, A_{V}=4$ and $V_{D D}=3.3 \mathrm{~V}$, the model developed in region M used for the other regions leads to the error in Fig. 7.2 (which is plotted versus $I_{S S}$ in logarithmic scale and under the worst case $C_{L}=0$ ).


Fig. 7.2. Error of the delay model along the three region using that in region M for the worst case $C_{L}=0 \mathrm{~F}$.

From Fig. 7.2, it is apparent that the error is always lower than $14 \%$, and it can be shown that for more realistic load conditions it is in the order of a few percentage points.

From an analytical point of view, extension of relationship (7.30) to region H is easily understood by comparing coefficients $a, b$ and $c$ in the three regions. To be more specific, extension from region M to H is justified by observing that $a^{H} \approx a^{M}$ (since at the boundary the term proportional to $a$ dominates over that proportional to $b$ ), while extension from region M to L is understood by observing that $b^{L}=b^{M}$ (at the boundary the term proportional to $b$ dominates over that proportional to $a$ ). As an example, these results are confirmed by numerical values of coefficients in the three regions for the $0.35-\mu \mathrm{m}$ CMOS process used and assuming $V_{\text {SWING }}=700 \mathrm{mV}, A_{V}=4$ and $V_{D D}=3.3 \mathrm{~V}$, reported in Table 7.5.

TABLE 7.5

| L | M | H |
| :--- | :--- | :--- |
| $a^{L}=0$ | $a^{M}=2.26 \mathrm{E}-10$ | $a^{H}=2.7 \mathrm{E}-10$ |
| $b^{L}=7.47 \mathrm{E}-20$ | $b^{M}=7.47 \mathrm{E}-20$ | $b^{H}=0$ |
| $c^{L}=2.48 \mathrm{E}-15$ | $c^{M}=1.82 \mathrm{E}-15$ | $c^{H}=1.23 \mathrm{E}-15$ |

Now, let us introduce a circuit interpretation of terms included in the general delay model (7.30) of an SCL inverter, since their meaning is less evident than the model (6.6) developed in the previous chapter.

From relationship (7.29), term $a^{M}$ is due only to NMOS transistors, while $b^{M}$ only to PMOS transistors. Moreover, in relationship (7.30) term $c^{M}$ which is due to both devices and can in general be neglected ${ }^{3}$. As a consequence, the term $a^{M} I_{S S} / V_{S W I N G}^{2} \approx C_{g d, n}+C_{d b, n}$ in (7.30) models the NMOS transistors' capacitance at the output node. Analogously, the term $b^{M} V_{\text {SWING }} / I_{S S} \approx C_{g d, p}+C_{d b, p}$ represents the equivalent capacitance at the output node associated with PMOS transistors.

From the circuit interpretation of addends in delay model (7.30), some interesting observations can be derived. First, capacitances at the output

[^14]node associated with NMOS and PMOS transistors are equal when $a^{M} I_{S S} /\left(V_{S W I N G}\right)^{2}=b^{M} V_{S W I N G} / I_{S S}$. Solving for $I_{S S}$, this means that NMOS and PMOS capacitances are equal for $I_{S S}$ given by relationship (7.5) which minimizes the power-delay product. Hence, we can interpret the powerefficient design criteria as that leading to an equal contribution to the delay of NMOS and PMOS transistors.

In the high-performance design (i.e., $I_{S S} \gg I_{S S, \text { opt } P_{-} P D P}$ ), the term $a^{M} I_{\text {SS }} / V_{\text {SWING }}^{2}$ dominates over $b^{M} V_{\text {SWING }} / I_{\text {SS }}$, and delay is essentially due to NMOS capacitances and the design criteria (7.8) with strict equality brings the equivalent NMOS capacitance to be equal to the load capacitance $C_{L}$ (thus, no significant advantage would be achieved for higher values of $I_{S S}$, since the NMOS capacitances would excessively self-load the gate). In contrast, in the low-power design (i.e., $I_{S S} \ll I_{S S, \text { opt_PDP }}$ ) the dominant contribution is that from the PMOS transistors.

### 7.4.4 Design criteria and examples

As was demonstrated in the previous section, delay of an SCL gate can be modeled by (7.2) (or, equivalently, (7.30)). As a consequence, design criteria introduced in Section 7.2 can be applied to size design parameters of an SCL inverter gate.

Regarding the power-efficient design in Section 7.2.1, the bias current (7.5) that leads to an optimal power-delay trade-off becomes

$$
\begin{equation*}
I_{S S, \text { opt } \_ \text {PDP }}=0.31 \sqrt{\mu_{e f f, n} \mu_{e f f, p}} C_{O X}^{\frac{3}{2}} \frac{1}{A_{V}} . \tag{7.37}
\end{equation*}
$$

$$
\sqrt{\frac{W_{p, \text { min }}^{2}}{L_{n, \text { min }}} \frac{A_{b u k, \text { max }}\left(V_{D D}-\mid V_{T, p}\right)}{C_{g d 0, n}+K_{j, n} C_{j, n} L_{d, n}+2 K_{j s w, n} C_{j s w, n}}}\left(V_{S W I N G}\right)^{\frac{3}{2}}
$$

where relationship (7.29) together with relationships in Table 7.2 were substituted. Regarding process parameters, bias current (7.37) for a powerefficient design is proportional to the geometric average of the NMOS and PMOS mobility, and depends on the oxide capacitance as $C_{O X}^{3 / 2}$. Moreover, regarding design parameters, (7.37) is proportional to $V_{S W I N G}{ }^{3 / 2}$ and inversely proportional to the voltage gain $A_{V}$, as qualitatively predicted in Section 7.3.1. It is worth noting that capacitance at the denominator within the square root can be interpreted as the capacitance per unit channel width of NMOS transistors (i.e. it includes both $C_{g d, n}$ and $C_{d b, n}$ ).

By substituting (7.29)-(7.30) into (7.6), the resulting (minimum) powerdelay product under bias current (7.37) is

$$
\begin{align*}
P D P_{o p t, S C L} & =0.35 \cdot V_{D D} I_{S S, \text { opt }}^{-} \text {PDP }
\end{align*} V_{S W I N G} \cdot\left[\frac{a^{M}}{\left(V_{S W I N G}\right)^{2}}+b^{M} \frac{V_{S W I N G}}{I_{S S, \text { opt }}^{2} P D P}+\frac{c^{M}+C_{L}}{I_{S S, \text { opt } P_{-} P D P}}\right]
$$

where coefficient $c^{M}$ was neglected. From (7.38), for a high load capacitance $C_{L}$ that dominates over the second addend (i.e. $C_{L}$ is much higher than the parasitic capacitance at the output node, consisting of NMOS and PMOS contributions both equal to $\sqrt{a^{M} b^{M} / V_{\text {SIING }}}$, from (7.28) and Table 7.2), power-delay product is roughly equal to $0.35 V_{D D} V_{S W I N G} C_{L}$, hence it can be proportionally reduced by reducing logic swing, while satisfying noise margin requirement. Analogously, the optimum $P D P$ can be reduced by reducing supply voltage, while keeping it sufficiently high to allow a correct operation of an assigned number of series gating levels, as discussed in Section 2.5.4. When the load capacitance is negligible as compared to parasitic capacitances, (7.38) is proportional to voltage gain, as well as to square root of the oxide capacitance and the logic swing.

As an example, using the $0.35-\mu \mathrm{m}$ technology used and setting $A_{V}=4$, $V_{S W I N G}=700 \mathrm{mV}, I_{\text {SS,opt_PDP }}$ results in $10.7 \mu \mathrm{~A}$, with aspect ratio of NMOS and PMOS transistors equal to $4.2 / 0.3$ and $0.6 / 1.8$, respectively. The sum of equal PMOS and NMOS parasitic capacitances $2 \sqrt{a^{M} b^{M} / V_{S W I N G}}$ is equal to 9.8 fF . Assuming a load capacitance of 50 fF , delay predicted by (7.30) with data in Table 7.5 is 1.42 ns , that differs by $20 \%$ with respect to simulated value of 1.77 ns .

Regarding the high-speed design discussed in Section 7.2.2, the bias current (7.10) that leads to an almost minimum delay with reasonable power consumption can be simplified in two cases which differ for the value of the load capacitance, $C_{L}$.

When a high value of $C_{L}$ loads the gate, such that $4 a b /\left[\left(c+C_{L}\right)^{2} V_{S W I N G}\right] \ll 1$ (for the CMOS process considered, this occurs when $C_{L}$ is greater than the minimum gate capacitance by more than one order of magnitude), the optimum bias current results to

$$
\begin{equation*}
I_{S S, \text { opt_delay }} \approx \frac{C_{L}}{4 \frac{L_{n, \text { min }}}{\mu_{e f f, n} C_{O X}}\left(C_{g d 0, n}+K_{i, n} C_{i, n} L_{d, n}+2 K_{j s w, n} C_{i s w, n}\right)}\left(\frac{V_{S W I N G}}{A_{V}}\right)^{2} \tag{7.39}
\end{equation*}
$$

Relationship (7.39) is found by substituting (7.29) into (7.10) and neglecting the second addend under the square root. From inspection of (7.39), bias current for high speed increases proportionally to $C_{L}, V_{\text {SWING }}^{2}$ and $1 / A_{V}^{2}$.

In the other case, when a very low load capacitance is assumed, $I_{S S, \text { opt_delay }}$ equates $I_{S S, \text { opt_PDP }}$ given by relationship (7.37), as already discussed in Section 7.2.2. From (7.9) and (7.29), the delay achieved for high-speed design in an SCL inverter results to

$$
\begin{equation*}
\tau_{P D, s C L}=2.76 \frac{A_{V}^{2} \frac{L_{n, \min }}{\mu_{e f f, n} C_{O X}}\left(C_{g d 0, n}+K_{j, n} C_{j, n} L_{d, n}+2 K_{j s w, n} C_{j s w, n}\right)}{V_{S W N G}} \tag{7.40}
\end{equation*}
$$

that is proportional to $A_{V}^{2}$, thereby confirming that reduction of the voltage gain is decisive in achieving a high speed performance.

As an example, under the same conditions of the power-efficient design, $I_{S S, \text { opt_delay }}$ results in the high value of $110 \mu \mathrm{~A}$, that leads to $P D P=82 \mathrm{fJ}$ (greater than value of 50 fJ given by (7.38) in the power-efficient case). The aspect ratios of NMOS and PMOS transistors are $41 / 0.3$ and $2.4 / 0.3$, respectively, while the predicted and simulated delay is 226 ps and 196.5 ps .

Finally, regarding the low-power design case discussed in Section 7.2.3, the delay of an SCL inverter results to

$$
\begin{equation*}
\tau_{P D, S C L} \cong 0.13 \cdot A_{\text {bulk }, \text { max }} \mu_{e f f, p} C_{O X}^{2} W_{p, \text { min }}^{2}\left(V_{D D}-\left|V_{T, p}\right|\left(\frac{V_{S W I N G}}{I_{S S}}\right)^{2}\right. \tag{7.41}
\end{equation*}
$$

### 7.4.5 Intuitive understanding of the delay dependence on logic swing and voltage gain in practical design cases

In the previous section, the delay dependence on parameters $V_{\text {SWING }}$ and $A_{V}$ was analytically discussed, and design criteria for such parameters were derived. In this section, an intuitive explanation of such results is provided to gain insight into design aspects of SCL gates.

In low-power design, as observed in Section 7.4.3, the capacitive contribution of PMOS transistors to the output node is much greater than that of NMOS transistors. This is because the PMOS channel length $L_{p}$ is much greater than its minimum value to ensure the desired logic swing. By reducing the logic swing for an assigned bias current, from (7.15) a smaller equivalent PMOS resistance $R_{D}$ must be implemented, which means that a lower values of $L_{p}$ must be used. As a consequence, the logic swing must be kept as low as possible to improve the speed performance, as analytically pointed out in Section 7.4.4. Analogous considerations hold for the powerefficient design.

In a high-speed design, NMOS capacitances are the main contribution to the capacitance at the output node, since from Table 7.1 NMOS channel width $W_{n}$ is much greater than its minimum value, to guarantee the desired $A_{V}$ in (2.44). From this relationship, it becomes apparent that the NMOS channel width (and hence NMOS parasitic capacitances) is reduced by increasing logic swing for a given bias current. This delay dependence on logic swing in a high-speed design makes the fundamental difference between CMOS Current-Mode gates and bipolar ones. Indeed, bipolar gates display a better speed performance when logic swing is reduced at a given bias current, since this leads to a reduction of load resistance $R_{C}$ and thus of time constants associated with capacitances at the output nodes.

In infrequent design cases where bias current is assigned regardless of the criteria discussed so far, an optimum value of logic swing that minimizes delay exists. For the sake of completeness, a brief discussion on this subject will be introduced in Section 7.6.3.

Regarding the delay dependence on the voltage gain, in a power-efficient design the voltage gain should be kept low according to (7.38), since for an assigned value of $I_{S S}$ and $V_{S W I N G}$ this allows reducing NMOS transistors' aspect ratio and thus their parasitic capacitances.

It is worth noting that the choice of $A_{V}$ does not heavily affect $P D P$, as can be noticed from the linear dependence in (7.38). Instead, in a high-speed design it is essential to set $A_{V}$ as low as possible since delay in (7.40) increases proportionally to $A_{V}{ }^{2}$. This is easily explained by observing that in such design cases NMOS capacitances are the dominant contribution, hence from (2.44) a reduction of $A_{V}$ by a given factor $x$ entails a reduction of

NMOS aspect ratio by $x^{2}$. Besides, the decrease in the noise margin (2.47) due to the reduction in $A_{V}$ is not of concern, since it can be compensated by the increase in $V_{S W I N G}$ required by the high-speed design criteria previously discussed. In low-power design cases, delay (7.41) is not affected by $A_{V}$ since the PMOS dominant capacitive contribution only depends on the logic swing, as already discussed in Section 7.3.2.

For the sake of clarity, all guidelines presented until now are summarized in Fig. 7.3 to make design of SCL gates easier.


Fig. 7.3. Summary of delay dependence on logic swing and voltage gain in SCL gates.

### 7.5 OPTIMIZED DESIGN OF THE SOURCE-COUPLED INVERTER WITH OUTPUT BUFFERS

As discussed in Chapter 2, the noise margins of SCL gates with and without output buffer are very close, therefore the expressions of aspect ratios of transistors in the internal SCL inverter introduced in Section 7.3 are still valid. As a consequence, the delay of an SCL inverter is given by relationship (6.7), where (7.2) with $C_{L}=0$ and (6.8) must respectively be substituted to the delay of internal SCL and output buffers

$$
\begin{align*}
\tau_{P D, S C L b u f}= & 0.35 \cdot V_{S W I N G}\left(\frac{a^{M}}{V_{S W I N G}^{2}}+b^{M} \frac{V_{S W I N G}}{I_{S S}^{2}}+\frac{c^{M}}{I_{S S}}\right) \\
& +0.69\left[R_{D}\left(C_{g d, b u f}+h C_{g s, b u f}\right)+\frac{C_{L}+C_{g s, b u f}}{g_{m, b u f}}\right] \\
= & 0.35 \cdot V_{S W I N G}\left(\frac{a^{M}}{V_{S W I N G}^{2}}+b^{M} \frac{V_{S W I N G}}{I_{S S}^{2}}\right. \\
& \left.+\frac{c^{M}+C_{g d, b u f}+h C_{g s, b u f}}{I_{S S}}\right)+\frac{C_{L}+C_{g s, b u f f}}{g_{m, b u f}} \tag{7.42}
\end{align*}
$$

In the following, expressions of buffer transistors' aspect ratio is derived and then applied to develop a design strategy of the SCL inverter with output buffers. In particular, there are two other design parameters compared to the SCL inverter without buffers: the bias current $I_{S F}$ of the buffer and the aspect ratio $W_{\text {buf }} / L_{\text {buf }}$ of its transistors, whose appropriate sizing depends on whether buffers are added to intentionally introduce a level shift (Section 7.5.1) or to improve the speed performance (Section 7.5.2).

### 7.5.1 Buffer used as a level shifter

When the buffer is used to implement a level shifter for reasons discussed in Section 2.5, it reduces the common-mode output voltage of an SCL gate by a $V_{G S}$ voltage equal to

$$
\begin{equation*}
V_{S H I F T}=V_{T, n}+\sqrt{\frac{2 I_{S F}}{\mu_{e f f, b u f} C_{O X} \frac{W_{b u f}}{L_{b u f}}}} \tag{7.43}
\end{equation*}
$$

By inverting this expression, we determine the value of the ratio $W_{b u f} f l_{S F}$ needed to achieve the assigned shift voltage $V_{\text {SHIFT }}$, resulting in

$$
\begin{equation*}
\frac{W_{b u f}}{I_{S F}}=\frac{2 L_{b u f, \min }}{\mu_{e f f, b u f} C_{O X}\left(V_{S H I F T}-V_{T, n}\right)^{2}} \tag{7.44}
\end{equation*}
$$

where $L_{b u f}$ has been assumed minimum sized, as occurs for practical values of $V_{\text {SHIFT }}$.

Now, as done for the SCL inverter without buffer, let us assume the current per gate $I_{\text {gate }}$ to be assigned. This current must be split into $I_{S S}$ and $I_{S F}$ according to a factor $\gamma=I_{S S} / I_{\text {gate }}$ that defines the amount of the total bias current used in the internal SCL (accordingly, $I_{S F}=0.5(1-\gamma) I_{\text {gate }}$ ). For a given $I_{\text {gate }}$, factor $\gamma$ must be optimally chosen to minimize the delay, and extensive numerical analysis shows that in practical cases $\gamma$ is significantly lower than unity (typically $0.1 \div 0.3$ ) and hence $I_{S S}$ is so small $\left(I_{S S} \ll I_{H I G H}\right)$ that the term proportional to $1 / I_{S S}{ }^{2}$ dominates over the others in the internal SCL gate delay model (7.42). Therefore, the delay can be approximated to

$$
\begin{equation*}
\tau_{P D, S C L b u f}=0.35 b^{M} \frac{V_{S W I N G}^{2}}{I_{S S}^{2}}+0.69 \frac{C_{L}}{g_{m, \text { buf }}} \tag{7.45}
\end{equation*}
$$

where $C_{g s, \text { buf }}$ was neglected with respect to load capacitance $C_{L}$. By using (7.44), substituting $I_{s s}=\gamma I_{\text {gate }}$ and $I_{s t}=0.5(1-\gamma) I_{\text {gate }}$, relationship (7.45) can be minimized for $\gamma$ by evaluating its derivative $\partial \tau_{P D, S C L b u f} / \partial \gamma$

$$
\begin{equation*}
\frac{\partial \tau_{P D, S C L b u f}}{\partial \gamma} \approx-0.35 \frac{b^{M} V_{S W I N G}^{2}}{\gamma^{3} I_{\text {gate }}^{2}}+0.69 \frac{C_{L}\left(V_{\text {SHFFT }}-V_{T, n}\right)}{I_{\text {gate }}} \tag{7.46}
\end{equation*}
$$

where $火<1$ has been assumed, as previously discussed. Equating (7.46) to zero and solving for $\gamma$, the value that minimizes $\tau_{P D, S C L b u f}$ is approximately given by

$$
\begin{equation*}
\gamma_{\text {opt }} \approx \sqrt[3]{\frac{b^{M} V_{\text {SWING }}}{C_{L} I_{\text {gate }}\left(V_{\text {SHIFT }}-V_{T, n}\right)}} \tag{7.47}
\end{equation*}
$$

which shows that increasing the load capacitance or the gate current, the fraction of $I_{\text {gate }}$ used for $I_{S S}$ decreases as $\left(I_{\text {gate }}\right)^{-1 / 3}$ and $\left(C_{L}\right)^{-1 / 3}$. Extensive verification of (7.47) shows that it agrees well with optimum $\gamma$ evaluated numerically from exact expression (7.42), and their difference is always lower than $25 \%$, and typically lower than $10 \%$. The effect of this difference on delay is even lower and typically about a few percentage points since the minimum of $\tau_{P D, S C L b u f}$ with respect to $\gamma$ is rather flat, as it can be observed in Fig. 7.4. This figure shows delay versus $\gamma$ assuming $C_{L}=1 \mathrm{pF}, I_{\text {gate }}=30 \mu \mathrm{~A}$, $V_{\text {SHIFT }}=1.2 \mathrm{~V}$, under conditions explained in the previous design examples. In this case, (7.47) provides $\gamma=0.16$, which differs from the exact minimum
0.148 by $7 \%$, while the optimum delay ( 9 ns ) is overestimated only by 0.15\%.


Fig. 7.4. Delay of an SCL inverter with buffer versus $\gamma$ for $C_{L}=1 \mathrm{pF}$, $I_{\text {gate }}=30 \mu \mathrm{~A}, V_{\text {SHIt }_{1}}=1.2 \mathrm{~V}$.

The resulting minimum delay (7.45), with optimum $\gamma$ in (7.47) assumed to be much lower than unity, results to

$$
\begin{align*}
\tau_{P D, S C L b u f, \text { op }} & \approx 0.35 \sqrt[3]{b^{M} C_{L}^{2}\left(V_{S H I F T}-V_{T, n}\right)^{2}}\left(\frac{V_{\text {SWING }}}{I_{\text {gate }}}\right)^{\frac{4}{3}} \\
& +0.69 \frac{C_{L}}{I_{\text {gate }}}\left(V_{S H I F T}-V_{T, n}\right) \tag{7.48}
\end{align*}
$$

which expresses the delay of the SCL inverter versus its total bias current $I_{\text {gate }}$ after optimally distributing it between the internal SCL gate and output buffers. From (7.48), it is apparent that it is always necessary to keep logic swing as low as possible, while $A_{V}$ does not significantly affect delay. Moreover, delay for very low values of $I_{\text {gate }}$ is proportional to $I_{\text {gate }}^{-4 / 3}$, while for
high bias current values it is inversely proportional to $I_{\text {gate }}$. For impractically high values of $I_{\text {gate }}$, the delay tends to a non-zero value that is not accounted for (7.48). It is worth noting that from (7.48) it is possible to derive a powerefficient design criteria by minimizing $P D P=V_{D D} \cdot I_{S S} \tau_{P D, S C L b u f}$ op as already done for the SCL inverter without output buffers.

### 7.5.2 Buffer used to improve speed

In some cases, for a given total gate current $I_{\text {gate }}$, the choice of an SCL gate with buffers (after properly splitting $I_{\text {gate }}$ into $I_{S S}$ and $I_{S F}$ ) leads to a better speed performance than an SCL gate without buffer (biased with $I_{S S}=I_{\text {gate }}$ ). This occurs when the available gate current $I_{\text {gate }}$ is low and the load capacitance $C_{L}$ is large. Indeed, for very high values of $C_{L}$, delay (7.30) of an SCL gate is proportional to $1 / I_{S S}$, while the delay with output buffers (7.42) is proportional to $1 / \sqrt{I_{\text {gate }}}$ (since $g_{m, b u f}$ evaluated in Section 6.3 is proportional to $\left.\sqrt{I_{\text {gate }}}\right)$.

When buffers are used to improve the speed performance, the design parameters $I_{S S}, I_{S F}$ and $W_{b u f}$ must be evaluated. More specifically, design criteria are needed to split the assigned current per gate $I_{g a t e}$ into $I_{S S}$ and $I_{S F}$ as well as to size buffer transistor channel width $W_{\text {buf }}$ (differently from previous subsection, no constraint between $W_{b u f}$ and $I_{S F}$ exists). To this aim, let us rewrite expression of $\tau_{P D, S C L b u f}$ to make its dependence on $W_{\text {buf }}$ more explicit

$$
\begin{align*}
\tau_{P D, S C L b u f} & \cong 0.35 V_{S W I N G}\left[\frac{a^{M}}{V_{S W I N G}^{2}}+b^{M} \frac{V_{S W I N G}}{I_{S S}^{2}}\right. \\
& \left.+\frac{c^{M}+\left(C_{g d, b u f, \text { min }}+h C_{g s, b u f, \text { min }}\right) w}{I_{S S}}\right]+0.69 \frac{C_{L}+w C_{g s, b u f, \text { min }}}{\sqrt{w} g_{m, b u f, \text { min }}} \tag{7.49}
\end{align*}
$$

that has been obtained by expressing into (7.42) capacitances $C_{g d, b u f,} C_{g s, b u f}$ as the product of their value at minimum $W_{b u f}\left(\right.$ named $\left.C_{g d, b u f, m i n}, C_{g s, b u f, m i n}\right)$ and the factor

$$
\begin{equation*}
w=\frac{W_{b u f}}{W_{b u f, \min }} \tag{7.50}
\end{equation*}
$$

that represents the channel width normalized to the minimum allowed. Moreover, from (1.66) and (2.44), $g_{m, \text { buf }}$ results in its value $g_{m, b u f, \text { min }}$ at
minimum $W_{\text {buf }}$ multiplied by $\sqrt{w}$ (minimum channel length has been assumed, as before). Since (7.50) has to be optimized both for $\gamma$ and $w$, for the sake of simplicity first assume $w=1$ (i.e., minimum $W_{b u f}$ ) and evaluate the optimum ratio $\gamma=I_{S S} / I_{\text {gate }}$ as in the previous subsection, and then optimize $w$ itself.

In the following, let us assume term $b^{M} V_{S W I N G} / I_{S S}{ }^{2}$ in (7.49) being dominant over the others in brackets (as in the previous subsection) and $C_{L}$ being much greater than $C_{g s, \text { buf }}$ (generally satisfied in practical cases since addition of buffers makes sense only for high $C_{L}$ ). Accordingly, the derivative of (7.49) with respect to $\gamma$ with $w=1$ results to

$$
\begin{align*}
\frac{\partial \tau_{P D, S C L \text { buf }}}{\partial \gamma} & \approx \frac{\partial}{\partial \gamma}\left[0.35 V_{\text {SWING }} \frac{b^{M}}{\gamma^{2} I_{g a t e}^{2}}+\frac{0.69 C_{L}}{\sqrt{\mu_{e f f, b u f} C_{O X} \frac{W_{b u f, \text { min }}}{L_{b u f, \text { min }}}(1-\gamma) I_{g a t e}}}\right]  \tag{7.51}\\
& \approx-0.35 V_{\text {SWING }} \frac{2 b^{M}}{\gamma^{3} I_{\text {gate }}^{2}}+\frac{0.69 C_{L}}{2 \sqrt{\mu_{e f f, b u f} C_{o x} \frac{W_{b u f, \text { min }}}{L_{b u f, \text { min }}} I_{\text {gate }}}}
\end{align*}
$$

that, equated to zero, gives the following value of $\gamma$ that minimizes $\tau_{P D, S C L b u f}$

$$
\begin{equation*}
\gamma_{o p t} \approx \sqrt[3]{\frac{2 b^{M} V_{S W I N G}}{C_{L}} \sqrt{\mu_{e f f, b u f} C_{o X} \frac{W_{b u f}}{L_{b u f}}}} \frac{1}{\sqrt{I_{\text {gate }}}} \tag{7.52}
\end{equation*}
$$

From (7.52), the fraction $\gamma$ of $I_{g a t e}$ used in $I_{S S}$ (i.e., in the internal SCL gate) decreases as $\left(C_{L}\right)^{-1 / 3}$ as increasing load capacitance, while it decreases as $\left(I_{g a t e}\right)^{-1 / 2}$ as increasing the gate current. Verification of (7.52) shows that, in the cases of interest (i.e., when the SCL gate with buffers is faster than that without buffer) it agrees well with optimum $\gamma$, being the difference always lower than $30 \%$ of the latter (typically lower than 15\%), and the difference of associated delay of a few percentage points. As an example, Fig. 7.5 shows delay of SCL gate versus $\gamma$ assuming $C_{L}=1 \mathrm{pF}, I_{\text {gate }}=30 \mu \mathrm{~A}$, under conditions explained in the previous design examples. In this case, relationship (7.52) provides $\gamma=0.18$, while the exact values is 0.17 , leading to about the same delay of 11.4 ns .


Fig. 7.5. Delay of an SCL gate with buffer versus $\gamma$ for $C_{L}=1 \mathrm{pF}$ and

$$
I_{\text {gate }}=30 \mu \mathrm{~A} .
$$

Once factor $\gamma$ is optimized, a further optimization is possible by properly setting normalized transistor channel width, $w$, after substituting $I_{S S}=\gamma_{\text {opt }} I_{\text {gate }}$ and $I_{S F}=0.5\left(1-\gamma_{o p t}\right) I_{\text {gate }}$. Intuitively, the contribution of load capacitance to delay (7.49) can be reduced by increasing $w$, that in turn determines an increase of the terms associated with $C_{g d, b u f}$ and $C_{g s, b u f .}$. Hence, an optimum value of $w$ exists, and can be found by differentiating (7.49) for $w$

$$
\begin{align*}
\frac{\partial}{\partial w} \tau_{P D, S C L b u f} & \approx \frac{\partial}{\partial w}\left[0.35 \frac{V_{\text {SWING }}}{\gamma_{\text {opt }} I_{\text {gate }}}\left(C_{g d, \text { buf }, \text { min }}+h C_{g s, b u f, \text { min }}\right) w+\frac{0.69 C_{L}}{\sqrt{w} g_{m, b u f, \text { min }}}\right] \\
& \approx 0.35 \frac{V_{\text {SWING }}}{\gamma_{\text {opt }} I_{\text {gate }}}\left(C_{g d, b u f, \text { min }}+h C_{g s, b u f, \text { min }}\right)  \tag{7.53}\\
& -\frac{0.69 C_{L}}{2 w^{\frac{3}{2}} \sqrt{\mu_{e f f, b u f} C_{O X} \frac{W_{b u f, \text { min }}}{L_{b u f, \text { min }}}\left(1-\gamma_{\text {opt }}\right) I_{\text {gate }}}}
\end{align*}
$$

Setting (7.53) to zero and solving for $w$ leads to its optimum value, $w_{\text {opt }}$, that minimizes delay

$$
\begin{equation*}
w_{o p t} \approx \sqrt[3]{\frac{\gamma_{o p t}^{2}}{1-\gamma_{o p t}} \frac{C_{L}^{2} I_{g a t e}}{\mu_{e f f, b u f} C_{O X} \frac{W_{b u f, \text { min }}}{L_{b u f, \text { min }}} V_{S W I N G}^{2}\left(C_{g d, b u f, \text { min }}+h C_{g s, b u f, \text { min }}\right)^{2}}} \tag{7.54}
\end{equation*}
$$

In the cases of interest, (7.54) exhibits an error lower than $25 \%$ with respect to numerical evaluation, and delay is within $9 \%$ of numerically minimized results due to the flat minimum, as can be noticed in Fig. 7.6, that shows delay of SCL gate versus $w$ assuming $C_{L}=1 \mathrm{pF}, I_{\text {gate }}=30 \mu \mathrm{~A}$, under conditions explained in the previous design examples. In this case, equation (7.54) gives $\alpha=68$, while the exact values is 63 , and the corresponding delay values are 2.65 ns and 2.74 ns , respectively, differing by $3.3 \%$.


Fig. 7.6. Example of delay of an SCL gate with buffer versus $\alpha$ for $C_{L}=1 \mathrm{pF}$ and $I_{\text {gate }}=30 \mu \mathrm{~A}$.

It is worth noting that the optimization of buffer transistors' size has led to a delay reduction by a factor of four, as compared to the case with minimum devices treated before in this subsection.

In general, (7.54) provides values in the order of several tens, which are not always acceptable due to the area increase. However, the minimum of $\tau_{P D, S C L b u f}$ with respect to $w$ is very flat, as shown in Fig. 7.6, therefore $w$ can be reduced without a significant delay increase. Typically, reducing $w$ by a
factor of two leads to a delay increase of $10 \%$, while with a factor of four the delay increase is about $30 \%$. To accurately estimate the delay increase for a given reduction of $w$ with respect to (7.54), it is preferable to resort to (7.49).

### 7.6 OPTIMIZED DESIGN OF THE SOURCE-COUPLED MUX/XOR AND D LATCH

As anticipated in Section 7.2, delay of SCL gates can be expressed in the form reported in relationship (7.2). In particular, in the following this will also be demonstrated for MUX, XOR and D latch gates.

TABLE 7.6

| $a_{g d, 1}^{M}$ | $4 A_{V}^{2} C_{g d, n} \frac{L_{n, \text { min }}}{\mu_{e f f, n} C_{O X}}$ |
| :---: | :---: |
| $a_{d b, 1}^{M}{ }^{\dagger}$ | $4 A_{V}^{2}\left(K_{j, n} C_{j, n} L_{d, n}+2 K_{j s w, n} C_{j s w, n}\right) \frac{L_{n, \text { min }}}{\mu_{e f f, n} C_{O X}}$ |
| $c_{d b, 1}^{M}{ }^{\dagger}$ | $2 K_{j s w, n} C_{j s w, n} L_{d, n}$ |
| $a_{g s, 3}^{M}$ | $\overline{\frac{8}{3} A_{r}^{2} \frac{L_{n, \text { min }}^{2}}{\mu_{e f f, n}}}$ |
| $b_{g d, p}^{M}$ | $\frac{3}{8} A_{\text {bulk } \text { max }} \mu_{e f f, p} C_{O X}^{2} W_{p, \text { min }}^{2}\left(V_{D D}-\left\|V_{T, p}\right\|\right)$ |
| $c_{g d, p}^{M}$ | $C_{g d 0, p} W_{p, \min }-\frac{3}{4} A_{b u l k, \max } \mu_{e f f, p} C_{O X}^{2} W_{p, \text { min }}\left(V_{D D}-\left\|V_{T, p}\right\|\right) R_{\text {dS } W} 10^{-6}$ |
| $c_{d b, p}^{M}$ | $K_{j, p} C_{j, p} L_{d, p} W_{p, \text { min }}+2 K_{j s w, p} C_{j s w, p}\left(L_{d, p}+W_{p, \text { min }}\right)$ |

(zero-valued coefficients are not reported)
${ }^{\dagger}$ Expressions of $a_{d b, 3}^{M}, b_{d b, 3}^{M}$ and $c_{d b, 3}^{M}$ differ from $a_{d b, 1}^{M}, b_{d b, 1}^{M}$ and $c_{d b, 1}^{M}$ only for the value of coefficients $K$, due to different bias conditions

### 7.6.1 MUX/XOR delay expression versus bias current and logic swing with the lower transistors switching

As discussed in Chapter 2, static parameters $V_{S W I N G}, A_{V}$ and $N M$ of a generic SCL gate are the same as those found in an inverter with equal bias current and transistors' aspect ratios. Thus, criteria to size the latter ones to meet assigned values of such parameters in Section 7.3 are still valid. As a result, the design equations for transistors' aspect ratios reported in Table 7.1 are still valid, according to the three biasing regions $\mathrm{L}, \mathrm{M}$ and H (obviously, their boundaries (7.16) and (7.21) are equal to those of the inverter). Accordingly, capacitances in the delay model in (6.11) of the MUX/XOR SCL gate can easily be shown to be expressed in the same form as relationship (7.25). To be more specific, non-zero coefficients $a_{x y}^{M}, b_{x y}^{M}$ and $c_{x y}^{M}$ of capacitances in (7.25) in region M of the MUX/XOR gates with input applied to the lower transistors are reported in Table 7.6.

As already observed for the SCL inverter, Table 7.6 shows that coefficients $a_{x y}^{M}, b_{x y}^{M}$ and $c_{x y}^{M}$ of the MUX/XOR gate only depend on $A_{V}$, the process used and supply voltage, hence in the design they are constant. Relationship (7.25) with coefficients reported in Table 7.6 can be profitably used to express the sum of capacitances in (6.11), that becomes

$$
\begin{align*}
& 3 C_{g d, 1}+2 C_{d b, 3}+C_{g d, p}+C_{d b, p}+\frac{2}{A_{V}}\left(2 C_{g d, 1}+3 C_{d b, 1}+C_{g s, 3}\right)  \tag{7.55}\\
& =\frac{a^{M}}{V_{S W I N G}^{2}} I_{S S}+b^{M} \frac{V_{S W I N G}}{I_{S S}}+c^{M}
\end{align*}
$$

where coefficients $a^{M}, b^{M}$ and $c^{M}$ are defined as

$$
\begin{equation*}
a^{M}=3 a_{g d, 1}^{M}+2 a_{d b, 3}^{M}+a_{g d, p}^{M}+a_{d b, p}^{M}+\frac{2}{A_{V}}\left(2 a_{g d, 1}^{M}+3 a_{d b, 1}^{M}+a_{g s, 3}^{M}\right) \tag{7.56a}
\end{equation*}
$$

$$
=3 a_{g d, 1}^{M}+2 a_{d b, 3}^{M}+\frac{2}{A_{V}}\left(2 a_{g d, 1}^{M}+3 a_{d b, 1}^{M}+a_{g s, 3}^{M}\right)
$$

$$
\begin{align*}
b^{M} & =3 b_{g d, 1}^{M}+2 b_{d b, 3}^{M}+b_{g d, p}^{M}+b_{d b, p}^{M}+\frac{2}{A_{V}}\left(2 b_{g d, 1}^{M}+3 b_{d b, 1}^{M}+b_{g s, 3}^{M}\right) \\
& =b_{g d, p}^{M}  \tag{7.56b}\\
c^{M} & =3 c_{g d, 1}^{M}+2 c_{d b, 3}^{M}+c_{g d, p}^{M}+c_{d b, p}^{M}+\frac{2}{A_{V}}\left(2 c_{g d, 1}^{M}+3 c_{d b, 1}^{M}+c_{g s, 3}^{M}\right)  \tag{7.56c}\\
& =2 c_{d b, 3}^{M}+c_{g d, p}^{M}+c_{d b, p}^{M}+\frac{6}{A_{V}} c_{d b, 1}^{M}
\end{align*}
$$

and where coefficients equal to zero were omitted. By substituting relationships (7.55)-(7.56) and (2.43) into (6.11), the delay of a MUX/XOR biased in region M can be expressed as in relationship (7.30) found for the inverter gate. For the $0.35-\mu \mathrm{m}$ CMOS process considered and assuming $V_{D D}=3.3 \mathrm{~V}$ and $A_{V}=4$, evaluation of relationships (7.56) leads to $a^{M}=8.96 \mathrm{E}-$ $10, b^{M}=7.47 \mathrm{E}-20, c^{M}=2.93 \mathrm{E}-15$, as reported in Table 7.9.

Again, when the gate is biased in region L, capacitances can be expressed as in (7.25), thus delay can be written in the same form (7.2), in which coefficients $a, b$ and $c$ are given by relationship (7.57) and data in Table 7.7.

$$
\begin{equation*}
a^{L}=3 a_{g d, 1}^{L}+2 a_{d b, 3}^{L}+a_{g d, p}^{L}+a_{d b, p}^{L}+\frac{2}{A_{V}}\left(2 a_{g d, 1}^{L}+3 a_{d b, 1}^{L}+a_{g s, 3}^{L}\right) \tag{7.57a}
\end{equation*}
$$

$$
\begin{align*}
& =0 \\
b^{L} & =3 b_{g d, 1}^{L}+2 b_{d b, 3}^{L}+b_{g d, p}^{L}+b_{d b, p}^{L}+\frac{2}{A_{V}}\left(2 b_{g d, 1}^{L}+3 b_{d b, 1}^{L}+b_{g s, 3}^{L}\right)  \tag{7.57b}\\
& =b_{g d, p}^{L} \\
c^{L} & =3 c_{g d, 1}^{L}+2 c_{d b, 3}^{L}+c_{g d, p}^{L}+c_{d b, p}^{L}+\frac{2}{A_{V}}\left(2 c_{g d, 1}^{L}+3 c_{d b, 1}^{L}+c_{g s, 3}^{L}\right) \tag{7.57c}
\end{align*}
$$

TABLE 7.7

| $c_{g d, 1}^{L}$ | $C_{g d 0, n} W_{n, \min }$ |
| :--- | :---: |
| $c_{d b, 1}^{L}{ }^{\dagger}$ | $K_{j, n} C_{j, n} L_{d, n} W_{n, \min }+2 K_{j s w, n} C_{j s w, n}\left(L_{d, n}+W_{n, \min }\right)$ |
| $c_{g s, 3}^{L}$ | $\frac{2}{3} W_{n, \min } L_{n, \min } C_{o x}$ |
| $b_{g d, p}^{L}$ | $b_{g d, p}^{M}$ |
| $c_{g d, p}^{L}$ | $c_{g d, p}^{M}$ |
| $c_{d b, p}^{L}$ | $c_{d b, p}^{M}$ |

(zero-valued coefficients are not reported)
${ }^{\dagger}$ Expression of $c_{d b, 3}^{L}$ differ from $c_{d b, 1}^{L}$ only for the value of coefficients $K$, due to different bias conditions.

Analogously, when the gate is biased in region $H$, delay can be written in the form (7.2), where coefficients $a, b$ and $c$ are given by relationship (7.58) and data in Table 7.8.

$$
\begin{align*}
a^{H} & =3 a_{g d, 1}^{H}+2 a_{d b, 3}^{H}+a_{g d, p}^{H}+a_{d b, p}^{H}+\frac{2}{A_{V}}\left(2 a_{g d, 1}^{H}+3 a_{d b, 1}^{H}+a_{g s, 3}^{H}\right)  \tag{7.58a}\\
b^{H} & =3 b_{g d, 1}^{H}+2 b_{d b, 3}^{H}+b_{g d, p}^{H}+b_{d b, p}^{H}+\frac{2}{A_{V}}\left(2 b_{g d, 1}^{H}+3 b_{d b, 1}^{H}+b_{g s, 3}^{H}\right)  \tag{7.58b}\\
& =0 \\
c^{H} & =3 c_{g d, 1}^{H}+2 c_{d b, 1}^{H}+c_{g d, p}^{H}+c_{d b, p}^{H}+\frac{2}{A_{V}}\left(2 c_{g d, 1}^{H}+3 c_{d b, 1}^{H}+c_{g s, 3}^{H}\right) \\
& =2 c_{d b, 1}^{H}+c_{d b, p}^{H}+\frac{6}{A_{V}} c_{d b, 1}^{H} \tag{7.58c}
\end{align*}
$$

Numerical values of coefficients in the three biasing regions for the $0.35-\mu \mathrm{m}$ CMOS process considered and assuming $V_{D D}=3.3 \mathrm{~V}$ and $A_{V}=4$ are reported
in Table 7.9. By comparing data in Table 7.5 and 7.9 , it is apparent that coefficient $a^{M}$ of the MUX/XOR gate is about four times that of the inverter gate, while $b^{M}$ is equal.

TABLE 7.8

| $a_{g d, 1}^{H}$ | $a_{g d, 1}^{M}$ |
| :---: | :---: |
| $a_{d b, 1}^{H}$ | $a_{d b, 1}^{M}$ |
| $c_{d b, 1}^{H}$ | $c_{d b, 1}^{M}$ |
| $a_{g s, 3}^{H}$ | $a_{g s, 3}^{M}$ |
| $a_{g d, p}^{H}$ | $\begin{aligned} & 2\left(C_{g d 0, p}+\frac{3}{4} A_{b u l k, \max } C_{O X} L_{p, \text { min }}\right) \\ & \mu_{e f f, p} C_{O X}\left(V_{D D}-\left\|V_{T, p}\right\| \left\lvert\,\left[1-\frac{\mu_{p, \min , p} C_{O X}}{L_{p, \text { min }}} R_{D S W} 10^{-6}\left(V_{D D}-\left\|V_{T, p}\right\|\right)\right]\right.\right. \end{aligned} V_{S W I N G}$ |
| $a_{d b, p}^{H}$ | $\begin{aligned} & 2\left(K_{j, p} C_{j, p} L_{d, p}+2 K_{j s w, p} C_{j s w, p}\right) \\ & \cdot \frac{L_{p, \text { min }}}{\mu_{e f f, p} C_{O X}\left(V_{D D}-\left\|V_{T, p}\right\|\right)\left[1-\frac{\mu_{e f f, p} C_{O X}}{L_{p, \text { min }}} R_{D S W} 10^{-6}\left(V_{D D}-\left\|V_{T, p}\right\|\right)\right]} \cdot V_{S W I N G} \end{aligned}$ |
| $c_{d b, p}^{H}$ | $2 K_{j s w, p} C_{j s w, p} L_{d, p}$ |

(zero-valued coefficients are not reported)

TABLE 7.9

| L | M | H |
| :--- | :--- | :--- |
| $a^{L}=0$ | $a^{M}=8.96 \mathrm{E}-10$ | $a^{H}=9.4 \mathrm{E}-10$ |
| $b^{L}=7.47 \mathrm{E}-20$ | $b^{M}=7.47 \mathrm{E}-20$ | $b^{H}=0$ |
| $c^{L}=5.61 \mathrm{E}-15$ | $c^{M}=2.93 \mathrm{E}-15$ | $c^{H}=2.35 \mathrm{E}-15$ |

For the same reasons discussed for the inverter, the delay expression derived in region M can be extended to the other regions. As an example, for the $0.35-\mu \mathrm{m}$ CMOS process used and for $A_{\nu}=4, V_{S W I N G}=700 \mathrm{mV}$, the error of delay model derived in region M with respect to the expressions rigorously derived in each region is plotted versus $I_{S S}$ (reported in logarithmic scale) in Fig. 7.7 assuming the worst case $C_{L}=0$. Even for impractically high values of $I_{S S}$, the error is always lower than $8 \%$ and rapidly decreases to a few percentage points for more realistic load capacitance values.


Fig. 7.7. Error between the delay derived in region M with respect to rigorously derived expressions in the region L and H versus $I_{S S}$ for the worst case $C_{L}=0 \mathrm{~F}$.

Now, let us give a circuit interpretation of terms in the delay model of an SCL MUX/XOR gate, as done for the inverter in Section 7.4.3. Inspection of Table 7.6 shows that the term $a=a^{M}$ in (7.2) is due only to NMOS transistors, $b=b^{M}$ only to PMOS transistors and $c=c^{M}$ to both devices. Since the last one can be neglected with respect to the other two addends ${ }^{4}$, by using eq. (7.55) the terms $a^{M} I_{\text {SS }} / V_{\text {SWING }}^{2}$ and $b^{M} V_{\text {SWING }} / I_{S S}$ model the NMOS and PMOS

[^15]transistors' capacitances, respectively. To be more specific, each of these capacitive terms in (7.2) is multiplied by $0.35 V_{\text {SWING }} / I_{S S}=0.69 R_{D}$, which is the equivalent resistance seen by the output node to ground, thus they make the same contribution to the delay as if all of its correspondent capacitances were lumped at the output node. As a consequence, terms $a^{M} I_{\text {SS }} / V_{\text {SWING }}^{2}$ and $b^{M} V_{\text {SWING }} / I_{S S}$ represent the equivalent capacitance at the output node associated with NMOS and PMOS transistors, respectively. The same interpretation in terms of dominant capacitances in power-efficient, highspeed or low-power design criterion as the inverter in Section 7.4.3 still hold (for example, the power-efficient leads to equal contributions to delay of NMOS and PMOS capacitances).

### 7.6.2 MUX/XOR delay expression versus bias current and logic swing for input applied to upper transistors

Let us evaluate delay when the input signal is applied to transistors at the upper level (M3-M6 in Figs. 6.11 and 6.12), that is equal to that of an inverter with modified capacitive contributions in (6.12), as observed in Section 6.4.3. Moreover, as discussed in the previous section, delay can be approximated to that in region M regardless of the biasing region. Thus, delay is again written in the form (7.2) with coefficients $a, b$ and $c$ equal to

$$
\begin{align*}
& a^{M}=2 a_{g d, n}^{M}+2 a_{d b, n}^{M}+a_{g d, p}^{M}+a_{d b, p}^{M}=2 a_{g d, n}^{M}+2 a_{d b, n}^{M}  \tag{7.59a}\\
& b^{M}=2 b_{g d, n}^{M}+2 b_{d b, n}^{M}+b_{g d, p}^{M}+b_{d b, p}^{M}=b_{g d, p}^{M}  \tag{7.59b}\\
& c^{M}=2 c_{g d, n}^{M}+2 c_{d b, n}^{M}+c_{g d, p}^{M}+c_{d b, p}^{M}=2 c_{d b, n}^{M}+c_{g d, p}^{M}+c_{d b, p}^{M} \tag{7.59c}
\end{align*}
$$

where (7.25) was used to express capacitances. By comparing (7.59) and coefficients (7.29) of the inverter, it is apparent that coefficient $a^{M}$ of the MUX/XOR gate is twice as that of the inverter, while $b^{M}$ has the same value. This means that the optimum bias current for a power-efficient or an highspeed design is lower than the inverter gate. The error of (7.2) using (7.59) compared to simulations was found to be equal to that of the inverter previously discussed.

### 7.6.3 Delay dependence on logic swing

Until now, the delay dependence on bias current has been evaluated for an assigned value of logic swing. For the sake of completeness, let us consider the delay dependence on the logic swing for a given bias current $I_{S S}$. From general SCL delay expression (7.2), it is apparent that an optimum value of logic swing can be found by differentiating for $V_{\text {SWING }}$ and setting the result to zero. The resulting optimum value of $V_{S W I N G}$ that minimizes delay for a bias current $I_{S S}$ is the solution of the following equation

$$
\begin{equation*}
\frac{2 b^{M}}{I_{S S}^{2}} V_{S W I N G}^{3}+\frac{c^{M}+C_{L}}{I_{S S}} V_{S W I N G}^{2}-a^{M}=0 \tag{7.60}
\end{equation*}
$$

which in general cannot be solved in a closed-form, and thus requires numerical analysis. For example, for $I_{S S}=50 \mu \mathrm{~A}$ and $C_{L}=200 \mathrm{fF}$, solving (7.60) for the MUX/XOR (with coefficients in Table 7.9) assuming an input applied to lower transistors results in 221 mV . Simulations confirm this result, as can be deduced from Fig. 7.8 that reports the plot of the delay versus the logic swing, where the minimum delay is achieved for $V_{\text {SWING }}$ equal to about 250 mV .


Fig. 7.8. MUX/XOR delay versus logic swing for $I_{S S}=50 \mu \mathrm{~A}$ and $C_{L}=200 \mathrm{fF}$.

To validate the dependence of the delay in (7.2) on the logic swing, simulations were also performed for $C_{L}$ equal to $0 \mathrm{~F}, 50 \mathrm{fF}, 200 \mathrm{fF}$ and 1 pF ,
with bias current equal to $5 \mu \mathrm{~A}, 20 \mu \mathrm{~A}, 50 \mu \mathrm{~A}$ and $100 \mu \mathrm{~A}$, and the resulting curves are not shown for the sake of brevity. Maximum error of (7.2) using coefficients in (7.56) is $17 \%$, and typically much lower, thus confirming the validity of the model discussed.

Even though in general relationship (7.60) cannot be solved in a closed form, in practical design cases optimum value of $V_{\text {SWING }}$ can be easily found. Indeed, it can be seen that optimum $V_{\text {SWING }}$ tends to zero for $I_{S S} \rightarrow 0$, thus according to low-power design considerations in Section 7.2.3, the logic swing is given by relationship (7.13). Moreover, optimum $V_{\text {SWING }}$ tends to infinity for $I_{S S} \rightarrow \infty$, thus according to high-speed design considerations in Section 7.2.2, the logic swing is given by relationship (7.14). For intermediate bias current values, the optimum value of $V_{\text {SWING }}$ belongs to the acceptable range [ $V_{S W I N G, \text { min }}, V_{S W I N G, \text { max }}$ ].

### 7.6.4 Extension to D latch

The D latch delay can be found as for the MUX/XOR gate firstly by assuming the gate to be biased in the region M , and then extending it to the other intervals in a similar manner. As discussed in Section 6.5, the delay of a D latch gate is equal to that of MUX/XOR with an equivalent load capacitance (6.14) that accounts for the input capacitance of a sourcecoupled NMOS pair (6.13). By substituting relationship (7.25) and data in Table (7.6), input capacitance (6.13) can be expressed ${ }^{5}$ as the gate-source capacitance in an NMOS transistor working in saturation region, e.g. $C_{g s, 3}$

$$
\begin{equation*}
C_{\text {input }}=C_{g s, 3}=\frac{a_{g s, 3}^{M}}{V_{S W I N G}^{2}} I_{S S} \tag{7.61}
\end{equation*}
$$

where from Table 7.6 it was observed that $b_{g s, 3}^{M}=c_{g s, 3}^{M}=0$. By substituting (7.61) into (6.14), and both into the delay expression (7.2), it is apparent that the D latch delay has the same expression as the MUX/XOR with coefficient $a^{M}$ being replaced by $a^{M}$

$$
\begin{equation*}
a^{M^{\prime}}=a^{M}+a_{g s, 3}^{M} \tag{7.62}
\end{equation*}
$$

[^16]that results to $1.06 \mathrm{E}-9$ under the conditions discussed above, in the case with switching input applied to lower transistors. Analogously, coefficient (7.62) results to $4.52 \mathrm{E}-10$ when input is applied to upper transistors.

Simulation results validate the extension. Indeed, the error was found to be in the same order of magnitude of the inverter and MUX/XOR. In conclusion, unless for the small change discussed, all the design strategies discussed for the MUX/XOR gates still hold.

### 7.7 OPTIMIZED DESIGN OF THE SOURCE-COUPLED MUX/XOR AND D LATCH WITH OUTPUT BUFFERS

Regarding the case with output buffers, design techniques introduced in Section 7.5 for the inverter can be extended to other SCL gates in a straightforward way. Indeed, delay of SCL gates without output buffers is always expressed by (7.42) regardless of the specific gate considered, being the sum of the internal SCL circuit and buffer delay contributions. The only difference between such gates is in the expression of coefficients $a^{M}, b^{M}$ and $c^{M}$ in the internal SCL delay. As a consequence, the same design criteria for the two practical design cases dealt with in Section 7.5 apply. Obviously, different gates have in general a different value of optimum bias current distribution $\gamma$, as well as a different optimum buffer transistors' aspect ratio factor $w$.

When buffers are used to implement a level shifter in a MUX/XOR or D latch gate, the optimum value of $\gamma$ in relationship (7.47) is the same as the inverter, since coefficient $b^{M}$ in (7.47) results in the same value for the three gates. In other words, the amount of the bias current used in the internal SCL gate (and buffers) for an assigned gate current in a MUX/XOR or D latch is the same as the inverter. Moreover, the resulting minimum delay in (7.48) for an assigned $I_{\text {gate }}$ is equal to that of the MUX/XOR, D latch and inverter gates. This is because the internal SCL gate has the same bias current (7.47) and is usually biased in its low-power region (see Fig. 7.1), thus its delay is mainly due to PMOS capacitances that are independent of the gate for assigned values of $A_{V}, V_{S W I N G}$ and $I_{S S}$. From (7.47) it is apparent that the buffer delay does not depend on the gate considered, since its bias current (equal to $\gamma I_{\text {gate }}$ ) is the same for all gates for a given $I_{\text {gate }}$.

The same observations hold when buffers are used to improve speed performance. Indeed, optimum current distribution (7.52) depends only on coefficient $b^{M}$, thus $\gamma$ does not depend on the gate considered. In addition, optimum transistor aspect ratio (7.54) of the buffer and resulting minimum delay is the same, regardless of the specific gate analyzed.

### 7.8 COMPARISON OF GATES ANALYZED AND EXTENSION TO ARBITRARY SCL LOGIC GATES

In the previous section, it has been shown that the delay of various SCL gates can be written in the form (7.2), which simply shows the design tradeoffs among speed, power consumption and noise immunity. In particular, this has been demonstrated for the inverter, MUX, XOR and D latch by observing that parasitic capacitances associated with transistors can always be expressed as in relationship (7.25), as well as evaluating the equivalent resistance seen by each capacitance. By reiterating the considerations reported in Section 6.4.1, in general equivalent resistances are equal to $R_{D}$ (i.e. when a capacitance is connected to the output nodes) or $1 / G_{M}$ (when a capacitance is not connected to the output nodes). As a consequence, for an arbitrary SCL gate, time constants have the same dependence on logic swing and bias current as the gates considered. Since the delay is proportional to the sum of time constants, it follows that the delay dependence in an SCL gate on $V_{S W I N G}$ and $I_{S S}$ is independent of the specific SCL gate considered, and therefore (7.2) is always valid. The same observations hold for gates with output buffers, since their delay simply adds to that of the internal SCL gate, as shown in relationship (7.42).

Now, it is useful to compare performance achievable by the gates considered, in order to understand how performance depends on their complexity. In particular, assuming inputs to be applied at lower transistors (i.e., we are considering the worst-case delay), coefficients of gates considered in relationship (7.2) are summarized in Table 7.10. Since coefficient $a^{M}$ depends on NMOS transistors, it obviously increases when a more complex logic gate (i.e., including a greater number of NMOS transistors) is considered, as confirmed by Table 7.10. Therefore, the minimum delay achievable for a high-speed design, modeled with relationship (7.9), increases for more complex gates, while the bias current needed decreases, according to (7.10). Regarding coefficient $b^{M}$, it only depends on PMOS transistors, therefore it is independent of the specific gate considered for assigned values of $V_{S W I N G}$ and $I_{S S}$. As a result, the speed performance (7.11) in a low-power design is equal for all SCL gates, for given values of $V_{S W I N G}$ and $I_{S S}$. Finally, in a power-efficient design the required bias current $I_{S S, \text { opt PDP }}$ in (7.5) decreases when a more complex gate is considered, while power-delay product gets worse due to the increase in coefficient $a^{M}$ (this means that the delay increase is greater than the power saving).

TABLE 7.10

| Inverter | MUX/XOR <br> (upper transistors) | D latch |
| :--- | :--- | :--- |
| $a^{M}=2.26 \mathrm{E}-10$ | $a^{M}=8.96 \mathrm{E}-10$ | $a^{M}=1.06 \mathrm{E}-9$ |
| $b^{M}=7.47 \mathrm{E}-20$ | $b^{M}=7.47 \mathrm{E}-20$ | $b^{M}=7.47 \mathrm{E}-20$ |
| $c^{M}=1.82 \mathrm{E}-15$ | $c^{M}=2.93 \mathrm{E}-15$ | $c^{M}=2.93 \mathrm{E}-15$ |

Finally, it is useful to note that delay model (7.2) suggests an alternative approach to characterize SCL logic cells in terms of design trade-offs without analytically evaluate coefficients $a, b$ and $c$. In other words, the approach introduced for the accurate model of bipolar gates can also be applied to SCL gates. Indeed, coefficients $a, b$ and $c$ only depend on process parameters, supply voltage and voltage gain $A_{V}$. Therefore, if such parameters are preliminarily set before designing logic gates, it is possible to evaluate coefficients by performing only three simulation runs for different bias current values widely distributed to cover the low-power, powerefficient and high-speed regions in Fig. 7.1 for a given logic swing. To be more specific, the procedure based on the minimization of functional $S$ in Section 4.4 can be used. Once coefficients $a, b$ and $c$ are found, delay dependence on bias current and logic swing is expressed by (7.2), and thus all design considerations discussed so far can be applied with no modification. This approach can be very useful when a library of standard cells implemented with a new process must be characterized in terms of the delay dependence on bias current and logic swing.

## Chapter 8

## APPLICATIONS AND REMARKS ON CURRENTMODE DIGITAL CIRCUITS

In the previous chapters, design and modeling strategies of Current-Mode gates have been developed for single logic gates. Actual circuits are made up of cascaded logic gates, therefore their overall delay is equal to the sum of the delay contributions of gates belonging to the path considered, each of which is evaluated by representing the following gate as an equivalent linear load capacitor. In this chapter some circuits currently used in typical applications, such as ring oscillators and frequency dividers, are explicitly analyzed in Sections 8.1 and 8.2, respectively, by applying results presented in previous chapters. To be more specific, their speed performance is evaluated as a function of process and design parameters. Besides, an alternative approach to implement CML gates with a low supply voltage is dealt with in Section 8.3.

Regarding the design of general logic circuits consisting of cascaded gates, criteria developed for single logic gates are generalized in Section 8.4 to relate design variables to the overall performance.

### 8.1 RING OSCILLATORS

Ring oscillators have recently attracted the interest of circuit designers [WKG94], [M971], [SL98], [HR99], [HLL99]. Indeed, they are widely used in PLLs that are key elements of RF circuits and microprocessors to provide synchronization, data recovery and perform frequency synthesis [R96][R961].

A ring oscillator is a closed-loop chain of equal inverter gates (or inverting amplifiers), as shown in Fig. 8.1, with a negative feedback to
provide oscillation. To guarantee oscillation, the negative feedback is necessary but not sufficient. To allow the oscillation to start, the unique stationary state (in which all gates are biased around their logic threshold) has to be unstable, and this happens when the circuit linearized around the bias point exhibits a negative phase margin, $m_{\phi}$.


Fig. 1. Ring oscillator.

As done in the previous chapter, by assuming each (inverting) stage to be modeled by a single-pole transfer function with time constant $\tau$, (i.e., a transfer function $H(s)=-A_{\nu} /(1+s \tau)$ ) the loop gain $L(s)$ (evaluated as the opposite of the transfer function obtained by breaking the loop at the input of any stage) results to

$$
\begin{equation*}
L(s)=(-1)^{n+1}\left[\frac{A_{V}}{1+s \tau}\right]^{n}=\left[\frac{A_{V}}{1+s \tau}\right]^{n} \tag{8.1}
\end{equation*}
$$

where term $(-1)^{n+1}$ must necessarily be positive to ensure a negative feedback, hence an odd number of inverting stages must be used. However, since Current-Mode gates have a differential output, an even number of stages can also be used, provided that the output nodes of one stage are inverted. Utilizing an even number of stages is particularly useful when both a reference periodic signal and its quadrature component must be generated.

By definition, the phase margin expressed in degrees is 180 plus the phase of the loop gain at the transition frequency (i.e., the frequency where the magnitude of $L(j \omega)$ is equal to unity). Hence, from relationship (8.1) we get

$$
\begin{equation*}
m_{\phi}=180+\angle L\left(j \omega_{t}\right)=180-n \cdot \angle L\left(1+j \omega_{t} \tau\right) \tag{8.2}
\end{equation*}
$$

where the phase is denoted by symbol $\angle$, and the transition frequency $\omega_{t}$ is equal to

$$
\begin{equation*}
\omega_{t}=\frac{\sqrt{A_{V}^{2}-1}}{\tau} \approx \frac{A_{V}}{\tau} \tag{8.3}
\end{equation*}
$$

The simplification holds for $A_{V}$ greater than unity, such that $A_{V}^{2} \gg 1$. Therefore, by substituting relationship (8.1) and (8.3) into (8.2), the phase margin becomes

$$
\begin{equation*}
m_{\phi} \cong \approx 180-n \cdot \angle \arctan \left(A_{\mathrm{v}}\right) \tag{8.4}
\end{equation*}
$$

From relationship (8.4), to guarantee a negative phase margin, which causes oscillation, the number of stages $n$ must be sufficiently high, according to the following condition

$$
\begin{equation*}
n>\frac{180}{\arctan \left(A_{V}\right)} \tag{8.5}
\end{equation*}
$$

that only depends on the voltage gain $A_{V}$ of the inverter gate, and is always greater than $2\left(\right.$ since $\left.\arctan \left(A_{V}\right)<90^{\circ}\right)$, thus in practical cases it is necessary to use at least three stages. It is worth noting that relationship (8.5) slightly overestimates the minimum number of stages because higher poles and positive zeros tend to reduce the phase margin. The right-hand side of relationship (8.5) is plotted versus $A_{V}$ in Fig. 8.2, whose inspection shows that three stages ensure oscillation to occur for a voltage gain greater than 1.7. Extensive simulations show that oscillation does not generally take place for $n=2$, except for a few bias current value for which the oscillation amplitude is very low (in the order of 10 mV ), due to the contribution of higher-order poles and zeroes.

The frequency oscillation of the ring oscillator is given by [R96]

$$
\begin{equation*}
f=\frac{1}{2 n \tau_{P D}} \tag{8.6}
\end{equation*}
$$

where $\tau_{P D}$ is the propagation delay of the fundamental gate. Therefore, the frequency estimation is reduced to the evaluation of the gate delay, that has to be carried out by remembering that each gate has an input rise time equal to that at the output and is loaded by an equal stage. In the following, the delay model developed in the previous chapters is used to evaluate the oscillations frequency model in ring oscillators based on bipolar (Section
8.1.1) and CMOS (Section 8.1.4) Current-Mode gates. From a design point of view, this model allows to predict performance achievable before the design, to correctly design the ring oscillator and to predict the oscillation frequency inaccuracy due to tolerances in model parameters.


Fig. 8.2. Minimum value of $n$ given by relationship (8.5) versus $A_{V}$.

### 8.1.1 Bipolar CML ring oscillators

Let us consider a ring oscillator based on the CML inverter in Fig. 4.2, whose small-signal gain $A_{V}$ is given by relationship (2.10), and typically ranges from 4 to 10 . Choosing the typical value $V_{S W I N G}=500 \mathrm{mV}$ for highspeed applications, the value of $A_{V}$ results to be 5 from (2.10), and thus, from Fig. 8.2, three stages are sufficient to ensure oscillation (relationship (8.5) gives $n>2.3$ ).

Now, let us evaluate the oscillation frequency (8.6), or equivalently the gate delay under the constraint that the input rise time is equal to that of the output waveform. Since the CML inverter gate has a first-order behavior with a time constant given by relationship (4.7) (actually, from (4.4) it must be divided by 0.69 ), the delay dependence on the input rise time of a onepole system is analyzed in the following.

Consider a one-pole system with a linear ramp input $v_{i n}(t)$ having a rise time $T$ and a maximum value equal to unity, as represented by the following relationship

$$
v_{i n}(t)=\left\{\begin{array}{lc}
\frac{t}{T} & t \leq T  \tag{8.7}\\
1 & t>T
\end{array}\right.
$$

The output response, $y(t)$, of the system is given by [AP99]

$$
y\left(t_{n}\right)= \begin{cases}t_{n}-\tau_{n}\left(1-e^{-\frac{t_{n}}{\tau_{n}}}\right) & t_{n} \leq 1  \tag{8.8}\\ 1-\tau_{n}\left(e^{\frac{1}{\tau_{n}}}-1\right) e^{\frac{t_{n}}{\tau_{n}}} & t_{n}>1\end{cases}
$$

where $t_{n}$ and $\tau_{n}$ are the time and the time constant normalized to $T$, respectively (i.e., $t_{n}=t / T$ and $\tau_{n}=\tau / T$ ).

The normalized propagation delay $\tau_{P D_{n}}=\tau_{P D} / T$ is the difference between $\tau_{P D n, \text { out }}$ and $\tau_{P D n, i n}$, which are the time normalized to $T$ when the output and the input reach half of their final value, respectively. The output propagation delay $\tau_{P D n, \text { out }}$ can be evaluated by setting relationship (8.8) equal to 0.5 . The input propagation delay $\tau_{P D n, i n}$ is equal to 0.5 . It is worth noting that, although $\tau_{P D n, \text { out }}$ depends only on $\tau_{n}$, it is difficult to solve $y\left(\tau_{P D n, o u t}\right)=0.5$ for $\tau_{P D n, \text { out }}$ because (8.8) is a nonlinear function. Moreover, since we do not know whether $\tau_{P D n, \text { out }}$ is greater or lower than unity, we have to solve both the two parts of relationship (8.8), and then discard the one without practically meaningful solution.

In order to evaluate the relation between the delay and the time constant, the ratio $\tau_{P D n} / \tau_{n}$ versus $\tau_{n}$ is evaluated numerically and plotted in Fig. 8.3 (with dashed line). As expected, if $\tau_{n} \rightarrow \infty$ (i.e., we have an ideal step input) the ratio $\tau_{P D_{n} /} \tau_{n}$ tends to the well-know value 0.69 . In the opposite case, when $\tau_{n} \rightarrow 0$, which is the case of a very slow input ramp, the ratio $\tau_{P D_{n}} / \tau_{n}$ tends to 1 . The latter result is reached by solving the equation in the case $t_{n} \leq 1$, and approximating $y_{n}\left(t_{n}\right) \cong t_{n}-\tau_{n}$ in (8.8), since the exponential term is very low. From inspection of Fig. 8.3, if $T>\tau$, approximating the propagation
delay of a ramp input with the value obtained assuming a step input can lead to an error greater than $30 \%$ (the error is lower than $5 \%$ for $T<\tau$ ).

To achieve a simple and useful closed-form expression of the ratio $\tau_{P D n} / \tau_{n}$, one must rewrite this ratio as $0.69\left[1+g\left(\tau_{n}\right)\right]$. Hence, we have to approximate $g\left(\tau_{n}\right)$ with an equivalent function which tends to zero and to 0.45 for $\tau_{n} \rightarrow+\infty$ and $\tau_{n} \rightarrow 0$, respectively. A suitable function $g\left(\tau_{n}\right)$ which satisfies the asymptotic behavior is

$$
\begin{equation*}
g\left(\tau_{n}\right)=0.45 \frac{1+A \tau_{n}}{1+B \tau_{n}^{2}} \tag{8.9}
\end{equation*}
$$

where parameters $A$ and $B$, found by a numerical fitting of the curve, are set equal to 0.66 and 7.8 , respectively. Thus, function $f\left(\tau_{n}\right)$ which approximates the ratio $\tau_{P D n} / \tau_{n}$ is given by

$$
\begin{equation*}
f\left(\tau_{n}\right)=0.69\left[1+0.45 \frac{1+0.66 \tau_{n}}{1+7.8 \tau_{n}^{2}}\right] \tag{8.10}
\end{equation*}
$$

Relationship (8.10) is plotted with a solid line in Fig. 8.3. The proposed approximation fits ratio $\tau_{P D n} / \tau_{n}$ well. Indeed, the error found, plotted in Fig. 8.4 , is always lower than $2 \%$.


Fig. 8.3. Ratio of $\tau_{P D n} / \tau_{n}$ versus $\tau_{n}$ : exact (dashed line); expression (8.10) (solid line).

normalized time constant tn
Fig. 8.4. Percentage error of expression (8.10) versus $\tau_{n}$.

Denormalizing (8.10), the general expression for the propagation delay of a first-order system versus the input rise time $T$ and the time constant $\tau$ is equal to

$$
\begin{equation*}
\tau_{P D}=\mathscr{f}\left(\frac{\tau}{T}\right) \cong 0.69 \tau\left[1+0.45 \frac{1+0.66 \frac{\tau}{T}}{1+7.8\left(\frac{\tau}{T}\right)^{2}}\right] \tag{8.11}
\end{equation*}
$$

To estimate the oscillation frequency of a ring oscillator, we can further simplify relationship (8.11) by introducing the condition $T=2 \tau_{P D}$ that ensures equal output and input rise time values. Thus solving (8.11) for $\tau / \tau_{P D}$ we get

$$
\begin{equation*}
\tau_{P D}=0.8 \tau \tag{8.12}
\end{equation*}
$$

where time constant is given by relationship (4.7) (more precisely, it is equal to relationship (4.7) divided by 0.69 ). Of course, relationship (8.12) must be evaluated by properly modeling the loading effect of the subsequent gate with its input capacitance (4.23), i.e. by setting $C_{L}$ equal to it in (4.7). The resulting expression of relationship (8.12) is

$$
\begin{align*}
\tau_{P D}= & 0.8\left[\frac{r_{e}+r_{b}}{1+g_{m} r_{e}} C_{b e}+r_{b} C_{b c i}\left(1+\frac{g_{m}\left(r_{c}+R_{C}\right)}{1+g_{m} r_{e}}\right)+\right. \\
& \left.+\left(r_{c}+R_{C}\right)\left(C_{b c i}+C_{b c x}+C_{c s}\right)+R_{C} C_{b c x}\left(1+\frac{V_{\text {SWING }}}{4 V_{T}}\right)\right] \tag{8.13}
\end{align*}
$$

that, being a CML gate, can be also written in the form as in (5.1) for design purposes.

To further increase the estimation accuracy of the oscillation frequency, we have to evaluate the loading effect in each stage due to the subsequent gate more precisely. Indeed, the assumption that one has only the load capacitance given by (4.23) overestimates the loading effect of the subsequent CML input impedance, as already observed in Section 4.8. To this aim, let us consider the equivalent half circuit of a CML gate in Fig. 8.5 and, for simplicity, neglect capacitances $C_{c s}$ and $C_{L}$.


Fig. 8.5. Equivalent linear circuit of the CML inverter.

From Fig. 8.5, the impedance seen from the input of the gate (i.e., the base of the transistor) has a third-order transfer function that, after neglecting higher order terms, can be approximated to the first order impedance

$$
\begin{equation*}
Z_{i n} \cong \frac{R_{i n}}{1+s R_{i n} C_{i n}} \tag{8.14}
\end{equation*}
$$

where

$$
\begin{align*}
& R_{i n} \cong r_{\pi}\left(1+g_{m} r_{e}\right)  \tag{8.15}\\
& C_{i n} \cong \frac{C_{b e}+C_{b c i}+C_{b c x}\left(1+\frac{V_{S W I N G}}{4 V_{T}}\right)}{1+g_{m} r_{e}} \tag{8.16}
\end{align*}
$$

in which both $r_{e}$ and $r_{b}$ are assumed to be much lower than $r_{\pi}$, as usually occurs. It is apparent that (8.14) represents the impedance of resistance $R_{\text {in }}$ in parallel with capacitance $C_{i n}$. Therefore, we can include it in the time constant, and combining (8.13) and (8.14) we get

$$
\begin{align*}
\tau_{P D} & =0.8\left[\frac{r_{e}+r_{b}}{1+g_{m} r_{e}} C_{b e}+r_{b} C_{b c i}\left(1+\frac{g_{m}\left(r_{c}+R_{C} \| R_{i n}\right)}{1+g_{m} r_{e}}\right)+\right. \\
& \left.+\left(r_{c}+R_{C} \| R_{i n}\right)\left(C_{b c i}+C_{b c x}+C_{c s}\right)+R_{C} \| R_{i n} C_{i n}\right] \tag{8.17}
\end{align*}
$$

that, along with relationship (8.6), models the oscillation frequency of a ring oscillator based on bipolar Current-Mode gates.

### 8.1.2 Validation of the oscillation frequency in a CML ring oscillator

The model of the oscillation frequency (8.6) with (8.17) was tested by means of SPICE simulations using the two bipolar technologies introduced in Chapter 4. As done before, the circuits were powered with 5 V and a $500-$ mV logic swing was set. Moreover, the bias current was varied from $100 \mu \mathrm{~A}$ to 1.4 mA and from $100 \mu \mathrm{~A}$ to 2.4 mA for the BiCMOS and HSB2 process, respectively, to avoid a degradation of transistor transition frequency.

Figures 8.6 and 8.7 show the simulated oscillation frequency of a CML ring oscillator with different number of stages and the theoretical one (given by (8.6) and (8.17)) versus the bias current $I_{S S}$ for the BiCMOS and HSB2 technology, respectively. The oscillation frequency error for the BiCMOS and HSB2 process is summarized in Table 8.1 and 8.2, where the case with a simplified input impedance (4.23) and an accurate one (8.14) are considered. To evaluate the superiority of the model (8.14) compared to the simplified one (4.23), Table 8.3 summarizes the average errors for the analyzed cases.


Fig. 8.6. Oscillation frequency with 4, 6 and 8 stages (BiCMOS process).


Fig. 8.7. Oscillation frequency with 4,6 and 8 stages (HSB2 process).

TABLE 8.1

|  | BiCMOS technology |  |  | HSB2 technology |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $I_{S S}(\mathrm{~mA})$ | $n=4$ | $n=6$ | $n=8$ | $N=4$ | $n=6$ | $n=8$ |
| 0.1 | $29 \%$ | $22 \%$ | $20 \%$ | $63 \%$ | $46 \%$ | $42 \%$ |
| 0.2 | $12 \%$ | $9.3 \%$ | $8.7 \%$ | $50 \%$ | $54 \%$ | $55 \%$ |
| 0.4 | $-2.7 \%$ | $-3.6 \%$ | $-3.8 \%$ | $29 \%$ | $22.2 \%$ | $20 \%$ |
| 0.6 | $-9.4 \%$ | $-9.9 \%$ | $-9.9 \%$ | $16 \%$ | $11.1 \%$ | $10.5 \%$ |
| 0.8 | $-13.5 \%$ | $-13.8 \%$ | $-13.8 \%$ | $5.9 \%$ | $2.6 \%$ | $2.1 \%$ |
| 1 | $-16.8 \%$ | $-17 \%$ | $-17 \%$ | $-1.9 \%$ | $-3.9 \%$ | $-4.2 \%$ |
| 1.2 | $-19.9 \%$ | $-20 \%$ | $-20 \%$ | $-7.8 \%$ | $-9.3 \%$ | $-9.6 \%$ |
| 1.4 | $-23.1 \%$ | $-23.3 \%$ | $-23.3 \%$ | $-13.2 \%$ | $-14.2 \%$ | $-14.5 \%$ |
| 1.6 | $/$ | $/$ | $/$ | $-17.3 \%$ | $-18.3 \%$ | $-18.6 \%$ |
| 1.8 | $/$ | $/$ | $/$ | $-21 \%$ | $-21.8 \%$ | $-21.8 \%$ |
| 2 | $/$ | $/$ | $/$ | $-24.4 \%$ | $-25.1 \%$ | $-25.1 \%$ |
| 2.2 | $/$ | $/$ | $/$ | $-27.2 \%$ | $-27.6 \%$ | $-27.6 \%$ |
| 2.4 | $/$ | $/$ | $/$ | $-29.6 \%$ | $-30.3 \%$ | $-30.3 \%$ |
| 2 |  |  |  |  |  |  |

TABLE 8.2

|  | BiCMOS technology |  |  | HSB2 technology |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $I_{S S}(\mathrm{~mA})$ | $n=4$ | $n=6$ | $n=8$ | $n=4$ | $n=6$ | $n=8$ |
| 0.1 | $26 \%$ | $19 \%$ | $18 \%$ | $32 \%$ | $18.1 \%$ | $15.1 \%$ |
| 0.2 | $17 \%$ | $14 \%$ | $13 \%$ | $24 \%$ | $14.3 \%$ | $12.1 \%$ |


| 0.4 | $9 \%$ | $8 \%$ | $8 \%$ | $15 \%$ | $8.6 \%$ | $7 \%$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0.6 | $5 \%$ | $4.4 \%$ | $4.4 \%$ | $8.5 \%$ | $3.9 \%$ | $3.2 \%$ |
| 0.8 | $1.2 \%$ | $0.9 \%$ | $0.8 \%$ | $3.4 \%$ | $0.2 \%$ | $0.2 \%$ |
| 1 | $-2.5 \%$ | $-2.8 \%$ | $-2.8 \%$ | $-0.8 \%$ | $-2.9 \%$ | $-3.1 \%$ |
| 1.2 | $-6.5 \%$ | $-6.7 \%$ | $-6.7 \%$ | $-3.7 \%$ | $-5.4 \%$ | $-5.6 \%$ |
| 1.4 | $-10.7 \%$ | $-10.9 \%$ | $-10.9 \%$ | $-6.9 \%$ | $-8 \%$ | $-8.3 \%$ |
| 1.6 | $/$ | $/$ | $/$ | $-9.6 \%$ | $-10.7 \%$ | $-10.9 \%$ |
| 1.8 | $/$ | $/$ | $/$ | $-12.3 \%$ | $-13.1 \%$ | $-13 \%$ |
| 2 | $/$ | $/$ | $/$ | $-14.6 \%$ | $-15.4 \%$ | $-15.4 \%$ |
| 2.2 | $/$ | $/$ | $/$ | $-16.9 \%$ | $-17.4 \%$ | $-17.4 \%$ |
| 2.4 | $/$ | $/$ | $/$ | $-18.8 \%$ | $-19.5 \%$ | $-19.5 \%$ |

TABLE 8.3

|  | BiCMOS technology |  |  | HSB2 technology |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Load model | $n=4$ | $n=6$ | $n=8$ | $N=4$ | $n=6$ | $n=8$ |
| Simplified (4.23) | $15.8 \%$ | $14.8 \%$ | $14.5 \%$ | $23.5 \%$ | $22.0 \%$ | $21.6 \%$ |
| Accurate (8.14) | $9.7 \%$ | $8.3 \%$ | $8.1 \%$ | $12.8 \%$ | $10.6 \%$ | $10.1 \%$ |

Inspection of Tables 8.1-8-3 shows that, considering the simplified load (4.23), the error is always lower than $30 \%$ for the BiCMOS technology but greatly increases in some cases to values higher than $50 \%$ for the HSB2 technology. This high error is heavily reduced by resorting to the accurate model in (8.14). Indeed, even for the HSB2 technology the accurate model has an error slightly higher than $30 \%$ in only one case, and is much lower for values of bias current that lead to an efficient power-delay trade-off. The
improved accuracy achieved by the accurate model is also well confirmed in Table 8.3, where the accurate model's average error is shown to be reduced by around $6 \%$ for the BiCMOS technology with the average reduction being greater than $10 \%$ for the higher-speed technology. For both cases, the average error was reduced by more than $40 \%$.

### 8.1.3 Remarks on the oscillation amplitude in a CML ring oscillator

In general, the evaluation of the oscillation amplitude in a ring oscillator is not an easy task, due to the non-linear behavior of its stages. Even for CML ring oscillators, no closed-form expression can be given to estimate the amplitude of the output voltage, that is equal for all stages by symmetry.

In practical cases, it is often of interest to maximize amplitude to improve the signal-to-noise ratio of the oscillator. Therefore, it is sufficient to understand when amplitude is close to its maximum achievable value, that is equal to half the logic swing of all stages, i.e. $R_{C} I_{S S}$. Intuitively, this is achieved when a high number of stages $n$ is used, since in this case each gate has enough time to settle to the nominal high or low output voltages. This is because in a ring oscillator the input of each stage is constant for half a period $T / 2=n \tau_{P D}$. When a smaller value of $n$ is considered, the input of each gate is forced to change before the gate settles to the nominal output voltage, thus reducing amplitude to values lower than $R_{C} I_{S j}$.

To simplify analysis and evaluate the minimum number of stages that ensures amplitude to be close to $R_{C} I_{S S}$, it is reasonable to assume that this occurs when the output voltage of each gate is able to switch from $10 \%$ to $90 \%$ of the entire logic swing in a half period $T / 2$. This means that the latter has to be greater than the output rise time, and we will analytically express this condition in the following. From (8.12) the half period is equal to $T / 2=n \tau_{P D}=0.8^{*} n \tau$, while the output rise time $T_{\text {RISE }}$ can be approximated as that under a step input, which for a CML stage represented by a first-order system with time constant $\tau$ is equal to $2.2 \tau$ [MG87]. As a result, the condition that ensures an almost full-swing oscillation becomes

$$
\begin{equation*}
\frac{T}{2}=0.8 \cdot n \cdot \tau>T_{\text {RISE }} \approx 2.2 \cdot \tau \Rightarrow n>3 \tag{8.18}
\end{equation*}
$$

which means that a number of stages greater than 3 must be used to maximize amplitude. This result is confirmed by simulations, that for a $500-$ mV logic swing (i.e., $R_{C} I_{S \mathrm{~s}}=250 \mathrm{mV}$ ) show an amplitude in the order of 100 mV for $n=3$, and around 245 mV for ring oscillators with a number of stages equal to 4 or greater.

### 8.1.4 CMOS SCL ring oscillators

The oscillation frequency model of ring oscillators developed in Section 8.1.1 can be analogously applied to SCL ring oscillators. In particular, the delay $\tau_{P D}$ of each stage, included in the frequency expression (8.6), is that of an SCL inverter (6.6) with a load capacitance equal to the input capacitance (6.13) of the subsequent stage (i.e. a purely capacitive input impedance, thus no change on its evaluation is needed). By remembering that delay of SCL gates is quite insensitive to the input rise time, it results in

$$
\begin{equation*}
\left.\tau_{P D}=0.8 R \mid\left(C_{g d, 2}+C_{d b 1,2}\right)+\left(C_{g d 3,4}+C_{d b, 4}\right)+C_{i n}\right] \tag{8.19}
\end{equation*}
$$

that substituted into relationship (8.6) expresses the oscillation frequency of an SCL ring oscillator. Simulations confirm the validity of model (8.6) using (8.19). Indeed, under conditions used in the previous chapter, the maximum error found with respect to simulation results is lower than $25 \%$, and in typical cases where an advantageous power-delay trade-off is achieved (i.e., delay is not very close to its asymptotic value, as explained in Chapter 7) it is much lower (in the order of $10 \%$ ).

Obviously, relationship (8.19) could be expressed in the form (7.2), that is more suitable for design. Moreover, considerations in Section 8.1.3 regarding the minimum number of stages that ensures a full-swing oscillation hold for SCL ring oscillators.

### 8.2 CML FREQUENCY DIVIDERS

The static frequency divider is a fundamental block in various applications, such as mobile or satellite communication systems and multiple- $\mathrm{Gb} / \mathrm{s}$ optic fiber systems. For such applications, high performance is essential, and low power consumption is highly desirable to extend battery lifetime in portable equipment and to make heat removal easier [K91], [I951], [F96]. As a consequence, design criteria to keep power dissipation as low as possible for an assigned speed requirement is of the utmost importance.

From an architecture point of view, a $1 / 2^{n}$ static frequency divider consists of $n$ cascaded divide-by-two stages, as depicted in Fig. 8.8, where the operating frequency at the input of each stage is indicated as a fraction of the input signal frequency $f_{I N}$.


Fig. 8.8. Architecture of a $1 / 2^{n}$ frequency divider.

The topology of each $1 / 2$ frequency divider consists of two cross-coupled CML D latches and a level shifter circuit, as shown in Fig. 8.9. The two cross-coupled D latches, realized in bipolar Current Mode logic as in Fig. 4.18, implement a Master-Slave T Flip-Flop since the output is fed back to the input after inversion. The level shifter is realized with a commoncollector stage as usual, and is inserted to avoid that transistors Q1-Q2 in D latches in Fig. 4.18 work in the saturation region, as discussed in Chapter 2.


Fig. 8.9. Topology of a $1 / 2$ frequency divider.

In the simple case of a single $1 / 2$ frequency divider, the operation frequency is limited by the delay of the level shifter and D latch. More specifically, from Fig. 8.9, the input signal $I N$ must propagate through the level shifter and the latch A within the positive half period, in order to set a correct input value in the latch B at the beginning of the successive half
period. Since the same consideration holds in the negative half period for latch B, both latches must have the same speed, since otherwise it would be limited by the slower one.

From an analytical point of view, to let the output of latch A cross the logic threshold so that latch B has a correct input voltage in the following half period, the following condition on the input half period $T_{I N} / 2$ must be satisfied

$$
\begin{equation*}
\frac{T_{I N}}{2} \geq \tau_{P D, \text { level_shifer }}+\tau_{P D, \text { lacten }} \tag{8.20}
\end{equation*}
$$

where $\tau_{P D, \text { level_shifter }}$ is the propagation delay of the level shifter and $\tau_{P D, \text { latch }}$ is the CK-Q latch delay, respectively. Therefore, the maximum operating frequency $f_{\max }$ of a single $1 / 2$ frequency divider is equal to

$$
\begin{equation*}
f_{\max }=\frac{1}{2\left(\tau_{P D, \text { level_shifer }}+\tau_{P D, \text { lactch }}\right)} \tag{8.21}
\end{equation*}
$$

To achieve a high-speed feature, the two delay contributions must be reduced as much as possible by properly setting the bias current of the latch $I_{\text {latch }}$ and that of the level shifter $I_{\text {level_shifier }}$.

Since the operating frequency of each stage is halved compared to that of the previous one, the delay of the generic stage can be twice as high as that of the previous one without degrading the speed of the divider. As a consequence, according to the model and design strategies developed in Chapter 5, it is possible to progressively reduce bias currents of each stage with respect to the previous one, allowing for a power saving [ADP02].

In the following, design equations are derived for bias currents in a frequency divider to achieve a high operating frequency by maximizing the speed performance of the first stage (Section 8.2.1), while minimizing overall power consumption in the successive stages that are unnecessarily fast (Section 8.2.2).

### 8.2.1 Design of the first stage

The first stage has to work at the highest frequency $f_{\max }$, and accordingly its bias currents $I_{\text {latch }}$ and $I_{\text {level_shifer }}$ have to be properly sized. Delay of D latches $\tau_{P D, \text { latch }}$ pertains to the case where the switching input is applied to lower transistors Q1-Q2 in Fig. 4.18, as discussed in Section 5.5.3. Therefore, this delay contribution is modeled by relationship (5.1) with
coefficients given by (5.29a), (5.42) and (5.29c). As demonstrated in Section 5.1, the latch delay can be minimized by setting bias current $l_{\text {latch }}$ equal to

$$
\begin{equation*}
I_{\text {latch }, o p}=\sqrt{\frac{b}{a}} \tag{8.22}
\end{equation*}
$$

that from relationship (5.3) leads to the minimum delay achievable equal to $\tau_{P D, \text { latch }, o p}=2 \sqrt{a b}$. A better trade-off between delay and power can be achieved by resorting to design criteria in Section 5.1 based on Fig. 5.1. In particular, an almost maximum speed is achieved by assigning $I_{\text {latch }}$ equal to $0.6 \cdot I_{\text {latch,op }}$, that allows for a $40 \%$ power saving at the cost of only $10 \%$ increase of delay with respect to the optimum design.

As far as level shifters are concerned, their delay is given by relationship (5.17) with $I_{C C}=I_{\text {level_shifiter, }}$, thus

$$
\begin{equation*}
I_{\text {level_s_sijfer }}=\frac{C_{j e, \text { buf }}}{\left[\left(\frac{\tau_{P D, l \text { level_shifer }}}{\tau_{P D, \text { level_shifer, min }}}\right)^{2}\right] \tau_{F}} V_{T}=4.76 \frac{C_{\text {je,buf }}}{\tau_{F}} V_{T} \tag{8.23}
\end{equation*}
$$

where ratio $\tau_{P D, \text { level_shiferer }} / \tau_{P D, \text { level_shifter, min }}$ between the actual and the minimum delay achievable has been set to 1.1, as clarified in Section 5.3, thereby leading to a delay $10 \%$ greater than the minimum.

### 8.2.2 Design of successive stages

Once the first stage is designed, the bias current of subsequent ones can be downscaled to reduce the power consumption, since each stage has an input frequency which is half that of the previous one. Therefore, the delay of each stage can be set twice as high as the previous one without decreasing the maximum frequency. This choice allows the stage bias current to be set at the minimum value compatible with speed of the first stage. In particular, it is necessary to find the scaling law of $I_{\text {lateh }}(i)$ and $I_{\text {level_shifter }}(i)$ of $i$-th stage to double its delay $\tau_{P D}(i)$ with respect to the previous one.

Regarding D latches of the $i$-th stage, let us consider the normalized delay expression (5.5) versus bias current $I_{N}$ normalized to $I_{\text {latch,op }}$, that is equal for all stages since they have the same load. From relationship (5.5), delay of the $i$-th stage's D latch is twice as high as that of the preceding $(i-1)$-stage if

$$
\begin{equation*}
2\left[I_{N}(i-1)+\frac{1}{I_{N}(i-1)}\right]=I_{N}(i)+\frac{1}{I_{N}(i)} \tag{8.24}
\end{equation*}
$$

that, solved for $I_{N}(i) / I_{N}(i-1)$ (i.e., $\left.I_{\text {latch }}(i) / I_{\text {latch }}(i-1)\right)$, results in

$$
\begin{equation*}
\frac{I_{N}(i)}{I_{N}(i-1)}=1-\frac{1}{\left[I_{N}(i-1)\right]^{2}}\left(1+\sqrt{1+\left[I_{N}(i-1)\right]^{2}+\left[I_{N}(i-1)\right]^{4}}\right) \tag{8.25}
\end{equation*}
$$

Design equation (8.25) shows how to scale currents from the second to the last stage by iteration, and can be approximated to

$$
\begin{equation*}
\frac{I_{N}(i)}{I_{N}(i-1)} \approx 0.5-0.24\left[I_{N}(i-1)\right]^{1.6} \tag{8.26}
\end{equation*}
$$

where the approximation leads to an error lower than $3 \%$ in the range of interest (i.e., $I_{N}(i) \leq 1$, as discussed in Section 5.1). For the sake of clarity, relationship (8.26) is plotted in Fig. 8.10, from which the bias current scaling factor $I_{N}(i) / I_{N}(i-1)$ of $i$-th stage with respect to $(i-1)$-th one is derived from the normalized bias current $I_{N}(i-1)$ of the latter.

As far as the level shifter is considered, sizing of its bias current can easily be chosen by recursively doubling in (8.23) the ratio $\tau_{P D, \text { level_shifter }} / \tau_{P D, \text {,level_shifter,min }}$ of the $i$-th stage with respect to the $(i-1)$-th in the evaluation of $I_{\text {level_shifter }}(i)$.


Fig. 8.10. Plot of relationship (8.26).

### 8.2.3 Design considerations and examples

It is worth noting that bias currents $I_{\text {latch }}(n)$ and $I_{\text {level_shifter }}(n)$ of the last stage decrease as the number of stages $n$ is increased, and for a high $n$ they can become excessively low. Indeed, for an assigned logic swing $V_{\text {SWING }}=2 R_{C} I_{\text {latch }}$, a very low $I_{\text {latch }}$ (in the order of a few microamperes) leads to a high load resistance $R_{C}$ (of the order of one hundred $\mathrm{k} \Omega$ ) that tends to occupy an excessive silicon area. Moreover, a very low $I_{\text {level shifter }}$ could be lower than the current level below which the transistor current gain $\beta_{F}$ rapidly decreases (well below $1 \mu \mathrm{~A}$, for the process considered).

From these considerations, when bias currents of the $i$-th stage reach values in the order of a few microamperes, one can avoid further downscaling bias currents of the successive stages, and this design option does not significantly affect the power saving offered by the design strategy. Indeed, bias currents of last stages whose current is not downscaled are negligible with respect to the power dissipation of first stages.

To illustrate the design strategy proposed and compare its results with simulations, let us consider the design of a $1 / 8$ static frequency using the HSB2 bipolar process. By assuming $V_{D D}=5 \mathrm{~V}$ and $V_{S W I N G}=500 \mathrm{mV}$, coefficients $a, b$ and $c$ result to $2.2 \mathrm{E}-9,6.69 \mathrm{E}-14$ and $1.4 \mathrm{E}-12$, respectively. From (8.22), the optimum latch current $\Lambda_{\text {latch,op }}$ is equal to 5.5 mA . For the first stage we use the power efficient design criteria $I_{\text {latch }}(1)=0.6 \cdot I_{\text {latch }, \text { op }}=3.3$ mA . Moreover, the bias current of level shifters $I_{\text {level_shifter }(1)}$ results 0.85 mA , from relationship (8.23) with delay ratio $\tau_{P D, \text { level_shifier }} / \tau_{P D, \text { level_shiffer,min }}$ set to 1.1. For the second stage, from (8.26) we get $I_{N}(2) / I_{N}(1)=0.39$, that leads to $I_{\text {latch }}(2)=1.2 \mathrm{~mA}$, and doubling ratio $\tau_{P D, \text { level_shifter }} \tau_{P D, \text { level }}$ shifter,min with respect to the first stage (i.e. setting it to 2.2 in (8.26)), means that $I_{\text {level_shifter }}(2)$ is equal to $50 \mu \mathrm{~A}$. For the third stage we obtain $I_{N}(3) / I_{N}(2)=0.45$ and hence $I_{\text {latch }}(3)=0.5 \mathrm{~mA}$, while in the level shifter we set $\tau_{P D, \text { level_shifier }} / \tau_{P D, \text { level_shifter,min }}=4.4$, obtaining $I_{\text {level_shifter }}(3)=10 \mu \mathrm{~A}$.

By using the evaluated bias currents, the maximum frequency obtained by SPICE simulations is 7.69 GHz , while that predicted by (8.21) and delay model used is equal to 7.35 GHz , that differs from the former by $4.4 \%$. Input and output waveforms at maximum operating frequency are plotted in Fig. 8.11.


Fig. 8.11. Input and output waveforms at maximum operating frequency for the $1 / 8$ frequency divider designed.

### 8.3 LOW-VOLTAGE BIPOLAR CURRENT-MODE TOPOLOGIES

Until now, the traditional series-gate approach was followed to implement CML gates. However, alternative approaches can be exploited to build CML gates with a reduced supply voltage [ROS94], [KKI97], [SMT98], [SPM00]. These alternative topologies are based on the consideration that the power consumption of a CML gate is equal to the product of bias current $I_{S S}$ and supply voltage $V_{D D}$. Reduction of $I_{S S}$ inevitably compromises the speed performance and in typical applications is not a viable solution, while reduction of $V_{D D}$ does not significantly affect speed, provided that all transistors work out of the saturation region.

In the traditional series-gate approach, $V_{D D}$ must be sufficiently high to ensure that all transistors work out of the saturation region, according to number of stacked levels, as discussed in Chapter 2. In [ROS94], [KKI97], [SMT98], some techniques that allow for a $V_{D D}$ reduction with respect to traditional CML gates are proposed. Of these approaches, the one reported in [ROS94] appears to be the most promising since it operates under the lowest supply voltage. Therefore, in the following only that in [ROS94] will be considered and accordingly it will be referred to as the low-voltage topology.

### 8.3.1 Low-voltage CML by means of the triple-tail cell

The supply voltage $V_{D D}$ of traditional CML gates can be lowered by reducing the number of series-gating levels. For example, according to considerations reported in Chapter 2, the minimum value of $V_{D D}$ allowed by a two-level CML gate is about 2 V , while that of a one-level gate is 1.1 V . It is worth noting that, since the base-emitter voltage of a bipolar transistor is not affected by the technology scaling, these minimum values of $V_{D D}$ do not depend on the process used.

The technique suggested in [ROS94] allows to reduce the number of series-gating levels by introducing the triple-tail cells concept. As shown in Fig. 8.12, a generic emitter coupled pair (Q1-Q2) in a series gate is activated (i.e., its input affects the output) when it is biased by the current $I_{5 S}$ steered by the pair lying at the lower level (Q3-Q4) (i.e., when the base voltage of Q3 is high). This requires two levels of series gating.


Fig. 8.12. Pair Q1-Q2 activated by transistor Q3 in a series gate.

A different strategy can be used to activate Q1-Q2 with transistors Q3-Q4 lying at the same level. To be more specific, transistor pair Q1-Q2 can be deactivated by connecting a third transistor Q3 (having its emitter in common with that of Q1-Q2) with a high base voltage, thus steering bias current to ground and turning off transistors Q1-Q2. On the contrary, when Q3 has a low base voltage, it does not affect operation of Q1-Q2, thereby activating it. The three emitter-coupled transistors that implement the pair

Q1-Q2 and the (de)activating transistor Q3 is called the triple-tail cell [ROS94] and is depicted in Fig. 8.13.


Fig. 8.13. The triple-tail cell.
When possible, by applying transformation in Fig. 8.12 that substitutes stacked emitter coupled pairs in a series gate with a triple cell (Fig. 8.13) a CML gate with a reduced number of series-gating levels is obtained. For example, low-voltage topologies of the MUX, XOR and D latch gates in Figs. 8.14-8.16 are obtained from the traditional CML topologies in Figs. $4.13,4.14$ and 4.18 , respectively.


Fig. 8.14. Low-voltage MUX gate topology.


Fig. 8.15. Low-voltage XOR gate topology.


Fig. 8.16. Low-voltage D latch gate topology.

It is worth noting that, in a triple-tail cell, the deactivating transistor Q3 does not completely switch off transistors Q1-Q2. To better understand this point, assume without loss of generality that input of Q1-Q2 is such that Q1 is conducting and Q 2 is not (i.e., base voltage of Q 1 and Q 2 are equal to zero and $-R_{C} I_{S S}$, respectively), and define $N$ as the ratio between the emitter area of Q3 and Q1-Q2 (i.e., $A_{E 3}=N \cdot A_{E 1,2}$ ). When Q3 deactivates Q1-Q2 (i.e., its base is high), Q1 and Q3 have the same base-emitter voltage, therefore current of Q3 is greater than that of Q1 by the ratio $N$ of their emitter area. Thus, the current flowing in Q1 assuming $\alpha_{F}=1$ results to

$$
\begin{equation*}
i_{C 1}=\frac{I_{S S}}{2} \frac{1}{1+N} \tag{8.27}
\end{equation*}
$$

and can be minimized by increasing factor $N$, or equivalently the area of transistor Q3.

In summary, the low-voltage CML gates in Figs. 8.14-8.16 can be implemented with a single-level logic like the simple inverter, thereby lowering the minimum $V_{D D}$ by around a factor of 2 (i.e., from 2 V to 1.1 V ). The effect of this supply voltage reduction on performance and power-delay trade-off will be analyzed in the next subsections for the particular case of a D latch gate, s it can be easily extended to the other low-voltage gates.

### 8.3.2 Analysis of the low-voltage CML D latch static operation

The low-voltage D latch topology, shown in Fig. 8.16, is made up of the two emitter-coupled pairs Q1-Q2 and Q3-Q4, biased with two current sources $I_{S S} / 2$. The transistor pair $\mathrm{Q} 1-\mathrm{Q} 2$ is driven by the differential input $D$, while the cross-coupled pair Q3-Q4 implements the memory element.

Depending on the level of the differential clock signal $C K$, one of the two transistor pairs is alternatively deactivated by transistors Q5 and Q6, whose emitter area $A_{E 5,6}$ is assumed to be greater than area $A_{E 1,2,3,4}$ of the other transistors by a factor $N$ (i.e., $N=A_{E 5,6} / A_{E 1,2,3,4}$ ). When $C K$ is low, transistor Q6 is OFF and the cross-coupled pair Q3-Q4 holds the previous logic value thanks to the positive feedback, and the latch is in the hold state. When $C K$ is high, transistor Q5 is OFF and Q6 is ON, thus deactivating the cross-coupled pair Q3-Q4. Hence, the output is set equal to input $D$ by the transistor pair Q1-Q2, and the latch is in the transparent state.

In the following, the static behavior of the gate will be analyzed in terms of logic swing. As it will be demonstrated, the logic swing of the lowvoltage D latch is given by

$$
\begin{equation*}
V_{S W I N G}=R_{C} I_{S S} \frac{N}{1+N} \tag{8.28}
\end{equation*}
$$

that is lower than that of the traditional D latch by a factor greater than two for an assigned value of $R_{C} I_{S S}$.

To demonstrate relationship (8.28), consider a low-voltage D latch working in the hold state, i.e. with Q5 in the ON state and Q6 in the OFF state, where Q3-Q4 store the previous output. Ideally, in this condition the output should not be affected by input $D$, i.e., transistors $\mathrm{Q} 1-\mathrm{Q} 2$ should both be in the cut-off region. In reality, transistors Q1-Q2 are only partially deactivated by transistor Q5, and the conducting one sinks a current given by relationship (8.27), that influences the output voltage in a manner that depends on whether input $D$ is equal or opposite to the stored value. For a better understanding of this aspect, assume input $D$ to be low without loss of generality.

When the stored value is equal to $D$ (i.e., at the low level, under the assumption made above) Q3 is ON and Q4 is OFF, hence the collector currents of Q3 and Q4 are $i_{C 3}=I_{S S} / 2$ and $i_{C 4}=0$, respectively. Moreover, transistor Q2 is OFF and Q1 conducts the current (8.27), thus the differential output voltage $V_{o}-\bar{V}_{o}$ is equal to the low level given by

$$
\begin{align*}
\left.V_{O L, \text { hold }}\right|_{D=O U T} & =-R_{C}\left(i_{C 1}+i_{C 3}\right)+R_{C}\left(i_{C 1}+i_{C 3}\right)  \tag{8.29}\\
& =-\frac{R_{C} I_{S S}}{2}\left(1+\frac{N}{1+N}\right)
\end{align*}
$$

By reiterating the same procedures and assuming input $D$ to be high, it is demonstrated that the high output voltage $\left.V_{O H, \text { hold }}\right|_{D=O U T}$ when the stored value is equal to $D$ is given again by (8.29) with an opposite sign. Therefore, the logic swing for $D=O U T$ in the hold mode results in

$$
\begin{equation*}
\left.V_{S W I N G, \text { hold }}\right|_{D=O U T}=R_{C} I_{S S}\left(1+\frac{N}{1+N}\right) \tag{8.30}
\end{equation*}
$$

In cases where the stored value is opposite to input $D$ (i.e., $D=\overline{O U T}$ ), by repeating the same reasoning, the logic swing becomes

$$
\begin{equation*}
\left.V_{S W N G, \text { hold }}\right|_{D=\overline{O U T}}=R_{C} I_{S S} \frac{N}{1+N} \tag{8.31}
\end{equation*}
$$

Until now, analysis presented is based on the assumption that the latch operates in the hold mode. When the latch is in the transparent mode (i.e., when $C K$ is high, hence transistor Q 5 is OFF and Q 6 is ON , thus deactivating pair Q3-Q4 and activating pair Q1-Q2, which sets the output equal to $D$ ), the logic swing can be shown to be equal to (8.30). By comparing (8.30) and (8.31), the worst-case logic swing (and noise margin) is achieved in the hold mode when $D$ is opposite to the stored value $O U T$, and is equal to (8.31), which demonstrates (8.28).

From (8.31) the low-voltage topology has a lower logic swing, and hence a lower noise margin, than a traditional circuit for a given value of $R_{C} l_{S S}$. As a consequence, from (8.28) parameter $N$ has to be chosen as high as possible to avoid an excessive noise margin degradation [SMP00]. However, a high value of $N$ leads to a high emitter area of transistors Q5-Q6, thus increasing the input capacitance seen from input $C K$. As suggested in [ROS94], a good compromise between these opposing requirements is to set $N=2$. As the logic swing and parameter $N$ are assigned before the design, from (8.28) the product $R_{C} I_{S S}$ also becomes a constant in the design.

### 8.3.3 Delay of the low-voltage CML D latch

Now let us evaluate the CK-Q and D-Q delay of a low-voltage D latch by applying the modeling strategy introduced in Chapter 4. It is useful to observe that, in practical cases such as in frequency dividers (see Section 8.2) or dual-modulus prescalers, the speed performance is usually limited by the CK-Q delay. As a consequence, in the following we shall focus on the CK-Q delay to describe the D latch speed performance, while the $\mathrm{D}-\mathrm{Q}$ delay will only be briefly discussed.

To evaluate the CK-Q delay of a D latch, let us consider the output transition that occurs when the D latch goes from the hold mode to the transparent mode due to a low-to-high transition of the input $C K$ (obviously $D$ is assumed to be opposite to the previously stored value, otherwise no transition would take place). From a circuit point of view, the clock signal goes high and abruptly activates the transistor pair Q1-Q2, that can be thought of as a simple CML inverter. At the same time, transistors Q3-Q4 are deactivated, so that they affect the transient response only through their parasitic capacitances. As a result, the circuit can be schematized as the CML inverter Q1-Q2 loaded by an equivalent capacitance $C_{e q}$ at the output nodes, that is equal to the capacitive contributions associated with transistors Q3-Q4 in parallel to the load capacitance $C_{L}$

$$
\begin{equation*}
C_{e q}=\left\lfloor C_{b c c 3,4}\left(1+g_{m} R_{C}\right)+C_{c s 3,4}\right\rfloor+C_{L} \tag{8.32}
\end{equation*}
$$

where it was observed that Q3-Q4 load the output nodes through their collector-substrate capacitance and that seen from their base in (4.23). Therefore, the CK-Q delay is given by relationship (4.7) with a load capacitance $C_{e q}$

$$
\begin{align*}
\tau_{P D} & =0.69\left\{\left(r_{e}+r_{b}\right) C_{b e}+r_{b} C_{b c i}\left[1+g_{m}\left(r_{c}+R_{C}\right)\right]\right. \\
& +\left(r_{c}+R_{C}\right)\left(C_{b c i}+C_{b c x}+C_{c s}\right)  \tag{8.33}\\
& \left.+R_{C}\left[C_{b c x 3,4}\left(1+g_{m} R_{C}\right)+C_{c s 3,4}+C_{t}\right]\right\}
\end{align*}
$$

By assuming $R_{C} I_{S S}=500 \mathrm{mV}$ and a minimum supply voltage $V_{D D}=1.1 \mathrm{~V}$, the capacitances obtained assuming the HSB2 technology are summarized in Table 8.4.

TABLE 8.4

| $C_{j e}$ | 44.9 fF |
| :--- | :--- |
| $C_{c s}$ | 17.4 fF |
| $C_{b c x}$ | 22.1 fF |
| $C_{b c i}$ | 6.6 fF |

By expressing the dependence on $I_{S S}$ as usual, relationship (8.33) can be rewritten in the form (5.1) with coefficients given by

$$
\begin{align*}
& a=0.69\left(2 \frac{r_{e}+r_{b}}{V_{S W I N G}} \tau_{F}+\frac{r_{b} C_{b c i 1,2} r_{c}}{4 V_{T}}\right)  \tag{8.34a}\\
& b=0.69 V_{S W I N G}\left[C_{c s 1,2}+C_{b c 1,2}+C_{c s 3,4}+\left(1+\frac{V_{S W I N G}}{4 V_{T}}\right) C_{b c 3,4}+C_{L}\right]  \tag{8.34b}\\
& c=0.69\left[\left(r_{e}+r_{b}\right) C_{j e 1,2}+r_{b} C_{b c i 1,2}\left(1+\frac{V_{S W I N G}}{4 V_{T}}\right)+r_{c}\left(C_{b c 1,2}+C_{c s 1,2}\right)\right] \tag{8.34c}
\end{align*}
$$

By comparing results of delay model (5.1) using (8.34) to SPICE simulations under the conditions discussed above, error for the $D$ latch
implemented in the HSB2 process is always lower than $22 \%$. Figure 8.17 plots the error versus the bias current assuming $C_{L}=0 \mathrm{fF}, 100 \mathrm{fF}$ and 1 pF .


Fig. 8.17. Error of (5.1) using (8.34) vs. $I_{S S}$ for the HSB2 process.

For the sake of completeness, let us evaluate the D-Q delay, when the D latch is in the transparent state. When $C K$ is high, transistor Q5 is OFF and Q6 is in the linear region, thereby deactivating the emitter-coupled pair Q3Q4. Therefore, when $D$ switches, the circuit can be simplified into an inverter made up of transistors Q1-Q2, where the loading effect of Q3 and Q4 can be accounted for with a linear capacitance. The resulting delay expression is in the form (5.1), where coefficients are given by

$$
\begin{align*}
& a=0.69\left(2 \frac{r_{e}+r_{b}}{V_{\text {SWING }}} \tau_{F}+\frac{r_{b} C_{b c i 3,4 c} r_{c}}{4 V_{T}}+2 \frac{r_{c}}{V_{\text {SWING }}} \frac{\tau_{F}}{1+N}\right)  \tag{8.35a}\\
& b=1.38 V_{\text {SWING }}\left[C_{L}+C_{c s 3,4}+C_{c s 1,2}+C_{b c i, 2}+\left(1+\frac{V_{S W I N G}}{2 V_{T}}\right) C_{b c 34}+C_{j e}\right] \tag{8.35b}
\end{align*}
$$

$$
\begin{align*}
c & =0.69\left[\left(r_{e}+r_{b}\right) C_{j e}+r_{b} C_{b c i}\left(1+\frac{V_{S W I N G}}{2 V_{T}}\right)+\right. \\
& \left.+r_{c}\left(C_{b c i}+C_{b c x}+C_{c s}\right)+\frac{2}{1+N} \tau_{F}\right] \tag{8.35c}
\end{align*}
$$

By comparing the delay model (5.1) and (8.35) with SPICE simulations under the previous conditions, error is always within $20 \%$.
> 8.3.4 Comparison of the low-voltage and traditional CML $D$ latch designed for high speed

Now let us compare performance achieved by the low-voltage and traditional D latch designed for a high speed, i.e. with an optimum bias current (5.2) that minimizes delay. To carry out a fair comparison, the supply voltage is set to the minimum value of 1.1 V and 2 V for the two topologies, and similar logic swing values are used. Therefore, from (2.9) using (8.28) and approximating their ratio to two, factor $R_{C} I_{S S}$ of the low-voltage latch will be assumed to be twice that of the traditional circuit.

The CK-Q delay of the low-voltage and traditional D latch is given by substituting coefficients (8.34) (low voltage topology) or (5.29) and (5.42) (traditional topology) into relationship (5.1). Assuming a logic swing of 500 mV and $C_{L}=100 \mathrm{fF}$ (i.e., about a unity fan-out), the numerical value obtained with the HSB2 process for both topologies is reported in Table 8.5. In the following, to avoid confusing the coefficients of the two topologies, subscript LV and TR will be used respectively to indicate the low-voltage and traditional D latch.

TABLE 8.5

|  | Low-voltage |
| :--- | :--- |
| $a$ | Traditional |
| $b$ | $1.11 \mathrm{E}-9$ |
| $2.2 \mathrm{E}-9$ |  |
| $b$ | $1.14 \mathrm{E}-13$ |
| $c$ | $6.69 \mathrm{E}-14$ |
| $c$ | $4.8 \mathrm{E}-12$ |

To understand and compare the speed performance and power-delay trade-off of the two topologies we can compare coefficients $a$ and $b$. Comparing relationship (8.34) and (5.29a), ratio $a_{L V} / a_{T R}$ is approximately equal to $1 / 2$ (from Table 8.5 , it results 0.51 ). Indeed, $a$ is mainly determined
by the base-emitter diffusion capacitance, that in the low-voltage D latch is halved since each transistor pair is biased by a current $I_{s s} / 2$. As far as coefficient $b$ is concerned, ratio $b_{L v} / b_{T R}$ is equal to

$$
\begin{equation*}
\frac{b_{L V}}{b_{T R}}=2 \frac{C_{L}+C_{c s 3,4}+C_{c s 1,2}+C_{b c 1,2}+\left(1+\frac{V_{S W I N G}}{4 V_{T}}\right) C_{b c 3,4}}{C_{L}+C_{c s 3,4}+C_{c s 5,6}+4 C_{b c 3,4,5,6}+C_{j c 3,4}+\frac{V_{S W I N G}}{4 V_{T}} C_{b c x 5,6}} \tag{8.36}
\end{equation*}
$$

that turns out to be lower than 2 and approaches 2 for high values of $C_{L}$, since the denominator has essentially the same capacitive contributions as the numerator, but a slightly higher number of small addends. To be more specific, the denominator has the additional term $C_{j e 3,4,}$, as well as a greater number of base-collector capacitances $C_{b c}$. As an example, for the HSB2 technology, ratio (8.36) ranges from 1.6 to 1.9 for $C_{L}$ ranging from zero to 1 pF .

From eq. (5.3), the resulting ratio of minimum delay achievable $\tau_{\text {PDop,Lv/ }} \tau_{\text {PDop,TR }}$ in the low-voltage and traditional circuit is

$$
\begin{equation*}
\frac{\tau_{P D D p, L V}}{\tau_{P D o p, T R}}=\sqrt{\frac{a_{L V} b_{L V}}{a_{T R} b_{T R}}} \tag{8.37}
\end{equation*}
$$

that, since $a_{L V} / a_{T R} \approx 1 / 2$ and $b_{L r^{\prime}} / b_{T R}$ is slightly lower than 2 , is slightly lower than unity and tends to 1 for very high load capacitances (since $b_{L \nu} / b_{T R} \rightarrow 2$ ). This means that the two topologies essentially exhibit the same maximum speed achievable, especially for high load capacitances.

Regarding the power-delay trade-off in the high-speed design case, let us consider the power-delay product expression in (5.4) for the optimum bias current (5.2), that results to $2 V_{D D} b$. Therefore, setting the supply voltage to its minimum value for each latch topology, the power-delay product ratio of the low-voltage circuit and the traditional circuit results in

$$
\begin{equation*}
\frac{P D P_{L V}}{P D P_{T R}} \approx \frac{V_{D D, \text { min }, L V}}{V_{D D, \text { min }, T R}} \frac{b_{L V}}{b_{T R}}=0.55 \frac{b_{L V}}{b_{T R}} \tag{8.38}
\end{equation*}
$$

From the considerations on ratio $b_{L v^{\prime}} / b_{T R}$, the two topologies have comparable power-delay product. To be more specific, for low values of $C_{L}$, the powerdelay product ratio (8.38) tends to be lower than unity, which means that the low-voltage latch has a slightly better power efficiency. For high values of
$C_{L}$, relationship (8.38) tends to 1.1 , which means that for high load capacitances the traditional D latch tends to be more efficient.

In actual design cases, as was demonstrated in Section 5.1, a more efficient design choice is achieved by lowering the optimum bias current by a factor $I_{N}=I_{S S} / I_{S S o p}$ according to the power-delay curve in Fig. 5.1. In particular, to carry out a consistent comparison, let us consider an equal factor $I_{N}$ for the two latch topologies, that determines an equal delay increase compared to the optimum case. Therefore, delay and power-delay ratios are still equal to (8.37) and (8.38), respectively, and the above considerations still remain valid.

Considerations reported until now are valid for the CK-Q latch delay. By using the same approach, they can be extended to the D-Q delay, as briefly explained in the following. The D-Q delay of low-voltage and traditional D latch is given by (5.1) with coefficients (8.35) (low-voltage topology) or (5.32) and (5.46) (traditional topology), whose numerical values under previous conditions are summarized in Table 8.6.

TABLE 8.6

|  | Low-voltage | Traditional |
| :---: | :---: | :---: |
| $a$ | $1.61 \mathrm{E}-9$ | $2.2 \mathrm{E}-9$ |
| $b$ | $1.24 \mathrm{E}-13$ | $5.32 \mathrm{E}-14$ |
| $c$ | $8.78 \mathrm{E}-12$ | $4.53 \mathrm{E}-12$ |

When D-Q delay is considered, ratio $a_{L D} / a_{T R}$ turns out to be higher, compared to that evaluated for the CK-Q delay. Indeed, from comparison of relationships (5.32) with (5.29), coefficient $a_{T R}$ of the traditional gate is the same as the CK-Q delay, while from comparison of (8.34a) with (8.35a) coefficient $a_{L V}$ for the D-Q delay has one addend more than the CK-Q delay. For the HSB2 process and under the previous conditions, ratio $a_{L V} / a_{T R}$ results equal to 0.73 . Moreover, ratio $b_{L /} / b_{T R}$ is slightly greater than 2 , due to the additional base-emitter capacitance $C_{j e}$ in $b_{L V}$ expressed by ( 8.35 b ), and tends to 2 for high values of the load capacitance (ratio $b_{L V} / b_{T R}$ ranges from 2.09 to 2.47 when $C_{L}$ is varied from 0 F to 1 pF ).

From these considerations, unlike the results obtained for the CK-Q delay, the minimum D-Q delay ratio is always greater than unity (this is also true for bias currents scaled with respect to optimum values). This means that, for a high-performance design, the traditional D latch is worse than the low-voltage one, regardless of the load capacitance.
8.3.5 Comparison of the low-voltage and traditional CML $D$ latch designed for low power consumption

For gates that do not lie in the critical path of the circuit being designed, the speed performance is not a concern. They can therefore be designed with a much lower bias current than the optimum value $I_{S S o p}$. In this case, the CK$Q$ delay expression (5.1) can be approximated as

$$
\begin{equation*}
\tau_{P D} \approx \frac{b}{I_{B}} \tag{8.39}
\end{equation*}
$$

which shows that at low bias currents delay is inversely proportional to bias current.

To carry out a consistent comparison, let us consider the low-voltage and traditional D latch with the same bias current $I_{S S}$. As a consequence, the delay ratio between the former and the latter circuit becomes

$$
\begin{equation*}
\frac{\tau_{P D, L V}}{\tau_{P D, T R}} \approx \frac{b_{L V}}{b_{T R}} \tag{8.40}
\end{equation*}
$$

which, as discussed in the previous subsection, is slightly lower than two. This means that in low-power designs, the traditional latch outperforms the low-voltage by a factor slightly lower than two.

To evaluate the power efficiency, we again consider the power-delay product ratio for the two topologies, that from (5.4) results in

$$
\begin{equation*}
\frac{P D P_{L V}}{P D P_{T R}} \approx \frac{V_{D D, \text { min }, L V}}{V_{D D, \text { min }, T R}} \frac{\tau_{P D, L V}}{\tau_{P D, T R}} \approx 0.55 \frac{b_{L V}}{b_{T R}} \tag{8.41}
\end{equation*}
$$

which equals that obtained in the high-performance design, as can be found by comparison with relationship (8.38). Again, from relationship (8.41), the low-voltage circuit $P D P$ is $10 \%$ worse than the traditional one for very high load capacitances. In contrast, for low values of $C_{L}$ the power-delay product ratio (8.41) tends to be lower than unity, thus the low-voltage latch has a small advantage in terms of power efficiency.

To give an example, for the HSB2 process, a load capacitance of 100 fF and a bias current of $100 \mu \mathrm{~A}$, the delay ratio predicted by relationship (8.40) and the simulated value are 1.7 and 1.56 , respectively, while the power-delay product ratio (8.41) and the simulated ratio are 0.94 and 0.86 , respectively.

Even though results discussed above refer to the CK-Q delay of the D latch, similar observations can be made for the D-Q delay. Indeed, relationship (8.40) states that the delay ratio is greater than two under a bias current which is much lower than the optimum value, meaning that the traditional latch outperforms the low-voltage one by a factor greater than two in low-power design.

### 8.3.6 Summary of results and remarks

In the previous sections, the low-voltage CML D latch topology has been analyzed and compared with the traditional implementation. The former allows a supply voltage reduction by a factor of about 0.55 , which could be possibly exploited to achieve a power saving. However, the bias current must also be considered when making a significant comparison with the traditional topology in terms of the speed performance achievable and the power-delay trade-off.

By using the methodology developed in Chapter 5, the low-voltage D latch was designed and compared to the traditional topology in terms of the delay and power-delay trade-off, for a high-performance or low-power consumption design target. The results showed that the low-voltage D latch topology is advantageous in typical cases where a low CK-Q latch delay is required and a low fan-out is expected, since this circuit exhibits a moderate speed improvement (in the order of $20 \%$ ) with respect to the traditional implementation. However, the power increase must be paid for this speed improvement, since the low-voltage and traditional topologies have roughly the same power-delay product. This is because the latch delay is that of inverter Q1-Q2 (or Q3-Q4) properly loaded whose biased current is only half of the total gate current $I_{S S}$, while the other half bias current is steered to ground by the deactivating transistor.

In the low-power design case, the traditional topology has a significant speed advantage over the low-voltage one (roughly by a factor of two), while the same considerations on the power-delay product as in the highperformance case hold.

As a result, the low-voltage topology never possesses a strong advantage in terms of delay or of power efficiency. In practical cases, the only significant advantage of the low-voltage circuit is that no output buffer (and thus its additive bias current) is required. In addition, when comparing the two D latch topologies, it should also be considered that the low supply voltage allowed by the low-voltage circuit imposes serious limits on the logic gates that can be implemented, since traditional series gates cannot correctly operate. Therefore, the low-voltage approach is a viable solution in
low-fan-out high-speed applications requiring only gates that can be efficiently implemented with triple cells, such as MUX, XOR and D latches.

These results can be easily generalized to other low-voltage CML gates, such as the MUX/XOR gate.

### 8.4 OPTIMIZED DESIGN STRATEGIES FOR CASCADED BIPOLAR CURRENT-MODE GATES

Until now, criteria have been introduced to optimally design a single Current-Mode gate and consciously manage the power-delay trade-off. When cascaded gates are considered, an analogous trade-off exists between the overall delay of a path and the overall power consumption, both of which are equal to the sum of the contributions associated with each gate. As in the case of single gates, overall delay and power consumption strongly depend on the overall bias current $I_{\text {TOT }}$ of gates belonging to the path considered.

Depending on the application, parameter $I_{T O T}$ can be either a design variable or a constant assigned from considerations on power consumption at the system level. In the first case, where the maximum speed allowed by the technology is required, the overall delay $\tau_{P D}$ has to be minimized by properly setting all bias currents in cascaded gates to make each gate delay contribution minimum. Therefore, the problem of minimizing the overall delay is greatly simplified in the delay minimization of each single gate, whose bias current has to be optimized independently of the others ${ }^{1}$. This is achieved by resorting to the design strategies for single gates already discussed in Chapter 5. The resulting overall current $I_{\text {TOT }}$ results equal to the sum of optimum currents, that for CML gates are given by relationship (5.2) and for output buffers by (5.18).

When the overall bias current is preliminarily set to meet a power consumption value assigned at the system level design, a significant effort is typically spent in sizing each bias current in cascaded gates to minimize overall delay. From an analytical point of view, this translates in the problem of minimizing the delay expression with the constraint that the sum of bias currents is equal to the desired value $I_{I O T}$. This can be done in a computationally efficient way by resorting to the delay model (5.1) and (5.17) of CML gates and output buffers as a function of bias currents. Indeed, these models have simple expressions and constant coefficients, therefore they are suitable for numerical optimization. From relationships

[^17](5.1) and (5.17), the resulting delay expression to be minimized is in the form
\[

$$
\begin{equation*}
\tau_{P D}=\sum_{i=1}^{n}\left[a_{i} I_{S S, i}+\frac{b_{i}}{I_{S S, i}}+c_{i}+1.6 \sqrt{\left(\frac{V_{S W I N G}}{2 I_{S S, i}}+r_{b, i}\right)\left(\tau_{F}+\frac{V_{T}}{I_{C C, i}} C_{j e, i}\right) C_{L, i}}\right] \tag{8.42}
\end{equation*}
$$

\]

where $n$ cascaded ECL gates were considered and subscript $i$ refers to the $i$ th gate, whereas for CML gates the output buffer delay (and thus term under square root) is zero. Relationship (8.42) must be minimized under the constraint

$$
\begin{equation*}
\sum_{i=1}^{n}\left(I_{S S, i}+2 I_{C C, i}\right)=I_{\text {ТОТ }} \tag{8.43}
\end{equation*}
$$

This approach is much more advantageous than the traditional one based on iterative simulations with a trial-and-error procedure, since the former is much less computationally expensive and time-consuming.

In some cases, the minimization of overall delay under a current constraint can be carried out in a pencil-and-paper manner. In particular, this is possible when only CML gates belong to the path considered, i.e. it does not include any output buffer. In the following sections, main results on analytical delay minimization under a bias current constraint for CML paths are illustrated.
8.4.1 Design of CML non-critical paths with a constraint on the overall bias current

Let us consider a specific path to be optimized that consists of $n$ cascaded CML gates. As an example, this is the case of cascaded low-voltage circuits, since they do not include any output buffer, or a chain of traditional CML gates whose output drive the upper-level inputs of the following ones. From (5.1), the delay of each gate is

$$
\begin{equation*}
\tau_{P D, i}=a_{i} I_{S S, i}+\frac{b_{i}}{I_{S S, i}}+c_{i} \approx a_{i} I_{S S, i}+\frac{b_{i}}{I_{S S, i}} \quad \text { for } i=1 \ldots n \tag{8.44}
\end{equation*}
$$

where subscript $i$ refers to the $i$-th gate of the path. The overall delay of the path is given by

$$
\begin{equation*}
\tau_{P D}=\sum_{i=1}^{n}\left(a_{i} I_{S S, i}+\frac{b_{i}}{I_{S S, i}}\right) \tag{8.45}
\end{equation*}
$$

which is minimized by properly setting bias current values $I_{S S, i}$, that must satisfy the following condition

$$
\begin{equation*}
\sum_{i=1}^{n} I_{S S, i}=I_{\text {TOT }} \tag{8.46}
\end{equation*}
$$

When the limited power (i.e., bias current) budget represents a stronger constraint than speed, such as in a non-critical path where speed is not of concern, the bias current of each gate is made sufficiently lower than its optimum value to allow neglecting terms $b_{i} / I_{S S, i}$ with respect to terms $a_{i} I_{S S, I}$ in relationships (8.44) and (8.45).

Hence, the overall path delay can be simplified to

$$
\begin{equation*}
\tau_{P D}=\sum_{i=1}^{n} \frac{b_{i}}{I_{S S, i}} \tag{8.47}
\end{equation*}
$$

Relationship (8.47) with constraint (8.46) is minimized if all terms $b_{i} / I_{S S, i}$ are equal

$$
\begin{equation*}
\frac{b_{1}}{I_{S S, 1}^{2}}=\frac{b_{2}}{I_{S S, 2}^{2}}=\ldots=\frac{b_{n}}{I_{S S, n}^{2}} \tag{8.48}
\end{equation*}
$$

Indeed, by using (8.46), we rewrite (8.47) as

$$
\begin{equation*}
\tau_{P D}=\frac{b_{1}}{I_{T O T}-\sum_{i=2}^{n} I_{S S, i}}+\sum_{i=2}^{n} \frac{b_{i}}{I_{S S, i}} \tag{8.49}
\end{equation*}
$$

and, setting to zero the derivatives of (8.49) with respect to terms $I_{S S, i}$ for $i=2 \ldots n$,
$\frac{\partial \tau_{P D}}{\partial I_{S S, i}}=\frac{b_{1}}{\left(I_{T O T}-\sum_{i=2}^{n} I_{S S, i}\right)^{2}}-\frac{b_{i}}{I_{S S, i}^{2}}=\frac{b_{1}}{I_{S S, 1}^{2}}-\frac{b_{i}}{I_{S S, i}^{2}}=0 \quad$ for $i=2 \ldots n$
relationship (8.48) is demonstrated.

The specific value of the current in each stage, $I_{S S, i}$, can be evaluated after a few algebraic manipulations. In particular, let us consider the optimum value of terms $I_{S S, i}{ }^{2} / b_{i}$ and define as $X^{*}$ its square root

$$
\begin{equation*}
X^{*}=\frac{I_{S S, i}}{\sqrt{b_{i}}} \quad i=1 \ldots n \tag{8.51}
\end{equation*}
$$

Evaluating $I_{S S, i}$ from (8.51) and substituting it into the current constraint (8.46), $X^{*}$ results

$$
\begin{equation*}
X^{*}=\frac{I_{\text {TOT }}}{\sum_{i=1}^{n} \sqrt{b_{i}}} \tag{8.52}
\end{equation*}
$$

that, once substituted into (8.51) and solving for $I_{S S, i}$, leads to the following expression of the optimum bias currents that minimize delay under the power consumption constraint

$$
\begin{equation*}
I_{S S, i}=\frac{\sqrt{b_{i}}}{\sum_{j=1}^{n} \sqrt{b_{j}}} I_{T O T} \tag{8.53}
\end{equation*}
$$

From inspection of (8.53) we found that the overall bias current must be distributed in all gates proportionally to the weight of its term $\sqrt{b_{i}}$ with respect to the sum of all terms $\sum_{j=1}^{n} \sqrt{b_{j}}$. This result is confirmed by intuition, since, as observed in Section 5.7, coefficients $b_{i}$ model the equivalent capacitance at the output node. This means that, to minimize the delay, gates having a greater parasitic capacitance must be provided with a greater bias current.

The resulting delay after optimization is obtained by substituting relationship (8.53) into (8.47)

$$
\begin{equation*}
\tau_{P D}=\frac{\left(\sum_{j=1}^{n} \sqrt{b_{j}}\right)^{2}}{I_{T O T}}=\frac{B}{I_{T O T}} \tag{8.54}
\end{equation*}
$$

It is interesting to observe that the power-delay interdependence in cascaded gates designed for low power consumption is the same as in single CML gates designed according to the same criterion (see Section 5.1). Thus, the same considerations made in Section 5.1 for single gates are immediately extended to non-critical paths. In particular, the power-delay product results equal to $B \cdot V_{D D}$ (i.e., it has the same expression of single gates $b \cdot V_{D D}$ ) and does not depend on the overall bias current.
8.4.2 Design of CML critical paths with a constraint on the overall bias current

Let us consider again a path to be optimized that consists of $n$ cascaded CML gates having a defined overall bias current, but unlike the previous case treated in Section 8.4.1, we assume that the bias current to be set in each stage can be close to the optimum value as speed, instead of power dissipation, is the main concern.

Following the same procedure discussed in Section 8.4.1 to reach (8.48), the delay is minimized if the following condition is satisfied

$$
\begin{equation*}
\frac{b_{1}}{I_{S S, 1}^{2}}-a_{1}=\frac{b_{2}}{I_{S S, 2}^{2}}-a_{2}=\ldots=\frac{b_{n}}{I_{S S, n}^{2}}-a_{n} \tag{8.55}
\end{equation*}
$$

under constraint (8.46). From (8.55), the gates lying in the path must be biased with current $I_{S S, i}$ such that their associate factors $\left(b_{i} / I_{S S, i}{ }^{2}-a_{i}\right)$ are all equal. In the following, this common value is referred to as $X$

$$
\begin{equation*}
X=\frac{b_{i}}{I_{s s, i}^{2}}-a_{i} \quad \text { for } i=1 \ldots n \tag{8.56}
\end{equation*}
$$

hence, by inverting relationship (8.73), the expression of bias currents is

$$
\begin{equation*}
I_{S S, i}=\sqrt{\frac{b_{i}}{a_{i}+X}} \quad \text { for } i=1 \ldots n \tag{8.57}
\end{equation*}
$$

Therefore, factor $X$ is evaluated from the constraint (8.46), which can be written as

$$
\begin{equation*}
\sum_{i=1}^{n} \sqrt{\frac{b_{i}}{a_{i}+X}}=I_{\text {TOT }} \tag{8.58}
\end{equation*}
$$

and can be solved for $X$ by using standard numerical methods for onevariable equations. Once factor $X$ is evaluated, bias currents are easily calculated by using relationship (8.57).

Evaluation of factor $X$ can also be carried out analytically in a high-speed design, where each bias current is assumed to be a significant fraction of the optimum bias current (5.2). This can be shown by rewriting (8.58) as

$$
\begin{equation*}
\sum_{i=1}^{n}\left[I_{S S \text { Sop }, i} f\left(\frac{X}{a_{i}}\right)\right]=I_{\text {ТОТ }} \tag{8.59}
\end{equation*}
$$

where it was observed that $\sqrt{\frac{b_{i}}{a_{i}}}=I_{S S o p, i}$, and function $f(x)$ was defined as

$$
\begin{equation*}
f(x)=\frac{1}{\sqrt{1+x}} \tag{8.60}
\end{equation*}
$$

Inversion of (8.76) for $X$ becomes much easier if $f\left(X_{l}^{\prime} a_{i}\right)$ is not much lower than unity (in the order of 0.2 or greater) as in the considered case of high-speed design, where bias currents are comparable to optimum values (i.e., $f(x)$ is not much lower than unity). Under this assumption, function (8.77) can be approximated as

$$
\begin{equation*}
f(x) \approx 1.65-x^{0.12} \tag{8.61}
\end{equation*}
$$

that, compared to expression (8.60), has an error lower than $10 \%$ for $x$ ranging from 0.01 to 25 , or equivalently for $f(x) \in[0.2,0.99]$ (i.e., from (8.59)-(8.60) for a bias current ranging from $20 \%$ to $99 \%$ of the optimum value (5.2)), as shown in Fig. 8.18.

By substituting approximation (8.61) in (8.59) and performing some simple calculations, factor $X$ results to

$$
\begin{equation*}
X=\left[\frac{1.65 \sum_{i=1}^{n} I_{S S O p, i}-I_{\text {TOT }}}{\sum_{i=1}^{n} \frac{I_{S S O p, i}}{a_{i}^{0.12}}}\right]^{\frac{1}{0.12}}=\left[\frac{1.65 \sum_{i=1}^{n} \sqrt{\frac{b_{i}}{a_{i}}}-I_{\text {TOT }}}{\sum_{i=1}^{n} \frac{b_{i}^{0.5}}{a_{i}^{0.62}}}\right]^{8.3} \tag{8.62}
\end{equation*}
$$

Once $X$ is evaluated, the bias current of each gate is found by resorting to relationship (8.57).


Fig. 8.18. Error of (8.61) with respect to (8.60) versus $x$.

Until now, general design criteria (i.e. for arbitrary coefficients $a_{i}$ and $b_{i}$ ) to optimize bias currents under a power consumption constraint in CML critical paths have been discussed. However, such results can be further simplified in the frequent case where all transistors belonging to the critical path have an equal emitter area, as will be shown in the following section.
8.4.3 Design of CML critical paths with a constraint on the overall bias current and equal transistors' emitter area

Let us consider $n$ cascaded CML gates having their transistors lying in the critical path with equal emitter area. This implies that coefficients $a_{i}$ of all gates are equal

$$
\begin{equation*}
a_{i}=a \quad \text { for } i=1 \ldots n \tag{8.63}
\end{equation*}
$$

because coefficients $a_{i}$ can always be expressed in the form (5.32a) regardless of the gate considered, and only depend on the transistor emitter area, for a given logic swing. In practical cases, equal emitter area values (or,
equivalently, coefficients $a_{i}$ ) are often used, since, as observed in Section 5.7, an increase in the emitter area determines a proportional increase in the input capacitance, and is thus rarely beneficial.

Under assumption in (8.63), delay (8.45) can be simplified into

$$
\begin{equation*}
\tau_{P D}=a \sum_{i=1}^{n} I_{S S, i}+\sum_{i=1}^{n} \frac{b_{i}}{I_{S S, i}}=a I_{T O T}+\sum_{i=1}^{n} \frac{b_{i}}{I_{S S, i}} \tag{8.64}
\end{equation*}
$$

where the first term is a constant. As a consequence, delay (8.81) is minimum under the same condition found in Section 8.4.1, which occurs when (8.48) is satisfied.

Rewriting relationship (8.48) in a more expressive way by multiplying all ratios by coefficient $a$

$$
\begin{equation*}
\frac{I_{S S, 1}^{2}}{\frac{b_{1}}{a}}=\frac{I_{S S, 2}^{2}}{\frac{b_{2}}{a}}=\ldots=\frac{I_{s S, n}^{2}}{\frac{b_{n}}{a}} \tag{8.65}
\end{equation*}
$$

and substituting (5.2) of optimum current

$$
\begin{equation*}
\frac{I_{S S, 1}}{I_{S S o p, 1}}=\frac{I_{S S, 2}}{I_{S S o p, 2}}=\ldots=\frac{I_{S S, n}}{I_{S S o p, n}}=i_{N} \tag{8.66}
\end{equation*}
$$

an interesting interpretation can be given. Indeed, from (8.66), the delay is minimized when all bias currents $I_{S S, i}$ are set to an equal fraction $i_{N}$ of their optimum value $I_{S S o p, i}$

$$
\begin{equation*}
I_{S S, i}=i_{N} I_{S S o p, i} \tag{8.67}
\end{equation*}
$$

where factor $i_{N}$ is evaluated from the constraint on the overall bias current (8.46), and from (8.83) results to

$$
\begin{equation*}
i_{N}=\frac{I_{\text {TOT }}}{\sum_{i=1}^{n} I_{S S O p, i}} \tag{8.68}
\end{equation*}
$$

By substituting (8.67) into (8.64) and using relationship (5.3), the resulting overall delay is

$$
\begin{align*}
\tau_{P D} & =\sum_{i=1}^{n}\left(a i_{N} I_{S S o p, i}+\frac{b_{i}}{i_{N} I_{S S o p, i}}\right)= \\
& =\sum_{i=1}^{n}\left(i_{N} \frac{\tau_{P D, o p, i}}{2}+\frac{\frac{\tau_{P D, o p, i}}{2}}{i_{N}}\right)=  \tag{8.69}\\
& =\frac{1}{2}\left(i_{N}+\frac{1}{i_{N}}\right) \tau_{P D, o p}
\end{align*}
$$

where $\tau_{P D, o p}$ is the minimum achievable path delay, i.e. the sum of minimum achievable gate delay

$$
\begin{equation*}
\tau_{P D, o p}=\sum_{i=1}^{n} \tau_{P D, o p, i} \tag{8.70}
\end{equation*}
$$

After defining $T_{P D}$ as the path delay normalized to its minimum value in (8.70), relationship (8.69) results to

$$
\begin{equation*}
T_{P D}=\frac{1}{2}\left(I_{N}+\frac{1}{I_{N}}\right) \tag{8.71}
\end{equation*}
$$

and analytically describes the power-delay trade-off in the path. It is worth noting that (8.71) is formally equal to (5.5) found for single CML gates and depicted in Fig. 5.1. This means that a path made up of CML gates with equal transistor emitter areas (or, equivalently, equal coefficients $a_{i}$ ) has a power-delay interdependence equal to that of a single gate, and the desing techniques developed in Chapter 5 for single gates still apply to cascaded gates.

## REFERENCES

[ACK93]D. Allstot, S. Chee, S. Kiaei, M. Shristawa, "Folded sourcecoupled logic vs. CMOS static logic for low-noise mixed-signal ICs," IEEE Trans. on CAS - part I, vol. 40, no. 9, pp. 553-563, Sept. 1993.
[ADP02] M. Alioto, G. Di Cataldo, G. Palumbo, "Design of Low-Power High-Speed Bipolar Frequency Dividers", Electronics Letters, Vol. 38, No. 4, pp. 158-160, February 2002.
[ADR00]Auvergne, D., Daga, J.M., Rezzoug M. "Signal Transition Time Effect on CMOS Delay Evaluation,", IEEE Trans. on Circuits and Systems - part I, vol. 47, no. 9, pp. 1362-1369, Sept. 2000.
[AM88] P. Antognetti, G. Massobrio, Semiconductor Device Modeling with Spice, McGraw-Hill, 1988.
[AMP03]M. Alioto, R. Mita, G. Palumbo, "Performance Evaluation of the Low-Voltage CML D-Latch Topology", Integration - The VLSI Journal, Vol. 36, No. 4, pp. 191-209, November 2003.
[AP99] M. Alioto, G. Palumbo, "Highly Accurate and Simple Models for CML and ECL Gates", IEEE Trans. on CAD, Vol. 18, No. 9, pp. 1369-1375, September 1999.
[AP991] M. Alioto, G. Palumbo, "CML and ECL: Optimized Design and Comparison", IEEE Trans. on CAS part I, Vol. 46, No. 11, pp. 1330-1341, November 1999.
[AP00] M. Alioto, G. Palumbo, "Modeling and Optimized Design of Current Mode MUX/XOR and D Flip-Flop", IEEE Trans. on CAS part II, Vol. 47, No. 5, pp. 452-461, May 2000.
[AP01] M. Alioto, G. Palumbo, "Oscillation Frequency in CML and ESCL Ring Oscillators", IEEE Trans. on CAS part I, Vol. 48, No. 2, pp. 210-214, February 2001.
[AP03] M. Alioto, G. Palumbo, "Design Strategies for Source Coupled Logic Gates," IEEE Trans. on CAS - part I, Vol. 30, No. 4, pp.640-654, May 2003.
[AP031] M. Alioto, G. Palumbo, "Design of MUX, XOR and D-Latch SCL Gates", IEEE ISCAS 2003, Bangkok, pp. V261-V264, May 2003.
[APP02] M. Alioto - G. Palumbo - S. Pennisi, "Modeling of Source Coupled Logic Gates," International Journal of Circuit Theory and Applications, Vol. 30, No. 4, pp.459-477, July, 2002.
[ARL95] L. Andersson, B. Rudberg, P. Lewin, M. Reed, S. Planer, S. Sundaram "Silicon Bipolar Chipset for SONET/SDH $10 \mathrm{~Gb} / \mathrm{s}$ Fiber-Optic Communication Links," IEEE Journal of Solid-State Circuits, vol. 30, no. 3, pp. 210-217, March 1995.
[BNK98]L. Bisdounis, S. Nikolaidis, O. Koufopavlou, "Analytical transient response of propagation delay evaluation of the CMOS inverter for short channel devices," IEEE Jour. of Solid-State Circuits, vol. 33, no. 2, pp. 302-306, Feb. 1998.
[C92] C. Chuang, "Advanced Bipolar Circuits," IEEE Circuits \& Devices, pp. 32-36, November 1992.
[CBA88]E.-Chor, A. Brunnschweiler, P. Ashburn, "A Propagation-Delay Expression and its Application to the Optimization of Polysilicon Emitter ECL Processes," IEEE Jour. of Solid-State Circ., Vol. 23, No. 1, pp. 251-259, February 1988.
[CB95a] A. Chandrakasan, R. Brodersen, "Minimizing Power Consumption in Digital CMOS Circuits," Proc. of the IEEE, vol. 83, no. 4, pp. 498-523, 1995.
[CB95b] A. Chandrakasan, R. Brodersen, Low Power Digital CMOS Design, Kluwer Academic Publisher, 1995.
[CBF01] A. Chandrakasan, W. Bowhill, F. Fox, (Eds.), Design of HighPerformance Microprocessor Circuits, IEEE Press 2001.
[CCK95]C. Choy, C. Chan, M. Ku, "A feedback control circuit design technique to suppress power noise in high speed output driver," Proc. of ISCAS'95, pp. 307-310, April 1995.
[CCK97]C. Choy, C. Chan, M. Ku, J. Povazanec, "Design procedure of low-noise high-speed adaptive output drivers," Proc. of ISCAS'97, pp. 1796-1799, 1997.
[CG73] B. Cochrun, A. Grabel, "A method for the Determination of the Transfer Function of electronic Circuits," IEEE Trans. on Circuit Theory, Vol. CT-20, No. 1, pp. 16-20, Jan. 1973.
[CH99] Y. Cheng, C. Hu, MOSFET modeling \& BSIM3 user's guide, Kluwer Academic Publishers, 1999.
[CHL92] C. Choy, C. Ho, G. Lunn, B. Lin, G. Fung "A BiCMOS Programmable Frequency Divider," IEEE Trans. on Circuits and Systems part II, Vol. 39, No. 3, pp. 147-154, March 1992.
[CJ89] C. Choy, P. Jones, "Minimisation Tachnique for Series-Gated Emitter-Coupled Logic," IEE Proc. Part G, vol. 136, no. 3, pp. 105-113, June 1989.
[CP86] K. Chu, D. Pulfrey, "Design Procedures for Differential Cascode Voltage Switch Circuits," IEEE Jour. of Solid-state Circ., vol. SC21, no. 6, pp. 1082-1087, December 1986.
[CW98] H. Chang, J. Wu, "A 723-MHz 17.2-mW CMOS Programmable Counter," IEEE Journal of Solid-State Circuits, vol. 33, no. 10, pp. 1572-1575, Oct. 1998.
[DKS90] B. Del Signore, D. Kerth, N. Sooch, E. Swanson, "A monolithic 20-b delta-sigma A/D converter," IEEE Jour. Solid-State Circuits, vol. 25, pp. 1311-1317, Dec. 1990.
[DS03] B. De Muer, M. Steyaert, CMOS Fractional-N Synthesizers (Designfor High Spectral Purity and Monlithic Integration), Kluwer Academic Publishers, 2003.
[E48] W. Elmore, "The transient response of damped linear networks," $J$. Appl. Phys., vol. 19, pp. 55-63, Jan. 1948.
[EBR94] S. Embabi, D. Brueske, K. Rachamreddy, "A BiCMOS low-power current mode gate," IEEE Jour. of Solid-State Circuits, vol. 29, no. 6, pp. 741-745, June 1994.
[F90] W. Fang, "Accurate Analytical Delay Expressions for ECL and CML Circuits and Their Applications to Optimizing High-Speed Bipolar Circuits," IEEE Jour. of Solid-State Circ., Vol. 25, No. 2, pp. 572-583, April 1990.
[F96] A. Felder, et al., "46 Gb/s DEMUX, $50 \mathrm{~Gb} / \mathrm{s}$ MUX, and 30 GHz Static Frequency Divider in Silicon Bipolar Technology," IEEE Jour. Of Solid-State Circ., Vol. 31, No. 4, pp. 481-486, April 1996.
[F97] I. Fujimori et al., "A 5-V single chip delta-sigma audio A/D converter with 111 dB dynamic range," IEEE Jour. of Solid-State Circuits, vol. 32, pp. 329-336, Mar. 1997.
[FMP96] A. Felder, M. Moller, J. Popp, J. Bock, H. Rein "46 Gb/s DEMUX, $50 \mathrm{~Gb} / \mathrm{s}$ MUX, and 30 GHz Static Frequency Divider in Silicon Bipolar Technology," IEEE Jour. of Solid-State Circ., Vol. 31, No. 4, pp. 481-486, April 1996.
[FBA90]W. Fang, A. Brunnschweiler, P. Ashburn, "An Analytical Maximum Toggle Frequency Expression and its Application to Optimizing High-Speed ECL Frequency Dividers," IEEE Jour. of Solid-State Circ., Vol. 25, No. 4, pp. 920-931, August 1990.
[GM93] P. Gray, R. Meyer, Analysis and Design of Analog Integrated Circuits (third edition), John Wiley \& Sons, 1993.
[GMC91]H. Greub, J. McDonald, T. Creedon and T. Yamaguchi, "Highperformance standard cell library and modeling technique for differential advanced bipolar current tree logic," IEEE Jour. of Solid-State Circuits, vol. 26, no. 5, pp. 749-762, May 1991.
[GMO90]M. Ghannam, R. Mertens, R. Van Overstraeten, ""An Analytical for the Determination of the Transient Response of CML and ECL Gates," IEEE Trans. on Electron Devices, Vol. 37, No. 1, pp. 191201, January 1990.
[GT86] R. Gregorian, G. Temes, Analog MOS Integrated Circuits for signal processing, John Wiley \& Sons, 1986.
[H95] Y. Harada, "Delay Components of a Current Mode Logic Circuit and Their Current Dependency," IEEE Jour. of Solid-State Circ., Vol. 30, No. 1, pp. 54-60, January 1995.
[H96] J. Hauenschild, et al., "A Plastic Packaged $10 \mathrm{~Gb} / \mathrm{s}$ BiCMOS Clock and Data Recovering 1:4-Demultiplexer with External VCO," IEEE Jour. of Solid-State Circ., Vol. 31, No. 12, pp. 2056-2059, December 1996.
[H97] G. Hurkx, "The relevance of $f_{T}$ and $f_{M A X}$ for the Speed of a Bipolar CE Amplifier Stage," IEEE Trans. on Electron Devices, Vol. 44, No. 5, pp. 775-781, May 1997.
[HFP01] C. Hung, B. Floyd, B. Park, K. O, "Fully Integrated $5.35-\mathrm{GHz}$ CMOS VCOs and Prescalers," IEEE Trans. on Microwave Theory and Techniques, vol. 49, no. 1, Jan. 2001.
[HLL99] A. Hajimiri, S. Limotyrakis, T. Lee, "Jitter and Phase Noise in Ring Oscillators," IEEE Journal of Solid State Circuits, Vol. 34, No. 6, pp. 790-804, June 1999.
[HR99] F. Herzel, B. Razavi, "A Study of Oscillator Jitter Due to Supply and Substrate Noise," IEEE Trans. on Circuits and Systems part II, Vol. 46, No. 1, pp. 56-62, January 1999.
[I95] N. Ishihara, et al., "3.5-Gb/s x 4-Ch Si Bipolar LSI's for Optical Interconnections," IEEE Jour. of Solid-state Circ., Vol. 30, No. 12, pp. 1493-1500, December 1995.
[I951] K. Ishii, et al. "Very-High-Speed Si Bipolar Static Frequency Dividers with New T-Type Flip-Flops," IEEE Jour. Of Solid-state Circ., Vol. 30, No. 1, pp. 19-24, January 1995.
[IIS89] H. Ichino, N. Ishihara, M. Suzuki, S. Konaka, "18-GHz 1/8 Dynamic Frequency Divider Using Si Bipolar Technologies," IEEE Journal of Solid-State Circuits, vol. 24, no. 6, pp. 1723-1728, Dec. 1989.
[IIT95] K. Ishii, H. Ichino, M. Togashi, Y. Kobayashi, C. Yamaguchi "Very-High-Speed Si Bipolar Static Frequency Dividers with New T-Type Flip-Flops," IEEE Jour. of Solid-State Circ., Vol. 30, No. 1, pp. 19-24, January 1995.
[JM97] D. Johns, K. Martin, Analog Integrated Circuit Design, John Wiley \& Sons, 1997.
[JMS97] S. Jantzi, K. Martin, A. Sedra, "Quadrature bandpass $\Sigma \Delta$ modulator for digital radio," IEEE Jour. of Solid-State Circuits, vol. 32, pp. 1935-1949, 1997.
[JR97] K. Jayabalan, S. Rezaul Hasan, "Current-mode BiCMOS folded source-coupled logic circuits," Proc. of ISCAS'97, pp. 1880-1883, June 1997.
[K91] M. Kurisu, et al., "A Si Bipolar 21-GHz/320-mW Static Frequency Divider," IEEE Jour. of Solid-state Circ., Vol. 26, No. 11, pp. 1626-1630, November 1991.
[K98] K. Koike, et al. "High-Speed, Low-Power, Bipolar Standard Cell Design Methodology for Gbit/s Signal Processing," IEEE Jour. of Solid-state Circ., Vol. 33, No. 10, pp. 1536-1544, October 1998.
[KA92] S. Kiaei, D. Allstot, "Low-noise logic for mixed-mode VLSI circuits," Microelectronics Journal, vol. 23, no. 2, pp. 103-114, Apr. 1992.
[KB92] G. K. Konstadinidis, H. H. Berger, "Optimization of Buffer Stages in Bipolar VLSI Systems," IEEE Jour. of Solid-State Circ., Vol. 27, No. 7, pp. 1002-1013, July 1992.
[KBW01]H. Knapp, J. Bock, M. Wurzer, G. Ritzberger, K. Aufinger, L. Treitinger, " $2-\mathrm{GHz} / 2-\mathrm{mW}$ and $12-\mathrm{GHz} / 30-\mathrm{mW}$ Dual-Modulus

Prescalers in Silicon Bipolar Technology," IEEE Jour. of SolidState Circ., Vol. 36, No. 9, pp. 1420-1423, September 2001.
[KCA90]S. Kiaei, S. Chee, D. Allstot, "CMOS source-coupled logic for mixed-mode VLSI," Proc. Int. Symp. Circuits Systems, pp. 16081611, 1990.
[KDN91]B. Kup, E. Dijkmans, P. Naus, J. Sneep, "A bit-stream digital-toanalog converter with 18-b resolution," IEEE Jour. of Solid-State Circuits, vol. 26, pp. 1757-1763, Dec. 1991.
[KH00] J. Kundan, S. Hasan, "Enhanced folded source-coupled logic technique for low-voltage mixed-signal integrated circuits," IEEE Trans. on CAS - part II, vol. 47, no. 8, pp. 810-817, Aug. 2000.
[KH97] J. Kundan, S. Hasan, "Current mode BiCMOS folded sourcecoupled logic circuits," Proc. of ISCAS'97, pp. 1880-1883, June 1997.
[KKI97] K. Kishine, Y. Kobayashi, H. Ichino, "A High-Speed, Low-Power Bipolar Digital Circuit for Gb/s LSI's: Current Mirror Control Logic," IEEE Jour. of Solid-state Circ., Vol. 32, No. 2, pp. 215221, February 1997.
[KOS91] M. Kurisu, M. Ohuchi, A. Sawairi, M. Sugiyama, H. Takemura, T. Tashiro "A Si Bipolar 21-GHz/320-mW Static Frequency Divider," IEEE Jour. of Solid-State Circ., Vol. 26, No. 11, pp. 1626-1630, November 1991.
[KUO92]M. Kurisu, G. Uemura, M. Ohuchi, C. Ogawa, H. Takemura, T. Morikawa, T. Tashiro "A Si Bipolar 28-GHz Dynamic Frequency Divider," IEEE Jour. of Solid-State Circ., Vol. 27, No. 12, pp. 1799-1804w, November 1992.
[L95] Z. H. Lao, et al., "A $12 \mathrm{~Gb} / \mathrm{s}$ Si Bipolar 4:1-Multiplexer IC for SDH Systems," IEEE Jour. of Solid-state Circ., Vol. 30, No. 2, pp. 129-132, February 1995.
[L96] Z. Lao, et al., "Si Bipolar $14 \mathrm{~Gb} / \mathrm{s}$ 1:4-Demultiplexer IC for System Applications," IEEE Jour. of Solid-State Circ., Vol. 31, No. 1, pp. 54-59, January 1996.
[LHG84] H. Lee, D. Hodges, P. Gray, "A self-calibrating 15-bit CMOS A/D converter," IEEE Jour. of Solid-State Circuits, vol. 19, pp. 813819, Dec. 1984.
[LL96] Z. Lao, U. Langmann, "Design of a Low-Power $10 \mathrm{~Gb} / \mathrm{s}$ Si Bipolar 1:16-Demultiplexer IC," IEEE Jour. of Solid-State Circ., Vol. 31, No. 1, pp. 128-131, January 1996.
[LR00] C. Lam, B. Razavi, "A 2.6-GHz/5.2-GHz Frequency Synthesizer in $0.4-\mu \mathrm{m}$ CMOS Technology," IEEE Jour. of Solid-State Circ., Vol. 35, No. 5, pp. 788-794, May 2000.
[LS94] K. Laker, W. Sansen, Design of Analog Integrated Circuits and Systems, Mc Graw-Hill, 1994.
[LWO91]H. Leopold, G. Winkler, P. O’Leary, K. Ilzer, J. Jernej, "A monolithic CMOS 20-b analog-to-digital converter," IEEE Jour. of Solid-State Circuits, vol. 26, pp. 910-916, July 1991.
[M89] R. Middlebrook, "Null Double Injection and the Extra Element Theorem", IEEE Trans. on Education, vol. 32, no. 3, pp. 167-180, August 1989.
[M92] S. Masui, "Simulation of substrate coupling in mixed-signal MOS circuits," Proc. Of Symp. VLSI Circuits, pp. 42-43, 1992.
[M97] C. Maier, et al., "A 533-MHz BiCMOS Superscalar RISC Micorprocessor," IEEE Jour. of Solid-state Circ., Vol. 32, No. 11, pp. 1625-1633, November 1997.
[M971] J. McNeill, "Jitter in Ring Oscillators," IEEE Journal of Solid State Circuits, Vol. 32, No. 6, pp. 870-879, June 1997.
[MG87] J. Millman - A. Grabel, Microelectronics (Second Edition), McGraw-Hill, 1987.
[MK86] R. Muller, T. Kamins, Device Electronics for Integrated Circuits, John Wiley \& Sons, 1986.
[MKA92]S. Maskai, S. Kiaei, D. Allstot, "Synthesis techniques for CMOS folded source-coupled logic circuits," IEEE Jour. of Solid-State Circuits, vol. 27, no. 8, pp. 1157-1167, Aug. 1992.
[MSO92]M. Mizuno, H. Suzuki, M. Ogawa, K. Sato, H. Ichikawa, "A 3$\mathrm{mW} 1.0-\mathrm{GHz}$ Silicon-ECL Dual-Modulus Prescaler IC," IEEE Jour. of Solid-State Circuits, vol. 27, no. 12, pp. 1794-1798, December 1992.
[NA97] H. Ng, D. Allstot, "CMOS current steering logic for low-voltage mixed-signal integrated circuits," IEEE Trans. on VLSI Systems, vol. 5, no. 3, pp. 301-308, Sept. 1997.
[NIE03] H. Nosaka, K. Isshii, T. Enoki, T. Shibata, "A 10-Gb/s DataPattern Independent Clock and Data Recovery with a Two-Mode Phase Comparator," IEEE Jour. of Solid-State Circuits, vol. 38, no. 2, pp. 192-197, February 2003.
[O99] V. Oklobdzija, High-Performance System Design (Circuits and Logic), IEEE Press 1999.
[P01] H. Partovi, "Clocked Storage Elements", in A. Chandrakasan, W. Bowhill, F. Fox, (Eds.), Design of High-Performance Microprocessor Circuits, IEEE Press 2001.
[PBS97] P. Poras, T. Balsara, D. Steiss, "Performance of CMOS differential circuits," IEEE Jour. of Solid-State Circuits, vol. 31, pp. 841-846, June 1997.
[PM91] O. Pederson, K. Mayaram, Analog Integrated Circuits for Communication (principles, Simulation and Design), Kluwer Academic Publishers, 1991.
[R98] B. Razavi, RF Microelectronics, Prentice Hall, 1998.
[R96] J. Rabaey, Digital Integrated Circuits (A Design Perspective), Prentice Hall, 1996.
[R961] B. Razavi (Ed.), Monolithic Phase-Locked Loops and Clock Recovery Circuits (Theory and Design), IEEE Press 1996.
[R02] B. Razavi, "Prospect of CMOS Technology for High-Speed Optical Communication Circuits," IEEE Jour. of Solid-State Circ., Vol. 37, No. 9, pp. 1135-1145, September 2002.
[RDR01]M. Reinhold, C. Dorschky, R. Pullela, P. Mayer, F. Kunz, Y. Beeyens, T. Link, J. Mattia, "A Fully Integrated 40-Gb/s Clock and Data Recovery IC With 1:4 DEMUX in SiGe Technology," IEEE Journal of Solid-State Circuits, vol. 36, no. 12, pp. 1937-1945, Dec. 2001.
[RM96] H. Rein, M. Moller, "Design Considerations for Very-High-Speed Si-Bipolar IC's Operating up to $50 \mathrm{~Gb} / \mathrm{s}$," IEEE Jour. of SolidState Circ., Vol. 31, No. 8, pp. 1076-1090, August 1996.
[ROS94] B. Razavi, Y. Ota, R. Swartz, "Design techniques for low-voltage high speed digital bipolar circuits," IEEE Jour. of Solid-State Circ., Vol. 29, No. 2, pp. 332-339, March 1994.
[RP96] J. Rabaey, M. Pedram (Eds.), Low power design methodologies, Kluwer Academic Publishers, 1996.
[S81] S. Sze, Physics of Semiconductor Devices, John Wiley \& Sons, 1981.
[S02] R. Singh (Ed.), Signal Integrity Effects in Custom IC and ASIC Design, IEEE Press 2002.
[SD93] C. Stout, J. Doernberg, "10-Gb/s Silicon Bipolar 8:1 Multiplexer and 1:8 Demultiplexer," IEEE Journal of Solid-State Circuits, vol. 28, no. 3, pp. 339-343, March 1993.
[SE94] K. M. Sharaf, M. Elmasry, "An Accurate Analytical Propagation Delay Model for High-Speed CML Bipolar Circuits," IEEE Jour. of Solid-State Circ., Vol. 29, No. 1, pp. 31-45, January 1994.
[SE96] K. M. Sharaf, M. I. Elmasry, "Analysis and Optimization of SeriesGates CML and ECL High-Speed Bipolar Circuits," IEEE Jour. of Solid-State Circ., Vol.31, No. 2, pp. 202-211, February 1996.
[SK97] R. Saez, M. Kayal, M. Declercq, M. Schneider, "Design guidelines for CMOS current steering logic," Proc. of ISCAS'97, pp. 18721875, 1997.
[SKD96] R. Sàez, M. Kayal, M. Declercq, M. Schneider, "Digital circuit techniques for mixed analog/digital circuits applications," Proc. of ICECS'96, Rodos, Greece 1996.
[SL98] Shing-Tag Yan, H. Luong, "A 3-V 1.3-to-1.8-GHz CMOS Voltage-Controlled Oscillator with 0.3-ps Jitter," IEEE Trans. on Circuits and Systems part II, Vol. 45, No. 7, pp. 876-880, July 1998.
[SLM93] D. Su, M. Loinaz, S. Masui, B. Wooley, "Experimental results and modeling techniques for substrate noise in mixed-signal integrated circuits," IEEE Jour. of Solid-State Circuits, vol. 28, pp. 420-430, Apr. 1993.
[SP93] R. Senthinatan, J. Prince, "Application specific CMOS output driver circuit design techniques to reduce simultaneous switching noise," IEEE Jour. of Solid-State Circuits, vol. 28, no. 12, pp. 1383-1388, Dec. 1993.
[S96] F. Sato, et al., "A $2.4 \mathrm{~Gb} / \mathrm{s}$ Receiver and a 1:16 Demultiplexer in One Chip Using a Super Self-Aligned Selectively Grown SiGe Base (SSSB) Bipolar Transistor," IEEE Jour. of Solid-State Circ., Vol. 31, No. 10, pp. 1451-1456, October 1996.
[SMS94] T. Seneff, L. McKay, K. Sakamoto, N. Tracht, "A Sub-1 mA 1.5GHz Silicon Bipolar Dual Modulus Prescaler," IEEE Jour. Of Solid-State Circ., Vol. 29, No. 10, pp. 1206-1211, October 1994.
[SMT98] G. Schuppener, M. Mokhtari, H. Tehnhunen, "A Divide-by-4 Circuit Implemented in Low Voltage, High Speed Topology," Proc. ISCAS'98, Monterey, pp. 215-221, June 1998.
[SPM00] G. Schuppener, C. Pala, M. Mokhtari, "Investigation on LowVoltage Low-Power Silicon Bipolar Design Topology for HighSpeed Digital Circuits," IEEE Jour. of Solid-State Circ., Vol. 35, No. 7, pp. 1051-1054, July 2000.
[SPW91] N. Sheng, R. Pierson, K. Wang, R. Nubling, P. Asbeck, M. Chang, W. Edwards, D. Phillips, "A High-Speed Multimodulus HBT Prescaler for Frequency Synthesizer Applications," IEEE Jour. of Solid-State Circ., Vol. 26, No. 10, pp. 1362-1367, October 1991
[SR01] J. Savoj, B. Razavi, High-Speed CMOS Circuits for Optical Receivers, Kluwer academic Publisher, 2001.
[SS91] A. Sedra, K. Smith, Microelectronic Circuits (third edition), Saunders College Publishing, 1991.
[SVR94] B. Stanistic, N. Verghese, R. Rutenbar, L. Carley, D. Allstot, "Addressing substrate coupling in mixed-mode IC's: simulation and power distribution synthesis," IEEE Jour. of Solid-State Circuits, vol. 29, pp. 226-238, Mar. 1994.
[T98] Y. Tsividis, Operation and modeling of MOS transistors, McGrawHill, 1998.
[T89] R. Treadway, "DC Analysis of Current Mode Logic," IEEE Circuits and Device Magazine, pp. 21-35, March 1989
[TS79] D. D. Tang, P. M. Solomon, "Bipolar Transistor Design for Optimized Power-Delay Logic Circuits," IEEE Jour. of Solid-State Circ., Vol. SC-14, No. 4, pp. 679-684, August 1979.
[TUF01] A. Tanabe, M. Umetani, I. Fujiwara, T. Ogura, K. Kataoka, M. Okiara, H. Sakuraba, T. Endoh, F. Masuoka, " $0.18-\mu \mathrm{m}$ CMOS 10$\mathrm{Gb} / \mathrm{s}$ Multiplexer/Demultiplexer ICs Using Current Mode Logic with Tolerance to Threshold Voltage Fluctuation," IEEE Jour. of Solid-State Circuits, vol. 36, no. 6, June 2001.
[VK95] C. Vaucher, D. Kasperkovitz, "A Wide-Band Tuning System for Fully Integrated Satellite Receivers," IEEE Jour. of Solid-State Circ., Vol. 33, No. 7, pp. 987-997, July 1995.
[W90] G. Wilson, "Advances in Bipolar VLSI," Proceeding of IEEE, Vol. 78 No. 11, pp. 1707-1719, November 1990.
[WKG94]T. Weigandt, B. Kim, P. Gray, "Analysis of Timing Jitter in CMOS Ring Oscillators," IEEE Proc. ISCAS'94, London, pp. 2730, June 1994.
[YTC92] A. T. Yang, Y. Chang, "Physical Timing Modeling for Bipolar VLSI," IEEE Jour. of Solid-State Circ., Vol. 27, No. 9, pp. 12451254, September 1992.
[YEC83] P. Yang, B. Epler, P. Chatterjee, "An Investigation of the Charge Conservation Problem for MOSFET Circuit Simulation", IEEE Jour. of Solid-State Circuits, vo. 18, no. 1, pp. 128-138, February 1983.

## ABOUT THE AUTHORS

Massimo Alioto was born in Brescia, Italy, in 1972. He received the laurea degree in electronics engineering and the Ph.D. degree from the University of Catania, Italy, in 1997 and 2001, respectively.

Since 2001 he has been teaching undergraduate and graduate courses on basic electronics, microelectronics and digital electronics.
In 2002, he joined the Dipartimento di Ingegneria dell'Informazione (DII) of the University of Siena as a research associate and in the same year as an assistant professor.

His primary research interests include the modeling and optimized design of bipolar and CMOS high-performance digital circuits in terms of high-speed or low-power dissipation, as well as arithmetic circuits. He has authored or co-authored more than 40 journals and conference papers.

Gaetano Palumbo was born in Catania, Italy, in 1964. He received the laurea degree in Electrical Engineering and a Ph.D. degree in 1988 and 1993, respectively, from the University of Catania.

Since 1993 he conducts courses on Electronic Devices, Digital Circuit and Systems and basic Electronics. In 1994 he joined the DEES (Dipartimento Elettrico Elettronico e Sistemistico), now DIEES (Dipartimento di Ingegneria Elettrica Elettronica e dei Sistemi), at the University of Catania as a researcher, subsequently becoming associate professor in 1998. Since 2000 he is a full professor in the same department.

His initial research interest has been devoted to analog circuits with particular emphasis on feedback circuits, compensation techniques, currentmode approach and low-voltage circuits. After several years, his research has also embraced digital circuits with emphasis on bipolar and MOS current-mode digital circuits, adiabatic circuits, high-performance building blocks focused on achieving optimum speed within the constraint of low power operation, and arithmetic circuits. In all these fields he is developing some research in collaboration with ST-microelectronics of Catania site.
He was the co-author of the books "CMOS Current Amplifiers" and "Feedback Amplifiers: theory and design", both by Kluwer Academic Publishers, in 1999 and 2002, respectively. He is a contributor to the Wiley Encyclopedia of Electrical and Electronics Engineering. In addition, he is
the author or co-author of more than 200 scientific papers on international journals (over 85) and conferences, and of several patents.

From June 1999 to the end of 2001 he served as an Associated Editor of the IEEE Transactions on Circuits and Systems part I for the topic "Analog Circuits and Filters". Since 2004 he is serving as an Associated Editor of the IEEE Transactions on Circuits and Systems part I for the topic "digital circuits and systems".

In 2003 he received the Darlington Award ${ }^{1}$. Prof. Palumbo is an IEEE Senior Member.

[^18]
[^0]:    ${ }^{1}$ All the derivatives are evaluated at the quiescent operating point.

[^1]:    ${ }^{1}$ Typical values of $g_{m b, \text { buf }} \not g_{m, b u f}$ range from 0.1 to 0.2 ; for the $0.35-\mu \mathrm{m}$ CMOS process used, ratio $g_{m b, \text { buf }} f_{m, b \text { buf }}$ is equal to 0.13 , leading to $v_{o} /\left(v_{i, b u f f} v_{i, b u f z}\right)=0.88$.

[^2]:    ${ }^{2}$ i.e. approximately equal to $I_{S S}$, as is obtained from $i_{C 3}=\alpha_{F, 1} \alpha_{F ; 3} I_{S S} \approx I_{S S}$, by assuming the common-base current gain $\alpha_{F}$ to be about unity.

[^3]:    ${ }^{3}$ It is worth noting that the same value is obtained when $v_{i}$ is low, since in this case the role of transistor $\mathrm{Q}_{\mathrm{i}}$ is played by that sharing the emitter node.

[^4]:    ${ }^{4}$ Actually, the low-pass filter and the amplifier are not implemented, since the upper cut-off frequency and the voltage gain of the mixer are exploited.

[^5]:    ${ }^{1}$ In some specific cases, which will be considered in Section 3.5, more than one input might be associated with the same level.

[^6]:    ${ }^{2}$ Actually, it is easy to realize that this condition is always satisfied when applying CPE to the top level.

[^7]:    ${ }^{4}$ For example, the minterms associated with the first two input values in Table 3.2a, which give $F=0$ regardless of the value of $X_{1}$, are simplified into the single minterm associated with the first row in Table 3.2b.

[^8]:    1 The small delay dependence on the input rise time will be considered in Chapter

[^9]:    ${ }^{1}$ In actual cases, $g_{m} r_{e}$ tends to reduce the voltage gain $A_{V}=g_{m} R_{C} /\left(1+g_{m} r_{e}\right)$. Since CML gates have a small logic swing and thus a low noise margin, a decrease of $A_{V}$ is not unacceptable, and is avoided by satisfying the condition $g_{m} r_{e} \ll 1$.

[^10]:    ${ }^{1}$ This explains why the half circuit concept can be applied. Indeed, both half circuits are driven by a constant current and have the same time constant, thus their output voltages move toward each other in a symmetrical way with respect to the logic threshold.

[^11]:    ${ }^{2}$ In the SCL gate only the resistive contribution of the output impedance was considered, since the output capacitance does not contribute to the delay. To better understand this point, consider the overall transfer function of the circuit in Fig. 6.9, which can be approximated as $\left(1+a_{1} s\right) /\left(1+b_{1} s\right)$ in the specific case of MOS CurrentMode gates, as will be shown above. By applying the Elmore delay approximation,

[^12]:    ${ }^{1}$ From relationship (7.9), this holds for both the delay obtained with high-speed criteria discussed, as well as for the minimum delay achievable (7.3) that only differs from (7.9) by a factor of 2.

[^13]:    ${ }^{2}$ This is because the transconductance in the linear region is lower than in the saturation region. In other terms, the transistor driving capability is reduced, thus determining a decrease in both speed and voltage gain from (2.44) (and hence in the noise margin (2.47)).

[^14]:    ${ }^{3}$ For example, in the CMOS process considered, $c^{M}$ is in the order of a gate capacitance of a minimum NMOS transistor, which is usually much lower than typical load capacitances in (7.30).

[^15]:    ${ }^{4}$ For example, for the MUX/XOR gate in the CMOS process considered and under the conditions discussed above, neglecting $c^{M}$ leads to an error which is always lower than $10 \%$ even for the worst case $C_{L}=0 \mathrm{~F}$. For practical load capacitance values, this error is even lower.

[^16]:    ${ }^{5}$ Actually, this expression holds in regions M and H which gates are usually biased in.

[^17]:    ${ }^{1}$ Indeed, as for design strategies introduced in Chapter 5, delay of each gate is to be minimized for a fixed load capacitance, since it does not depend on other bias currents. This is because load of each gate consists of the input capacitances of subsequent gates, that do not depend on their bias currents.

[^18]:    ${ }^{1}$ best paper bridging the gap between theory and practice published in IEEE Trans. on CAS I or II, during the two calendar years preceding the award.

