# A 2.5-Gb/s Clock Recovery Circuit for NRZ Data in 0.4- $\mu$ m CMOS Technology<sup>1</sup>

Seema Butala Anand and Behzad Razavi Electrical Engineering Department University of California, Los Angeles

#### **Abstract**

This paper describes a 2.5-Gb/s phase-locked clock recovery circuit utilizing a two-stage ring oscillator and a sample-and-hold phase detector. Fabricated in a 0.4- $\mu$ m digital CMOS technology, the recovered clock exhibits an rms jitter of 10.8 ps for a PRBS sequence of length  $2^7 - 1$  while dissipating 50 mW of power from a 3.3-V supply.

#### I. INTRODUCTION

The telecommunication industry has created a wide demand for high-speed serial-data communication networks, motivating research on low-cost fiber-optic receivers. A critical task in such receivers is the recovery of the implicit clock in the non-return-to-zero (NRZ) serial data stream. This paper describes a circuit using a phase-locked loop (PLL) technique for recovering a 2.5-GHz clock from random NRZ data.

The next section of the paper presents the clock recovery architecture and design issues. Section III describes the building blocks and Section IV summarizes the experimental results.

## II. ARCHITECTURE

The architecture of the clock recovery circuit is shown in Fig. 1. The loop consists of a phase detector (PD), a voltage-to-current (V/I) converter, a passive loop filter (LF), and a ring-based voltage-controlled oscillator (VCO). The VCO provides the main output through a set of open-drain buffers to  $50-\Omega$  termination resistors.



Fig. 1. PLL architecture.

At low supply voltages, the VCO gain,  $K_{VCO}$ , required to achieve a given tuning range becomes quiet large. As a result, the ripple on the control voltage due to the phase detector activity creates greater jitter at the output. The conflict between a wide tuning range and a low VCO sensitivity is

<sup>1</sup>This work was supported in part by SRC under contract 64.001 and by Cypress Semiconductor.

resolved by a provision for two control inputs, a fine control driven by the main loop and a coarse control that will be driven by a frequency-locked loop (FFL) in future implementations. Since the FLL remains relatively quiet (or can be disabled), after phase lock, the high sensitivity of coarse control does not lead to high jitter.

All the building blocks are fully differential to minimize the effect of supply and common-mode noise.

## III. BUILDING BLOCKS

In this section, the transistor-level implementation of each building block is described, emphasizing the design constraints imposed by the technology limitations.

#### A. VCO

The VCO consists of a differential ring oscillator with delay interpolation [1], providing a tuning range wide enough to encompass process and temperature variations [Fig. 2(a)]. In the 0.4- $\mu$ m CMOS technology used here, the maximum oscillation frequency does not exceed 1.8 GHz for a four-stage ring and 2.4 GHz for a three-stage ring. Thus, to achieve reliable operation at 2.5 GHz, a two-stage topology is necessary. However, two simple differential pairs in a loop fail to oscillate because each stage contributes only one pole, yielding insufficient phase at unity gain. To resolve this quandary, each stage employs a load that introduces excess phase at high frequencies.

Figure 2(b) shows the implementation of each delay stage. The fast and slow paths share the load consisting of  $M_7$ - $M_8$  and the  $R_1$ - $C_1$  networks. Resistor  $R_1$  is realized as a PMOS device operating in the triode region and capacitance  $C_1$  is simply MOS gate-source capacitance including the gate-source capacitance of  $M_7$  (or  $M_8$ ). Setting the gain of the fast and slow paths,  $M_9$  and  $M_{10}$  in fact consist of smaller transistors so as to provide fine and coarse tuning.

The  $R_1$ - $C_1$  network converts a single-pole stage to a secondorder circuit. Considering only the fast path and neglecting channel-length modulation, we obtain the transfer function as:

$$\frac{V_{in}}{V_{out}} = \frac{-g_{m1}(1+R_1C_1s)}{g_{m7} + (C_1 + C_L + R_1C_1/r_{o7})s + R_1C_1C_Ls^2},$$

where  $C_L$  denotes the load capacitance seen at each output node (including the drain junction capacitances of  $M_1$ ,  $M_5$ , and  $M_7$  and the input capacitance of the next stage). The



Fig. 2. (a) One stage of the VCO, (b) implementation of one stage with modified load.

circuit exhibits a zero at  $-1/R_1C_1$  and two poles whose sum is given by

$$\omega_{p1} + \omega_{p2} = \frac{C_1 + C_L}{R_1 C_1 C_L}.$$

Oscillation of the two-stage VCO depends on careful placement of the poles and the zero, with a requisite  $90^{\circ}$  phase shift at the unity-gain frequency,  $\omega_{\rm u}$ , for each stage. This in turn mandates that both pole frequencies be less than the frequency of the zero. Hence,

$$\frac{C_1+C_L}{R_1C_1C_L}<\frac{2}{R_1C_1},$$

and consequently,

$$C_1 < C_L$$
.

The above condition is easily met because  $M_7$  must be narrower than  $M_1$  to guarantee the required voltage gain and, therefore, its gate-source capacitance is less than that of  $M_1$  in the following stage. The drain junction capacitances at the output further strengthen the condition.

Figure 3 plots the simulated gain and phase of each stage as a function of frequency with and without the excess phase network. Two important points can be observed: (1) the unity-gain frequency is *higher* with the  $R_1$ - $C_1$  network due

to the inductive behavior of the load and (2) the circuit satisfies Barkhausen's oscillation criteria at a single frequency ( $\approx 2.5$  GHz) in the presence of  $R_1$  and  $C_1$  but fails short by  $13^{\circ}$  (per stage) at  $\omega_{\nu}$  without the excess phase network.



Fig. 3. Gain and phase response of each delay stage.

The phase noise due to thermal noise of resistors  $R_1$  in Fig. 2(b) is of concern. However, simulations show that the phase noise due to all four resistors in the VCO is roughly equal to  $-143.2 \, \mathrm{dBc/Hz}$  at a 5-MHz offset, a value much less than the contribution of the other devices.

### B. Phase Detector

The design of phase detectors for high-speed random NRZ data is a challenging task. In "linear" phase detectors such as that in [2], the output pulse width is linearly proportional to the input phase difference, resulting in a constant loop gain during lock transient and minimal charge pump activity after phase lock is achieved. The difficulty, however, lies in generating pulse-widths equal to a fraction of the clock period at speeds near the limits of the technology. By contrast, bang-bang PDs [1] employ simple flipflops for maximum speed but provide only two output states, creating significant ripple on the control line in the locked condition and hence producing great jitter at the VCO output.

The phase detector used in this work combines the two methods so as to overcome the speed limitations of the former and avoid the high activity rate of the latter. Shown in Fig. 4(a), the PD is realized as a master-slave sample-and-hold circuit (an "analog D flipflop"), whereby each rising data transition samples the instantaneous value of the VCO output. The circuit thus generates an output that is linearly proportional to the input phase difference in the vicinity of lock. The voltage then drives a V/I converter and the loop filter.

The transistor implementation of the PD is depicted in Fig. 4(b). Each of the master and slave stages consists of a differential pair whose tail current and load devices turn off simultaneously, thereby storing the instantaneous value of  $V_{VCO}$ 



Fig. 4. (a) Master-slave sample-and-hold phase detector, (b) circuit implementation of (a).

on the parasitic capacitances  $C_{P1}$ - $C_{P4}$ . To allow operation from a low supply voltage, the tail current is controlled by a current mirror and a PMOS differential pair. Transistor  $M_T$  is a narrow and long device so that when it is on,  $M_1$  and  $M_2$  (which are wide and short) are forced into the triode region. This obviates the need for common-mode feedback (CMFB).

In addition to a relatively linear behavior with respect to the input phase difference, the PD of Fig. 4 exhibits two other properties. First, the master-slave sample-and-hold circuit avoids a transparent path from  $D_{in}$  to  $V_{out}$ , producing a voltage proportional to the phase difference for most of the period. Second, the path with large switching transients, namely, from  $D_{in}$  to each stage, operates only at the data rate rather than the VCO rate. Consequently, the bandwidth of this path can be as low as  $0.7 \times 2.5$  GHz = 1.75 GHz, allowing a low-power implementation in 0.4- $\mu$ m CMOS technology.

## C. V/I Converter and Loop Filter

Figure 5 illustrates the differential V/I converter and loop filter. The current generated by  $M_1$  and  $M_2$  is folded up so as to produce an output common-mode level compatible with the VCO control path. A simple CMFB network defines the output CM level.



Fig. 5. V/I converter and loop filter.

The mismatch between the differential output currents of the V/I converter translates to a static phase error between the data and the VCO output. The circuit therefore incorporates relatively large devices to minimize this error. Note that this stage runs at a frequency equal to the *difference* between the input data rate and the VCO frequency, allowing a relaxed trade-off between speed, device dimensions, and power dissipation.

# IV. EXPERIMENTAL RESULTS

The clock recovery circuit has been fabricated in a digital 0.4- $\mu$ m CMOS technology. Shown in Fig. 6 is a photograph of the die, whose active area measures  $1.2 \text{ mm} \times 0.8 \text{ mm}$ . The circuit has been tested in a chip-on-board assembly while running from a 3.3-V power supply.



Fig. 6. Die photo of clock recovery circuit.

The free-running frequency characteristics of the VCO are shown in Fig. 7 for both coarse and fine tuning. Note the linear behavior across a wide range for both control inputs.



Figure 8 shows the measured output and the jitter histogram

381

18-3-3

in response to a 2.5-Gb/s PRBS sequence of length  $2^7 - 1$ . The rms and peak-to-peak jitter are equal to 10.8 ps and 90.8 ps, respectively. Figure 9 illustrates the output for a sequence length of  $2^{23}-1$ , exhibiting 17.4 ps and 167 ps of rms and peak-to-peak jitter, respectively. The increase in jitter is attributed to the insufficient time constant in the loop filter. Table 1 summarizes the measured performance of the clock recovery circuit. The performance is comparable with that of a clock recovery circuit designed in a 30-GHz bipolar technology [3].



Fig. 8. (a) Time domain waveform at 2.5 GHz, (b) jitter histogram for PRBS of  $2^7 - 1$ 

## REFERENCES

- [1] B. Lai, R. C. Walker, "A Monolithic 622 Mb/s Clock Extraction Data Retiming Circuit," ISSCC Digest of Tech. Papers, pp. 144-145, Feb. 1991.
- [2] C. Hogge, "A Self Correcting Clock Recovery Circuit," IEEE Journal of Lightwave Technology, vol. LT-3, pp. 1312-1314, Dec. 1985.



Fig. 9. (a) Time domain waveform at 2.5 GHz, (b) jitter histogram for PRBS of  $2^{23}-1$ .

| 2.5 Gb/s                               |
|----------------------------------------|
| 15 MHz                                 |
| 50 MHz                                 |
| -80 dBc/Hz                             |
| 10.7 ps, rms                           |
| 17.4 ps, rms                           |
| 50 mW                                  |
| 3.3 V                                  |
| $0.8 \text{ mm} \times 1.2 \text{ mm}$ |
| 0.4-μm CMOS                            |
|                                        |

Table 1. Performance summary.

[3] M. Soyuer, "A Monolithic 2.3-Gb/s 100-mW Clock and Data Recovery Circuit in Silicon Bipolar Technology," *IEEE Journal of Solid-State Circuit*, vol. 28, no. 12, pp. 1310-3, Dec. 1993.