# **Clock Design Techniques Considering Circuit Reliability** Yonghwan Kim, Minseok Kang, Kyoung-Hwan Lim, Sangdo Park, Deokjin Joo, Taewhan Kim School of Electrical Engineering and Computer Science Seoul National University, Korea {oceanic, ikado, khlim, bundo, jdj, tkim}@ssl.snu.ac.kr Abstract—This paper overviews clock design problems related to the circuit reliability in deep submicron design technology. The topics include clock polarity assignment problem for reducing peak power/ground noise, clock mesh network design problem for tolerating clock delay variation, electromagnetic interference (EMI) aware clock optimization problem, adjustable delay buffer (ADB) allocation and assignment problem to support multiple voltage mode designs, and state encoding problem for reducing peak current in sequential elements. The last topic belongs to FSM design and is not directly related to the clock design, but it can be viewed that reducing noise at the sequential elements driven by clock signal is contained in the spectrum of reliable circuit design from clock source down to sequential elements inclusive. #### I. Introduction In a synchronous digital system, the clock signal is used to define a time reference for the movement of data in the system. The clock distribution network distributes the clock signal(s) from a clock source to all sequential elements which require it. Thus, the clock function is vital to the operation of synchronous system. Nowadays much more attention has been paid to the clock related design than ever before. This is because as the clock frequency increases over 1GHz with low supply voltage, a small noise on clock signal causes a transient function error or even a drastic system failure. (The noise comes from many factors such as current charge/discharge variation and temperature variation.) One of the most important clock design issues is analyzing the clock signals' behavior, and mitigating the adverse impact of clock noise on circuit reliability. This work overviews a number of important clock network optimization problems and the proposed techniques with regard to circuit reliability in deep submicron design technology. The following subsections cover (1) the clock polarity assignment problem for reducing peak current noise on clock tree, (2) the clock mesh network design problem for tolerating clock skew variation, (3) the electromagnetic interference (EMI) aware clock optimization problem, (4) the adjustable delay buffer (ADB) allocation and assignment problem that is useful in multiple voltage mode design environment, and finally (5) FSM state encoding problem for reducing peak current in the sequential elements in FSM. The last topic is not directly related to the clock design issue, but it can be regarded that reducing noise at the activation of sequential elements driven by clock signal is contained within the spectrum of achieving reliable circuits from clock source down to sequential elements inclusive. ## II. RELIABILITY AWARE CLOCK DESIGN TECHNIQUES ## A. Clock Polarity Assignment In synchronous circuits, clock trees and its clocked loads are the major sources of on-chip noise (power/ground voltage fluctuations) since they switch simultaneously near the rising and/or falling edge of the clock signal and draw significant amount of current from the power/ground rail. To accommodate this simultaneous switching, Vittal *et al.* [1] and Benini *et al.* [2] took advantage of clock skew to scatter the clock signal arrival times along time domain. Fig. 1. Current profiles for a buffer and an inverter. (a) Buffers draw larger current from the power rail at the rising edge of the clock and discharges the current to the ground rail at the falling edge. (b) For the inverter, the opposite phenomenon occurs. Thereafter, Nieh et al. [3] introduced additional degree of freedom to this scattering, by firstly proposing polarity assignment technique. They showed that mixing buffers and inverters for clock buffering elements can disperse noise over rising and falling edges of the clock as shown in Fig. 1. The approach assigned half of the buffering elements to negative polarity and the other half positive by replacing one of the two buffers that are connected to the clock source with an inverter. However, although the total peak current is reduced significantly, since power/ground noise is a local effect, the problem remained largely unsolved. Samanta et al. [4] mixed buffers and inverters throughout the clock tree structure so that for each local region about half of the buffering elements have positive polarity and the other half negative polarity. This approach, however, while greatly reducing noise, is likely to introduce large clock skew. According to a more recent data [5], mixing buffers and inverters for non-leaf elements resulted in an average clock skew of 592ps. Chen et al. [6], [7] observed that noise from leaf buffering elements dominates that from non-leaf buffering elements. This phenomenon is more prominent when the clock tree is not a binary tree that one non-leaf buffering element has more than two leaf buffering elements attached. Hence, they proposed to assign polarity to leaf buffering elements only and assigned half of the leaf nodes to negative polarity, with the objective of minimizing clock skew. However, this approach only heuristically solved the noise reduction problem in that it does not consider buffer load which affects the peak value of noise current. Jang and Kim [8] proposed and solved a combined problem of buffer sizing and polarity assignment with the proof of the NP-completeness of the polarity assignment problem. It retrieves all assignment combinations under clock skew constraint and selects the one with the lowest noise. Very recently, Joo and Kim [9] proposed a fine-grained polarity assignment to overcome the two limitations of the prior works, which are the unawareness of the signal delay (i.e., arrival time) differences to the leaf buffering elements and the ignorance of the effect of the current fluctuation of non-leaf buffering elements on the total peak current waveform. In addition, there are a number of works which have considered other design factors to improve the quality of polarity assignment. Kang and Kim [10] considered statistical delay variation to increase yield during polarity assignment. Ryu and Kim [11] took a different approach by assigning polarity to leaf nodes first and then routing clock tree. Lu and Taskin [12] exploited XOR gates, instead of inverters and buffers, so that polarity may be controlled at runtime. #### B. Clock Mesh Network Design As CMOS process scaling continues under deep submicron technology while clock frequency increases, the variation effect on clock skew becomes significant. It is known that the delay variation on interconnect can cause up to 25% variation in clock skew [13], implying the necessity of controlling variation effect during clock network design. Mesh-based clock distribution network, as depicted in Fig. 2, is one of the solutions that can effectively mitigate the clock skew variation since multiple paths from clock source to sink are able to compensate the different clock arrival times. Fig. 2. Mesh-based clock distribution network. However, the multiple paths are likely to generate short circuit currents between mesh buffers which have different clock arrival times. Moreover, mesh-based clock network requires much more wires than tree-based clock network in order to support multiple clock signal paths. Consequently, most of related works have focused on reducing the amount of power consumption and wire resource while maintaining clock skew variation tolerance. Venkataraman et al. [14] decided the location and size of buffers first, and then reduced mesh wires; they solved the mesh buffer placement and sizing problem by formulating it into a weighted set-cover problem. Precisely, for each node (i.e., buffer) in clock mesh they defined so called a covering region, which refers to the set of nodes that the node can drive. Subsequently, they find a minimal set of covering regions that can cover all nodes in clock mesh. They employed a greedy algorithm which iteratively picks a covering region that covers the largest number of uncovered nodes. The mesh reduction problem was then solved by formulating it into the Steiner network problem [15]; they iteratively removed wire segments while maintaining a certain level of redundancy, i.e., every clock sink should have at least k closest clock buffers within a distance of $L_{max}$ . After mesh reduction is performed, readjustment of buffer size is attempted as a post-processing. On the other hand, Rajaram and Pan [16] suggested an initial mesh planning; they expressed the total wire length and worstcase clock skew as functions of mesh size, by which they created an initial mesh size under worst-case clock skew constraint. In addition, they improved the cost function used in the greedy algorithm in [14] by considering a potential effect of buffer insertion on mesh and low-pass filter characteristics of an RC mesh. They also proposed a network sensitivity based mesh reduction algorithm in which the delay sensitivity of each sink is calculated in terms of the width of wire segment. By using this delay sensitivity, they removed the wire segments that cause little effect on clock skew. #### C. Electromagnetic Inference (EMI) Aware Clock Design Increasing requirement for high performance circuits and the enforcement of strict governmental regulation lead to a new design solution for EMI. Electromagnetic radiation from electronic circuit can be categorized into two forms [17]: differential-mode radiation and common-mode radiation. Differential mode radiation occurs in current loop, which is related to on-chip wiring, the package and enclosure, and the board wiring among other things. In contrast, common-mode radiation is the result of voltage level difference. Thus, common-mode radiation is closely related to the signals on data and clock wires. EMI-aware clock optimization is a clock network synthesis method with the objective of reducing EM emission by controlling clock parameters. The key clock parameters are slew rate and clock skew. Slew rate is rise/fall time in clock signal which has typically trapezoidal waveform. With Spectral analysis, the clock signal can be represented in the sum of series of sine and cosine functions. The amplitude of high frequency functions critically depends on the speed of slew rate [18]. In other words, a clock signal with fast slew rate has larger amplitude in high frequencies than the other with the slower slew rate. Previous works demonstrate that in order to reduce total EMI it is very effective to decrease the slew rate of clock signal [18]–[20]. Pandini *et al.* [19] proposed - 143 - ISOCC 2011 a method to decrease slew rate by manually removing buffers and inverters with higher driving strength from the target library. Hu et al. [20] proposed a method of weakening buffer driving strength by setting the constraint of minimum slew rate during clock tree synthesis. They proposed an incremental dynamic programming algorithm to determine buffer sizing and positioning for a clock tree, considering metrics such as skew and power used in [19]. EM emission is also primarily generated by Simultaneous Switching Noise (SSN). In clock network, sink buffers and connected flip-flops toggle almost at the same time, creating a large current pulse on the power network, and fast current variation increases di/dt, causing a high SSN. Note that peak current can be controlled by spreading the activity of buffers and flip-flops within a time period, but it causes clock skew variation. Vuillod et al. [21] proposed a solution to reducing peak current by relaxing clock skew constraint. Pandini et al. [19] performed a theoretical analysis on power/ground noise by describing noise using periodic triangular pulse, and demonstrated that increasing skew bound is effective in reducing EM emission. ## D. Adjustable Delay Buffer (ADB) Allocation Traditional clock optimization technique is mainly based on the environment in which all elements in a circuit use one fixed operating voltage. However, recent technology trend requires multiple supply voltages to allow the voltage level applied to a circuit to be dynamically changed. In this multi-voltage mode design, some part of circuit could operate at high voltage when the associated module is required to complete its processing quickly and at low voltage when timing requirement can be relaxed and reducing power is more important. When the supplied voltage switches from high to low (or low to high), delay of all logic elements including buffers in clock network on the part of chip also varies. One serious problem in this multi-voltage mode design is the clock skew variation on clock network. To tackle the clock skew variation problem, Su *et al.* [22] proposed a methodology of dealing with clock skew optimization in which they proposed to use adjustable delay buffer (ADB), which is a specially designed buffer so that its delay can be controlled dynamically. The general structure of ADB is shown in Fig. 3. It is composed of a normal buffer, internal capacitor bank, and capacitor bank controller. The delay of ADB could be adjusted by activating the internal capacitor bank. That is, the adjusted delay of ADB is determined by the amount of activated capacitors in the capacitor bank that is controlled by the capacitor bank controller and its control input. The problem to be solved in using ADBs in multi-voltage mode design is to minimize the cost of ADBs to be used since an ADB has more transistors than normal buffer due to the internal capacitor bank and controller. The approach in [22] focused the power mode with the worst violation of the clock skew constraint and resolved the worst clock skew by adding ADBs in a greedy manner. The approach then repeated until there is no clock skew violation for every power mode. Fig. 3. Structure of Adjustable Delay Buffer [23]. While the approach can efficiently find an ADB allocation with no clock skew violation, an optimal ADB allocation is not guaranteed due to the inherent limitation induced by the iterative heuristic. Recently, Lim *et al.* [24] improved the work in [22] by proposing a linear-time optimal algorithm for the ADB allocation and delay value assignment. Initially, they replaced all buffers in the clock tree with ADBs and then removed unnecessary ADBs through a comprehensive analysis of the timing relation between ADBs. They performed the analysis in bottom-up fashion in the clock tree and showed that the ADB allocation is a polynomial-time optimal, bounded by $O(N \cdot K)$ where N and K is the number of buffers in clock tree and the number of power modes, respectively. ### E. Noise Aware FSM State Encoding Since finite state machine (FSM) is a synchronous component controlled by the clock signal of circuit, it also draws the peak current on the power line $(V_{DD})$ and ground line $(V_{SS})$ of the circuit when state transition occurs. Thus, it is also important to reduce the number of FSM flip-flops in the state register that switch simultaneously. Peak current in state encoding can be explained using the example given in Fig. 4, which shows state transition graphs (STGs) of a finite state machine. The maximum switching on the left FSM (STG<sub>1</sub>) occurs when state changes from $S_0$ to $S_4$ (three 0-to-1 transitions). Meanwhile, the maximum switching on the right FSM (STG<sub>2</sub>) occurs when state changes from $S_3$ to $S_0$ (two 1-to-0 transitions). This example shows that a proper state encoding may reduce the circuit noise. Fig. 4. Two state transition graph (STGs) of FSM. Huang *et al.* [25] starts from the encoding result produced by the scheme in [26] which targets reducing total power (i.e., total switching activity) of an input STG. The idea of reducing peak switching among state transitions is to decide whether the - 144 - ISOCC 2011 value of each bit position in the previously encoded registers is to be complemented or not. For example, if the solution selects to complement bit positions 0 and 3, then all the values in bit positions 0 and 3 in state encoding are complemented. They formulated the problem of determining bit complementing/uncomplementing into an ILP, targeting to minimizing the larger of the maximum number of bit positions that switch from 0 to 1 among state transitions and the maximum number of bit positions that switch from 1 to 0. For n-state FSM which has m-bit code length, the approach explores design space as large as $2^m$ . Lee *et al.* [27] improved Huang's work by considering the trade-off between power and noise, in which they minimized peak switching, followed by minimizing total switching (i.e., minimizing total dynamic power consumption). They formulated the peak switching minimization problem into a satisfiability (SAT) problem with Pseudo-Boolean (PB) expressions, and solved it by using PB-Solver [28]. Then, they iteratively updated the PB expression to extract a state transition which results in the highest switching, running PB-Solver with the additional CNF constraints. On the other hand, Gu et al. [29], similar to the work in [25], started from an encoded FSM with minimal total switching, but they identified a set of transitions (called working set S) that cause peak current and applied both state-replication and state-encoding to S. State-replication replicates a state to assign another code to each replicated state with proper generation of state transitions while state-encoding re-encodes the given code of a state to another unused code. With an input FSM with minimum total switching, by applying state-replication and state-encoding in a combined manner iteratively, they reduced the peak switching. ### III. CONCLUSION As the process technology scales down, the variation or sensitivity of noise and delay in clock network is getting worse and worse, which in fact causes a drastic impact on the circuit reliability. This paper reviewed several important clock related design problems and the existing techniques that are essential to the highly reliable circuit design. In addition to the introduced design optimization and synthesis issues, there exist other issues that are also important to tolerate or mitigate circuit noise. Those examples are 3D clock design considering TSV design variation and circuit reliability issue under powergated or clock-gated design environment. ## ACKNOWLEDGMENT This work was supported by Basic Science Research Program through National Research Foundation (NRF) grant (No.2010-0028711) and Global Frontier 2011 grant funded by Korea Ministry of Education, Science and Technology, and supported by MKE (Ministry of Knowledge Economy), Korea, under ITRC (Information Technology Research Center) support program supervised by NIPA (National IT Industry Promotion Agency) (NIPA-2011-C1090-1100-0010). ## REFERENCES A. Vittal, H. Ha, F. Brewer, and M. Marek-Sadowska, "Clock skew optimization for ground bounce control," in *ICCAD*, 1996, pp. 395–399. - [2] L. Benini, P. Vuillod, A. Bogliolo, and G. D. Micheli, "Clock skew optimization for peak current reduction," J. VLSI Signal Process. Syst., vol. 16, pp. 117–130, July 1997. - [3] Y.-T. Nieh, S.-H. Huang, and S.-Y. Hsu, "Minimizing peak current via opposite-phase clock tree," in DAC, 2005, pp. 182–185. - [4] R. Samanta, G. Venkataraman, and J. Hu, "Clock buffer polarity assignment for power noise reduction," in *ICCAD*, 2006, pp. 558–562. - [5] J. Lu and B. Taskin, "Clock buffer polarity assignment considering capacitive load," in ISQED, 2010, pp. 765–770. - [6] K.-H. H. Po-Yuan Chen and T. Hwang, "Skew aware polarity assignment in clock tree," in *ICCAD*, 2007, pp. 376–379. - [7] Chen, Po-Yuan, Ho, Kuan-Hsien, Hwang, and Tingting, "Skew-aware polarity assignment in clock tree," ACM Trans. Des. Autom. Electron. Syst., vol. 14, pp. 31:1–31:17, April 2009. - [8] H. Jang, D. Joo, and T. Kim, "Buffer sizing and polarity assignment in clock tree synthesis for power/ground noise minimization," *IEEE Trans.* on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 1, pp. 96–109, January 2011. - [9] D. Joo and T. Kim, "Wavemin: a fine-grained clock buffer polarity assignment combined with buffer sizing," in DAC, 2011. - [10] M. Kang and T. Kim, "Clock buffer polarity assignment considering the effect of delay variations," in ISQED, 2010, pp. 69–74. - [11] Y. Ryu and T. Kim, "Clock buffer polarity assignment combined with clock tree generation for power/ground noise minimization," in *ICCAD*, 2008, pp. 416–419. - [12] J. Lu and B. Taskin, "Clock tree synthesis with xor gates for polarity assignment," in ISVLSI, July 2010, pp. 17–22. - [13] Y. Liu, S. Nassif, L. Pileggi, and A. Strojw, "Impact of interconnect variations on the clock skew of a gigahertz microprocessor," in *DAC*, July 2000, pp. 168–172. - [14] G. Venkataraman, Z. Feng, J. Hu, and P. Li, "Combinatorial algorithms for fast clock mesh optimization," *IEEE Trans. on Very Large Scale Integration (VLSI) Systems*, vol. 18, pp. 131–141, January 2010. - [15] K. Jain, "A factor 2 approximation algorithm for the generalized steiner network problem," *Combinatorica*, vol. 21, pp. 448–457, January 2001. - [16] A. Rajaram and D. Z. Pan, "Meshworks: A comprehensive framework for optimized clock mesh network synthesis," *IEEE Journal on Tech*nology in Computer Aided Design, vol. 29, pp. 1945–1958, December 2010. - [17] H. W. Ott, Noise reduction techniques in electronic systems, 2nd ed., R. M. Osgood, Jr., Ed. Wiley & Sons, 1988. - [18] K.Harrdin, J. Fessler, D. Bush, and L. Inc, "Spread spectrum clock generation for the reduction of radiated missions," in *ISEMC*, August 1994, pp. 227–231. - [19] D.Pandini, G. Repetto, and V. Sinisi, "Clock distribution techniques for low-emi design," *Springer Lecture Notes in Computer Science*, vol. 4644, pp. 201–210, 2007. - [20] X.Hu and M. Guthaus, "Clock tree optimization for electromagnetic compatibility (emc)," in ASPDAC, January 2011, pp. 184–189. - [21] P.Vuillod, L. Benini, A. Bogliolo, and G. D. Micheli, "Clock skew optimization for peak current reduction," in *ISLPED*, August 1996, pp. 265–270. - [22] Y.-S. Su, W.-K. Hon, C.-C. Yang, C. Shih-Chieh, and Y.-J. Chang, "Value assignment of adjustable delay buffers for clock skew minimization in multi-voltage mode designs," in *ICCAD*, 2009, pp. 535–538. - [23] A. Kapoor, N. Jayakumar, and S. P. Khatri, "A novel clock distribution and dynamic de-skewing methodology," in *ICCAD*, 2004, pp. 626–631. - [24] K.-H. Lim and T. Kim, "An optimal algorithm for allocation, placement, and delay assignment of adjustable delay buffers for clock skew minimization in multi-voltage mode designs," in ASPDAC, 2011, pp. 503–508. - [25] S. Huang, C. Chang, and Y. Nieh., "State re-encoding for peak current minimization," in *ICCAD*, 2006, pp. 33–38. - [26] L. Benini and G. Micheli, "State assignment of low power dissipation," IEEE Journal of Solid-State Circuits, vol. 30, no. 3, pp. 258–268, 1995. - [27] Y. Lee, K. Choi, and T. Kim, "Sat-based state encoding for peak current minimization," in *ICCD*, 2009, pp. 432–435. - [28] F. Aloul, A. Ramani, I. Markov, and K. Sakallah, "Pb-solver: a backtrack-search pseudo-boolean solver and optimizer," in SAT, 2002, pp. 346–353. - [29] J. Gu, G. Qu, L. Yuan, and Q. Zhou, "Peak current reduction by simultaneous state replication and re-encoding," in *ICCAD*, 2010, pp. 592–595. - 145 - ISOCC 2011