# Managing Clock Skews in Clock Trees with Local Clock Skew Requirements Using Adjustable Delay Buffers

Deokjin Joo Taewhan Kim *jdj@snucad.snu.ac.kr tkim@snucad.snu.ac.kr* School of Electrical and Computer Engineering, Seoul National University, Seoul, Korea

Abstract—The problem of meeting the skew constraint in clock trees becomes much hard as the IC design paradigm has been shifting to multiple power supply mode design, in which the clock skew varies dynamically according to the voltage levels of the applied power modes during the execution. As a remedy to deal with the clock skew optimization problem of the designs with multiple power modes, which are now a mainstream for low power designs, many researches have focused on the utilization of adjustable delay buffers (ADBs), whose delay can be adjusted dynamically, and attempted to replace the fewest number of clock tree buffers with ADBs. However, none of the works have considered the local clock skew requirements in clock trees, and the clock trees are optimized pessimistically, resulting in excess ADB insertion. In this work, we propose a solution to the problem of ADB insertion to resolve the difference of local clock skews in clock trees. Through experiments with benchmark circuits, it is shown that our proposed solution is able to reduce the number of ADBs by 21% on average over that of the conventional local skew-unaware ADB insertion method for clock trees with multiple power modes.

Keywords: clock skew; multiple power modes; adjustable delay buffer; local clock skew; optimization.

## I. INTRODUCTION

Clock is a centralized signal that controls the synchronous data transfers across the chip. That is, all of the synchronous elements on a chip such as flip-flops (FFs) sample their input data at the rise/fall of the clock signal. The locations that require the clock signal are generally called *clock sinks*, and *clock trees* are one of the typical network structures that are designed to distribute the clock signal to every clock sink. Clock trees exhibit *clock skew*, which is defined as the time difference between the latest and the earliest clock signal arrival times at the clock sinks. It is desirable to keep the clock skew as small as possible, as the large clock skew inflicts a direct performance penalty on chips. Hence, controlling or optimizing the clock skew has been the primary concern in clock related researches.

Extensive clock tree optimization researches have been done, regarding clock routing, clock buffer insertion/sizing to control or minimize the clock skew [1]–[5]. However, for multiple power mode designs which are now a mainstream for low power designs, the supply voltages to the chip modules vary dynamically according to the applied power modes, making clock skew control problematic. Recently, a number of works have proposed to replace some of the buffers in the clock trees with Adjustable Delay Buffers (ADBs), which are buffers whose delay can be adjusted dynamically through its delay control inputs, to cope with the delay variations [6]–[10]. In [9], Kim, Joo, and Kim proposed an  $O(n \log n)$ -time algorithm



Fig. 1. A design example with a hierarchical clock tree that contains two clock subtrees, in which the left clock subtree synchronizes the three submodules (colored boxes) in *Module-1* at multiple voltage levels and the right clock subtree synchronizes the two modules in *Module-2*. Suppose that *Module-1* and *Module-2* have different performance requirements, requiring clock skew bounds of 30ps and 40ps, respectively. In addition, suppose that the overall system should meet the global clock skew bound of 50ps. The previous ADB insertion algorithms optimize the clock tree with the tightest skew bound, which is 30ps, leading to pessimistic insertions of ADBs.

which inserts a minimum number of ADBs to clock tree while meeting a single skew bound. On the other hand, *local clock skew* is becoming important in clock network synthesis. In ISPD 2010 high performance clock network synthesis contest, they put emphasis on local clock skew control [11]. However, to our knowledge, no ADB insertion methods have considered the local clock skew requirements in a clock tree that may vary across the modules on a chip.

Fig. 1 shows a design example of a clock tree with local clock skew requirements in which it contains two clock subtrees: the left one synchronizes the three submodules (colored boxes) in the region of *Module-1* while the right one synchronizes the two submodules in Module-2. The different colors in the boxes represent the different supply voltages applied. Module-1 and Module-2 have different performance requirements such that the clock skew bounds for them are 30ps and 40ps, respectively. Moreover, there is a global clock skew requirement that ensures the correct operation of the whole chip. All of the previous ADB insertion algorithms invariably embed ADBs according to the tightest skew requirement, which is 30ps in the example in Fig. 1, resulting in inserting more ADBs than necessary. In this paper, we put emphasis on local clock skew requirements and embed ADBs more optimistically to minimize the number of ADBs inserted.

### II. ADB INSERTION WITH MULTIPLE CLOCK SUBTREES

In this work, for the single bound algorithm, we employ the best known algorithm proposed in [9], which we refer to as ADB-SB (ADB insertion for designs with single clock skew bound). The problem of inserting ADBs according to multiple clock skew bound requirements, which we refer to as

#### TABLE I

| EXPERIMENTAL RESULTS OF ADB-SB [9] AND OUR ADB-MB. ISPD 2009 CLOCK NETWORK SYNTHESIS CONTEST BENCHMARK CIRCUITS F11, F22        |
|---------------------------------------------------------------------------------------------------------------------------------|
| WERE PARTITIONED INTO TWO MODULES AND F31 AND F34 WERE PARTITIONED INTO 4 MODULES. CLOCK SKEW BOUND FOR MODULE 3 (IF IT EXISTS) |
| is set to have the same clock skew bound as module 1, which is notated in the column as skew bound for "modules 1,3", and       |
| "modules 2,4" has similar meaning. Since ADB-SB is unaware of multiple clock skews, it was given the tightest skew bound so     |
| THAT ALL THE CLOCK SKEW BOUND REQUIREMENTS BE MET.                                                                              |

| Circuit | #Sinks | #Buffers | #Modules | Skew bound |             |             | #ADBs      |        | Paduation (%) |
|---------|--------|----------|----------|------------|-------------|-------------|------------|--------|---------------|
|         |        |          |          | Global     | modules 1,3 | modules 2,4 | ADB-SB [9] | ADB-MB |               |
| f11     | 121    | 165      | 2        | 30         | 40          | 50          | 8          | 8      | 0.00          |
|         |        |          |          | 40         | 30          | 50          |            | 7      | 12.50         |
|         |        |          |          | 50         | 30          | 40          |            | 4      | 50.00         |
| f22     | 91     | 97       | 2        | 30         | 40          | 50          | 3          | 3      | 0.00          |
|         |        |          |          | 40         | 30          | 50          |            | 3      | 0.00          |
|         |        |          |          | 50         | 30          | 40          |            | 0      | 100.00        |
| f31     | 273    | 328      | 4        | 30         | 40          | 50          | 15         | 15     | 0.00          |
|         |        |          |          | 40         | 30          | 50          |            | 11     | 26.67         |
|         |        |          |          | 50         | 30          | 40          |            | 11     | 26.67         |
| f34     | 157    | 210      | 4        | 30         | 40          | 50          | 7          | 7      | 0.00          |
|         |        |          |          | 40         | 30          | 50          |            | 6      | 14.29         |
|         |        |          |          | 50         | 30          | 40          |            | 5      | 28.57         |
| Average |        |          |          |            |             |             |            |        | 21.56         |

ADB-MB (ADB insertion for designs with multiple clock skew bounds), can be divided into two subproblems: (subproblem-1) satisfying the local clock skew bounds and (subproblem-2) optimizing for global clock skew bounds to integrate subtrees while the number of ADBs at the global level are minimized.

Without loss of generality<sup>1</sup>, consider a clock tree with global and local clock skew constraints, such as the one shown in Fig. 1. subproblem-1: Based on the fact that module-i must satisfy both the global and local clock skew constraints  $B_a$ and  $B_i$  simultaneously, (1) If  $B_i \leq B_q$ , ADB-SB is applied to the clock subtree of module-*i* first with  $B_i$ .  $B_g$  is handled afterwards by solving subproblem-2. (2) When  $B_i > B_q$ , ADB-SB is applied using  $B_q$ . After the application of ADB-SB to the clock subtree of module-*i*,  $B_g$  is met while  $B_i$  is trivially satisfied. subproblem-2: (3) Each subtree is replaced into a *fully reduced timing trees* proposed in [8]: the subtree below the subtree root is reduced into two clock sinks, where one sink has the arrival time as the earliest arrival time of the clock subtree and the other sink having the latest arrival time. (4) Since the clock tree is without hierarchy at this point, subproblem-2 can be trivially solved with ADB-SB.

#### **III. EXPERIMENTAL RESULTS**

Table I summarizes the optimization results by ADB-MB, compared with that by ADB-SB [9], which inserts ADB minimally with single skew constraint. The clock trees were synthesized for ISPD 2009 clock network synthesis contest benchmarks using the algorithm in [12]. The experimental results show that ADB-MB can achieve 21% ADB count reduction on average compared to that of ADB-SB. This is due to the fact that the conventional methods (e.g., ADB-SB) insert ADBs pessimistically according to the tightest clock skew bound.

## **IV. CONCLUSION**

This work proposed a solution to the problem of ADB insertion to resolve the difference of local clock skews in low power

<sup>1</sup>The designer may define multiple levels of requirements as s/he needs, as long as the requirements are hierarchical.

clock trees in which the clock skews vary at execution time as their applied power mode changes. Through experiments with ISPD 2009 benchmark circuits, it was shown that our proposed solution was able to reduce the number of ADBs by 21% on average over that of the best known conventional ADB insertion method. It should be noted that for hierarchical clock trees with timing mismatches among local clock skews, our proposed solution could be used to effectively resolve the timing mismatches using reduced number of ADBs.

Acknowledgment: This research was supported by the ITRC program of ITTP by MSIP (IITP-2015-H8501-15-1005) in Korea, and the NRF grant funded by the MSIP (2015R1A2A2A01004178) and the Brain Korea 21 Plus Project in 2015.

#### REFERENCES

- [1] C. J. Alpert, A. Devgan, and S. T. Quay, "Buffer insertion with accurate gate and interconnect delay computation," in *DAC*, 1999. [2] J. Cong, C. Koh, and K. Leung, "Simultaneous buffer and wire sizing for
- performance and power optimization," in ISLPED, 1996.
- [3] C. C. N. Chu and M. D. F. Wong, "An efficient and optimal algorithm for simultaneous buffer and wire sizing," IEEE TCAD, 1999.
- [4] T. Okamoto and J. Cong, "Buffered steiner tree construction with wire sizing for interconnect layout optimization," in ICCAD, 1996.
- [5] K. Wang, Y. Ran, H. Jiang, and M. Marek-Sadowska, "General skew constrained clock network sizing based on sequential linear programming," IEEE TCAD, 2005.
- [6] Y.-S. Su, W.-K. Hon, C.-C. Yang, S.-C. Chang, and Y.-J. Chang, "Clock skew minimization in multi-voltage mode designs using adjustable delay buffers," IEEE TCAD, 2010.
- [7] K.-Y. Lin, H.-T. Lin, and T.-Y. Ho, "An efficient algorithm of adjustable delay buffer insertion for clock skew minimization in multiple dynamic supply voltage designs," in ASPDAC, 2011.
- [8] K.-H. Lim, D. Joo, and T. Kim, "An optimal allocation algorithm of adjustable delay buffers and practical extensions for clock skew optimization in multiple power mode designs," IEEE TCAD, 2013.
- [9] J. Kim, D. Joo, and T. Kim, "An optimal algorithm of adjustable delay buffer insertion for solving clock skew variation problem," in DAC, 2013.
- [10] K. Park, G. Kim, and T. Kim, "Mixed allocation of adjustable delay buffers combined with buffer sizing in clock tree synthesis of multiple power mode designs," in DATE, 2014.
- [11] C. N. Sze, "ISPD 2010 high performance clock network synthesis contest: benchmark suite and results," in Proc. ISPD, 2010.
- [12] T.-Y. Kim and T. Kim, "Clock tree synthesis for TSV-based 3D IC designs," ACM TODAES, vol. 16, no. 4, pp. 48:1-48:21, Oct. 2011.