Electronic Design Automation
- Clock network synthesis
- Low power design: Power, clock gating
- Reliable and Variation-aware design
Click here for the complete list of topics.
- C. E. Stroud, L.-T. Wang, and Y.-W. Chang, "Introduction," in Electronic Design Automation: Synthesis, Verification, and Testing (L.T. Wang, Y.-W. Chang, and K.-T. Cheng, Editors), Elsevier/Morgan Kaufmann, 2009.
- Placement and routing results are performed by Synopsys IC Compiler. Input benchmark circuit is b15 in ITC99
Clock network synthesis
Clock networks: In synchronous digital circuits, all memory elements, which are called flip-flops (FFs), sample their input data at the rising (or falling) edge of clock signal. The circuits that distribute this signal are called clock networks. The figure on the right hand side is an illustration of a clock tree, which is a kind of clock network. It can be seen in the figure that the clock tree distributes the clock signal to ALL clock sinks (FFs) in the system.
- The clock signal must keep switching when the system is powered on, as all FFs depend on it. This makes clock trees one of the most active circuits on a chip.
- It is desirable that the clock trees deliver clock signals with minimum clock signal arrival time differences for enabling high-speed operation of a chip.
- The difference of the latest and the earliest clock signal arrival time is called clock skew.
- Noise minimization
- Resonant clock synthesis
- Adjustable Delay Buffer (ADB) insertion algorithm
- and more... Click here for full list of topics.
Clock tree generation with noise minimization: While it is desirable for clock trees to have minimum clock skew as possible, this means that the clock buffers that amplify the clock signal (shown as triangles in the figures) switch simultaneously, which generates Simultaneous Switching Noise (SSN). This can cause delay variations in the circuits which may cause a chip to fail. To mitigate this problem, clock polarity assignment technique can be used. To assign polarity to a clock tree means to have some part of the clock tree have opposite phase clock signal (figure on the left). By replacing clock buffers to inverters, and replacing the affected flip-flops to opposite edge triggered ones, simultaneous switching problem is mitigated. This is due to the fact that buffers and inverters generate noise at different edge of the clock signal (figure on the right). However, mixing inverters in clock network induces clock skew. We solved the problem by proposing a method that trade-off clock skew and SSN.
Resonant clock synthesis: Excessive switching activities of clock network consume large dynamic power of an electronic system. Especially, due to large amount of wire resources, the power consumption of the clock mesh structure (which is an alternative clock network to clock trees) is not acceptable. Resonant clocking technique uses LC resonance to store energy on an inductor as magnetic energy and use it (the rightmost figure). We minimized the number of allocated LC tanks under clock skew bound.
Adjustable Delay Buffer Insertion: In multiple power mode design, the supply voltage of a system varies depending on modes so clock arrival time on clock sinks also changes dynamically. Adjustable delay buffer (ADB) is used to control delay for multiple power modes. However, the large area/control overhead of ADB makes the ADB insertion problem harder. We proposed algorithms that optimize the number of allocated ADB under clock skew bound.
Low power design methodology
- Power gating
- Clock gating
- Multi-bit Flip-flop optimization
Power gating: Researches on power gating methodology to reduce the leakage power of the chip efficiently for low power systems such as mobile or edge devices are conducted as follows; Power gating and retention strategy and implementation of power-gated designs, optimization/minimization of the number of power switch cells under IR-drop constraints, and the number of retention registers by developing clustering algorithms.
Clock gating: Clock gating is a popular technique for reducing the switching activity on the clock network, but additional structures cause area overhead. To solve this problem, a special Flip-flop for clock gating is re-designed, and then an algorithm that considering placement and timing information at the same time is proposed. In terms of chip level, the proposed method can obtain optimized PPA (power, performance, area) results compared to conventional methods. To get better results, incremental changes to HLS via ECO to reduce the number of toggling is now considered.
Multi-bit flip-flop optimization: Allocating multi-bit flip-flops (MBFFs), or merging near flip-flops in geometrical or logical view has been recognized as one of pretty effective design optimization techmuques to reduce the clock network power. By grouping two or more flip-flops (FFs) into the single MBFF, designers can reuse common clock signal driving logics (2 inverters in the above figure) rather than placing the same logics for each FF. Our research topics for MBFF includes (1) "Loosely coupled" structure which does not requires that grouped FFs should be placed next of, (2) Advanced MBFF grouping and allocation methodology, and more.
Power gating-aware HLS and logic design: In power gating technology which removes leakage, the three important design parameters are the wakeup current (from sleep to active mode transition time), the amount of current flowing to ground when the sleep transistors are turned on, and the sleep transistor overhead. Note that reducing the wakeup time affects the overall performance of the circuit, and reducing the peak current flowing to ground when the sleep transistors are turned on mitigates the noise on the power distributed network. Furthermore, reducing the number of sleep transistors saves the design area and also reduces the design complexity. Clearly, there are trade offs between the number of sleep transistors used, the peak current flowing to ground, and the transition time from sleep to active mode. We optimized the number of sleep transistors by developing several clustering algorithms (figure on the left side). Efficient clustering algorithm reduces the wakeup time while satisfying peak current constraints (graph on the right side).
- Design-Technology Co-Optimization
- Variation-aware design
- Hardware performance monitor: SRAM, Logic
- Enabling near-threshold operation voltage
Design-Technology Co-Optimization: The process from the development of a new semiconductor process technology to the development of mass production products is traditionally a sequential process. Consequently, a considerable amount of effort and time is required to resolve the early-stage problems in a late stage. To overcome the limitation of the sequential process, DTCO (Design-Technology Co-Optimization) is needed. DTCO between the process and design rules can bring better solutions in terms of both chip size and yield. For achieving the goal, a fast and accurate DR evaluation and exploration should be done.