Physical Design Flow for Faster TAT in Lower Technology Nodes

Sai Pranav G; Sujatha Hiremath

doi:10.17577/IJERTV11IS060299

Volume 11, Issue 06 (June 2022)

Physical Design Flow for Faster TAT in Lower Technology Nodes

DOI : 10.17577/IJERTV11IS060299

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 407
Authors : Sai Pranav G , Sujatha Hiremath
Paper ID : IJERTV11IS060299
Volume & Issue : Volume 11, Issue 06 (June 2022)
Published (First Online): 30-06-2022
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Physical Design Flow for Faster TAT in Lower Technology Nodes

Sai Pranav G

Department of Electronics & Communication Engineering, RV College of Engineering Bengaluru 560059, Karnataka, India

Sujatha Hiremath

Assistant Professor

Department of Electronics & Communication Engineering, RV College of Engineering Bengaluru 560059, Karnataka, India

AbstractPhysical design is a set of design steps that converts the logical gate-level netlist to a physical layout that can be fabricated. It involves steps like, floorplanning, power planning, placement, clock tree synthesis and routing and ends with physical verification. The number of devices in an SoC is increasing with the enablement of lower technology nodes (like 5nm, 4nm). The devices in 7nm and below are fin-fet devices and these fin-fet devices bring many challenges like substrate connection, power supply, signal integrity of clock nets and congestion. Also, the modern SoCs are signed-off in various multi-mode multi-corner (MMMC) timing scenarios. Thus, physical implementation of the modern-day designs is compute- intensive and requires many trials for the sign-off. The design is partitioned into multiple blocks for the physical implementation. This paper discusses various block-level implementation strategies at each stage of the physical design to achieve faster turnaround time (TAT) and optimal power, performance and area (PPA) of the design block.

KeywordsFloorplanning; power planning; placemenet; clock tree synthesis; routing.

INTRODUCTION

ASIC design flow is a set of steps that takes a logical level design into a physically working chip. ASIC design flow is silicon aware design process that is widely used to produce the various semiconductor chips. The ASIC flow starts with specifications where the target design requirements, operating parameters, constraints and models of the design will be developed. This design is coded using hardware description languages (HDLs) (like Verilog, vhdl and system verilog). The HDL produces a register transfer level (RTL) of the design. This RTL is verified for its functionality by various verification techniques and down streamed for synthesis. The synthesis step converts the RTL to a gate-level netlist. It is the step where the design is mapped to the technology-based cells. On the gate- level netlist, the various design for testability (DFT) logic is inserted to enable the testing of the chips after its fabrication. The netlist obtained is partitioned into the blocks for the physical implementation. After the partitioning, the physical design flow is executed. The physical design is a set of steps that convert the netlist to a layout. The physical design of the block starts with floorplanning where the given area to the block is planned for the various steps in the physical design. The floorplanning steps follow the power planning, where the global distribution of VDD and VSS for each cell in the design is planned. The placement and routing steps places and route all the standard cells (instances) of the design. Finally, the layout is verified for physical violations and fixed. These steps of physical design become complex for industrial designs and thus

a proper and mature method is needed for a faster turnaround of the design.

The next section discusses the physical design flow for the block implementation that is suitable for lower technology nodes (like 7nm, 5nm and 4nm). Section III describes the various optimization strategies at each stage of the flow that reduces the computational cost and delivers a quality PPA metric holding design. Section IV outlines various results by using the techniques described at each stage of the physical design flow. Section V concludes the work with its effectiveness and implications of using these methods at lower technology nodes.
PHYSICAL DESIGN FLOW

The physical design flow (fig 1) consists of steps like floorplanning, power planning, placement, clock tree synthesis and routing. The various inputs that are imported to perform physical design are : design (.v), constraints (.sdc), operating conditions/ modes (MMMC), technology (.lef) and libraries(.lib).

Fig. 1. Physical Design flow
1. Floorplanning
  
  Floorplanning is the first step in the physical design where the planning is done for the placement of various cells in the given specified area. The quality of the floorplan decides the design convergence on the downstream. The floorplan requires a design netlist, area requirements, power requirements, timing constraints, macro placement information and I/O details to start the plan. The floorplan consists of the core and the die. The die is the allocated area to the block and the core is the area in which the logic is placed. The clearance between the core and
  
  die is a technology-dependent parameter and it is given at all the sides of the block as shown in fig 2: core to right, core to bottom, core to left and core to top. The clearance is given to place I/O pins and to create power rings for the power distribution.[1] The core area has multiple rows created, the height of these rows depends on the technology and is specified in the .lef file of the technology.
  
  Fig. 2. Core-to-die clearance and cell rows
2. Power Planning
All the cells in the design need VDD and VSS signals for their operation. Thus, a global strategy is required to supply the power to all the cells. This is done by creating the power distribution network (PDN). The PDN consists of the power ring, power stripes and special route. These three structures on the chip will distribute the power to the standard cells, memories and macro-blocks present in the design.

Modern designs are multi-metal layer designs and may consist of up to 14 metal layers. In multi-layer designs, the width (cross-section) increases from bottom to top. Preferentially, the top metal layers are used for creating power rings and stripes due to their lower resistance.[2] Then with the metal-via stack, the power is pulled down to lower metal layers to reach the standard cells.

Fig. 3 shows a typical power planning for a 12-metal layer process. The power ring is created in metal 12 and metal 11. The power stripes are created vertically in metal 12 on the care area and the metal via stack is dropped from the metal 12 to metal 1 to create special routes (one in each standard cell row for VDD and VSS).

a site that is defined in the LEF file of the target technology. Placement is driven mainly by the two targets timing optimization or congestion minimization. In modern EDA tools, placement is done as initial placement followed by placement optimization. In initial placement coarse placement is performed and in optimization, a more quality placement database is produced. [3,4]
Fig. 4. Placement

In placement, an early route analysis is to be done to verify the route feasibility of the design. The timing analysis is also carried out by available extractions (RCs) and with virtual route information and an ideal clock. The setup violations are of primary importance at this stage which gives the details of the paths that are failing to meet the timing constraints.
1. Clock Tree Synthesis
  
  Clock synchronization is one of the most important parameters of concern in high-performance SoCs. Ideally, the clock must arrive at all the leaf nodes precisely at the same time. Practically skew exits in the created clock. In lower technology node designs clock and its network consumes 40-50% of the chips power. The two main strategies for the design of the clock
  1. Locate all the clock inputs close together
  2. Drive clock from the same source and balance the network
The first idea is not suitable due to the physical limitation and diverse location of clock sinks. Thus, the second strategy is most often used. The clock tree is built by balancing the skew by buffering (fig 5). The buffers used are special clock buffers with higher drive strength and have equal rise and fall delays.[5]
Fig. 5. Balancing clock tree

C. Placement

Fig.3. Power Planning

Various balancing strategies are used like, traditional buffered CTS, multi-source CTS, clock mesh, standard H-tree and flexible H-tree.

E. Routing

Placement is the process of placing standard cells in standard cell rows in a floorplanned design. The cells will be placed in the rows by the placer (fig 4). A row is a multiple of

At the routing stage, the connections between the pins of the

standard cells are made using the metal layers (fig. 6). It is followed by the CTS optimization where the exact routing paths

are determined. This is done in two major steps: Global routing and Local routing. In global routing, the entire design is divided into smaller bins called G-cells and the routing is planned on the G-cells. In local routing, the router routes the exact track of the connection.

FIG. 6. ROUTING
STRATEGIES FOR FASTER TAT

The steps of placement, CTS and routing can become more complex for bigger designs with large instance count. These steps are compute-intensive and require a lot of resources (CPU and memory). All the PnR steps (placement, CTS and Routing) have optimization steps to deliver a quality QoR (quality of run). Here, three steps: initial IR analysis, placement and optimization interleaving and scan-chain reordering methods are described that help in achieving a faster turnaround of the design.

A. Initial IR drop analysis

IR drop is the voltage drop in the metal wires of the power routes that constitute the power grid before it reaches the power pins of the standard cells. Severe IR drop in the design can lead to the improper operation of the standard cell. Thus, the regions of higher IR drop are of concern when the power grid is designed. To analyze the IR drop of the design, an initial rail analysis is to be run. The rail analysis reports the regions in the design with higher IR drop with the color map (fig. 7). The severe IR drop regions (red regions) can be fixed by increasing the density of the stripes at that region.

netlist, remapping logic, swapping pins, deleting buffers, moving instances, applying useful skew, layer optimization and track optimization.

In the traditional placement, the initial placement is done and optimization follows it. This is very expensive on computational time and memory. A better approach to run optimization is by interleaving the placement and optimization steps (fig. 8). This is not only optimization aware but also provides a design with better timing, congestion and density.

Fig. 8. Placement Optimization interleaving

C. Scan-chain reordering

Scan chains are inserted as a part of the DFT logic to enable the testing of chips after manufacturing. The primary purpose of these chains is to propagate faults from the origin to the required node. The chains can be altered in their order to improve the performance. Scan reordering is the process of reordering scan chains to save routing resources. [6]
Fig. 9. Scan-chain reordering

The fig. 9 shows a typical three-element scan flop with the chain order 1 2 3. But it consumes more route length because it has to detour from flop 3 to connect to flop 2 to follow the scan-chain order. This results in longer route length, and consequently more timing path. This can be optimized by reordering the scan chains. If the ordering is 2 3 1, the interconnect length decreases and thus also reduces the chance of timing violations. This change in the order is updated in the scan-def of the design.

Fig. 7. IR drop analysis

B. Placement and optimization interleaving

The placement of standard cells happens as a coarse placement followed by an optimization step. Optimization is the process of iterating through a design such that it meets timing, area, and power specifications. Depending on the stage of the design, optimization can include the following operations: Adding buffers, resizing gates, restructuring the

RESULTS

The techniques described in section III are applied to industry designs and tested for enhancements in the run-time, memory usage and PPA metrics.

IR-drop by rail analysis

The rail analysis is run after the creation of the power distribution network. The rail analysis reports the IR drops in the design. This is reported as regions in the report (table I) and also displayed as the color map.

TABLE I. IR DROP REGIONS

TABLE III. SCAN REORDERING QOR

Region	XY Location (in um)	IR drop composition
1	45.000 45.000 444.000 246.000	0.146(V)
2	260.000 241.800 585.000 545.000	0.130(V)
3	426.000 45.000 585.000 320.200	0.065(V)

Metric	No reordering	Re-ordering
Setup Slack	-2.457 ns	-2.419 ns
Hold Slack	-0.003 ns	-0.0021 ns

The three hotspots of IR drop are fixed by increasing the density of the power stripes. The standard cells function properly as the weak power nets are now been driven by multiple and strong power connections.

Placement optimization interleaving

The interleaving of placement and optimization provides better results with the optimum uses of resources. This strategy is implemented on the medium complex industrial designs and compared with the traditional flow (table II).

TABLE II. PLACE OPT INTERLEAVING

Scenario

CPU Time

Peak memory (Including subprocesses)

Traditional approach

1 Hrs 23 Min 33 Secs

13.064 GB

Place Opt Interleaving

0 Hrs 42 Min 58 Secs

8.476 GB

To the medium complex block, a significant improvement in resource usage is observed. On carrying out various passes of the place-opt, the design converges and the congestion, timing and density improve.
Scan-chain reordering

To The scan-chain reordering is implemented on the design on a block with 223 scan-chains and the reordering is performed on these chains. Out of them, 53 are deep scan chains with large chain lengths. The reordering is to be enabled before the placement step so the place engine can reorder at the place stage. The improvements in the QoR are dumped in table III. The timing improvement is observed with the reordering of the scan chains.

CONCLUSION

After the application of techniques like initial rail analysis, place optimization interleaving and scan-reordering the design turnaround time is significantly decreased in the block-level implementation. The rail analysis provides the quality of the power grid which provides a way to debug IR drop issues that leads to the in-operation of the standard cell. These techniques provide more optimization for lower technology nodes. The design turnaround time(tat) can be reduced by identifying the quality of these runs with less number of iterations.

REFERENCES

[1] Li Li, Yuchun Ma, Ning Xu, Yu Wang and Xianlong Hong, "Floorplan and Power/Ground network co-design using guided incremental floorplanning," 2009 IEEE th International Conference on ASIC, 2009, pp. 747-750, doi: 10.1109/ASICON.2009.5351313.

[2] M. Zhao, R. V. Panda, S. S. Sapatnekar and D. Blaauw, "Hierarchical analysis of power distribution networks," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 21, no. 2, pp. 159-168, Feb. 2002, doi: 10.1109/43.980256

[3] I. Tseng, Z. C. Lee, V. Tripathi, C. M. Tommy Yip, Z. Chen and J. Ong, "A System for Standard Cell Routability Checking and Placement Routability Improvements," 2019 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), 2019, pp. 125-128, doi: 10.1109/APCCAS47518.2019.8953119.

[4] P. H. Madden, "Reporting of standard cell placement results," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 21, no. 2, pp. 240-247, Feb. 2002, doi: 10.1109/43.980262.

[5] S. Roy, P. M. Mattheakis, L. Masse-Navette and D. Z. Pan, "Clock Tree Resynthesis for Multi-Corner Multi-Mode Timing Closure," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 34, no. 4, pp. 589-602, April 2015, doi: 10.1109/TCAD.2015.2394310.

[6] M. Hirech, J. Beausang and Xinli Gu, "A new approach to scan chain reordering using physical design information," Proceedings International Test Conference 1998 (IEEE Cat. No.98CH36270), 1998, pp. 348-355, doi: 10.1109/TEST.1998.743173.

Scenario	CPU Time	Peak memory (Including subprocesses)
Traditional approach	1 Hrs 23 Min 33 Secs	13.064 GB
Place Opt Interleaving	0 Hrs 42 Min 58 Secs	8.476 GB

Physical Design Flow for Faster TAT in Lower Technology Nodes

Leave a Reply