- 8.1 Global Floorplanning of Hierarchical Units
- 8.2 Parasitic Interconnect Estimation
- 8.3 Cell Placement
- 8.4 Clock Tree Local Buffer Placement
- 8.5 Summary
- Further Research
8.4 Clock Tree Local Buffer Placement
A key aspect of the placement flow is the special consideration to be given to the clock buffers in the netlist, typically added by the CTS step in the synthesis flow (see Section 7.9). The CTS algorithm attempts to balance the (estimated) loading on the branches of the clock tree in the network, whether originating from a single clock pin or connecting to a global clock grid. During cell placement, the common algorithmic approach is to select clusters of flops in close proximity and place a clock buffer in the final branch of the tree within the area spanned by the flop cluster. Once all clock tree endpoints are placed, a similar approach selects clusters of clock buffers and places a buffer from the preceding level of the tree appropriately; this process iterates recursively to the root level of the clock tree. The clock buffer placement algorithm results in output netlist updates, as the (logically equivalent) sinks at each level of the tree may be swapped during the clustering phase of the placement algorithm. The introduction of clock gating to the CTS tree implies that the cells at each level of the tree are not necessarily logically equivalent; clustering of placed sinks needs to observe gated clk_enable functionality.
For block placement with preplaced hard IP macros, the related clock buffers may also be preplaced accordingly. For relative placement groups, clock buffers may be included in the group definition. An increasing design trend is to offer multi-bit registers as an atomic cell library offering to minimize the clock routing and loading among bits. These registers are also likely to be part of relative placement groups with clock buffers (and decoupling capacitance cells).
During block routing, the attention to clock signals focuses on balancing the arrival latency at endpoints, primarily through R*C interconnect segment allocation. Performance optimization features in the routing flow may result in changes to the drive strength of logic path cells and flops; clock buffer tree cells may likewise need to receive drive strength updates in routing. For drive strength increases, any resulting cell area overlaps to the placement output locations need to be (incrementally) resolved during routing.