18.06.2013 Views

Microsoft PowerPoint - 3.5presentation.ppt - Cadence Design Systems

Microsoft PowerPoint - 3.5presentation.ppt - Cadence Design Systems

Microsoft PowerPoint - 3.5presentation.ppt - Cadence Design Systems

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

INVENTIVE<br />

Synthesis Strategies for<br />

better QoR using RC<br />

Nandini Chintala<br />

<strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>


2<br />

Outline<br />

Introduction<br />

Requirements for Synthesis<br />

Background on RTL Compiler<br />

Metrics – Area, Timing, Power<br />

Flow exploration<br />

Effective strategies<br />

Conclusions<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.


3<br />

Introduction<br />

Modern designs have aggressive requirements<br />

Need implementation of Smaller, Cooler and Faster chips<br />

Implementation of RTL into netlist is the first step<br />

Global focus synthesis results in faster chips, single pass multi Vt<br />

synthesis, and MSV design results in cooler chips<br />

Logic Netlist determines the quality of design implementation<br />

Advanced techniques for low power and timing closure<br />

Capacity allows for top down synthesis<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.


4<br />

Synthesis Using RTL Compiler<br />

Low V t<br />

Library<br />

Med. V t<br />

Library<br />

High V t<br />

Library<br />

Multidimensional Synthesis - Timing, area and power are<br />

optimized concurrently<br />

Identify and map critical logic for timing<br />

Identify and map off-critical logic for power and area<br />

RTL<br />

Timing,<br />

Power<br />

Constraints<br />

Multi-objective<br />

optimization<br />

Optimized<br />

Netlist<br />

Switching<br />

Activity<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.<br />

T<br />

i<br />

m<br />

i<br />

n<br />

g<br />

P<br />

o<br />

w<br />

e<br />

r<br />

A<br />

r<br />

e<br />

a


5<br />

Typical Flow<br />

Set target library<br />

set_attr library name /<br />

Read HDL files<br />

read_hdl ${FILE_LIST}<br />

Elaborate the design<br />

elaborate<br />

Set timing and design constraints<br />

Apply optimization directives<br />

Synthesize –to_gen<br />

Synthesize –to_map –no_incr<br />

Synthesize –to_map -incr<br />

Reports and Interface to P&R<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.<br />

Timing constraints, Power constraints<br />

Generic optimizations such as MUX<br />

optimizations, datapath selections<br />

Global Mapping<br />

Incremental Mapping


6<br />

Mapping stages<br />

synthesize –to_generic<br />

synthesize –to_map<br />

–no_incr<br />

synthesize –to_map<br />

-incr<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.<br />

Generic structuring<br />

Target setting<br />

Global mapping<br />

Remaps (area_map..)<br />

Incremental<br />

Generic optimizations such as<br />

MUX optimizations, datapath<br />

selections<br />

Targets for each cost group<br />

are derived. Each clock<br />

definition creates a cost group<br />

Global optimization for area,<br />

timing, power driven by target<br />

Area recovery stage for non<br />

critical paths<br />

Path based optimization,<br />

DRC, critical region<br />

resynthesis for timing and<br />

sequential resynthesis


7<br />

Synthesis Strategies Summary<br />

Technique<br />

RTL Coding<br />

Constraints<br />

Selective<br />

ungrouping<br />

Path groups<br />

Multi Vt Libraries<br />

Cell selection<br />

Target manipulation<br />

Mux optimizations<br />

WLM selections<br />

Clock gate analysis<br />

Area<br />

++<br />

++<br />

++<br />

++<br />

++<br />

++<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.<br />

=<br />

++<br />

++<br />

=<br />

=<br />

++<br />

++<br />

Time<br />

++<br />

++<br />

++<br />

++<br />

++<br />

++<br />

++<br />

Power<br />

++<br />

++<br />

++<br />

++<br />

++<br />

++<br />

++<br />

=<br />

++<br />

++<br />

Methodology<br />

Impact<br />

High<br />

High<br />

Medium<br />

Low<br />

Low<br />

Medium<br />

Medium<br />

Medium<br />

Medium<br />

High<br />

Methodology<br />

Change<br />

Clock gating<br />

FSM encoding<br />

Clean constraints<br />

result in better netlist<br />

Analyze opportunity for<br />

better optimization<br />

Synthesis script<br />

Multi Vt libraries<br />

loaded in synthesis<br />

Limit cell selection<br />

Analyze targets<br />

Analyze mux choices<br />

Analyze WLM choices<br />

Analyze clock gating


8<br />

Typical Flow<br />

Set target library<br />

set_attr library name /<br />

Read HDL files<br />

read_hdl ${FILE_LIST}<br />

Elaborate the design<br />

elaborate<br />

Set timing and design constraints<br />

Apply optimization directives<br />

Synthesize –to_gen<br />

Synthesize –to_map –no_incr<br />

Synthesize –to_map -incr<br />

Reports and Interface to P&R<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.<br />

RTL coding – example report power -rtl<br />

•Path adjust<br />

•Path groups<br />

•Mux guidance<br />

•Override target<br />

•Cell selection<br />

•Datapath selection<br />

•Ungrouping<br />

•Area constraints<br />

Effort levels


9<br />

Creating Cost Groups<br />

Cost groups – buckets of logic sharing the same target<br />

Divide the problem and conquer<br />

Target per cost group<br />

Mapping is target based<br />

Highly negative target results in unrealistic goal<br />

Use to isolate impossible paths from other paths<br />

Target manipulation<br />

define_cost_group -name C2C<br />

path_group -from [all::all_seqs] -to [all::all_seqs] -group C2C -name C2C<br />

define_cost_group –name I2C<br />

path_group -from [all::all_inps] -to [all::all_seqs] -group I2C -name I2C<br />

define_cost_group –name I2O<br />

path_group -from [all::all_inps] -to [all::all_outs] -group I2O -name I2O<br />

define_cost_group –name C2O<br />

path_group -from [all::all_seqs] -to [all::all_outs] -group C2O -name C2O<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.


10<br />

Path Adjust Flow – Unique To RC<br />

Target and Path Adjust<br />

Timing lint report<br />

Apply target based PA for effective mapping<br />

Important for area and power effective design for non<br />

critical paths<br />

Affects synthesize –to_map only<br />

path_adjust -to CPU/EBOX/iu0/idaP0ADW/idaP0PreByp2D_reg* -delay -150<br />

path_adjust -to CPU/EBOX/iu1/idaP1ADW/idaP0PreByp2D_reg* -delay -150<br />

path_adjust –from [all_ins] –to [all_outs] –delay 500 –name PA_I2O<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.


11<br />

Mux Selection And Mapping<br />

Mux optimization<br />

Binary mux selection – pragma driven<br />

Area and timing driven but not congestion driven<br />

Attributes available to bias mux mapping<br />

Important for congestion<br />

Identify the high-density pin cells<br />

a<br />

s<br />

b<br />

s<br />

Complex gate Mux gate<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.<br />

a<br />

b


12<br />

Wire Delays – Physically Aware<br />

Wire delays are significant as process geometries shrink<br />

Use Physical Layout Estimator (PLE) method<br />

Dynamic WLM compared to the static library wireload models<br />

Gate sizing for real wires<br />

Physical Layout Estimator (PLE)<br />

• PLE uses actual design and physical<br />

library information.<br />

• Dynamically calculates wire delays for<br />

different logic structures in the design.<br />

• Correlates better with place and route.<br />

set_attr lef_library <br />

set_attr cap_table_file <br />

set_attr interconnect_mode ple /<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.<br />

Wire-load Models<br />

• Wire load models are statistical.<br />

• Wire loads are calculated based on the nearest<br />

calibrated area.<br />

• Correlation is difficult even with custom wire-load<br />

models.<br />

set_attr interconnect_mode wireload /<br />

set_attr wireload_mode top /<br />

set_attr force_wireload<br />

[find /mylib -wireload S160K] /designs/*


13<br />

Wireload Selection Based On the <strong>Design</strong><br />

Timing critical design<br />

PLE<br />

Scale factors help to<br />

introduce pessimism in the<br />

ple calculated wirecaps<br />

Can derive from factors used<br />

in Encounter<br />

set_attr lef_library <br />

set_attr cap_table_file <br />

set_attr interconnect_mode ple /<br />

set_attribute scale_of_cap_per_unit_len 1.2 /<br />

set_attribute scale_of_res_per_unit_len 1.2 /<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.<br />

Non timing critical design<br />

Zero wireload model<br />

synthesis<br />

Smallest design – great for<br />

area<br />

Sized/restructured for<br />

placement introduced wires<br />

Slight over constrain using<br />

path adjust in synthesis<br />

set_attribute force_wireload none /designs/*


14<br />

Clock Gating – Is it important?<br />

Clock gating<br />

Number of clock gating<br />

elements versus root level<br />

clock gating<br />

RC allows Multi level clock<br />

gating<br />

Clone and declone clock<br />

gates after clock gating<br />

insertion<br />

Post map analysis on clock<br />

gating quality of result<br />

Area vs dynamic power<br />

savings<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.<br />

report clock_gating –detail<br />

report power –clock_tree -buffers -<br />

leaf_max_fanout <br />

set_attr lp_clock_gating_max_flops <br />

/designs/*<br />

set_attr lp_clock_gating_min_flops <br />

/designs/*<br />

clock_gating declone/share


15<br />

Optimizing Total Negative Slack and Cell<br />

Selection<br />

TNS<br />

If Worst Negative Slack (WNS) is negative, should I<br />

care about TNS?<br />

Total Negative Slack Optimization provides critical<br />

range<br />

Effective for path based incremental<br />

Focus on mapping result first<br />

Reducing the number of paths violating timing<br />

Cell Selection<br />

Review of don’t use lists used during synthesis<br />

Avoid bad cells<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.


16<br />

Optimizing For Leakage Power Using Multi Vt Libraries<br />

Leakage Power optimization<br />

Load multi Vt libraries<br />

RC uses leakage power numbers characterized in the<br />

library<br />

Depends on the accuracy of library characterization<br />

Useful for sprinkling in leaky cells<br />

Strategies<br />

Limit global mapping to high Vt cells only<br />

Incremental synthesis using Standard vt cells<br />

Effective in optimal leakage optimization and better<br />

structure for timing<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.


17<br />

Selective Ungrouping<br />

Selective ungrouping<br />

ChipWare components<br />

Deeply nested hierarchy<br />

Threshold based or instance name based<br />

Module naming controlled during generic mapping<br />

Ungroup Chipware components and very small instances<br />

set_attribute gen_module_prefix “CW_” /<br />

ungroup –threshold 5000<br />

ungroup [get_attr instance $subdesign]<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.


18<br />

Effort Levels – Generic, Mapping, Incremental Mapping<br />

Effort levels<br />

Effort high does not necessarily mean a good strategy<br />

Effort level is determined by targets for cost groups<br />

Global synthesis<br />

Bottom up synthesis<br />

Don’t discount it completely<br />

It works well for certain designs<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.


19<br />

Bottom up Flow Implementation<br />

Read RTL<br />

Load constraints<br />

Synthesis<br />

Derive environment<br />

Unit level RTL<br />

Unit constraints/reset target<br />

Synthesis<br />

Read unit netlist<br />

Stitch top level<br />

Clock gating analysis<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.<br />

Bottom up flow implementation<br />

Unit level constraint generation<br />

Unit level synthesis<br />

Top level hookup of unit netlists


20<br />

Flow Explorations<br />

A small example design was used to test the RC synthesis strategies<br />

Inst.<br />

count<br />

37312<br />

39109<br />

45728<br />

38611<br />

37228<br />

Hvt(%)<br />

37312(100)<br />

39109(100)<br />

30548(66.8)<br />

30386(78.7)<br />

29423(79)<br />

Svt(%)<br />

0(0)<br />

0(0)<br />

15177(33.2)<br />

8222(21.3)<br />

7802(21)<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.<br />

Timing<br />

WNS ps<br />

-49<br />

-5<br />

-12<br />

500<br />

Run 1<br />

Run 2 Run 3 Run 4<br />

Load Hvt libs<br />

Load Hvt libs Load Multi-Vt libs Hvt<br />

No power const<br />

Map no timing Optimize constraints Multi-Vt Map high –no_incr<br />

Override target<br />

Incr with const<br />

Load Svt<br />

Incr<br />

0<br />

Power<br />

mW<br />

15.99<br />

14.93<br />

33.34<br />

17.7<br />

14.97<br />

Leakage<br />

mW<br />

9.46<br />

9.07<br />

27<br />

6.02<br />

12.14<br />

Run 5<br />

Hvt<br />

Map no const<br />

Svt<br />

Incr


21<br />

Conclusions<br />

Exploration of Synthesis flows and strategies<br />

Better QoR achieved by exploring different<br />

synthesis flows<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.


22<br />

THANK YOU!<br />

© 2007 <strong>Cadence</strong> <strong>Design</strong> <strong>Systems</strong>, Inc. All rights reserved worldwide.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!