rsz: Global sizing#10599
Conversation
Signed-off-by: Eren Dogan <erendogan@google.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a Lagrangian-Relaxation-driven global sizing and Vt assignment optimization phase (GLOBAL_SIZING) to the resizer. It adds the LRSubproblem class to evaluate per-gate Lagrangian subproblems concurrently and the GlobalSizingPolicy class to drive the outer optimization loop. It also introduces worker-safe overloads for gate delay calculations in Resizer to support parallel evaluation. The review comments correctly identify critical null-pointer dereference risks in both LRSubproblem::isDataArc and GlobalSizingPolicy::isDataArc where edge->role() can be null.
|
@QuantamHD @mguthaus FYI |
Signed-off-by: Eren Dogan <erendogan@google.com>
|
I've merged with master and started a secure CI where the repair_timing in cts.tcl uses the above phases. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4c3e619379
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| = r.best_is_downsize ? downsize_accept_tol : upsize_accept_tol; | ||
| if (r.best_cost < r.baseline_cost * (1.0f - tol)) { | ||
| sta::LibertyCell* prev = network_->libertyCell(r.inst); | ||
| if (subproblem_->applyReplacement(r.inst, r.best_cell)) { |
There was a problem hiding this comment.
Re-check DRC before applying stale Jacobi decisions
When a driver and one of its fanout loads are both resized in the same Jacobi sweep, each GateDecision was DRC-filtered against the pre-sweep snapshot, but this serial apply step installs the cell without revalidating against the live design after earlier replacements. A load upsized earlier in the loop can increase the driver's actual output load after the driver's snapshot passed checkOutputMaxCap/slew, so GLOBAL_SIZING can leave max-cap/max-slew violations that repair_timing does not check at finish.
Useful? React with 👍 / 👎.
| = r.best_is_downsize ? downsize_accept_tol : upsize_accept_tol; | ||
| if (r.best_cost < r.baseline_cost * (1.0f - tol)) { | ||
| sta::LibertyCell* prev = network_->libertyCell(r.inst); | ||
| if (subproblem_->applyReplacement(r.inst, r.best_cell)) { |
There was a problem hiding this comment.
Stop applying upsizes once max utilization is reached
For repair_timing -setup -max_utilization ... -phases GLOBAL_SIZING, this loop can keep accepting area-increasing replacements without any resizer_.overMaxArea() guard or rollback. Other setup policies stop when the area limit is hit, but here the phase may run past the requested utilization and then fail in OptimizationPolicy::finish() with RSZ-0025, even though a partial set of replacements could have respected the user limit.
Useful? React with 👍 / 👎.
|
The two significant issues in the public CI are cva6/asap7 crashed: and ibex/ihp fails in legalization To reproduce, use your branch with |
Signed-off-by: Eren Dogan <erendogan@google.com>
Signed-off-by: Eren Dogan <erendogan@google.com>
Signed-off-by: Eren Dogan <erendogan@google.com>
|
Don't most of the public ci designs meet timing? |
no |
Signed-off-by: Eren Dogan <erendogan@google.com>
Signed-off-by: Eren Dogan <erendogan@google.com>
Signed-off-by: Eren Dogan <erendogan@google.com>
Signed-off-by: Eren Dogan <erendogan@google.com>
Signed-off-by: Eren Dogan <erendogan@google.com>
Signed-off-by: Eren Dogan <erendogan@google.com>
Signed-off-by: Eren Dogan <erendogan@google.com>
Signed-off-by: Eren Dogan <erendogan@google.com>
Signed-off-by: Eren Dogan <erendogan@google.com>
|
@maliberty @jhkim-pii I think this is ready now. |
|
@codex review |
| lr_params_.include_clock_network = utl::readEnvarBool( | ||
| kIncludeClockNetworkEnv, lr_params_.include_clock_network); | ||
| lr_params_.setup_slack_margin = utl::readEnvarFloat( | ||
| kLrSetupSlackMarginEnv, lr_params_.setup_slack_margin); | ||
| lr_params_.max_iterations | ||
| = utl::readEnvarInt(kLrMaxIterationsEnv, lr_params_.max_iterations); | ||
| lr_params_.beta = utl::readEnvarFloat(kLrBetaEnv, lr_params_.beta); | ||
| lr_params_.mu_exponent | ||
| = utl::readEnvarFloat(kLrMuExponentEnv, lr_params_.mu_exponent); | ||
| lr_params_.lambda_floor | ||
| = utl::readEnvarFloat(kLrLambdaFloorEnv, lr_params_.lambda_floor); | ||
| lr_params_.timing_bias | ||
| = utl::readEnvarFloat(kLrTimingBiasEnv, lr_params_.timing_bias); | ||
| lr_params_.budget_safety_factor = utl::readEnvarFloat( | ||
| kLrBudgetSafetyFactorEnv, lr_params_.budget_safety_factor); | ||
| } |
There was a problem hiding this comment.
Passing controls through envars is fine for development testing but should be removed for merge. These should be promoted to command options.
There was a problem hiding this comment.
Perhaps add these to set_opt_config/reset_opt_config/report_opt_config?
There was a problem hiding this comment.
Or we may take this approach. Replacing the envars to constants and merge the PR. For development, revert back to envars locally.
There was a problem hiding this comment.
What's wrong with command args?
There was a problem hiding this comment.
I think command arguments are good, as long as there aren't too many of them.
There was a problem hiding this comment.
So long as they have reasonable defaults they are preferrable to hidden implicit ones. These seem like auto-tunable knobs.
There was a problem hiding this comment.
I will remove these and add Tcl command options.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c39faddb64
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (static_cast<size_t>(id) >= lambda_.size()) { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Rebuild LR state after replacement creates new edges
When a sweep replaces cells, this path explicitly skips any timing edge whose id was minted after allocate() sized lambda_; those same new edges are also ignored when snapshots sum output pressure, so subsequent LR iterations can treat already-resized gates as having no timing multiplier even if they are still on the critical path. In multi-iteration GLOBAL_SIZING runs that change Liberty timing arcs, this makes later sizing decisions use an incomplete objective rather than the current timing graph.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This is a known performance issue. It won't crash but potentially ignore new arcs. I will fix this in the future.
Signed-off-by: Eren Dogan <erendogan@google.com>
Signed-off-by: Eren Dogan <erendogan@google.com>
Signed-off-by: Eren Dogan <erendogan@google.com>
Signed-off-by: Eren Dogan <erendogan@google.com>
Signed-off-by: Eren Dogan <erendogan@google.com>


Summary
This PR introduces a new repair timing policy, Lagrangian relaxation-based global sizing.
Type of Change
Impact
The new global sizing policy is disabled by default now. Initial QoR experiments show promising results when enabled with
repair_timing -phases "GLOBAL_SIZING LEGACY LAST_GASP".Verification
./etc/Build.sh).