Add Configuration Maximums page for Kubermatic Virtualization#2190
Open
mihiragrawal wants to merge 20 commits into
Open
Add Configuration Maximums page for Kubermatic Virtualization#2190mihiragrawal wants to merge 20 commits into
mihiragrawal wants to merge 20 commits into
Conversation
Document the validated configuration maximums of a KubeV cluster as discovered by the ConfigMax benchmark tool. Includes: - a customer-facing "Validated maximums" table (accepted vs sustained), - an internal engineering reference with the vSphere ConfigMax comparison and full measurement provenance (marked for review/trim), - a "How we measure" section covering discovery / target / workload-SLO modes and the distress signals that stop a run, - a per-capability bottleneck summary, - a "Run it yourself" guide with an annotated ConfigMaxRun YAML and the key parameters reference. Lands in content/kubermatic-virtualization/main (next release). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
… methodology Replace the accepted/sustained model with the programmed + verified-functional ceiling dataset (June 2026 runs), split the page into a public part (marketing table, technical reference, run-it-yourself incl. CLI) and an internal engineering reference (methodology, distress probes, per-test method cards, tuning baseline, bottleneck/caveats registers, journey, glossary) behind a review divider. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Marketing and technical tables now lead with the actual resource (VPCs, Subnets, NetworkPolicies, SecurityGroups, Services); each description opens with the platform-neutral concept line for readers from other virtualization stacks. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The What-it-means column now carries just the bolded platform-neutral concept per row; detail lives in the technical reference table. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Run dates and durations remain in the per-test method cards. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Replace the qualitative 4x description with the per-run p99 numbers from the 2026-05-08 validation runs (cross-host 1.6 ms to 6.9 ms at the cliff, 3-8 ms sustained band, reproduced in the second run). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
State the full measured curve: 1.5-1.75 ms cross-host p99 through 70 tenants, 6.9 ms at 80, 3-8 ms band from 90-120, with the cliff points of all three validation runs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Baseline, steady range, cliff, and degraded band each get a row with tenant count and measured cross-host p99, replacing the single dense sentence. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Values from the v8a per-batch trace; baseline same-host honestly marked not-recorded (only cross-host was captured in that run). Notes that the same-host cliff arrives one batch after the cross-host one. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Drop the tracked-connections row (a Linux kernel parameter, not a product capability — data stays in the internal method cards); label the latency row as the idle-cluster floor; move the ~80 active-tenants degradation result out of the capacity table with a footnote pointer to the degradation section. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The re-run with an independent apiserver-VIP bystander probe reached the full 120,000 cap cleanly: 120,001 policies / 355,469 ACLs settled, zero probe failures in ~4 h. The earlier 25,101 stop is confirmed as a one-off harness-pod network loss, not a data-plane wall. Updates the marketing and technical rows, the method card, bottleneck and caveats registers, and adds the journey entry. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The page was marked chapter = true, which the theme styles with enlarged chapter-intro typography — the whole page rendered with a larger font than sibling docs. Drop the chapter flag and the manual H1 (the theme renders the title) so it matches the other content pages and gains the standard in-page TOC. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Restructure the single long page into a section with sub-pages,
matching the Architecture section's layout:
configuration-maximums/ overview, test environment, reading guide
validated-maximums/ headline table + technical reference
degradation/ tenant-scaling degradation result
running-configmax/ prerequisites
operator/ ConfigMaxRun walkthrough, parameters, profiles
cli/ standalone binary
engineering-reference/ internal-review warning + map
ceiling-methodology/ definition, run loop, distress probes
degradation-methodology/
method-cards/ per-test method cards
tuning-and-findings/ tuning baseline, bottleneck/caveats registers
glossary/
Content is moved verbatim; only cross-references changed from in-page
anchors to ref shortcode links between the new pages. Refs use ./ and
../ relative paths because the theme's ref shortcode resolves bare
names site-wide and errors on ambiguous section names (e.g. "cli").
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Per review feedback: the technical reference table now carries the data each run actually collected - run date, duration, and peak component readings against their danger lines (etcd database fill, control-plane database memory, rule-compiler CPU, host memory, programming pace, probe results) - so a technical reader can see what the cluster was doing the moment each ceiling was recorded. Adds the shared danger-line legend and a pointer to the per-test method cards for the full named-component data. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ference Per review: the technical reference table's audience is technical readers, so the measured-data column now names the actual components (ovn-central, ovn-northd, ovs-ovn, kube-ovn-controller, ACLs, OVN LB VIPs) instead of platform-neutral paraphrases. Add the latency observations each run captured (gateway-ping RTT at 11.8k subnets, VPC provisioning latency) and a pod-to-pod latency row with the full idle-cluster measurement, including the 14k-orphan-subnet contrast that motivates the clean-cluster rule. VM-to-VM latency under tenant load stays on the Degradation page, referenced below the table. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…lidraw diagram The two probes listed as missing (per-node ovs-ovn memory, kube-ovn-controller restart/crash) are now implemented and active in every run; the probe table gains both rows and the honest-list entry records how they were validated (forced-trip run + a full 10k-VPC ceiling re-run with zero false trips and an unchanged settled count). The ASCII run-loop block becomes the Excalidraw render per the diagram rule. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Readers need the current behavior, not the chronology of past mistakes and their fix dates. Probe history becomes present-tense design notes; method-card and tuning caveats keep the load-bearing facts (requirements, scaling rationale, safety warnings) and drop the earlier-run stories. Measurement provenance (run dates on published numbers) stays. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 D1 run supersedes the ~80±10 cliff: 120 tenants / 600 VMs (run cap) with flat VM-to-VM p99 (406-488us vs 430us baseline), all 24 boundaries measured. The old cliff traced to memory-starved per-node networking agents + non-enforcing policies — kept as a superseded section with the root cause. Methodology page now documents the dual stop rule (2 ms floor AND 4x own baseline), probe-lost abort, and baseline completeness guards; method card, validated-maximums notice, and journey table updated to match. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Readers need the current result, not the story of how it was reached: dropped the superseded-cliff section, validation dates in prose, and roadmap chatter; sizing lessons kept as plain guidance. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Internal cluster name removed from the published pages. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
New documentation page: Configuration Maximums for Kubermatic Virtualization, under
content/kubermatic-virtualization/main/configuration-maximums/. It documents the validated configuration maximums of a KubeV cluster — VMs, networks, firewall rules, subnets, routes, etc. — as discovered by the in-cluster ConfigMax benchmark tool.Lands in
main(next release);v1.1.0is frozen.What's in it
ConfigMaxRunYAML, apply/watch/read commands, and a key-parameters reference.For the reviewer (@Moath — decisions needed)
Verification
Built locally with the project's Dockerized Hugo (
quay.io/kubermatic/hugo:0.159.1-0), exit 0, no errors. Page renders and appears in themainsection nav.🤖 Generated with Claude Code