Skip to content

feat(kubernetes): support HA gateway rebalancing#1868

Open
TaylorMutch wants to merge 4 commits into
mainfrom
1021-ha-gateway-rebalancing/tm
Open

feat(kubernetes): support HA gateway rebalancing#1868
TaylorMutch wants to merge 4 commits into
mainfrom
1021-ha-gateway-rebalancing/tm

Conversation

@TaylorMutch

@TaylorMutch TaylorMutch commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds HA gateway rebalancing support for Kubernetes deployments so client and supervisor traffic can survive gateway replica scale-up, scale-down, and pod rotation.

This PR targets main directly and currently includes the reconciler lease commit from #1577. Review focus should be ad9f04d7 unless #1577 lands first.

Related Issue

Closes #1021

Related: #1012, #1429, #1577, #1731, #1488

Changes

  • Adds gateway peer authentication and peer routing for HA supervisor relay handoff.
  • Adds Kubernetes compute lease/reconciler ownership behavior for multi-replica gateways.
  • Adds Helm peer Service/RBAC rendering and Skaffold HA/Envoy dev profile support.
  • Adds Kubernetes HA rebalancing e2e coverage and removes the noisy readyz e2e smoke.
  • Updates architecture and local cluster/debug skills for HA gateway development.

Testing

  • mise run pre-commit passes
  • Local Kubernetes HA validation with Envoy Gateway, external PostgreSQL, and gateway scale/rotation was exercised during development
  • GitHub test:e2e-kubernetes label should run the Kubernetes HA E2E job

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

@TaylorMutch TaylorMutch requested review from a team, derekwaynecarr and mrunalp as code owners June 11, 2026 04:47
@TaylorMutch TaylorMutch added the test:e2e-kubernetes Requires Kubernetes end-to-end coverage label Jun 11, 2026
@github-actions

Copy link
Copy Markdown

Label test:e2e-kubernetes applied for ad9f04d. Open the existing run and click Re-run all jobs to execute with the label set. The run will execute Kubernetes HA E2E after building the required gateway and supervisor images once. This is an optional proof-of-life suite; failures are visible in the workflow run but do not publish a required CI gate status.

Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
@TaylorMutch TaylorMutch force-pushed the 1021-ha-gateway-rebalancing/tm branch from 24c1003 to 3e590e6 Compare June 11, 2026 17:35
Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
- op: add
path: /deploy/helm/releases/0/valuesFiles/-
value: ci/values-high-availability.yaml
- name: ha-envoy

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading the docs initially, it was not clear that ha-envoy included the high-availability profile? Could we call this out explicitly (perhaps renaming the profile), or make it so that these are composable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e-kubernetes Requires Kubernetes end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(k8s, helm): Enable running OpenShell Gateway with multiple replicas

2 participants