fix(rag): stop flooding the gateway on non-retryable model errors by Sayt-0 · Pull Request #3091 · docker/docker-agent

Sayt-0 · 2026-06-12T14:47:10Z

Summary

A mis-configured model name in the RAG config (e.g. claude-sonnet-4-7) caused floods of requests to the AI gateway even though every request was guaranteed to fail with a permanent error (404 not_found_error). As @krissetto noted, we need to detect non retry-able errors and stop making reranking and embeddings requests for all the chunks.

Verified reproduction: with 5 files and an invalid embedding model, main sends 5 doomed requests (one per file, error silently swallowed, Initialize reports success); this branch sends exactly 1 and surfaces the error.

Both request paths are fixed, reusing the existing pkg/modelerrors.ClassifyModelError classification (same logic the runtime uses for retry/fallback decisions):

1. Indexing (embeddings + semantic LLM calls) — the per-chunk flood

Previously, indexing errors were logged and skipped, so every remaining file/chunk still issued its doomed request.

pkg/rag/strategy/indexing_errors.go (new): classifyModelCallError marks permanent model failures (4xx incl. 429, auth, invalid model) with an abort sentinel; transient errors (5xx, timeouts) and context cancellation are untouched.
pkg/rag/strategy/vector_store.go: the abort propagates through all four indexing paths:
- Initialize: returning the error cancels the errgroup, stopping parallel file-indexing goroutines; an error event is emitted for the TUI
- buildEmbeddingInputs: the semantic-embeddings LLM builder no longer silently falls back to raw chunk content on permanent errors (which previously made one failing LLM call per chunk)
- CheckAndReindexChangedFiles
- file-watcher re-index loop

2. Reranking — the per-query flood

Rerank failures were logged and silently retried on every agent query, so a bad reranking model name produced a failing request per query, forever.

pkg/rag/manager.go: the duplicated rerank fallback logic is extracted into Manager.rerank, which permanently disables the reranker for the lifetime of the manager after a non-retryable error (with a clear error log pointing at the model configuration). Transient errors keep the existing fallback-and-retry behavior.

Behavior matrix

Error	Indexing	Reranking
404 / 401 / 400 (invalid model, auth)	abort run immediately	disable reranker, keep original results
429 (rate limit)	abort run (don't hammer)	disable reranker
5xx / timeouts (transient)	unchanged: skip file, continue	unchanged: fallback, retry next query
context cancellation	bubble up	unchanged, reranker stays enabled

Tests

All new tests fail against main (they capture the bug) and pass with this fix:

pkg/rag/strategy/vector_store_test.go: error classification table; Initialize stops after 1 embedding request on 404 (vs one per file before); transient 500 still attempts every file; permanent semantic-LLM error aborts with 0 embedding requests; transient LLM error keeps the raw-content fallback.
pkg/rag/manager_rerank_test.go: reranker called once then disabled on 404; not disabled on 5xx; not disabled on context cancellation.

Validation

task build
task test (only pre-existing environmental failure: TestLoadExamples requires Docker Model Runner)
task lint — 0 issues

When an embedding or semantic LLM call fails with a permanent error (invalid model name, auth failure, rate limit), indexing previously logged the error and kept going, issuing the same doomed request for every remaining file and chunk and flooding the AI gateway. Classify model call errors with modelerrors.ClassifyModelError and abort the whole indexing run (initial indexing, change re-index, and file-watcher re-index) on non-retryable failures. Transient errors (5xx, timeouts) keep the previous skip-and-continue behavior. Fixes #3082

A mis-configured reranking model (e.g. invalid model name) previously caused every agent query to issue a reranking request guaranteed to fail, since rerank errors were logged and silently retried on each query. Extract the duplicated rerank fallback logic into Manager.rerank and permanently disable the reranker for the lifetime of the manager after a non-retryable error. Transient errors (5xx, timeouts) and context cancellation keep the existing fallback-and-retry-next-query behavior. Fixes #3082

docker-agent

Assessment: 🟢 APPROVE

This PR correctly addresses the gateway flooding issue (#3082) by introducing an abort sentinel for non-retryable model errors during indexing and permanently disabling the reranker after non-retryable errors. The implementation is sound: the fmt.Errorf("%w: %w", ...) multi-error wrapping is valid in Go 1.20+, errors.Is traverses the chain correctly, and the atomic.Bool field for rerankDisabled is thread-safe. One minor gap was found in the watchLoop path (see inline comment).

docker-agent · 2026-06-12T14:53:55Z

 					})
+					if isIndexingAborted(err) {
+						slog.ErrorContext(ctx, "Stopping re-indexing due to non-retryable model error", "strategy", s.name, "error", err)
+						break


[LOW] watchLoop keeps running after abort — re-index will be retried on every subsequent filesystem event

The break on line 946 exits the inner for i, file := range filesToReindex loop inside the processChanges closure, stopping the current batch. However, watchLoop's outer for { select { ... } } loop continues running. On the next filesystem write/create event, processChanges is called again, attempts to re-index with the same broken model, hits the same non-retryable error, emits another error event, and breaks again.

This partially defeats the flood-prevention goal for the watcher path: the number of doomed requests is reduced from all files per event to one request per event, but remains unbounded over time. The Initialize and CheckAndReindexChangedFiles paths are correctly fixed (abort propagates to the caller), but the watcher path has no persistent "disabled" state equivalent to rerankDisabled.Store(true).

Suggested fix: introduce a persistent indexingDisabled atomic.Bool on VectorStore (similar to Manager.rerankDisabled) and check it at the top of processChanges (or at the start of watchLoop) after storing true on abort. Alternatively, cancel the watcher's context on abort.

Sayt-0 added 2 commits June 12, 2026 16:35

Sayt-0 requested a review from a team as a code owner June 12, 2026 14:47

docker-agent reviewed Jun 12, 2026

View reviewed changes

aheritier added area/rag For work/issues that have to do with the RAG features kind/fix PR fixes a bug (maps to fix: commit prefix) labels Jun 12, 2026

dgageot approved these changes Jun 12, 2026

View reviewed changes

dgageot merged commit ffb5187 into main Jun 12, 2026
13 checks passed

dgageot deleted the fix/3082-abort-indexing-on-nonretryable-errors branch June 12, 2026 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rag): stop flooding the gateway on non-retryable model errors#3091

fix(rag): stop flooding the gateway on non-retryable model errors#3091
dgageot merged 2 commits into
mainfrom
fix/3082-abort-indexing-on-nonretryable-errors

Sayt-0 commented Jun 12, 2026

Uh oh!

docker-agent left a comment

Uh oh!

docker-agent Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Sayt-0 commented Jun 12, 2026

Summary

1. Indexing (embeddings + semantic LLM calls) — the per-chunk flood

2. Reranking — the per-query flood

Behavior matrix

Tests

Validation

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🟢 APPROVE

Uh oh!

docker-agent Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants