Skip to content

fix(rag): stop flooding the gateway on non-retryable model errors#3091

Merged
dgageot merged 2 commits into
mainfrom
fix/3082-abort-indexing-on-nonretryable-errors
Jun 12, 2026
Merged

fix(rag): stop flooding the gateway on non-retryable model errors#3091
dgageot merged 2 commits into
mainfrom
fix/3082-abort-indexing-on-nonretryable-errors

Conversation

@Sayt-0

@Sayt-0 Sayt-0 commented Jun 12, 2026

Copy link
Copy Markdown
Member

Summary

Fixes #3082

A mis-configured model name in the RAG config (e.g. claude-sonnet-4-7) caused floods of requests to the AI gateway even though every request was guaranteed to fail with a permanent error (404 not_found_error). As @krissetto noted, we need to detect non retry-able errors and stop making reranking and embeddings requests for all the chunks.

Verified reproduction: with 5 files and an invalid embedding model, main sends 5 doomed requests (one per file, error silently swallowed, Initialize reports success); this branch sends exactly 1 and surfaces the error.

Both request paths are fixed, reusing the existing pkg/modelerrors.ClassifyModelError classification (same logic the runtime uses for retry/fallback decisions):

1. Indexing (embeddings + semantic LLM calls) — the per-chunk flood

Previously, indexing errors were logged and skipped, so every remaining file/chunk still issued its doomed request.

  • pkg/rag/strategy/indexing_errors.go (new): classifyModelCallError marks permanent model failures (4xx incl. 429, auth, invalid model) with an abort sentinel; transient errors (5xx, timeouts) and context cancellation are untouched.
  • pkg/rag/strategy/vector_store.go: the abort propagates through all four indexing paths:
    • Initialize: returning the error cancels the errgroup, stopping parallel file-indexing goroutines; an error event is emitted for the TUI
    • buildEmbeddingInputs: the semantic-embeddings LLM builder no longer silently falls back to raw chunk content on permanent errors (which previously made one failing LLM call per chunk)
    • CheckAndReindexChangedFiles
    • file-watcher re-index loop

2. Reranking — the per-query flood

Rerank failures were logged and silently retried on every agent query, so a bad reranking model name produced a failing request per query, forever.

  • pkg/rag/manager.go: the duplicated rerank fallback logic is extracted into Manager.rerank, which permanently disables the reranker for the lifetime of the manager after a non-retryable error (with a clear error log pointing at the model configuration). Transient errors keep the existing fallback-and-retry behavior.

Behavior matrix

Error Indexing Reranking
404 / 401 / 400 (invalid model, auth) abort run immediately disable reranker, keep original results
429 (rate limit) abort run (don't hammer) disable reranker
5xx / timeouts (transient) unchanged: skip file, continue unchanged: fallback, retry next query
context cancellation bubble up unchanged, reranker stays enabled

Tests

All new tests fail against main (they capture the bug) and pass with this fix:

  • pkg/rag/strategy/vector_store_test.go: error classification table; Initialize stops after 1 embedding request on 404 (vs one per file before); transient 500 still attempts every file; permanent semantic-LLM error aborts with 0 embedding requests; transient LLM error keeps the raw-content fallback.
  • pkg/rag/manager_rerank_test.go: reranker called once then disabled on 404; not disabled on 5xx; not disabled on context cancellation.

Validation

  • task build
  • task test (only pre-existing environmental failure: TestLoadExamples requires Docker Model Runner)
  • task lint — 0 issues

Sayt-0 added 2 commits June 12, 2026 16:35
When an embedding or semantic LLM call fails with a permanent error
(invalid model name, auth failure, rate limit), indexing previously
logged the error and kept going, issuing the same doomed request for
every remaining file and chunk and flooding the AI gateway.

Classify model call errors with modelerrors.ClassifyModelError and
abort the whole indexing run (initial indexing, change re-index, and
file-watcher re-index) on non-retryable failures. Transient errors
(5xx, timeouts) keep the previous skip-and-continue behavior.

Fixes #3082
A mis-configured reranking model (e.g. invalid model name) previously
caused every agent query to issue a reranking request guaranteed to
fail, since rerank errors were logged and silently retried on each
query.

Extract the duplicated rerank fallback logic into Manager.rerank and
permanently disable the reranker for the lifetime of the manager after
a non-retryable error. Transient errors (5xx, timeouts) and context
cancellation keep the existing fallback-and-retry-next-query behavior.

Fixes #3082
@Sayt-0 Sayt-0 requested a review from a team as a code owner June 12, 2026 14:47

@docker-agent docker-agent left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

This PR correctly addresses the gateway flooding issue (#3082) by introducing an abort sentinel for non-retryable model errors during indexing and permanently disabling the reranker after non-retryable errors. The implementation is sound: the fmt.Errorf("%w: %w", ...) multi-error wrapping is valid in Go 1.20+, errors.Is traverses the chain correctly, and the atomic.Bool field for rerankDisabled is thread-safe. One minor gap was found in the watchLoop path (see inline comment).

})
if isIndexingAborted(err) {
slog.ErrorContext(ctx, "Stopping re-indexing due to non-retryable model error", "strategy", s.name, "error", err)
break

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] watchLoop keeps running after abort — re-index will be retried on every subsequent filesystem event

The break on line 946 exits the inner for i, file := range filesToReindex loop inside the processChanges closure, stopping the current batch. However, watchLoop's outer for { select { ... } } loop continues running. On the next filesystem write/create event, processChanges is called again, attempts to re-index with the same broken model, hits the same non-retryable error, emits another error event, and breaks again.

This partially defeats the flood-prevention goal for the watcher path: the number of doomed requests is reduced from all files per event to one request per event, but remains unbounded over time. The Initialize and CheckAndReindexChangedFiles paths are correctly fixed (abort propagates to the caller), but the watcher path has no persistent "disabled" state equivalent to rerankDisabled.Store(true).

Suggested fix: introduce a persistent indexingDisabled atomic.Bool on VectorStore (similar to Manager.rerankDisabled) and check it at the top of processChanges (or at the start of watchLoop) after storing true on abort. Alternatively, cancel the watcher's context on abort.

@aheritier aheritier added area/rag For work/issues that have to do with the RAG features kind/fix PR fixes a bug (maps to fix: commit prefix) labels Jun 12, 2026
@dgageot dgageot merged commit ffb5187 into main Jun 12, 2026
13 checks passed
@dgageot dgageot deleted the fix/3082-abort-indexing-on-nonretryable-errors branch June 12, 2026 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/rag For work/issues that have to do with the RAG features kind/fix PR fixes a bug (maps to fix: commit prefix)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Mis-configured RAG reranking model name floods the AI gateway during re-index

4 participants