fix(rag): stop flooding the gateway on non-retryable model errors#3091
Conversation
When an embedding or semantic LLM call fails with a permanent error (invalid model name, auth failure, rate limit), indexing previously logged the error and kept going, issuing the same doomed request for every remaining file and chunk and flooding the AI gateway. Classify model call errors with modelerrors.ClassifyModelError and abort the whole indexing run (initial indexing, change re-index, and file-watcher re-index) on non-retryable failures. Transient errors (5xx, timeouts) keep the previous skip-and-continue behavior. Fixes #3082
A mis-configured reranking model (e.g. invalid model name) previously caused every agent query to issue a reranking request guaranteed to fail, since rerank errors were logged and silently retried on each query. Extract the duplicated rerank fallback logic into Manager.rerank and permanently disable the reranker for the lifetime of the manager after a non-retryable error. Transient errors (5xx, timeouts) and context cancellation keep the existing fallback-and-retry-next-query behavior. Fixes #3082
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
This PR correctly addresses the gateway flooding issue (#3082) by introducing an abort sentinel for non-retryable model errors during indexing and permanently disabling the reranker after non-retryable errors. The implementation is sound: the fmt.Errorf("%w: %w", ...) multi-error wrapping is valid in Go 1.20+, errors.Is traverses the chain correctly, and the atomic.Bool field for rerankDisabled is thread-safe. One minor gap was found in the watchLoop path (see inline comment).
| }) | ||
| if isIndexingAborted(err) { | ||
| slog.ErrorContext(ctx, "Stopping re-indexing due to non-retryable model error", "strategy", s.name, "error", err) | ||
| break |
There was a problem hiding this comment.
[LOW] watchLoop keeps running after abort — re-index will be retried on every subsequent filesystem event
The break on line 946 exits the inner for i, file := range filesToReindex loop inside the processChanges closure, stopping the current batch. However, watchLoop's outer for { select { ... } } loop continues running. On the next filesystem write/create event, processChanges is called again, attempts to re-index with the same broken model, hits the same non-retryable error, emits another error event, and breaks again.
This partially defeats the flood-prevention goal for the watcher path: the number of doomed requests is reduced from all files per event to one request per event, but remains unbounded over time. The Initialize and CheckAndReindexChangedFiles paths are correctly fixed (abort propagates to the caller), but the watcher path has no persistent "disabled" state equivalent to rerankDisabled.Store(true).
Suggested fix: introduce a persistent indexingDisabled atomic.Bool on VectorStore (similar to Manager.rerankDisabled) and check it at the top of processChanges (or at the start of watchLoop) after storing true on abort. Alternatively, cancel the watcher's context on abort.
Summary
Fixes #3082
A mis-configured model name in the RAG config (e.g.
claude-sonnet-4-7) caused floods of requests to the AI gateway even though every request was guaranteed to fail with a permanent error (404not_found_error). As @krissetto noted, we need to detect non retry-able errors and stop making reranking and embeddings requests for all the chunks.Verified reproduction: with 5 files and an invalid embedding model,
mainsends 5 doomed requests (one per file, error silently swallowed,Initializereports success); this branch sends exactly 1 and surfaces the error.Both request paths are fixed, reusing the existing
pkg/modelerrors.ClassifyModelErrorclassification (same logic the runtime uses for retry/fallback decisions):1. Indexing (embeddings + semantic LLM calls) — the per-chunk flood
Previously, indexing errors were logged and skipped, so every remaining file/chunk still issued its doomed request.
pkg/rag/strategy/indexing_errors.go(new):classifyModelCallErrormarks permanent model failures (4xx incl. 429, auth, invalid model) with an abort sentinel; transient errors (5xx, timeouts) and context cancellation are untouched.pkg/rag/strategy/vector_store.go: the abort propagates through all four indexing paths:Initialize: returning the error cancels the errgroup, stopping parallel file-indexing goroutines; an error event is emitted for the TUIbuildEmbeddingInputs: the semantic-embeddings LLM builder no longer silently falls back to raw chunk content on permanent errors (which previously made one failing LLM call per chunk)CheckAndReindexChangedFiles2. Reranking — the per-query flood
Rerank failures were logged and silently retried on every agent query, so a bad reranking model name produced a failing request per query, forever.
pkg/rag/manager.go: the duplicated rerank fallback logic is extracted intoManager.rerank, which permanently disables the reranker for the lifetime of the manager after a non-retryable error (with a clear error log pointing at the model configuration). Transient errors keep the existing fallback-and-retry behavior.Behavior matrix
Tests
All new tests fail against
main(they capture the bug) and pass with this fix:pkg/rag/strategy/vector_store_test.go: error classification table;Initializestops after 1 embedding request on 404 (vs one per file before); transient 500 still attempts every file; permanent semantic-LLM error aborts with 0 embedding requests; transient LLM error keeps the raw-content fallback.pkg/rag/manager_rerank_test.go: reranker called once then disabled on 404; not disabled on 5xx; not disabled on context cancellation.Validation
task buildtask test(only pre-existing environmental failure:TestLoadExamplesrequires Docker Model Runner)task lint— 0 issues