fix: pass question_date through to answer prompts (temporal questions ran undated)#46
Open
thameema wants to merge 1 commit into
Open
fix: pass question_date through to answer prompts (temporal questions ran undated)#46thameema wants to merge 1 commit into
thameema wants to merge 1 commit into
Conversation
… the real date
The benchmark loaders (e.g. LongMemEval) put question_date into
question.metadata.questionDate, and the answer phase reads
checkpoint.questions[id].questionDate to fill the {{questionDate}} /
'Question Date:' slot in every answer prompt. But the orchestrator's
initQuestion call never copied questionDate from the loaded question
into the checkpoint, so the value was always undefined and every
answer prompt rendered 'Question Date: Not specified'.
Temporal-reasoning questions ('what did I buy 10 days ago?') are
unanswerable without the question date, so temporal category scores
were understated for every provider measured on the harness.
058486c to
6dd52c6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The bug
Orchestrator.run()initializes each question's checkpoint entry here:https://github.com/supermemoryai/memorybench/blob/main/src/orchestrator/index.ts#L241
but never copies
questionDatefrom the loaded question into the checkpoint. Everything downstream is already wired for it:metadata.questionDatefrom the dataset'squestion_date(src/benchmarks/longmemeval/index.ts:203)initQuestionaccepts and storesmetadata.questionDate(src/orchestrator/checkpoint.ts:205)checkpoint.questions[id].questionDateand substitutes it into the{{questionDate}}/Question Date:slot of every provider's answer prompt (src/orchestrator/phases/answer.ts:120,src/prompts/defaults.ts:13)Because the orchestrator drops it at the hand-off, the value is always
undefined, and every answer prompt rendersQuestion Date: Not specified— for every provider, on every run.Impact
Temporal-reasoning questions like "What kitchen appliance did I buy 10 days ago?" are unanswerable without the question date: the model has the dated evidence in context but no anchor to resolve "10 days ago" against. So temporal-reasoning category scores are understated for all providers measured on the harness, and cross-provider comparisons on that category are mostly noise from the model guessing an anchor date.
The fix
One line: pass
q.metadata?.questionDatethrough toinitQuestion, matching the signature it already accepts.How we found it / measured effect
We hit this while evaluating our own memory system on the harness: temporal answers kept resolving relative dates against the wrong anchor despite correct retrieval. On a 20-question LongMemEval pilot (gpt-4o answering + judging), overall accuracy went 60% → 75% from this fix alone, with the gains concentrated in temporal-reasoning questions.
Found while evaluating memnos (github.com/thameema/memnos) on this harness.