Skip to content

fix: pass question_date through to answer prompts (temporal questions ran undated)#46

Open
thameema wants to merge 1 commit into
supermemoryai:mainfrom
thameema:fix/question-date-plumbing
Open

fix: pass question_date through to answer prompts (temporal questions ran undated)#46
thameema wants to merge 1 commit into
supermemoryai:mainfrom
thameema:fix/question-date-plumbing

Conversation

@thameema

@thameema thameema commented Jun 11, 2026

Copy link
Copy Markdown

The bug

Orchestrator.run() initializes each question's checkpoint entry here:

https://github.com/supermemoryai/memorybench/blob/main/src/orchestrator/index.ts#L241

but never copies questionDate from the loaded question into the checkpoint. Everything downstream is already wired for it:

  • the LongMemEval loader sets metadata.questionDate from the dataset's question_date (src/benchmarks/longmemeval/index.ts:203)
  • initQuestion accepts and stores metadata.questionDate (src/orchestrator/checkpoint.ts:205)
  • the answer phase reads checkpoint.questions[id].questionDate and substitutes it into the {{questionDate}} / Question Date: slot of every provider's answer prompt (src/orchestrator/phases/answer.ts:120, src/prompts/defaults.ts:13)

Because the orchestrator drops it at the hand-off, the value is always undefined, and every answer prompt renders Question Date: Not specified — for every provider, on every run.

Impact

Temporal-reasoning questions like "What kitchen appliance did I buy 10 days ago?" are unanswerable without the question date: the model has the dated evidence in context but no anchor to resolve "10 days ago" against. So temporal-reasoning category scores are understated for all providers measured on the harness, and cross-provider comparisons on that category are mostly noise from the model guessing an anchor date.

The fix

One line: pass q.metadata?.questionDate through to initQuestion, matching the signature it already accepts.

How we found it / measured effect

We hit this while evaluating our own memory system on the harness: temporal answers kept resolving relative dates against the wrong anchor despite correct retrieval. On a 20-question LongMemEval pilot (gpt-4o answering + judging), overall accuracy went 60% → 75% from this fix alone, with the gains concentrated in temporal-reasoning questions.

Found while evaluating memnos (github.com/thameema/memnos) on this harness.

… the real date

The benchmark loaders (e.g. LongMemEval) put question_date into
question.metadata.questionDate, and the answer phase reads
checkpoint.questions[id].questionDate to fill the {{questionDate}} /
'Question Date:' slot in every answer prompt. But the orchestrator's
initQuestion call never copied questionDate from the loaded question
into the checkpoint, so the value was always undefined and every
answer prompt rendered 'Question Date: Not specified'.

Temporal-reasoning questions ('what did I buy 10 days ago?') are
unanswerable without the question date, so temporal category scores
were understated for every provider measured on the harness.
@thameema thameema force-pushed the fix/question-date-plumbing branch from 058486c to 6dd52c6 Compare June 11, 2026 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant