Baseline-
Decision-
Mean conf vs context-
KV cache-
warmup

incoming relative-pose edges · columns = source, rows = target

Note: the model assigns higher confidence to nearby (easier) frame pairs. When Frame 5 (tag 8) arrives after the distractors, its strongest edges connect back to Frames 1–3 — the model immediately recognizes the previously visited scene.

Synchronized reconstruction