From Meeting at 14 May 2025, here is the literature review about Benchmark on dialogues that focus on Long-Term Memory.
Link to the full survey reading note: Reading - (Survey) Rethinking Memory in AI
| Benchmark | Domain | Sess | Q | Context Depth | Core Memory Abilities | ||||
|---|---|---|---|---|---|---|---|---|---|
| IE | MR | KU | TR | ABS | |||||
| MSC (Xu et al., 2022a) | Open-Domain | 5k | - | 1k | ✗ | ✗ | ✗ | ✗ | ✗ |
| DuLeMon (Xu et al., 2022b) | Open-Domain | 30k | - | 1k | ✗ | ✗ | ✗ | ✗ | ✗ |
| MemoryBank (Zhong et al., 2024) | Personal | 300 | 194 | 5k | ✓ | ✗ | ✗ | ✓ | ✗ |
| PerLTQA (Du et al., 2024) | Personal | 4k | 8593 | 1M∗ | ✓ | ✗ | ✗ | ✗ | ✓ |
| LoCoMo (Maharana et al., 2024) | Personal | 1k | 7512 | 10k | ✓ | ✓ | ✗ | ✓ | ✓ |
| DialSim (Kim et al., 2024) | TV Shows | 1k–2k | 1M | 350k | ✓ | ✓∗∗ | ✗ | ✓ | ✓ |
| LongMemEval (this work) | Personal | 50k | 500 | 115k, 1.5M | ✓ | ✓ | ✓ | ✓ | ✓ |