FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

Submitted by Kate Moore on October 29, 2023
Phantom

Source https://hyunw.kim/fantom/

Theory of mind (ToM) evaluations currently focus on testing models using passive narratives that inherently lack interactivity. This team introduces FANToM 👻, a new benchmark designed to stress-test ToM within information-asymmetric conversational contexts via question answering. The authors show how the benchmark draws upon important theoretical requisites from psychology and necessary empirical considerations when evaluating large language models (LLMs).

Log in or register to join the discussion