Autor: |
Kudo, Keito, Aoki, Yoichi, Kuribayashi, Tatsuki, Sone, Shusaku, Taniguchi, Masaya, Brassard, Ana, Sakaguchi, Keisuke, Inui, Kentaro |
Rok vydání: |
2024 |
Předmět: |
|
Druh dokumentu: |
Working Paper |
Popis: |
This study investigates the internal reasoning mechanism of language models during symbolic multi-step reasoning, motivated by the question of whether chain-of-thought (CoT) outputs are faithful to the model's internals. Specifically, we inspect when they internally determine their answers, particularly before or after CoT begins, to determine whether models follow a post-hoc "think-to-talk" mode or a step-by-step "talk-to-think" mode of explanation. Through causal probing experiments in controlled arithmetic reasoning tasks, we found systematic internal reasoning patterns across models; for example, simple subproblems are solved before CoT begins, and more complicated multi-hop calculations are performed during CoT. |
Databáze: |
arXiv |
Externí odkaz: |
|