Autor: |
Gao, Minghe, Li, Juncheng, Fei, Hao, Pang, Liang, Ji, Wei, Wang, Guoming, Lv, Zheqi, Zhang, Wenqiao, Tang, Siliang, Zhuang, Yueting |
Rok vydání: |
2023 |
Předmět: |
|
Druh dokumentu: |
Working Paper |
Popis: |
Visual programming, a modular and generalizable paradigm, integrates different modules and Python operators to solve various vision-language tasks. Unlike end-to-end models that need task-specific data, it advances in performing visual processing and reasoning in an unsupervised manner. Current visual programming methods generate programs in a single pass for each task where the ability to evaluate and optimize based on feedback, unfortunately, is lacking, which consequentially limits their effectiveness for complex, multi-step problems. Drawing inspiration from benders decomposition, we introduce De-fine, a training-free framework that automatically decomposes complex tasks into simpler subtasks and refines programs through auto-feedback. This model-agnostic approach can improve logical reasoning performance by integrating the strengths of multiple models. Our experiments across various visual tasks show that De-fine creates more robust programs. Moreover, viewing each feedback module as an independent agent will yield fresh prospects for the field of agent research. |
Databáze: |
arXiv |
Externí odkaz: |
|