Abstrakt: |
The capacity of Generative Artificial Intelligence (AI) models in formulating optimization problems is an interesting area of exploration in this rapidly evolving field. This study explores the capability of AI to interpret and formulate mathematical modeling problems from English descriptions. Five Large Language Models (LLMs) were selected, including OpenAI's ChatGPT-4, Google's Gemini, Microsoft's Copilot, Anthropic’s Claude, and an open-sourced model by Meta Llama-2. The research is conducted through a systematic comparison between human-expert generated formulations and those produced by the LLMs to better understand the strengths and shortcomings of the five LLMs. A diverse set of 26 linear programming problems was used for this evaluation. The effectiveness of these AI tools is measured based on the correctness of the formulations. ChatGPT-4 outperformed its competitors with a mean score of 88.55, followed by Copilot at 84.93, Gemini at 83.57, Claude at 81.21, and Llama-2 at 46.26. The test problems were example in Linear programming from typical junior level course in industrial engineering and were graded using a rubric. Overall, ChatGPT-4 was the best earning a "B+" grade compared to others Copilot ("B"), Gemini ("B-"), Claude ("B-"), and Llama-2 ("F"). These findings indicate considerable variation in current Generative AI technologies in their ability to automatically formulate mathematical optimization problems. There are interesting opportunities to harness these technologies as they continue to rapidly evolve in research, education, and practice. [ABSTRACT FROM AUTHOR] |