Generating Reward Functions Using IRL Towards Individualized Cancer Screening
Autor: | Alex A. T. Bui, Panayiotis Petousis, William Hsu, Simon X. Han |
---|---|
Rok vydání: | 2019 |
Předmět: |
Computer science
02 engineering and technology Maximum entropy inverse reinforcement learning Machine learning computer.software_genre Article Cancer screening 03 medical and health sciences Breast cancer screening 0302 clinical medicine Breast cancer 0202 electrical engineering electronic engineering information engineering medicine 030212 general & internal medicine Partially-observable Markov decision processes Overdiagnosis Lung cancer medicine.diagnostic_test business.industry Principle of maximum entropy Partially observable Markov decision process Cancer medicine.disease 020201 artificial intelligence & image processing Markov decision process Artificial intelligence business computer |
Zdroj: | Artificial intelligence in health : first International Workshop, AIH 2018, Stockholm, Sweden, July 13-14, 2018, Revised selected papers. AIH (Workshop) (1st : 2018 : Stockholm, Sweden), vol 11326, iss 213-227 Lecture Notes in Computer Science ISBN: 9783030127374 AIH@IJCAI (Revised Selected Papers) |
Popis: | Cancer screening can benefit from individualized decision-making tools that decrease overdiagnosis. The heterogeneity of cancer screening participants advocates the need for more personalized methods. Partially observable Markov decision processes (POMDPs), when defined with an appropriate reward function, can be used to suggest optimal, individualized screening policies. However, determining an appropriate reward function can be challenging. Here, we propose the use of inverse reinforcement learning (IRL) to form rewards functions for lung and breast cancer screening POMDPs. Using experts (physicians) retrospective screening decisions for lung and breast cancer screening, we developed two POMDP models with corresponding reward functions. Specifically, the maximum entropy (MaxEnt) IRL algorithm with an adaptive step size was employed to learn rewards more efficiently; and combined with a multiplicative model to learn state-action pair rewards for a POMDP. The POMDP screening models were evaluated based on their ability to recommend appropriate screening decisions before the diagnosis of cancer. The reward functions learned with the MaxEnt IRL algorithm, when combined with POMDP models in lung and breast cancer screening, demonstrate performance comparable to experts. The Cohen’s Kappa score of agreement between the POMDPs and physicians’ predictions was high in breast cancer and had a decreasing trend in lung cancer. |
Databáze: | OpenAIRE |
Externí odkaz: |