Popis: |
Neural text generation models that are conditioned on a given input (e.g., machine translation and image captioning) are typically trained through maximum likelihood estimation of the target text. However, models trained in this manner often suffer from various types of errors when making subsequent inferences. In this study, we propose suppressing an arbitrary type of error by training the text generation model in a reinforcement learning framework; herein, we use a trainable reward function that can discriminate between references and sentences, containing the targeted type of errors. We create such negative examples by artificially injecting the targeted errors into the references. In the experiments, we focus on two error types; repeated and dropped tokens in model-generated text. The experimental results demonstrate that our method can suppress generation errors, and achieves significant improvements on two machine translation and two image captioning tasks. |