Autor: |
Cetinkaya, Yusuf Mucahit, Lee, Yeonjung, Kulah, Emre, Toroslu, Ismail Hakki, Cowan, Michael A., Davulcu, Hasan |
Zdroj: |
IEEE Internet Computing; September 2024, Vol. 28 Issue: 5 p20-27, 8p |
Abstrakt: |
The rise of harmful online content underscores the urgent need for artificial intelligence (AI) systems to effectively detect, filter, and foster safer and healthier communication. This article introduces a novel approach to mitigating toxic content generation propensities of large language models (LLMs) by fine-tuning them with a programmable stance-directed focus on core human values and the common good. We propose a streamlined keyword coding and processing pipeline that generates weakly labeled data to train AI models to avoid toxicity and champion civil discourse. We also developed a toxicity classifier and an aspect-based sentiment analysis model to assess and control the effectiveness of a humanizing AI model. We evaluate the proposed pipeline using a contentious real-world X (formerly Twitter) dataset on U.S. race relations. Our approach successfully curbs the toxic content generation propensity of an unrestricted LLM by a significant 85%. |
Databáze: |
Supplemental Index |
Externí odkaz: |
|