Multimodal Deep Learning using Images and Text for Information Graphic Classification
Autor: | Kathleen F. McCoy, Edward Kim |
---|---|
Rok vydání: | 2018 |
Předmět: |
Information retrieval
Modality (human–computer interaction) Artificial neural network Point (typography) business.industry Bar chart Computer science Deep learning 020207 software engineering 02 engineering and technology law.invention Metadata law 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence Graphics Line (text file) business |
Zdroj: | ASSETS |
Popis: | Information graphics, e.g. line or bar graphs, are often displayed in documents and popular media to support an intended message, but for a growing number of people, they are missing the point. The World Health Organization estimates that the number of people with vision impairment could triple in the next thirty years due to population growth and aging. If a graphic is not described, explained in the text, or missing alt tags and other metadata (as is often the case in popular media), the intended message is lost or not adequately conveyed. In this work, we describe a multimodal deep learning approach that supports the communication of the intended message. The multimodal model uses both the pixel data and text data in a single neural network to classify the information graphic into an intention category that has previously been validated as useful for people who are blind or who are visually impaired. Furthermore, we collect a new dataset of information graphics and present qualitative and quantitative results that show our multimodal model exceeds the performance of any one modality alone, and even surpasses the capabilities of the average human annotator. |
Databáze: | OpenAIRE |
Externí odkaz: |