Automatic Assessment of Document Quality in Web Collaborative Digital Libraries

Autor: Dalip, Daniel Hasan, Gonçalves, Marcos André, Cristo, Marco, Calado, Pável
Zdroj: Journal of Data and Information Quality (ACM Digital Library); December 2011, Vol. 2 Issue: 3 p1-30, 30p
Abstrakt: The old dream of a universal repository containing all of human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and open edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its quality. In this work, we explore a significant number of quality indicators and study their capability to assess the quality of articles from three Web collaborative digital libraries. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment. Through experiments, we show that the most important quality indicators are those which are also the easiest to extract, namely, the textual features related to the structure of the article. Moreover, to the best of our knowledge, this work is the first that shows an empirical comparison between Web collaborative digital libraries regarding the task of assessing article quality.
Databáze: Supplemental Index