Towards A new Spam Filter Based on PV-DM (Paragraph Vector-Distributed Memory Approach)
Autor: | Meryem Amar, Hicham Laanaya, Samira Douzi, Bouabid El Ouahidi |
---|---|
Rok vydání: | 2017 |
Předmět: |
Word embedding
Information retrieval Computer science Volume (computing) 020207 software engineering 02 engineering and technology ComputingMethodologies_PATTERNRECOGNITION Order (business) Filter (video) 0202 electrical engineering electronic engineering information engineering General Earth and Planetary Sciences 020201 artificial intelligence & image processing Distributed memory Paragraph Representation (mathematics) General Environmental Science |
Zdroj: | FNC/MobiSPC |
ISSN: | 1877-0509 |
DOI: | 10.1016/j.procs.2017.06.130 |
Popis: | The increasing volume of emails has led to the emergence of problems caused by unsolicited email, commonly referred to as Spam. One of the most commonly presentation used in Spam Filter is the BoW (Bag-of-words). However, this approach has a number of weaknesses, mainly the fact that the word order is lost; hence different emails can have the same representation since the same words are used, and it ignores the relationship between words, which can lead to poor performance. This paper proposes a new Spam filter based on PV-DM (Paragraph Vector-Distributed Memory) in order to overcome the limitations of the BoW representation. |
Databáze: | OpenAIRE |
Externí odkaz: |