Popis: |
[EN] The anonymity and high security of the Tor network allow it to host a significant amount of criminal activities. Some Tor domains attract more traffic than others, as they offer better products or services to their customers. Detecting the most influential domains in Tor can help detect serious criminal activities. Therefore, in this paper, we present a novel supervised ranking framework for detecting the most influential domains. Our approach represents each domain with 40 features extracted from five sources: text, named entities, HTML markup, network topology, and visual content to train the learning-to-rank (LtR) scheme to sort the domains based on user-defined criteria. We experimented on a subset of 290 manually ranked drug-related websites from Tor and obtained the following results. First, among the explored LtR schemes, the listwise approach outperforms the benchmarked methods with an NDCG of 0.93 for the top-10 ranked domains. Second, we quantitatively proved that our framework surpasses the link-based ranking techniques. Third, we observed that using the user-visible text feature can obtain comparable performance to all the features with a decrease of 0.02 at NDCG@5. The proposed framework might support law enforcement agencies in detecting the most influential domains related to possible suspicious activities. SI Publicación en abierto financiada por el Consorcio de Bibliotecas Universitarias de Castilla y León (BUCLE), con cargo al Programa Operativo 2014ES16RFOP009 FEDER 2014-2020 DE CASTILLA Y LEÓN, Actuación:20007-CL - Apoyo Consorcio BUCLE |