An Empirical Study of the Usage of Checksums for Web Downloads

Autor: Bernard, Gaël, Huguenin, Kévin, Bertil.Chapuis
Rok vydání: 2022
DOI: 10.17605/osf.io/a9ykr
Popis: Checksums, typically provided on webpages and generated from cryptographic hash functions (e.g., MD5, SHA256) or signature schemes (e.g., PGP), are commonly used on websites to enable users to verify that the files they download have not been tampered with when stored on possibly untrusted servers. In this paper, we shed light on the current practices regarding the usage of checksums for web downloads (hash functions used, visibility and validity of checksums, type of websites and files, presence of instructions, etc.), as this has been mostly overlooked so far. Using a snowball-sampling strategy for the 200,000 most popular domains of the Web, we first crawled a dataset of 8.5M webpages, from which we built, through an active-learning approach, a unique dataset of 277 diverse webpages that contain checksums. Our analysis of these webpages reveals interesting findings about the usage of checksums. For instance, it shows that broken hash functions are frequently used and that a non-negligible proportion of the checksums provided on webpages do not match that of their associated files.
Databáze: OpenAIRE