Abstrakt: |
Cryptocurrencies are rapidly growing and are increasingly accepted by major commercial vendors. However, along with their rising popularity, they have also become the go-to currency for illicit activities driven by the anonymity they provide. Cryptocurrencies such as the one on the Ethereum blockchain provide a way for entities to hide their real-world identities behind pseudonyms, also known as addresses. Hence, the purpose of this work is to uncover the level of anonymity in Ethereum by investigating multiclass classification models for Externally Owned Accounts (EOAs) of Ethereum. The researchers aim to achieve this by examining patterns of transaction activity associated with these addresses. Using a labelled Ethereum address dataset from Kaggle and the Ethereum crypto dataset by Google BigQuery, an address profiles dataset was compiled based on the transaction history of the addresses. The compiled dataset, consisting of 4371 samples, was used to tune and evaluate the Random Forest, Gradient Boosting and XGBoost classifier for predicting the category of the addresses. The best-performing model found for the problem was the XGBoost classifier, achieving an accuracy of 75.3% with a macro-averaged F1-Score of 0.689. Following closely was the Random Forest classifier, with an accuracy of 73.7% and a macro-averaged F1-Score of 0.641. Gradient Boosting came in last with 73% accuracy and a macro-averaged F1-Score of 0.659. Owing to the data limitations in this study, the overall scores of the best model were weaker in comparison to similar research, with the exception of precision, which scored slightly higher. Nevertheless, the results proved that it is possible to predict the category of an Ethereum wallet address such as Phish/Hack, Scamming, Exchange and ICO wallets based on its transaction behaviour. [ABSTRACT FROM AUTHOR] |