This repository contains the cryptojacking malware dataset and relevant information for the “SoK: Cryptojacking Malware” paper.
1. VT Dataset – Malware
The VT dataset consists of the hash values of the 20200 cryptojacking samples in the CSV format.
We made our queries on the entire VirusTotal database that can be accessed via VirusTotal academic API. Then, we performed the case-insensitive search of the keyword “miner” on the samples’ VT scan reports in our database.
Reproducibility of VT Dataset:
- Download the hash values of the samples
- You can search for hashes of the samples on the VT interface or download the samples with your own credentials.
- Run the keyword-search.py to find the samples of interest. For example, one can run the “Coinhive” to find the Coinhive service provider’s samples. Similarly, one can run “XMR” to find the samples using Monero as a target cryptocurrency.
2. PublicWWW Dataset
The PublicWWW dataset consists of two domain lists and two keyword lists.
- “known_service_provider_domain_list.csv”: This file contains the domains with the publicly known service providers. The list also includes the service provider name for each domain and associated keyword. Please note that some of the domain use multiple service providers.
- “unknown_service_provider_domain_list.csv”: This file contains the domains with unknown service providers. The second column includes the keyword that is used to identify this domain.
- “service provider keywords.csv”: This file contains the keywords that can be used to identify the 14 service providers uniquely.
- “unknown service provider keywords.csv”: This file contains the keywords that can be used to identify the cryptominer, but the associated service provider is unknown.