Data Sources

SSL Certificates

Our data is provided by Rapid7, who generously makes (roughly) weekly full IPv4 HTTPS scans. The data can be downloaded from the University of Michigan Internet Scans Repository. For this study, we use the scans between October 30, 2013 and April 28, 2014.

Alexa Domains

We filter the SSL certificates to only consider those that advertise a Common Name in the Alexa Top 1 Million domains.

Vulnerability Scan

To determine if a host is running a version of OpenSSL that was likely vulnerable in the past, we conduct our own scan. Please use the Contact Us link if you need access to this data set.

Processed data

Verification of Certificate Chais

For each Alexa-domain-advertising certificate we encounter, we validate the certificate's chain using openssl verify. For certificate chains that were advertised in the past, we use the Faketime library in combination with OpenSSL.

Certificate Database

We place all 628,692 valid, Alexa-domain-advertising certificates we find into a SQLite database. This database can be downloaded from this link (552 MB). There are a number of tables in this database, which are briefly described below.

Analysis

Most of the analysis in the paper is expressed as queries against the SQLite database described above. For example, Figure 5 is generated using the following queries: