SSL and TLS are used to secure the most commonly used Internet protocols. As a result, the ecosystem of SSL certificates has been thoroughly studied, leading to a broad understanding of the strengths and weaknesses of the certificates accepted by most web browsers. Prior work has naturally focused almost exclusively on “valid” certificates—those that standard browsers accept as well-formed and trusted—and has largely disregarded certificates that are otherwise "invalid." Surprisingly, however, this leaves the majority of certificates unexamined: we find that, on average, 65% of SSL certificates advertised in each IPv4 scan that we examine are actually invalid. In this paper, we demonstrate that despite their invalidity, much can be understood from these certificates. Specifically, we show why the web’s SSL ecosystem is populated by so many invalid certificates, where they originate from, and how they impact security. Using a dataset of over 80M certificates, we determine that most invalid certificates originate from a few types of end-user devices, and possess dramatically different properties than their valid counterparts. We find that many of these devices periodically reissue their (invalid) certificates, and develop new techniques that allow us to track these reissues across scans. We present evidence that this technique allows us to uniquely track over 6.7M devices. Taken together, our results open up a heretofore largely-ignored portion of the SSL ecosystem to further study.
The X.509 RFC defines a certificate as invalid if a client is unable to validate it at some point in time. There are multiple reasons that a client could find a certificate to be invalid: it could be outside of its validity period, it could have been revoked by its CA, its subject could be incorrect, its signature could be wrong, and so on. Because our dataset spans years (§4), we define a certificate as invalid if no client with a standard set of root certificates would ever be able to validate it (i.e., we ignore expiry warnings). The most common reason for invalidity that we have observed is certificates signed by an unknown or untrusted root; if the client does not trust the root of a certificate chain, it transitively does not trust the rest of the chain. Specifically, in our dataset, we found that 88.0% of invalid certificates are self-signed (i.e., the root of the chain is the leaf certificate itself) and a further 11.99% are signed by a different, untrusted certificate (i.e., the root of the chain is some other certificate that is not in the set of trusted root certificates).
The figure below (Figure 2) shows the full /8 analysis of discrepancy between University of Michigan and a Rapid7 scan. For our total measurement period, we grouped the IP addresses into their advertised BGP prefixes using historic RouteViews data and compared each BGP prefixes one could cover but the other couldn't; If neither datasets were able to scan some prefixes, the regions are colored gray, if only University of Michigan dataset could scan, they are colored green, and if only Rapid7 could do, they are colored red.
We believe this is due to black-listing by either the scan operators or the target networks (Rapid7 confirmed to us that they have a growing list of networks that requested to not be scanned). You also can see our numeric analysis about BGP prefixes in our paper.
|Name||Type||Size||SHA-256 Hash (Uncompressed)|
|Invalid Certificates||tsv (tab separated version)||65 GB||87afe4f2d9c0901d607ee2bb0fc2a77e3a043470b2ed1a09df628a069e9fb62b|
|Valid Certificates||tsv (tab separated version)||28 GB||f015814146b49afd38a43c05f3f1a3d9a841361f4fa1338dc07745f2de1b4b0d|
|Comparison btw. Invalid and Valid certificates||perl||Exporting basic statistics of each fields in certificates.|
|Linking Certificates||python||Linking multiple certificates having common field value and non-overlapping lifetime.|
|Cascade Linking Certificates||python||Iteratively linking multiple certificates using all possible field values. Linking conditions are same as above one.|