While the overall SSL ecosystem is well-studied, the frequency with which certificates are revoked and the circumstances under which clients (e.g., browsers) check whether certificates are revoked are still not well-understood. In our IMC'15 paper, we took a close look at certificate revocations in the Web's PKI. Using 74 full IPv4 HTTPS scans, we found that a surprisingly large fraction (8%) of the certificates served have been revoked, and that obtaining certificate revocation information can often be expensive in terms of latency and bandwidth for clients. We then studied the revocation checking behavior of 30 different combinations of web browsers and operating systems; we found that browsers often do not bother to check whether certificates are revoked (including mobile browsers, which uniformly never check). We also examined the CRLSet infrastructure built into Google Chrome for disseminating revocations; we found that CRLSet only covers 0.35% of all revocations. Overall, our results paint a bleak picture of the ability to effectively revoke certificates today.
Our data is provided by Rapid7, who generously makes (roughly) weekly full IPv4 HTTPS scans. The data can be downloaded from the University of Michigan Internet Scans Repository. For this study, we use the scans between October 30, 2013 and March 30, 2015. Overall, we observe 38,514,130 unique SSL certificates.
In section 7.2, we compare the revocation in CRLs from Alexa Top 1 Million domains with the CRLSets.
We verify all observed certificates by first building the set of all intermediates certificates that can be verified relative to the roots. And we discover 1,946 intermediate certificates, which we refer to as the Intermediate Set.
We then verify all leaf certificates using this set of intermediates and root certificates. And we discover a total of 5,067,476 such leaf certificates, which we refer to as the Leaf Set.
To determine the revocation status for all valid certificates from these scans, we check the status for Intermediate Set and Leaf Set every day, starting in October, 2014.
For the certificates that include a CRL distribution point, we use this CRL to obtain revocation information for the certificate. We observe a total of 2,800 unique CRLs, and we configure a crawler to download each of these CRLs once per day between October 2, 2014 and March 31, 2015.
We observe a total of 499 unique OCSP responders across all certificates. And we only query the OCSP responders for the 642 certificates that only have an OCSP responder provided (i.e., no CRL distribution point). This data was collected on March 31, 2015.
To determine what fraction of certificates are hosted on servers that support OCSP Stapling, we use the IPv4 TLS Handshake scans conducted by the University of Michigan, which can be downloaded from this link. We examine the scan of March 28, 2015, and look for servers that were advertising certificates in the Leaf Set.
To examine Google's approach CRLSets, we fetch the files once per day between September 23, 2014 and March 31, 2015, and crawled 110 historical CRLSets originally published between July 18th, 2013 and September 23, 2014; in total our dataset contains 300 unique CRLSets.
We place all 5,067,476 valid certificates we find into a SQLite database. This database can be downloaded from this link (in total 2.9GB). There are a number of tables in this database, which are briefly described below.