That was the simple question I asked myself on Saturday morning, thinking the answer would likely be simple to find. It wasn’t and ended up 48 hours later with me building this jupyter notebook to find out.
I really thought it would be as easy as pulling down the NVD data feeds and running a simple nvd['Published'].value_counts().head(10)
to find out that 1098 of 146450 CVEs were published on 2004-12-31.
I even produced a nice little graph:
Except, looking at it, that data didn’t make much sense. With some more research and help, it became clear the data quality from NVD is pretty poor.
Using a tool called MissingNo to get a visualization makes it obvious that only about half the CVEs in the data are complete:
When you drop CVEs that are missing the CVSS BaseScore
to clean up the data here is what the new graph looks like:
The “best” answer to What Day Had The Most CVEs published appears to be 2020-04-15 with 508 of 72964 CVEs published that date.
Here is what the top 10 days looks like:
2020-04-15 508 2018-07-09 431 2019-12-18 364 2018-06-11 349 2018-02-15 340 2017-08-08 316 2019-09-27 309 2020-03-12 307 2018-04-18 281 2017-04-24 281
All that being said, I am not a Data Scientist so I am open to any pull requests or suggestions on how to improve the data in the notebook I built.