Yield-Focused Vulnerability Score (YFVS)
Vulnerability scoring isn't the most potentially-exciting part of penetration testing, so I'm going to start this writeup out with pictures of flashy radial bar graphs and a live demo you can play with.
|Example YFVS 0.4 Scores|
Did you notice the QR code for the raw score string? There's one for the dynamic URL too!
In the spring/summer of 2014, I was asked to help formalize our penetration testing programme at work. One of the big gaps that I saw — especially if we were going to hire contract pen-testers for short-term engagements — was that our scoring system was not very solid. Among other things, there were few firm criteria for selecting different values, so if a particular vulnerability's score was called into question, we'd have to discuss it with whoever applied that score to understand their reasoning — not a cheap option if that tester's contract was already over.
I looked for an existing system to use and came up empty in terms of one that was reasonably objective and also returned scores that I felt were in line with the actual severity of real-world findings. There have been numerous discussions elsewhere about (for example) CVSS' shortcomings, but if you'd like to read my thoughts in particular, see YFVS Sidebar 1: Shortcomings of Existing Systems.
This writeup describes the current draft of the highly-experimental system I've developed in an attempt to fill the void of a decent pen-testing vulnerability scoring system. It is not "ready for primetime", but I hope it will at least kick off some discussions that lead to a really solid system. While my specific goal was a system to use for pen-test vulnerabilities, I don't see any reason it couldn't work for vulnerabilities in general.
I am not actually a big fan of developing processes and procedures (I'd rather be reverse-engineering something so that I can figure out all of the neat other things it can do that its owner/vendor doesn't know about, especially if one of those "neat other things" is "unauthenticated remote code-execution"), but I see this as a huge gap in the toolset that pen-testers have today. We need to be able to represent the severity of issues in a way that's understandable to technical people outside of information security, as well as businesspeople and others whose expertise has nothing to do with Kerberos "golden tickets", XML external entity data exfiltration, or smashing the stack. That means giving them (at a minimum) a single overall score that they can use to determine potential impact to the business/organization, prioritize work on fixing the issue, and so on. I believe that there is value in providing a lower-level breakdown of the overall score for members of that audience who would like a deeper understanding, so I went one step further and built that in as well.
Warning: on a scale from 1 to 10, where 1 represents "under-engineered" and 10 represents "over-engineered", I would consider this system to be a 6 or 7. CVSS version 2.0 would be a 3, and 3.0 would be a 4. Microsoft's deprecated DREAD model would be a 1 or 2. I believe the results are much more accurate, that the example tool is much more user-friendly (and light years more like the dashboard of a Colonial Viper) than any other vuln scoring system has, and that because the results aren't really open to debate that the overall effort is the same or lower, but there is a bit more time required up front to use it.
Here's what I set out to achieve:
High-level scoring model
Without delving into the individual elements (yet), the basic scoring model is made up of three factors:
Yield is given the most weight, followed by Ease-of-Exploitation, and finally Stealthiness (hence the name "Yield-Focused"). For details, see YFVS - Scoring Formula Details.
These three basic factors are always scored, and apply to the "vulnerable system" — that is, if the scope of the score is "Apache Tomcat version 7.0.54", then the ratings for these values should apply to the default configuration of Tomcat 7.0.54. In other words, this group of three factors is similar to the CVSS "Impact" and "Exploitability" metrics. Similarly, as with the CVSS Base Score Metrics, the values for these factors should be very unlikely to change over the lifetime of the vulnerability.
For scoring considerations that still apply to all deployed instances of the vulnerable systems but which are likely to change over time, a single category of sub-scores called Prerequisites is used. These sub-scores capture (for a given moment in time) how much time/resources/effort a potential attacker must go to in order to discover and exploit the vulnerability.
Many things that influence the severity of a vulnerability only appear once a vulnerable system is actually deployed in a real-world environment. There are two categories of this type:
To skip ahead to a walkthrough of how several example vulnerabilities and vulnerability chains/trees are scored, see YFVS - Example Comparison.
For the low-level nuts and bolts, you can read the YFVS - Scoring Formula Details article, but the process for transforming these categories into the final score is essentially:
With individual vulnerability scores scoped to the component which is actually vulnerable, those scores can then be processed and reported on in a variety of ways. I favour an approach in which the person or team which is directly responsible for the vulnerable component sees the full details by default. The management structure that they report to would (by default) have one or more of the following applied to the individual scores, but could also view the low-level details if they wished:
Some scenarios may require that the result be a number within a different range than 1-10. 1-5 is fairly common, and in this case the conversion is easy - divide by two and round up.
Mapping numeric values to "friendly" criticality names is also fairly common. When tuning the YFVS values, my goal was a more-or-less linear progression. My recommendation would be something like:
Things I considered but left out
I originally tried to account for potential motivations that an attacker might have for exploiting a particular vulnerability — sort of a "Benefits to Attacker" category to complement the "Consequences". This was in an effort to provide another quasi-"likelihood of exploitation" value (see YFVS Sidebar 2: Likelihood Ratings for a longer discussion on this). I found it very difficult to not only account for all of the potential benefits (some of which may be extremely non-obvious to the person scoring the vulnerability), but also to quantify them.
For example, one of the scoring elements was "Financial Motivation". The concept is fairly straightforward, but how is it quantified? There needs to be some sort of scale. Should it be in relation to how much absolute money can be obtained by exploiting a vulnerability? A criminal operating out of a relatively poor country may be motivated by a much lower absolute figure than one operating out of a first-world country. The best I could do was answers along the lines of:
I don't think anyone would take the time to calculate these values accurately.
So why not simplify it down to a yes/no option? Because at some organizations, this would result in debates over whether a tiny financial benefit (ten cents, for example) meant that the answer should be "yes" or "no". The system is supposed to be as objective as possible.
It was even harder trying to quantify physical benefits (e.g. for a vulnerability in a vending machine that allows free products to be obtained) or corporate/military espionage-type benefits to an attacker. I pretty much gave up on the concept at that point. If you have any better ideas, I'd be happy to hear them.
This is absolutely an early draft of a work in progress. Please send whatever feedback you have via the Contact form, or reply in one of the forums/discussion lists that I'm planning to spam with this. As noted above, the main goal is to kick off a discussion that leads to a system that's useful for a lot of people. That won't happen if I'm the only person working on it.
Are there any scoring factors that are missing? How about redundant factors or options that can be simplified away? Are there any gross errors with the numeric values assigned to various answers? What did I miss or get wrong?
|1.||This is a matter of personal philosophy, but it's one I believe in very strongly. There are two main reasons. First, the more complex a system becomes, the harder it is to accurately predict how it will behave. In other words, if you take an already insanely-complicated system like a modern web application, and then try to account for its security vulnerabilities by putting a web application firewall (which in many ways is typically even more complicated) in front of it, you've probably improved security overall, but trying to be sure that every possible permutation of the actual vulnerabilities are addressed is essentially impossible. Second (and this is somewhat related to the first point), it is still very likely that there are still ways to exploit each of those vulnerabilities, even if the difficulty of doing so has been increased slightly. For example, a major multinational vendor recently responded to my report of an unauthenticated remote code execution (as a local administrator account) vulnerability in their product by telling me that their recommendation was to use firewall rules to block remote access to the port on which the vulnerable service ran (as opposed to, you know, removing the vulnerable component, which was completely non-functional in the current release anyway). While this does block the easiest route by which the vulnerability can be exploited, there are all sorts of additional ways to get to it: Obtain a non-privileged account which can log on via RDP or as a batch job and proxy connections to the still-vulnerable service on the loopback address, find a vulnerability in one of the many other listening services which allows connections to be proxied, etc. Vulnerabilities generally result when the people building a system don't understand it as completely as an attacker does. If the people who built the system missed an attack vector, do you really think you're going to do significantly better when designing your workaround?|