[ Beneath the Waves ]

Yield-Focused Vulnerability Score (YFVS)

Ben Lincoln

 

Vulnerability scoring isn't the most potentially-exciting part of penetration testing, so I'm going to start this writeup out with pictures of flashy radial bar graphs and a live demo you can play with.

Example YFVS 0.4 Scores
[ Example 1 (3.3) ]
Example 1 (3.3)
[ Example 2 (5.0) ]
Example 2 (5.0)
[ Example 3 (7.9) ]
Example 3 (7.9)
[ Example 4 (8.5) ]
Example 4 (8.5)
[ Example 5 (9.2) ]
Example 5 (9.2)

Did you notice the QR code for the raw score string? There's one for the dynamic URL too!

 

The live YFVS 0.4 score calculator requires JavaScript and an HTML5-capable browser. You can read more about the radial bar graphs in the Nightingale Charts article.

Background

In the spring/summer of 2014, I was asked to help formalize our penetration testing programme at work. One of the big gaps that I saw — especially if we were going to hire contract pen-testers for short-term engagements — was that our scoring system was not very solid. Among other things, there were few firm criteria for selecting different values, so if a particular vulnerability's score was called into question, we'd have to discuss it with whoever applied that score to understand their reasoning — not a cheap option if that tester's contract was already over.

I looked for an existing system to use and came up empty in terms of one that was reasonably objective and also returned scores that I felt were in line with the actual severity of real-world findings. There have been numerous discussions elsewhere about (for example) CVSS' shortcomings, but if you'd like to read my thoughts in particular, see YFVS Sidebar 1: Shortcomings of Existing Systems.

This writeup describes the current draft of the highly-experimental system I've developed in an attempt to fill the void of a decent pen-testing vulnerability scoring system. It is not "ready for primetime", but I hope it will at least kick off some discussions that lead to a really solid system. While my specific goal was a system to use for pen-test vulnerabilities, I don't see any reason it couldn't work for vulnerabilities in general.

I am not actually a big fan of developing processes and procedures (I'd rather be reverse-engineering something so that I can figure out all of the neat other things it can do that its owner/vendor doesn't know about, especially if one of those "neat other things" is "unauthenticated remote code-execution"), but I see this as a huge gap in the toolset that pen-testers have today. We need to be able to represent the severity of issues in a way that's understandable to technical people outside of information security, as well as businesspeople and others whose expertise has nothing to do with Kerberos "golden tickets", XML external entity data exfiltration, or smashing the stack. That means giving them (at a minimum) a single overall score that they can use to determine potential impact to the business/organization, prioritize work on fixing the issue, and so on. I believe that there is value in providing a lower-level breakdown of the overall score for members of that audience who would like a deeper understanding, so I went one step further and built that in as well.

Warning: on a scale from 1 to 10, where 1 represents "under-engineered" and 10 represents "over-engineered", I would consider this system to be a 6 or 7. CVSS version 2.0 would be a 3, and 3.0 would be a 4. Microsoft's deprecated DREAD model would be a 1 or 2. I believe the results are much more accurate, that the example tool is much more user-friendly (and light years more like the dashboard of a Colonial Viper) than any other vuln scoring system has, and that because the results aren't really open to debate that the overall effort is the same or lower, but there is a bit more time required up front to use it.

Goals

Here's what I set out to achieve:

  1. The system should be as objective as possible. There should be one correct answer to each question that impacts the score.
  2. Scoring should be performed in the context of how severely the vulnerability compromises the component, system, or systems that the rating is explicitly scoped to. This is in contrast to systems like CVSS that typically score vulnerabilities depending on how much they compromise the particular OS instance that the vulnerable component is running on. If the vulnerability is severe enough to "escape" the context of that scope (for example, SQL injection combined with the ability to obtain OS-level command execution via the database software), then the vulnerable system should be considered "completely compromised", because it has utterly failed from a security perspective. Therefore, most of the maximum Yield scores are reserved for cases where something outside the scope of the vulnerable system(/component/etc.) has been exposed.
  3. A score should be able to be applied to a chain or tree of vulnerabilities if this is the most effective way to illustrate the true severity of a set of flaws when used together.
  4. The result should be a single number between 1 and 10 which represents with reasonable accuracy the severity of a vulnerability, but the input data which resulted in that score should be preserved and available at several levels of granularity.
  5. Where possible, elements of existing scoring systems which are successful should be mimicked to make the system more intuitive as well as to avoid unnecessary work. As a result, you'll see echoes of CVSS, DREAD, and other systems in YFVS, but ideally they're echoes of only the good parts of those other systems.
  6. It should be possible to arrive at a high score via a number of different means, depending on the specifics of a vulnerability.
  7. When applying score modifiers to account for the details of a specific implementation, workarounds (in the general sense of being things that are applied on top of a problem) should never be able to reduce the severity of a serious vulnerability as much as actually fixing the flaw would.[1]
  8. The system should reflect real-world security concerns, as opposed to being forced to fit into any particular theoretical model.
  9. Penetration testing/red team activities/"no-notice interoperability exercises" cover a broad range of vulnerabilities and exploitation thereof. The system should be able to account for software, hardware, mechanical, and other types of flaw, unless doing so would cause the system to become unwieldy.

High-level scoring model

Without delving into the individual elements (yet), the basic scoring model is made up of three factors:

  1. Yield — what sort of potential effects can be obtained by exploiting the vulnerability?
  2. Ease-of-Exploitation — is the vulnerability trivially exploitable, or are there factors that make it more challenging?
  3. Stealthiness — how hard will it be to detect (in near realtime) or investigate (forensically) exploitation of the vulnerability?

Yield is given the most weight, followed by Ease-of-Exploitation, and finally Stealthiness (hence the name "Yield-Focused"). For details, see YFVS - Scoring Formula Details.

These three basic factors are always scored, and apply to the "vulnerable system" — that is, if the scope of the score is "Apache Tomcat version 7.0.54", then the ratings for these values should apply to the default configuration of Tomcat 7.0.54. In other words, this group of three factors is similar to the CVSS "Impact" and "Exploitability" metrics. Similarly, as with the CVSS Base Score Metrics, the values for these factors should be very unlikely to change over the lifetime of the vulnerability.

For scoring considerations that still apply to all deployed instances of the vulnerable systems but which are likely to change over time, a single category of sub-scores called Prerequisites is used. These sub-scores capture (for a given moment in time) how much time/resources/effort a potential attacker must go to in order to discover and exploit the vulnerability.

Many things that influence the severity of a vulnerability only appear once a vulnerable system is actually deployed in a real-world environment. There are two categories of this type:

  1. Consequences — the real-world/"big picture" effects that will result if the vulnerability is exploited in a particular deployed instance. For example, if the deployed instance of a vulnerable version of IIS is used to host an online store, then a breach of that system may be legally required to be reported to a government agency.
  2. Criticality Modifiers — aspects of a particular deployed instance which make influence the more basic sub-scores (either positively or negatively). For example, a web application firewall may make exploitation of a SQL injection vulnerability more difficult.

To skip ahead to a walkthrough of how several example vulnerabilities and vulnerability chains/trees are scored, see YFVS - Example Comparison.

For the low-level nuts and bolts, you can read the YFVS - Scoring Formula Details article, but the process for transforming these categories into the final score is essentially:

  1. Apply certain Criticality Modifiers to the Y/E/S values to derive the Modified Yield, Modified Ease-of-Exploitation, and Modified Stealthiness.
  2. Take the worst-case scenario from either the Modified Yield or Consequences (whichever is higher). This is the Effective Yield.
  3. Average all of the Modified Ease-of-Exploitation and Prerequisite sub-scores together. This is the Effective Ease-of-Exploitation.
  4. Average all of the Modified Stealthiness and two of the potential Criticality Modifiers together. This is the Effective Stealthiness.
  5. Combine the Effective Ease-of-Exploitation and Effective Stealthiness values in a 3:1 ratio. For example, if Effective Ease-of-Exploitation is 7.3 and Effective Stealthiness is 6.2, the result would be 7.025.
  6. Subtract the value from the previous step from 10, then divide by 2 (to account for the inability of workarounds to completely address fundamental vulnerabilities).
  7. Subtract the value from the previous step from the Effective Yield. The result is the Overall Score.

Post-score reporting

With individual vulnerability scores scoped to the component which is actually vulnerable, those scores can then be processed and reported on in a variety of ways. I favour an approach in which the person or team which is directly responsible for the vulnerable component sees the full details by default. The management structure that they report to would (by default) have one or more of the following applied to the individual scores, but could also view the low-level details if they wished:

  1. Summarize the data statistically. E.g. Team A (which reports to me) has 12 open vulnerabilities of score 8.5 or higher ("critical").
  2. Filter the data based on its score. E.g. if I am a manager directly above Team A and Team B, I will only see vulnerabilities of score 5 or higher. My manager will only see vulnerabilities of score 6 or higher, and their manager will only see vulnerabilities of score 7 or higher.
  3. Scale the vulnerability score based on how much of my responsibilities it applies to. This can be done in a number of ways — by number of logical systems which my teams are responsible for, by percentage of my teams' applications that are potentially impacted (regardless of how many logical hosts make up those systems), by percentage of systems responsible for corporate revenue (for executives), etc.

Some scenarios may require that the result be a number within a different range than 1-10. 1-5 is fairly common, and in this case the conversion is easy - divide by two and round up.

Mapping numeric values to "friendly" criticality names is also fairly common. When tuning the YFVS values, my goal was a more-or-less linear progression. My recommendation would be something like:

Things I considered but left out

I originally tried to account for potential motivations that an attacker might have for exploiting a particular vulnerability — sort of a "Benefits to Attacker" category to complement the "Consequences". This was in an effort to provide another quasi-"likelihood of exploitation" value (see YFVS Sidebar 2: Likelihood Ratings for a longer discussion on this). I found it very difficult to not only account for all of the potential benefits (some of which may be extremely non-obvious to the person scoring the vulnerability), but also to quantify them.

For example, one of the scoring elements was "Financial Motivation". The concept is fairly straightforward, but how is it quantified? There needs to be some sort of scale. Should it be in relation to how much absolute money can be obtained by exploiting a vulnerability? A criminal operating out of a relatively poor country may be motivated by a much lower absolute figure than one operating out of a first-world country. The best I could do was answers along the lines of:

I don't think anyone would take the time to calculate these values accurately.

So why not simplify it down to a yes/no option? Because at some organizations, this would result in debates over whether a tiny financial benefit (ten cents, for example) meant that the answer should be "yes" or "no". The system is supposed to be as objective as possible.

It was even harder trying to quantify physical benefits (e.g. for a vulnerability in a vending machine that allows free products to be obtained) or corporate/military espionage-type benefits to an attacker. I pretty much gave up on the concept at that point. If you have any better ideas, I'd be happy to hear them.

Feedback

This is absolutely an early draft of a work in progress. Please send whatever feedback you have via the Contact form, or reply in one of the forums/discussion lists that I'm planning to spam with this. As noted above, the main goal is to kick off a discussion that leads to a system that's useful for a lot of people. That won't happen if I'm the only person working on it.

Are there any scoring factors that are missing? How about redundant factors or options that can be simplified away? Are there any gross errors with the numeric values assigned to various answers? What did I miss or get wrong?

 
Footnotes
1. This is a matter of personal philosophy, but it's one I believe in very strongly. There are two main reasons. First, the more complex a system becomes, the harder it is to accurately predict how it will behave. In other words, if you take an already insanely-complicated system like a modern web application, and then try to account for its security vulnerabilities by putting a web application firewall (which in many ways is typically even more complicated) in front of it, you've probably improved security overall, but trying to be sure that every possible permutation of the actual vulnerabilities are addressed is essentially impossible. Second (and this is somewhat related to the first point), it is still very likely that there are still ways to exploit each of those vulnerabilities, even if the difficulty of doing so has been increased slightly. For example, a major multinational vendor recently responded to my report of an unauthenticated remote code execution (as a local administrator account) vulnerability in their product by telling me that their recommendation was to use firewall rules to block remote access to the port on which the vulnerable service ran (as opposed to, you know, removing the vulnerable component, which was completely non-functional in the current release anyway). While this does block the easiest route by which the vulnerability can be exploited, there are all sorts of additional ways to get to it: Obtain a non-privileged account which can log on via RDP or as a batch job and proxy connections to the still-vulnerable service on the loopback address, find a vulnerability in one of the many other listening services which allows connections to be proxied, etc. Vulnerabilities generally result when the people building a system don't understand it as completely as an attacker does. If the people who built the system missed an attack vector, do you really think you're going to do significantly better when designing your workaround?
 
[ Page Icon ]