Information Disclosure
The Information Disclosure section covers attacks designed to acquire system specific information about a web site. This system specific information includes the software distribution, version numbers, and patch levels, or the information may contain the location of backup files and temporary files. In most cases, divulging this information is not required to fulfill the needs of the user. Most web sites will reveal some data, but it’s best to limit the amount of data whenever possible. The more information about the web site an attacker learns, the easier the system becomes to compromise.
Directory Indexing
Automatic directory listing/indexing is a web server function that lists all of the files within a requested directory if the normal base file (index.html/home.html/default.htm) is not present. When a user requests the main page of a web site, he normally types in a URL such as http://www.example.com, using the domain name and excluding a specific file. The web server processes this request and searches the document root directory for the default filename and sends this page to the client. If this page is not present, the web server will issue a directory listing and send the output to the client. Essentially, this is equivalent to issuing a "ls" (Unix) or "dir" (Windows) command within this directory and showing the results in HTML form. From an attack and countermeasure perspective, it is important to realize that unintended directory listings may be possible due to software vulnerabilities (discussed next in the example section) combined with a specific web request.
When a web server reveals a directory’s contents, the listing could contain information not intended for public viewing. Often web administrators rely on "Security Through Obscurity," assuming that if there are no hyperlinks to these documents, they will not be found, or no one will look for them. The assumption is incorrect. Today’s vulnerability scanners, such as Nikto, can dynamically add additional directories/files to include in their scan based upon data obtained in initial probes. By reviewing the /robots.txt file and/or viewing directory indexing contents, the vulnerability scanner can now interrogate the web server further with this new data. Although potentially harmless, directory indexing could allow an information leak that supplies an attacker with the information necessary to launch further attacks against the system.
Directory Indexing Example
The following information could be obtained based on directory indexing data:
- Backup files—with extensions such as .bak, .old, or .orig.
- Temporary files—these are files that are normally purged from the server but for some reason are still available.
- Hidden files—with filenames that start with a "." (period).
- Naming conventions—an attacker may be able to identify the composition scheme used by the web site to name directories or files. Example: Admin versus admin, backup vs. back-up, and so on.
- Enumerate user accounts—personal user accounts on a web server often have home directories named after their user account.
- Configuration file contents—these files may contain access control data and have extensions such as .conf, .cfg, or .config.
- Script contents—Most web servers allow for executing scripts by either specifying a script location (e.g., /cgi-bin) or by configuring the server to try and execute files based on file permissions (e.g., the execute bit on *nix systems and the use of the Apache XBitHack directive). Due to these options, if directory indexing of cgi-bin contents are allowed, it is possible to download/review the script code if the permissions are incorrect.
There are three different scenarios where an attacker may be able to retrieve an unintended directory listing/index:
- The web server is mistakenly configured to allow/provide a directory index. Confusion may arise of the net effect when a web administrator is configuring the indexing directives in the configuration file. It is possible to have an undesired result when implementing complex settings, such as wanting to allow directory indexing for a specific sub-directory, while disallowing it on the rest of the server. From the attacker’s perspective, the HTTP request is normal. They request a directory and see if they receive the desired content. They are not concerned with or care "why" the web server was configured in this manner.
- Some components of the web server allow a directory index even if it is disabled within the configuration file or if an index page is present. This is the only valid "exploit" example scenario for directory indexing. There have been numerous vulnerabilities identified on many web servers that will result in directory indexing if specific HTTP requests are sent.
- Search engines’ cache databases may contain historical data that would include directory indexes from past scans of a specific web site.
Apache Countermeasures for Directory Indexing
First of all, if directory indexing is not required for some specific purpose, then it should be disabled in the Options directive, as outlined in Chapter 4. If directory indexing is accidentally enabled, you can implement the following Mod_Security directive to catch this information in the output data stream. Figure 7.1 shows what a standard directory index web page looks like.
Web pages that are dynamically created by the directory indexing function will have a title that starts with "Index of /". We can use this data as a signature and add the following Mod_Security directives to catch and deny this access to this data:
SecFilterScanOutput On SecFilterSelective OUTPUT "\<title\>Index of /"
Figure 7.1 Standard directory index web page.
References
Directory
Indexing
Vulnerability
Alerts
http://www.securityfocus.com/bid/1063
http://www.securityfocus.com/bid/6721
http://www.securityfocus.com/bid/8898
Nessus "Remote File Access" Plugin web page http://cgi.nessus.org/plugins/dump.php3?family=Remote%20file%20access
Web
Site
Indexer
Tools
http://www.download-freeware-shareware.com/Internet.php?Theme=112
Search Engines as a Security Threat http://it.korea.ac.kr/class/2002/software/Reading%20List/Search%20Engines%20a %20a%20Security%20Threat.pdf
The
Google
Hacker's
Guide
http://johnny.ihackstuff.com/security/premium/The_Google_Hackers_
Guide_v1.0.pdf
Information Leakage
Information Leakage occurs when a web site reveals sensitive data, such as developer comments or error messages, which may aid an attacker in exploiting the system. Sensitive information may be present within HTML comments, error messages, source code, or simply left in plain sight. There are many ways a web site can be coaxed into revealing this type of information. While leakage does not necessarily represent a breach in security, it does give an attacker useful guidance for future exploitation. Leakage of sensitive information may carry various levels of risk and should be limited whenever possible.
In the first case of Information Leakage (comments left in the code, verbose error messages, etc.), the leak may give intelligence to the attacker with contextual information of directory structure, SQL query structure, and the names of key processes used by the web site.
Often a developer will leave comments in the HTML and script code to help facilitate debugging or integration. This information can range from simple comments detailing how the script works, to, in the worst cases, usernames and passwords used during the testing phase of development.
Information Leakage also applies to data deemed confidential, which aren’t properly protected by the web site. These data may include account numbers, user identifiers (driver’s license number, passport number, social security numbers, etc.) and user-specific data (account balances, address, and transaction history). Insufficient Authentication, Insufficient Authorization, and secure transport encryption also deal with protecting and enforcing proper controls over access to data. Many attacks fall outside the scope of web site protection, such as client attacks, the "casual observer" concerns. Information Leakage in this context deals with exposure of key user data deemed confidential or secret that should not be exposed in plain view even to the user. Credit card numbers are a prime example of user data that needs to be further protected from exposure or leakage even with the proper encryption and access controls in place.
Information Leakage Example
There are three main categories of Information Leakage: comments left in code, verbose error messages, and confidential data in plain sight. Comments left in code:
<TABLE border="0" cellPadding="0" cellSpacing="0" height="59" width="591"> <TBODY> <TR> <!--If the image files are missing,restart VADER --> <TD bgColor="#ffffff" colSpan="5"
height="17" width="587"> </TD>
Here we see a comment left by the development/QA personnel indicating what one should do if the image files do not show up. The security breach is the host name of the server that is mentioned explicitly in the code, "VADER."
An example of a verbose error message can be the response to an invalid query. A prominent example is the error message associated with SQL queries. SQL Injection attacks typically require the attacker to have prior knowledge of the structure or format used to create SQL queries on the site. The information leaked by a verbose error message can provide the attacker with crucial information on how to construct valid SQL queries for the backend database. The following was returned when placing an apostrophe into the username field of a login page:
An Error Has Occurred. Error Message: System.Data.OleDb.OleDbException: Syntax error (missing operator) in query expression ’username = ’’’ and password = ’g’’. at System.Data.OleDb.OleDbCommand.ExecuteCommandTextErrorHandling ( Int32 hr) at System.Data.OleDb.OleDbCommand.ExecuteCommandTextForSingleResult
( tagDBPARAMS dbParams, Object& executeResult) at
In the first error statement, a syntax error is reported. The error message reveals the query parameters that are used in the SQL query: username and password. This leaked information is the missing link for an attacker to begin to construct SQL Injection attacks against the site.
Confidential data left in plain sight could be files that are placed on a web server with no direct html links pointing to them. Attackers may enumerate these files by either guessing filenames based on other identified names or perhaps through the use of a local search engine.
Apache Countermeasures for Information Leakage
Preventing Verbose Error Messages
Containing information leaks such as these requires Apache to inspect the outbound data sent from the web applications to the client. One way to do this, as we have discussed previously, is to use the OUTPUT filtering capabilities of Mod_Security. We can easily set up a filter to watch for common database error messages being sent to the client and then generate a generic 500 status code instead of the verbose message:
SecFilterScanOutput On SecFilterSelective OUTPUT "An Error Has Occurred" status:500
Preventing Comments in HTML
While Mod_Security is efficient at identifying signature patterns, it does have one current shortcoming. Mod_Security cannot manipulate the data in the transaction. When dealing with information disclosures in HTML comment tags, it would not be appropriate to deny the entire request for a web page due to comment tags. So how can we handle this? There is a really cool feature in the Apache 2.0 version called filters: http://httpd.apache.org/docs-2.0/mod/mod_ext_filter.html. The basic premise of filters is that they read from standard input and print to standard output. This feature becomes intriguing from a security perspective when dealing with this type of information disclosure prevention. First, we use the ExtFilterDefine directive to set up our output filter. In this directive, we tell Apache that this is an output filter, that the input data will be text, and that we want to use an OS command to act on the data. In this case, we can use the Unix Stream Editor program (sed) to strip out any comment tags. The last step is to use the SetOutputFilter directive to activate the filter in a LocationMatch directive. We can add the following data to the httpd.conf file to effectively remove all HTML comment tags, on-the-fly, as they are being sent to the client:
ExtFilterDefine remove_comments mode=output intype=text/html cmd="/bin/sed s/\<\!--.*--\>//g" <LocationMatch /*> SetOutputFilter remove_comments </LocationMatch>
Pretty slick, huh? Just think, this is merely the tip of the iceberg as far as the potential possibilities for using filters for security purposes.
References
"Best practices with custom error pages in .Net," Microsoft Support http://support.microsoft.com/default.aspx?scid=kb;en-us;834452
"Creating
Custom
ASP
Error
Pages," Microsoft
Support
http://support.microsoft.com/default.aspx?scid=kb;en-us;224070
"Apache
Custom
Error
Pages," Code
Style
http://www.codestyle.org/sitemanager/apache/errors-Custom.shtml
"Customizing the Look of Error Messages in JSP," DrewFalkman.com http://www.drewfalkman.com/resources/CustomErrorPages.cfm
ColdFusion Custom Error Pages http://livedocs.macromedia.com/coldfusion/6/Developing_ColdFusion_MX_ Applications_with_CFML/Errors6.htm
Obfuscators: JAVA http://www.cs.auckland.ac.nz/~cthombor/Students/hlai/hongying.pdf
Path Traversal
The Path Traversal attack technique forces access to files, directories, and commands that potentially reside outside the web document root directory. An attacker may manipulate a URL in such a way that the web site will execute or reveal the contents of arbitrary files anywhere on the web server. Any device that exposes an HTTP-based interface is potentially vulnerable to Path Traversal.
Most web sites restrict user access to a specific portion of the file-system, typically called the "web document root" or "CGI root" directory. These directories contain the files intended for user access and the executables necessary to drive web application functionality. To access files or execute commands anywhere on the file system, Path Traversal attacks will utilize the ability of special-character sequences.
The most basic Path Traversal attack uses the "../" special-character sequence to alter the resource location requested in the URL. Although most popular web servers will prevent this technique from escaping the web document root, alternate encodings of the "../" sequence may help bypass the security filters. These method variations include valid and invalid Unicode-encoding ("..%u2216" or "..%c0%af") of the forward slash character, backslash characters ("..\") on Windows-based servers, URL-encoded characters ("%2e%2e%2f"), and double URL encoding ("..%255c") of the backslash character.
Even if the web server properly restricts Path Traversal attempts in the URL path, a web application itself may still be vulnerable due to improper handling of user-supplied input. This is a common problem of web applications that use template mechanisms or load static text from files. In variations of the attack, the original URL parameter value is substituted with the filename of one of the web application’s dynamic scripts. Consequently, the results can reveal source code because the file is interpreted as text instead of an executable script. These techniques often employ additional special characters such as the dot (".") to reveal the listing of the current working directory, or "%00" NUL characters in order to bypass rudimentary file extension checks.
Path Traversal Examples
Path Traversal Attacks Against a Web Server
GET /../../../../../some/file HTTP/1.0 GET /..%255c..%255c..%255csome/file HTTP/1.0 GET /..%u2216..%u2216some/file HTTP/1.0
Path Traversal Attacks Against a Web Application
Normal: GET /foo.cgi?home=index.htm HTTP/1.0 Attack: GET /foo.cgi?home=foo.cgi HTTP/1.0
In the previous example, the web application reveals the source code of the foo.cgi file because the value of the home variable was used as content. Notice that in this case, the attacker does not need to submit any invalid characters or any path traversal characters for the attack to succeed. The attacker has targeted another file in the same directory as index.htm.
Path Traversal Attacks Against a Web Application Using Special-Character Sequences
Original: GET /scripts/foo.cgi?page=menu.txt HTTP/1.0 Attack: GET /scripts/foo.cgi?page=../scripts/foo.cgi%00txt HTTP/1.0
In this example, the web application reveals the source code of the foo.cgi file by using special-characters sequences. The "../" sequence was used to traverse one directory above the current and enter the /scripts directory. The "%00" sequence was used both to bypass file extension check and snip off the extension when the file was read in.
Apache Countermeasures for Path Traversal Attacks
Ensure the user level of the web server or web application is given the least amount of read permissions possible for files outside of the web document root. This also applies to scripting engines or modules necessary to interpret dynamic pages for the web application. We addressed this step at the end of the CIS Apache Benchmark document when we updated the permissions on the different directories to remove READ permissions.
Normalize all path references before applying security checks. When the web server decodes path and filenames, it should parse each encoding scheme it encounters before applying security checks on the supplied data and submitting the value to the file access function. Mod_Security has numerous normalizing checks: URL decoding and removing evasion attempts such as directory self-referencing.
If filenames will be passed in URL parameters, then use a hard-coded file extension constant to limit access to specific file types. Append this constant to all filenames. Also, make sure to remove all NULL-character (%00) sequences in order to prevent attacks that bypass this type of check. (Some interpreted scripting languages permit NULL characters within a string, even though the underlying operating system truncates strings at the first NULL character.) This prevents directory traversal attacks within the web document root that attempt to view dynamic script files.
Validate all input so that only the expected character set is accepted (such as alphanumeric). The validation routine should be especially aware of shell meta-characters such as path-related characters (/ and \) and command concatenation characters (&& for Windows shells and semi-colon for Unix shells). Set a hard limit for the length of a user-supplied value. Note that this step should be applied to every parameter passed between the client and server, not just the parameters expected to be modified by the user through text boxes or similar input fields. We can create a Mod_Security filter for the foo.cgi script to help restrict the type file that may be referenced in the "home" parameter.
SecFilterSelective SCRIPT_FILENAME "/scripts/foo.cgi" chain SecFilterSelective ARG_home "!^[a-zA-Z].{15,}\.txt"
This filter will reject all parameters to the "home" argument that is a filename of more than 15 alpha characters and that doesn’t have a ".txt" extension.
References
"CERT Advisory CA-2001-12 Superfluous Decoding Vulnerability in IIS" http://www.cert.org/advisories/CA-2001-12.html
"Novell Groupwise Arbitrary File Retrieval Vulnerability" http://www.securityfocus.com/bid/3436/info/
Predictable Resource Location
Predictable Resource Location is an attack technique used to uncover hidden web site content and functionality. By making educated guesses, the attack is a brute force search looking for content that is not intended for public viewing. Temporary files, backup files, configuration files, and sample files are all examples of potentially leftover files. These brute force searches are easy because hidden files will often have common naming conventions and reside in standard locations. These files may disclose sensitive information about web application internals, database information, passwords, machine names, file paths to other sensitive areas, or possibly contain vulnerabilities. Disclosure of this information is valuable to an attacker. Predictable Resource Location is also known as Forced Browsing, File Enumeration, Directory Enumeration, and so forth.
Predictable Resource Location Examples
Any attacker can make arbitrary file or directory requests to any publicly available web server. The existence of a resource can be determined by analyzing the web server HTTP response codes. There are several Predictable Resource Location attack variations.
Blind Searches for Common Files and Directories
/admin/ /backup/ /logs/ /vulnerable_file.cgi
Adding Extensions to Existing Filename: (/test.asp)
/test.asp.bak /test.bak /test
Apache Countermeasures for Predictable Resource Location Attacks
To prevent a successful Predictable Resource Location attack and protect against sensitive file misuse, there are two recommended solutions. First, remove files that are not intended for public viewing from all accessible web server directories. Once these files have been removed, you can create security filters to identify if someone probes for these files. Here are some example Mod_Security filters that would catch this action:
SecFilterSelective REQUEST_URI "^/(scripts|cgi-local|htbin|cgibin |cgis|win-cgi|cgi-win|bin)/" SecFilterSelective REQUEST_URI ".*\.(bak|old|orig|backup|c)$"
These two filters will deny access to both unused, but commonly scanned for, directories and also files with common backup extensions.