路径遍历
本文详细解析了路径遍历(也称目录遍历)漏洞的核心原理、实际危害与修复方案。该漏洞源于应用程序未对用户传入的文件路径参数进行严格的安全过滤,导致攻击者可利用诸如 ../ 的跳转序列突破预设目录,越权读取服务器上的任意敏感文件(如密码文件、源代码等),在特定条件下甚至能引发远程代码执行(RCE)。防御此漏洞的关键在于避免将用户输入直接传递给文件API,并结合白名单输入验证与底层路径规范化(Canonicalization)进行双重校验。
类型: PortSwigger 需要阅读: No
1. 📌 主题摘要 (Topic Summary)
本文档探讨了路径遍历(Path Traversal,亦称目录遍历 Directory Traversal)漏洞的核心机制与防御策略,并结合 6 个实战实验(Labs),详细解析了在不同防御机制下(如绝对路径拦截、非递归过滤、URL 解码、前缀/后缀校验)的多种绕过攻击手法。
2. 🧠 核心原理 (Core Principle)
底层机制:
当 Web 应用程序将用户提供的输入(如文件名)直接拼接到服务器的文件路径中,并传递给底层的文件系统操作时,如果没有进行严格的安全验证,就会引发路径遍历漏洞。
攻击者利用操作系统的目录解析规则,输入特殊的目录遍历序列(如 Unix/Linux 下的 ../ 或 Windows 下的 ..\),使解析后的路径“向上跳出”应用程序限定的基础目录(Base Directory),从而访问到文件系统根目录及其他任意位置的文件。
术语规范:
- Path Traversal / Directory Traversal - 路径遍历/目录遍历漏洞。
- API - Application Programming Interface (应用程序编程接口)。在此处指操作系统提供的用于读写文件的底层函数。
- URL - Uniform Resource Locator (统一资源定位符)。
- PoC - Proof of Concept (概念验证代码/载荷)。
- RCE - Remote Code Execution (远程代码执行) (AI 补充说明:指攻击者利用漏洞在目标服务器上执行任意系统命令,通常是文件写入或包含漏洞的最终危害)。
3. 🛠️ 实际应用与举例 (Usage & Examples - 怎么用)
应用场景:
常见于通过 URL 参数动态加载资源的场景,例如电商网站显示商品图片的接口:https://insecure-website.com/loadImage?filename=218.png。
具体示例与 PoC (结合实战 Labs):
以下汇总了不同安全防御场景下的具体攻击载荷(Payload)用于读取 Linux 系统标准的用户信息文件 /etc/passwd:
| 实验场景 (Lab Case) | 防御机制说明 | 攻击载荷 (Payload) | 绕过原理 |
|---|---|---|---|
| 基础场景 | 无任何防御措施。 | ../../../etc/passwd | 连续使用 ../ 跳回文件系统根目录。 |
| 绝对路径绕过 | 拦截了 ../ 序列,但按相对路径处理输入。 | /etc/passwd | 直接提供目标文件的绝对路径,无需遍历符号。 |
| 非递归过滤 | 应用程序仅单次剥离/替换了 ../。 | ....//....//....//etc/passwd | 利用嵌套(双写)序列。当内层的 ../ 被剔除后,外层字符会重新拼接成合法的 ../。 |
| 多余的 URL 解码 | 拦截了标准遍历序列,但在验证后进行了额外的 URL 解码。 | ..%252f..%252f..%252fetc/passwd | 双重 URL 编码绕过。%25 解码为 %,与 2f 结合成为 %2f,最终由应用/服务器再次解码为 /。 |
| 路径起点验证 | 验证参数必须以预期的基础文件夹路径开头。 | /var/www/images/../../../etc/passwd | 先输入合法的预期目录满足验证,随后紧跟 ../ 序列向外跳转。 |
| 文件后缀验证 | 验证参数必须以预期的扩展名(如 .png)结尾。 | ../../../etc/passwd%00.png | 空字节截断 (Null Byte Bypass)。利用 %00(URL编码的空字符)。应用层校验后缀通过,但底层 C/C++ 文件系统 API 遇到空字符会认为字符串结束,从而忽略后面的 .png。 |
代码/函数解析:
File(Java Class): 代表文件和目录路径名的抽象表示形式。例如new File(BASE_DIRECTORY, userInput)用于将基础目录与用户输入拼接。getCanonicalPath()(Java Method): Returns the canonical pathname string (返回此抽象路径名的规范路径名字符串)。该方法会解析路径中的所有../和./等相对路径符号,以及解析符号链接,最终返回目标文件的真实绝对路径。它是防御路径遍历的核心函数。
4. ⚠️ 危害评估 (Risk & Impact)
如果该漏洞被成功利用,将给系统带来极其严重的后果:
- 敏感信息泄露:攻击者能够读取应用源代码、数据库凭证(Credentials)、以及后端系统的敏感配置文件(如 Linux 的
/etc/passwd或 Windows 的win.ini)。 - 业务数据篡改:如果应用不仅存在读取漏洞,还存在文件写入漏洞,攻击者可以修改应用数据或系统配置文件。
- 系统完全接管:(AI 补充说明) 攻击者可通过写入 SSH 密钥、覆盖定时任务(Cron jobs)或上传 WebShell,最终实现 RCE,完全控制服务器。
5. 🛡️ 防御与修复建议 (Defense & Mitigation)
最有效的防御策略是彻底避免将用户提供的输入直接传递给底层文件系统 API。如果业务逻辑不可避免,必须采用以下双层防御机制:
- 严格的输入验证 (Input Validation):
- 最佳实践:使用白名单(Whitelist)机制,仅允许预先定义好的安全文件名。
- 备选方案:如果无法使用白名单,必须通过正则表达式验证输入内容仅包含允许的字符(例如:仅限字母和数字 Alphanumeric characters),彻底拒绝任何包含
/、\或%00的输入。
- 路径规范化与目录锁定 (Canonicalization & Base Directory Verification):
- 不要自己编写过滤
../的逻辑(容易被上述 Lab 中的手法绕过)。 - 使用平台提供的标准文件系统 API 将路径“规范化”(解析掉所有的遍历符号),然后再验证规范化后的绝对路径是否仍然以预期的基础目录开头。
- Java 修复方案示例:
- 不要自己编写过滤
// 1. 将用户输入与基础目录拼接
File file = new File(BASE_DIRECTORY, userInput);
// 2. 获取规范化后的绝对路径,并验证其是否未跳出安全目录
if (file.getCanonicalPath().startsWith(BASE_DIRECTORY)) {
// process file (安全,可以处理文件)
} else {
// 拒绝请求,记录安全日志
}
- 权限最小化原则 (Principle of Least Privilege) (AI 补充说明):
- 确保运行 Web 应用程序的服务账户(如
www-data)仅具有访问必需目录(如/var/www/images/)的读取权限,严禁赋予系统级目录(如/etc/)的访问权限。
Path Traversal
This article provides a detailed analysis of the core principles, actual threats, and mitigation strategies for path traversal (also known as directory traversal) vulnerabilities. The vulnerability arises from the lack of strict security filtering on file path parameters provided by users, allowing attackers to exploit jump sequences such as ../ to bypass predefined directories and read any sensitive files on the server (e.g., password files, source code, etc.). Under certain conditions, it can even lead to remote code execution (RCE). The key to defending against this vulnerability lies in avoiding direct transmission of user input to file APIs, and implementing a dual-check mechanism that includes whitelist validation and underlying path normalization.
Type: PortSwigger Recommended Reading: None
1. 📌 Topic Summary
This document explores the mechanisms behind path traversal (also known as directory traversal) vulnerabilities and outlines defense strategies. It includes six practical experiments (Labs) that demonstrate various bypass techniques under different defense mechanisms, such as absolute path interception, non-recursive filtering, URL decoding, and prefix/suffix validation.
2. 🧠 Core Principle
Underlying Mechanism:
When a web application directly appends user-provided input (e.g., file names) to the server’s file path and passes it to the underlying file system operations without strict security verification, a path traversal vulnerability is created. Attackers exploit the directory resolution rules of the operating system by inserting special traversal sequences (e.g., ../ on Unix/Linux or ..\ on Windows). This causes the parsed path to “jump out” of the base directory defined by the application, allowing access to the file system root directory and files in any other location.
Terminology Explanation:
- Path Traversal / Directory Traversal: A vulnerability that allows an attacker to access files outside the intended scope.
- API: Application Programming Interface, referring to the underlying functions provided by the operating system for file I/O.
- URL: Uniform Resource Locator.
- PoC (Proof of Concept): Code or payload used to demonstrate a vulnerability.
- RCE (Remote Code Execution): An attacker uses the vulnerability to execute arbitrary system commands on the target server, typically leading to file manipulation or the execution of malicious code.
3. 🛠️ Practical Applications and Examples (How to Use)
Use Cases:
This vulnerability is commonly seen in scenarios where resources are dynamically loaded via URL parameters, such as an e-commerce website’s interface for displaying product images: https://insecure-website.com/loadImage?filename=218.png.
Specific Examples and PoCs (Combined with Practical Labs):
The following examples demonstrate different types of attack payloads used to read the standard Linux user information file /etc/passwd in various security defense scenarios:
| Lab Case | Defense Mechanism Description | Attack Payload | Bypass Principle |
|---|---|---|---|
| Basic Scenario | No security measures in place. | ../../../etc/passwd | Repeatedly using ../ to navigate back to the root directory of the file system. |
| Absolute Path Bypass | The ../ sequence is intercepted, but the input is processed as a relative path. | /etc/passwd | The absolute path to the target file is provided directly, avoiding the need to traverse the file system. |
| Non-Recursive Filtering | The application only removes/rewrites ../ once. | ....//....//....//etc/passwd | By using a nested sequence, when the inner ../ is removed, the outer characters are recombined to form a valid ../. |
| Excessive URL Decoding | The standard traversal sequence is intercepted, but additional URL decoding is performed after verification. | ..%252f..%252f..%252fetc/passwd | Double URL encoding is used to bypass security measures. %25 is decoded to %, and when combined with 2f, it results in /. The application/server then decodes it back to /. |
| Path Startpoint Verification | Parameters must start with the expected base directory path. | /var/www/images/../../../etc/passwd | A valid expected directory is entered first to pass the verification, followed by the ../ sequence. |
| File Extension Verification | Parameters must end with the expected extension (e.g., .png). | ../../../etc/passwd%00.png | Null Byte Bypass: %00 (a null character in URL encoding) is used. The application layer verifies the extension, but the underlying C/C++ file system API assumes the string ends at the null character, ignoring the .png part. |
Code/Function Explanation:
File(Java Class): Represents an abstract representation of file and directory paths. For example,new File(BASE_DIRECTORY, userInput)is used to concatenate the base directory with user input.getCanonicalPath()(Java Method): Returns the canonical pathname string: This method parses all relative path symbols (such as../and./) and symbolic links in the path, ultimately returning the actual absolute path of the target file. It is a key function for defending against path traversal attacks.
4. ⚠️ Risk and Impact**
If this vulnerability is successfully exploited, it can have extremely serious consequences for the system:
- Sensitive Information Leakage: Attackers can access application source code, database credentials, and sensitive configuration files from the backend system (e.g.,
/etc/passwdon Linux orwin.inion Windows). - Business Data Tampering: If the application also has a write vulnerability, attackers can modify application data or system configuration files.
- Full System Control: (Additional AI Explanation) Attackers can achieve Remote Code Execution (RCE) by writing SSH keys, overwriting scheduled tasks (Cron jobs), or uploading a WebShell, ultimately taking complete control of the server.
5. 🛡️ Defense and Mitigation Recommendations
The most effective defense strategy is to thoroughly avoid passing user-provided input directly to the underlying file system APIs. If business logic makes this unavoidable, the following dual-layer defense mechanism must be implemented:
- Strict Input Validation:
- Best Practice: Use a whitelist to only allow predefined, safe filenames.
- Alternative: If a whitelist cannot be used, validate the input using regular expressions to ensure it contains only allowed characters (e.g., only alphanumeric characters), and explicitly reject any input that contains
/,\, or%00.
- Path Canonicalization and Base Directory Verification:
- Do not attempt to manually filter
../paths (as this can be easily bypassed by the techniques used in the lab). - Use the standard file system APIs provided by the platform to canonicalize the paths (remove all traversal symbols), and then verify whether the normalized absolute path still starts with the expected base directory.
- Example of Java code for mitigation:
- Do not attempt to manually filter
// 1. Concatenate the user input with the base directory
File file = new File(BASE_DIRECTORY, userInput);
// 2. Get the normalized absolute path and verify if it does not exceed the safe directory limits
if (file.getCanonicalPath().startsWith(BASE_DIRECTORY)) {
// Process the file (it is safe to do so)
} else {
// Reject the request and log it as a security incident
}
- Principle of Least Privilege (Additional explanation by AI):
- Ensure that the service account running the web application (e.g.,
www-data) has only read access to the necessary directories (such as/var/www/images/) and never grants access to system-level directories (such as/etc/).