路径遍历

本文详细解析了路径遍历（也称目录遍历）漏洞的核心原理、实际危害与修复方案。该漏洞源于应用程序未对用户传入的文件路径参数进行严格的安全过滤，导致攻击者可利用诸如 ../ 的跳转序列突破预设目录，越权读取服务器上的任意敏感文件（如密码文件、源代码等），在特定条件下甚至能引发远程代码执行（RCE）。防御此漏洞的关键在于避免将用户输入直接传递给文件API，并结合白名单输入验证与底层路径规范化（Canonicalization）进行双重校验。

类型: PortSwigger 需要阅读: No

1. 📌 主题摘要 (Topic Summary)

本文档探讨了路径遍历（Path Traversal，亦称目录遍历 Directory Traversal）漏洞的核心机制与防御策略，并结合 6 个实战实验（Labs），详细解析了在不同防御机制下（如绝对路径拦截、非递归过滤、URL 解码、前缀/后缀校验）的多种绕过攻击手法。

2. 🧠 核心原理 (Core Principle)

底层机制：当 Web 应用程序将用户提供的输入（如文件名）直接拼接到服务器的文件路径中，并传递给底层的文件系统操作时，如果没有进行严格的安全验证，就会引发路径遍历漏洞。攻击者利用操作系统的目录解析规则，输入特殊的目录遍历序列（如 Unix/Linux 下的 ../ 或 Windows 下的 ..\），使解析后的路径“向上跳出”应用程序限定的基础目录（Base Directory），从而访问到文件系统根目录及其他任意位置的文件。

术语规范：

Path Traversal / Directory Traversal - 路径遍历/目录遍历漏洞。
API - Application Programming Interface (应用程序编程接口)。在此处指操作系统提供的用于读写文件的底层函数。
URL - Uniform Resource Locator (统一资源定位符)。
PoC - Proof of Concept (概念验证代码/载荷)。
RCE - Remote Code Execution (远程代码执行) (AI 补充说明：指攻击者利用漏洞在目标服务器上执行任意系统命令，通常是文件写入或包含漏洞的最终危害)。

3. 🛠️ 实际应用与举例 (Usage & Examples - 怎么用)

应用场景：常见于通过 URL 参数动态加载资源的场景，例如电商网站显示商品图片的接口：https://insecure-website.com/loadImage?filename=218.png。

具体示例与 PoC (结合实战 Labs)：以下汇总了不同安全防御场景下的具体攻击载荷（Payload）用于读取 Linux 系统标准的用户信息文件 /etc/passwd：

实验场景 (Lab Case)	防御机制说明	攻击载荷 (Payload)	绕过原理
基础场景	无任何防御措施。	`../../../etc/passwd`	连续使用 `../` 跳回文件系统根目录。
绝对路径绕过	拦截了 `../` 序列，但按相对路径处理输入。	`/etc/passwd`	直接提供目标文件的绝对路径，无需遍历符号。
非递归过滤	应用程序仅单次剥离/替换了 `../`。	`....//....//....//etc/passwd`	利用嵌套（双写）序列。当内层的 `../` 被剔除后，外层字符会重新拼接成合法的 `../`。
多余的 URL 解码	拦截了标准遍历序列，但在验证后进行了额外的 URL 解码。	`..%252f..%252f..%252fetc/passwd`	双重 URL 编码绕过。`%25` 解码为 `%`，与 `2f` 结合成为 `%2f`，最终由应用/服务器再次解码为 `/`。
路径起点验证	验证参数必须以预期的基础文件夹路径开头。	`/var/www/images/../../../etc/passwd`	先输入合法的预期目录满足验证，随后紧跟 `../` 序列向外跳转。
文件后缀验证	验证参数必须以预期的扩展名（如 `.png`）结尾。	`../../../etc/passwd%00.png`	空字节截断 (Null Byte Bypass)。利用 `%00`（URL编码的空字符）。应用层校验后缀通过，但底层 C/C++ 文件系统 API 遇到空字符会认为字符串结束，从而忽略后面的 `.png`。

代码/函数解析：

File (Java Class): 代表文件和目录路径名的抽象表示形式。例如 new File(BASE_DIRECTORY, userInput) 用于将基础目录与用户输入拼接。
getCanonicalPath() (Java Method): Returns the canonical pathname string (返回此抽象路径名的规范路径名字符串)。该方法会解析路径中的所有 ../ 和 ./ 等相对路径符号，以及解析符号链接，最终返回目标文件的真实绝对路径。它是防御路径遍历的核心函数。

4. ⚠️ 危害评估 (Risk & Impact)

如果该漏洞被成功利用，将给系统带来极其严重的后果：

敏感信息泄露：攻击者能够读取应用源代码、数据库凭证（Credentials）、以及后端系统的敏感配置文件（如 Linux 的 /etc/passwd 或 Windows 的 win.ini）。
业务数据篡改：如果应用不仅存在读取漏洞，还存在文件写入漏洞，攻击者可以修改应用数据或系统配置文件。
系统完全接管：(AI 补充说明) 攻击者可通过写入 SSH 密钥、覆盖定时任务（Cron jobs）或上传 WebShell，最终实现 RCE，完全控制服务器。

5. 🛡️ 防御与修复建议 (Defense & Mitigation)

最有效的防御策略是彻底避免将用户提供的输入直接传递给底层文件系统 API。如果业务逻辑不可避免，必须采用以下双层防御机制：

严格的输入验证 (Input Validation)：
- 最佳实践：使用白名单（Whitelist）机制，仅允许预先定义好的安全文件名。
- 备选方案：如果无法使用白名单，必须通过正则表达式验证输入内容仅包含允许的字符（例如：仅限字母和数字 Alphanumeric characters），彻底拒绝任何包含 /、\ 或 %00 的输入。
路径规范化与目录锁定 (Canonicalization & Base Directory Verification)：
- 不要自己编写过滤 ../ 的逻辑（容易被上述 Lab 中的手法绕过）。
- 使用平台提供的标准文件系统 API 将路径“规范化”（解析掉所有的遍历符号），然后再验证规范化后的绝对路径是否仍然以预期的基础目录开头。
- Java 修复方案示例：

// 1. 将用户输入与基础目录拼接
File file = new File(BASE_DIRECTORY, userInput);
// 2. 获取规范化后的绝对路径，并验证其是否未跳出安全目录
if (file.getCanonicalPath().startsWith(BASE_DIRECTORY)) {
    // process file (安全，可以处理文件)
} else {
    // 拒绝请求，记录安全日志
}

权限最小化原则 (Principle of Least Privilege) (AI 补充说明)：

确保运行 Web 应用程序的服务账户（如 www-data）仅具有访问必需目录（如 /var/www/images/）的读取权限，严禁赋予系统级目录（如 /etc/）的访问权限。

Path Traversal

This article details the core principles, practical impacts, and remediation strategies for the Path Traversal (also known as Directory Traversal) vulnerability. This vulnerability arises when an application does not strictly filter the file path parameters provided by users, allowing attackers to use traversal sequences like ../ to bypass the intended directory and read arbitrary sensitive files on the server (such as password files, source code, etc.), and under specific conditions, it can even lead to Remote Code Execution (RCE). The key to defending against this vulnerability is to avoid passing user input directly to file APIs, combined with whitelist input validation and underlying path canonicalization for dual verification.

Type: PortSwigger Required Reading: No

1. 📌 Topic Summary

This document explores the core mechanisms and defense strategies of the Path Traversal (also known as Directory Traversal) vulnerability, and combines 6 practical labs to detail various bypass techniques under different defense mechanisms (such as absolute path blocking, non-recursive filtering, URL decoding, prefix/suffix validation).

2. 🧠 Core Principle

Underlying Mechanism: When a web application directly concatenates user-provided input (such as a filename) into the server’s file path and passes it to the underlying file system operations without strict security verification, it can lead to a Path Traversal vulnerability. Attackers exploit the operating system’s directory parsing rules by inputting special traversal sequences (like ../ in Unix/Linux or ..\ in Windows), causing the parsed path to ‘jump out’ of the application’s intended base directory, thereby accessing the root directory of the file system and other arbitrary files.

Terminology Standards:

Path Traversal / Directory Traversal - Path Traversal/Directory Traversal vulnerability.
API - Application Programming Interface. Here refers to the underlying functions provided by the operating system for reading and writing files.
URL - Uniform Resource Locator.
PoC - Proof of Concept (concept verification code/payload).
RCE - Remote Code Execution (AI supplementary note: refers to the attacker exploiting the vulnerability to execute arbitrary system commands on the target server, usually the ultimate impact of file write or inclusion vulnerabilities).

3. 🛠️ Usage & Examples (How to Use)

Application Scenario: Commonly seen in scenarios where resources are dynamically loaded via URL parameters, such as an e-commerce website’s interface for displaying product images: https://insecure-website.com/loadImage?filename=218.png.

Specific Examples & PoC (Combined with Practical Labs): The following summarizes specific payloads for different security defense scenarios, used to read the standard Linux user information file /etc/passwd:

Lab Case	Defense Mechanism Description	Payload	Bypass Principle
Basic Scenario	No defensive measures.	`../../../etc/passwd`	Uses consecutive `../` to jump back to the filesystem root.
Absolute Path Bypass	Blocks the `../` sequence but treats input as a relative path.	`/etc/passwd`	Directly provides the absolute path of the target file, without needing traversal symbols.
Non-Recursive Filtering	The application only strips/replaces `../` once.	`....//....//....//etc/passwd`	Exploits nested (double-write) sequences. After the inner `../` is removed, the outer characters reassemble into a valid `../`.
Excessive URL Decoding	Blocks standard traversal sequences but performs additional URL decoding after validation.	`..%252f..%252f..%252fetc/passwd`	Double URL encoding bypass. `%25` decodes to `%`, combining with `2f` to become `%2f`, which is then further decoded to `/` by the application/server.
Path Starting Point Validation	Validates that the parameter must start with the expected base folder path.	`/var/www/images/../../../etc/passwd`	First enters a legitimate expected directory to satisfy validation, then immediately follows with `../` sequences to jump out.
File Extension Validation	Validates that the parameter must end with an expected extension (e.g., `.png`).	`../../../etc/passwd%00.png`	Null Byte Bypass. Exploits `%00` (URL-encoded null character). The application-level check passes the extension, but the underlying C/C++ filesystem API encounters the null character, considering the string ended and ignoring the subsequent `.png`.

Code/Function Analysis:

File (Java Class): An abstract representation of file and directory pathnames. For example, new File(BASE_DIRECTORY, userInput) is used to concatenate a base directory with user input.
getCanonicalPath() (Java Method): Returns the canonical pathname string. This method resolves all relative path symbols like ../ and ./ in the path, as well as symbolic links, ultimately returning the true absolute path of the target file. It is the core function for defense against path traversal.

4. ⚠️ Risk & Impact

If this vulnerability is successfully exploited, it will have extremely severe consequences for the system:

Sensitive Information Disclosure: Attackers can read application source code, database credentials, and sensitive configuration files of backend systems (such as Linux’s /etc/passwd or Windows’ win.ini).
Business Data Tampering: If the application has not only read vulnerabilities but also file write vulnerabilities, attackers can modify application data or system configuration files.
Complete System Takeover: (AI supplementary note) Attackers can achieve RCE (Remote Code Execution) and fully control the server by writing SSH keys, overwriting Cron jobs, or uploading WebShells.

5. 🛡️ Defense & Mitigation

The most effective defense strategy is to completely avoid passing user-provided input directly to the underlying file system API. If business logic makes it unavoidable, the following two-layer defense mechanism must be adopted:

Strict Input Validation：
- Best Practice: Use a whitelist mechanism to allow only pre-defined safe filenames.
- Alternative: If a whitelist cannot be used, the input must be validated via regular expressions to ensure it contains only allowed characters (e.g., alphanumeric characters only), and any input containing /, \, or %00 must be completely rejected.
Path Canonicalization and Base Directory Verification：
- Do not write your own logic to filter ../ (it can be easily bypassed by the techniques in the above Lab).
- Use the standard file system API provided by the platform to ‘canonicalize’ the path (resolve all traversal symbols), then verify that the canonicalized absolute path still starts with the expected base directory.
- Java Fix Example：

// 1. Concatenate user input with base directory
File file = new File(BASE_DIRECTORY, userInput);
// 2. Obtain the canonical absolute path and verify that it hasn't escaped the secure directory
if (file.getCanonicalPath().startsWith(BASE_DIRECTORY)) {
    // process file (safe, can process the file)
} else {
    // reject the request, log security event
}

Principle of Least Privilege (AI Supplementary Note)：

Ensure that the service account running the web application (e.g., www-data) has only read access to necessary directories (e.g., /var/www/images/), and strictly prohibit granting access to system-level directories (e.g., /etc/).

路径遍历

路径遍历 path-traversal