Runc Container Escape Vulnerabilities (CVE-2025-31133 et al.) In-Depth Analysis

1. Executive Summary
In November 2025, the security community disclosed three critical vulnerabilities affecting runc: CVE-2025-31133, CVE-2025-52565, and CVE-2025-52881. Together, these vulnerabilities form a serious vulnerability cluster, with their core exploiting runc’s privileged operations on the Linux /proc filesystem (procfs) during container lifecycle management.
These three vulnerabilities cover different flaws in runc, but their ultimate goal is the same: abuse the runc process running as root on the host to write data to sensitive “gadgets” in the host’s /proc filesystem. Successful exploitation can lead to catastrophic consequences, including complete container escape (achieving root-privileged remote code execution on the host) or host Denial of Service (DoS) (by triggering a kernel panic).
2. The Common Attack Vector: /proc
To deeply understand the severity of these vulnerabilities, we must first deconstruct their common attack target: the Linux /proc filesystem (procfs).
procfs is not a real filesystem stored on disk. It is a pseudo-filesystem that serves as a dynamic interface to Linux kernel data structures and runtime configuration parameters. Write operations to specific “files” in procfs are actually calls to kernel functions or modifications to the kernel’s live state.
During the standard container creation process, the runc process runs with full root privileges in the host namespace before the container namespace setup is complete. This means the runc process has unrestricted read/write access to the host’s procfs. The common core of these three vulnerabilities is hijacking this privileged access of the runc process, tricking it into writing data to attacker-chosen procfs paths.
Core Attack “Gadgets”
Attackers primarily target two high-value procfs write targets, known as “gadgets”:
- Gadget 1:
/proc/sys/kernel/core_pattern(Achieving RCE)
- Mechanism: This file controls how the Linux system handles core dumps when a process crashes. If the file’s content starts with a pipe character
|, the kernel interprets the following string as a path to an executable program. When a process crashes, the kernel will execute this program with root privileges (because coredump handling is performed by the kernel and doesn’t belong to any user namespace), passing the coredump data as standard input. - Exploitation: The attacker simply overwrites this file with content like
|/path/to/malicious/script(where the script is located on a mount point accessible to the container), then triggers any process crash within the container to achieve arbitrary code execution (RCE) with root privileges on the host.
2. Gadget 2: /proc/sysrq-trigger (Achieving DoS)
- Writing specific characters (e.g.,
c) to this file immediately triggers a kernel panic and system reboot. This file can therefore be exploited for DoS attacks.
Exploitation Prerequisites
All three attacks rely on a key prerequisite: the attacker must be able to trick runc into launching a container with custom mount configuration. While this sounds like a high-privilege operation, the security advisory explicitly notes that such configuration can be achieved not only through malicious Kubernetes Pod definitions or docker run commands, but also through RUN --mount=... directives in Dockerfiles.
It’s currently speculated that this can be achieved through methods like docker buildx build, for example, first executing to the maskedPath logic via RUN --mount=..., then achieving the race condition through buildx build.
3. Vulnerability Analysis (1): CVE-2025-31133
- CVSS Score: 7.3 (High) (CVSS:4.0/AV:L/AC:L/AT:P/PR:L/UI:A/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H)
- Affected Versions: All known versions of
runc
Root Cause
runc provides a security feature called maskedPaths, designed to “mask” certain highly sensitive host procfs or sysfs files (such as /proc/kcore, which provides access to all physical memory) within the container, making them unreadable and unwritable from inside the container.
The Intended (Normal) Flow of runc
The purpose of runc’s maskedPaths feature is to make certain procfs paths (e.g., /proc/kcore) “invisible” or “harmless” inside the container. The developer’s approach was:
- Goal: Prevent processes inside the container from reading
/proc/kcore. - Technique: Use the
mount(2)system call to “overlay” a harmless, empty device (i.e.,/dev/null) on top of the/proc/kcorepath inside the container. - Expected Result: Processes inside the container (e.g.,
cat /proc/kcore) actually access/dev/null, thus reading nothing, achieving the “masking” effect. - The technical approach
runcuses to implement “masking” has a flaw: it achieves this by bind-mounting the inode of the container’s internal/dev/nulldevice to the targetmaskedPathspath (e.g.,/container/rootfs/proc/kcore).
Exploitation Method
This vulnerability is a classic Time-of-Check-to-Time-of-Use (TOCTOU) race condition.
- Step 1 (Preparation): The attacker continuously monitors the state of the
/dev/nullfile inside their container (or through a parallel container). - Step 2 (Race): In the tiny time window between when the
runcprocess checks the container’s/dev/null(confirming it exists and is a device file) and when theruncprocess actually executes themount(2)system call, the attacker quickly replaces the container’s/dev/nullwith a symbolic link (symlink) pointing to a malicious target. For example:
unlink("/dev/null")symlink("/proc/sys/kernel/core_pattern", "/dev/null")
3. Step 3 (Trigger): The runc process continues executing mount("/dev/null", "/container/rootfs/proc/kcore",...).
4. Step 4 (Result): When the mount(2) system call is passed a symbolic link as its source, the Linux kernel will automatically dereference the symbolic link. Therefore, what the runc process actually executes (at the kernel level) is mount("/proc/sys/kernel/core_pattern", "/container/rootfs/proc/kcore",...).
5. Final Impact: Even worse, runc believes it’s mounting /dev/null, so it sets this mount as read-write. The attacker thus gains a writable mount point to the host’s /proc/sys/kernel/core_pattern file inside the container (at /proc/kcore). The attacker then writes an RCE payload to this file (e.g., |/path/to/shell), triggers a process crash, and completes the container escape.
Note:
Researchers also discovered a second exploitation variant of this vulnerability: if the attacker deletes /dev/null during the race window, runc’s code logic intentionally ignores this error and continues execution, causing the maskedPath operation to fail completely (becoming a no-op). While this doesn’t lead to RCE, it causes sensitive files like /proc/kcore or /proc/timer_list to no longer be masked, resulting in potential information disclosure.
Vulnerability Deep Dive (2): CVE-2025-52565
- CVSS Score: 7.3 (High) (CVSS:4.0/AV:L/AC:L/AT:P/PR:L/UI:A/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H)
- Affected Versions:
runc>= 1.0.0-rc3
Root Cause: Logic Flaw in /dev/console Bind Mount
When creating a container, runc needs to set up a console (/dev/console) for it. It implements this as follows: runc requests a new PTY (pseudo-terminal) device on the host (e.g., /dev/pts/5), then bind-mounts this host PTY device to the /dev/console path inside the container namespace.
Exploitation Method
This vulnerability is almost identical to CVE-2025-31133 in concept and application, also exploiting a TOCTOU race condition.
- Step 1 (Preparation): The attacker needs to predict or (through a parallel container) discover the PTY path that
runcwill use on the host (e.g.,/dev/pts/5). - Step 2 (Race): In the time window between when
runccreates the PTY device and whenruncexecutesmount(2)to bind it to the container’s/dev/console, the attacker replaces the host’s PTY path (/dev/pts/5) with a symbolic link pointing to a malicious “gadget”. For example:
unlink("/dev/pts/5")symlink("/proc/sysrq-trigger", "/dev/pts/5")
3. Step 3 (Trigger): The runc process executes mount("/dev/pts/5", "/container/rootfs/dev/console",...).
4. Step 4 (Result): The kernel dereferences the symbolic link. What the runc process actually executes (at the kernel level) is mount("/proc/sysrq-trigger", "/container/rootfs/dev/console",...).
5. Final Impact: The attacker gains write access to the host’s /proc/sysrq-trigger file inside the container (by writing to /dev/console). The attacker simply executes echo c > /dev/console inside the container to trigger a host kernel panic, achieving a powerful DoS attack.
This vulnerability has a timing advantage over CVE-2025-31133: the /dev/console bind mount operation occurs before runc applies maskedPaths and readonlyPaths. This means even if runc is configured to explicitly list /proc/sysrq-trigger as “read-only” or “masked”, this vulnerability can still succeed because the malicious mount is completed before these security measures take effect.
Vulnerability Deep Dive (3): CVE-2025-52881
Root Cause: Incomplete Fix for CVE-2019-16884
To understand CVE-2025-52881, we must trace back to CVE-2019-16884.
- Historical Background (CVE-2019-16884): Attackers could trick
runcinto writing the LSM label (intended for/proc/self/attr/currentto apply AppArmor or SELinux policies) to an attacker-controlled fakeprocfsfile ontmpfs. This caused the LSM policy to not be applied at all to the container process. - The Incomplete Fix: The fix for CVE-2019-16884 was very limited. It simply added a check to “verify that the target
runcwrites the LSM label to is indeed a procfs file.” - CVE-2025-52881 Exploitation: Instead of using a fake file on
tmpfs, attackers use shared mounts and race conditions to redirectrunc’s write to another real, but harmlessprocfsfile, such as/proc/self/sched(process scheduler information). - Result:
runcprepares to write the LSM policy. It checks the target (now redirected to/proc/self/sched).runcasks: “Is this a real file onprocfs?”. The answer is “yes”. The check passes.runcthen writes the LSM policy data (e.g., AppArmor profile name) to/proc/self/sched. The kernel accepts this write (because it’s a valid procfs file), but this is a no-op for LSM policy application. The LSM policy is discarded and never applied.
Exploitation Method
This vulnerability has two powerful exploitation modes:
- Impact 1: LSM Bypass
- Exploitation: The attacker uses the method described above to redirect
runc’s write to/proc/self/attr/current. - Result:
runcbelieves it successfully applied the AppArmor or SELinux profile, but in reality, the profile was discarded. The container process will start in a completely unrestricted state (i.e., unconfined). This makes it a perfect “unlocker” for chaining with other vulnerabilities (like 31133 and 52565).
- Impact 2: Direct Container Escape/DoS
- Exploitation: This “write redirection” vulnerability is not limited to LSM labels. It applies to all
runcwrite operations to/proc, includingsysctlparameters (writing to/proc/sys/...). - Result: Attackers can hijack a harmless
sysctlparameter thatruncwrites (e.g., settingnet.ipv4.ip_local_port_range) and redirect it to maliciousprocfs“gadgets”, such as:/proc/sys/kernel/core_pattern(achieving RCE). Redirect to/proc/sysrq-trigger(achieving DoS).
This makes CVE-2025-52881 itself an extremely powerful vulnerability capable of independently achieving container escape or DoS attacks.
Conclusion
CVE-2025-31133, CVE-2025-52565, and CVE-2025-52881 together reveal a profound, systemic security flaw in modern container runtimes. Their core is the combination of runc’s privileged operations on procfs during container environment setup with race conditions and symbolic link abuse.
The Biggest Warning: LSM Failure
The most significant lesson from this incident is that Linux Security Modules (LSM) like AppArmor and SELinux, widely trusted, are fragile and can be completely bypassed when facing well-configured runtime attacks targeting their implementation logic (like CVE-2025-52881).
Furthermore, SELinux’s ineffectiveness against CVE-2025-31133 (due to relabeling) and AppArmor’s ineffectiveness against CVE-2025-52565 (due to permissive default policies) are equally alarming. Organizations can no longer treat LSM as a “silver bullet” for container security.
Patch Analysis and Remediation Strategy
The OpenContainers project has released three patched runc versions to address all three vulnerabilities:
runc v1.4.0-rc.3runc v1.3.3runc v1.2.8
Since vulnerabilities can be exploited during container builds (docker build) in CI/CD pipelines, services with image building capabilities need special attention.
Comments