Runc Container Escape Vulnerabilities (CVE-2025-31133 et al.) In-Depth Analysis


1. Executive Summary

In November 2025, the security community disclosed three critical vulnerabilities affecting runc: CVE-2025-31133, CVE-2025-52565, and CVE-2025-52881. Together, these vulnerabilities form a serious vulnerability cluster, with their core exploiting runc’s privileged operations on the Linux /proc filesystem (procfs) during container lifecycle management.

These three vulnerabilities cover different flaws in runc, but their ultimate goal is the same: abuse the runc process running as root on the host to write data to sensitive “gadgets” in the host’s /proc filesystem. Successful exploitation can lead to catastrophic consequences, including complete container escape (achieving root-privileged remote code execution on the host) or host Denial of Service (DoS) (by triggering a kernel panic).

2. The Common Attack Vector: /proc

To deeply understand the severity of these vulnerabilities, we must first deconstruct their common attack target: the Linux /proc filesystem (procfs).

procfs is not a real filesystem stored on disk. It is a pseudo-filesystem that serves as a dynamic interface to Linux kernel data structures and runtime configuration parameters. Write operations to specific “files” in procfs are actually calls to kernel functions or modifications to the kernel’s live state.

During the standard container creation process, the runc process runs with full root privileges in the host namespace before the container namespace setup is complete. This means the runc process has unrestricted read/write access to the host’s procfs. The common core of these three vulnerabilities is hijacking this privileged access of the runc process, tricking it into writing data to attacker-chosen procfs paths.

Core Attack “Gadgets”

Attackers primarily target two high-value procfs write targets, known as “gadgets”:

  1. Gadget 1: /proc/sys/kernel/core_pattern (Achieving RCE)
  • Mechanism: This file controls how the Linux system handles core dumps when a process crashes. If the file’s content starts with a pipe character |, the kernel interprets the following string as a path to an executable program. When a process crashes, the kernel will execute this program with root privileges (because coredump handling is performed by the kernel and doesn’t belong to any user namespace), passing the coredump data as standard input.
  • Exploitation: The attacker simply overwrites this file with content like |/path/to/malicious/script (where the script is located on a mount point accessible to the container), then triggers any process crash within the container to achieve arbitrary code execution (RCE) with root privileges on the host.

2. Gadget 2: /proc/sysrq-trigger (Achieving DoS)

  • Writing specific characters (e.g., c) to this file immediately triggers a kernel panic and system reboot. This file can therefore be exploited for DoS attacks.

Exploitation Prerequisites

All three attacks rely on a key prerequisite: the attacker must be able to trick runc into launching a container with custom mount configuration. While this sounds like a high-privilege operation, the security advisory explicitly notes that such configuration can be achieved not only through malicious Kubernetes Pod definitions or docker run commands, but also through RUN --mount=... directives in Dockerfiles.

It’s currently speculated that this can be achieved through methods like docker buildx build, for example, first executing to the maskedPath logic via RUN --mount=..., then achieving the race condition through buildx build.

3. Vulnerability Analysis (1): CVE-2025-31133

  • CVSS Score: 7.3 (High) (CVSS:4.0/AV:L/AC:L/AT:P/PR:L/UI:A/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H)
  • Affected Versions: All known versions of runc

Root Cause

runc provides a security feature called maskedPaths, designed to “mask” certain highly sensitive host procfs or sysfs files (such as /proc/kcore, which provides access to all physical memory) within the container, making them unreadable and unwritable from inside the container.

The Intended (Normal) Flow of runc

The purpose of runc’s maskedPaths feature is to make certain procfs paths (e.g., /proc/kcore) “invisible” or “harmless” inside the container. The developer’s approach was:

  1. Goal: Prevent processes inside the container from reading /proc/kcore.
  2. Technique: Use the mount(2) system call to “overlay” a harmless, empty device (i.e., /dev/null) on top of the /proc/kcore path inside the container.
  3. Expected Result: Processes inside the container (e.g., cat /proc/kcore) actually access /dev/null, thus reading nothing, achieving the “masking” effect.
  4. The technical approach runc uses to implement “masking” has a flaw: it achieves this by bind-mounting the inode of the container’s internal /dev/null device to the target maskedPaths path (e.g., /container/rootfs/proc/kcore).

Exploitation Method

This vulnerability is a classic Time-of-Check-to-Time-of-Use (TOCTOU) race condition.

  1. Step 1 (Preparation): The attacker continuously monitors the state of the /dev/null file inside their container (or through a parallel container).
  2. Step 2 (Race): In the tiny time window between when the runc process checks the container’s /dev/null (confirming it exists and is a device file) and when the runc process actually executes the mount(2) system call, the attacker quickly replaces the container’s /dev/null with a symbolic link (symlink) pointing to a malicious target. For example:
  • unlink("/dev/null")
  • symlink("/proc/sys/kernel/core_pattern", "/dev/null")

3. Step 3 (Trigger): The runc process continues executing mount("/dev/null", "/container/rootfs/proc/kcore",...).

4. Step 4 (Result): When the mount(2) system call is passed a symbolic link as its source, the Linux kernel will automatically dereference the symbolic link. Therefore, what the runc process actually executes (at the kernel level) is mount("/proc/sys/kernel/core_pattern", "/container/rootfs/proc/kcore",...).

5. Final Impact: Even worse, runc believes it’s mounting /dev/null, so it sets this mount as read-write. The attacker thus gains a writable mount point to the host’s /proc/sys/kernel/core_pattern file inside the container (at /proc/kcore). The attacker then writes an RCE payload to this file (e.g., |/path/to/shell), triggers a process crash, and completes the container escape.

Note: Researchers also discovered a second exploitation variant of this vulnerability: if the attacker deletes /dev/null during the race window, runc’s code logic intentionally ignores this error and continues execution, causing the maskedPath operation to fail completely (becoming a no-op). While this doesn’t lead to RCE, it causes sensitive files like /proc/kcore or /proc/timer_list to no longer be masked, resulting in potential information disclosure.

Vulnerability Deep Dive (2): CVE-2025-52565

  • CVSS Score: 7.3 (High) (CVSS:4.0/AV:L/AC:L/AT:P/PR:L/UI:A/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H)
  • Affected Versions: runc >= 1.0.0-rc3

Root Cause: Logic Flaw in /dev/console Bind Mount

When creating a container, runc needs to set up a console (/dev/console) for it. It implements this as follows: runc requests a new PTY (pseudo-terminal) device on the host (e.g., /dev/pts/5), then bind-mounts this host PTY device to the /dev/console path inside the container namespace.

Exploitation Method

This vulnerability is almost identical to CVE-2025-31133 in concept and application, also exploiting a TOCTOU race condition.

  1. Step 1 (Preparation): The attacker needs to predict or (through a parallel container) discover the PTY path that runc will use on the host (e.g., /dev/pts/5).
  2. Step 2 (Race): In the time window between when runc creates the PTY device and when runc executes mount(2) to bind it to the container’s /dev/console, the attacker replaces the host’s PTY path (/dev/pts/5) with a symbolic link pointing to a malicious “gadget”. For example:
  • unlink("/dev/pts/5")
  • symlink("/proc/sysrq-trigger", "/dev/pts/5")

3. Step 3 (Trigger): The runc process executes mount("/dev/pts/5", "/container/rootfs/dev/console",...).

4. Step 4 (Result): The kernel dereferences the symbolic link. What the runc process actually executes (at the kernel level) is mount("/proc/sysrq-trigger", "/container/rootfs/dev/console",...).

5. Final Impact: The attacker gains write access to the host’s /proc/sysrq-trigger file inside the container (by writing to /dev/console). The attacker simply executes echo c > /dev/console inside the container to trigger a host kernel panic, achieving a powerful DoS attack.

This vulnerability has a timing advantage over CVE-2025-31133: the /dev/console bind mount operation occurs before runc applies maskedPaths and readonlyPaths. This means even if runc is configured to explicitly list /proc/sysrq-trigger as “read-only” or “masked”, this vulnerability can still succeed because the malicious mount is completed before these security measures take effect.

Vulnerability Deep Dive (3): CVE-2025-52881

Root Cause: Incomplete Fix for CVE-2019-16884

To understand CVE-2025-52881, we must trace back to CVE-2019-16884.

  • Historical Background (CVE-2019-16884): Attackers could trick runc into writing the LSM label (intended for /proc/self/attr/current to apply AppArmor or SELinux policies) to an attacker-controlled fake procfs file on tmpfs. This caused the LSM policy to not be applied at all to the container process.
  • The Incomplete Fix: The fix for CVE-2019-16884 was very limited. It simply added a check to “verify that the target runc writes the LSM label to is indeed a procfs file.”
  • CVE-2025-52881 Exploitation: Instead of using a fake file on tmpfs, attackers use shared mounts and race conditions to redirect runc’s write to another real, but harmless procfs file, such as /proc/self/sched (process scheduler information).
  • Result: runc prepares to write the LSM policy. It checks the target (now redirected to /proc/self/sched). runc asks: “Is this a real file on procfs?”. The answer is “yes”. The check passes. runc then writes the LSM policy data (e.g., AppArmor profile name) to /proc/self/sched. The kernel accepts this write (because it’s a valid procfs file), but this is a no-op for LSM policy application. The LSM policy is discarded and never applied.

Exploitation Method

This vulnerability has two powerful exploitation modes:

  1. Impact 1: LSM Bypass
  • Exploitation: The attacker uses the method described above to redirect runc’s write to /proc/self/attr/current.
  • Result: runc believes it successfully applied the AppArmor or SELinux profile, but in reality, the profile was discarded. The container process will start in a completely unrestricted state (i.e., unconfined). This makes it a perfect “unlocker” for chaining with other vulnerabilities (like 31133 and 52565).
  1. Impact 2: Direct Container Escape/DoS
  • Exploitation: This “write redirection” vulnerability is not limited to LSM labels. It applies to all runc write operations to /proc, including sysctl parameters (writing to /proc/sys/...).
  • Result: Attackers can hijack a harmless sysctl parameter that runc writes (e.g., setting net.ipv4.ip_local_port_range) and redirect it to malicious procfs “gadgets”, such as: /proc/sys/kernel/core_pattern (achieving RCE). Redirect to /proc/sysrq-trigger (achieving DoS).

This makes CVE-2025-52881 itself an extremely powerful vulnerability capable of independently achieving container escape or DoS attacks.

Conclusion

CVE-2025-31133, CVE-2025-52565, and CVE-2025-52881 together reveal a profound, systemic security flaw in modern container runtimes. Their core is the combination of runc’s privileged operations on procfs during container environment setup with race conditions and symbolic link abuse.

The Biggest Warning: LSM Failure

The most significant lesson from this incident is that Linux Security Modules (LSM) like AppArmor and SELinux, widely trusted, are fragile and can be completely bypassed when facing well-configured runtime attacks targeting their implementation logic (like CVE-2025-52881).

Furthermore, SELinux’s ineffectiveness against CVE-2025-31133 (due to relabeling) and AppArmor’s ineffectiveness against CVE-2025-52565 (due to permissive default policies) are equally alarming. Organizations can no longer treat LSM as a “silver bullet” for container security.

Patch Analysis and Remediation Strategy

The OpenContainers project has released three patched runc versions to address all three vulnerabilities:

  • runc v1.4.0-rc.3
  • runc v1.3.3
  • runc v1.2.8

Since vulnerabilities can be exploited during container builds (docker build) in CI/CD pipelines, services with image building capabilities need special attention.

References

  1. https://seclists.org/oss-sec/2025/q4/138
  2. https://github.com/advisories/GHSA-qw9x-cqr3-wc7r
  3. https://github.com/advisories/GHSA-9493-h29p-rfm2

Comments