All posts
Security Infrastructure April 2026

Copy Fail: Here We Go Again

CVE-2026-31431 is a 9-year-old Linux kernel bug that gives any unprivileged local user root on every major distribution. No race condition required. 732 bytes of Python. If you're running shared-kernel infrastructure, you've seen this pattern before.

Here we go again.

Every few years, the Linux kernel produces a new catchy-named privilege escalation with a dedicated website, a logo, and the same familiar summary: unprivileged local user, root on every major distribution, sitting in the code for years. Dirty Cow in 2016. Dirty Pipe in 2022. Now Copy Fail in 2026. They all target the page cache. They all exploit the intersection of features that each looked reasonable in isolation. And they’re all exactly as bad as they sound.

CVE-2026-31431 is real, it has been patched, and if you’re running any Linux kernel from 2017 through last week’s update, you’re affected. Let me walk through what actually happened and why this one feels a little different.

What happened

The bug lives in the authencesn AEAD template in the Linux kernel’s crypto subsystem. authencesn is an IPsec implementation detail – it handles authentication with extended sequence numbers per RFC 4303. It has existed since 2011. For most systems, nothing ever calls it directly.

The problem isn’t authencesn alone. The problem is what happened when three independent features converged over fifteen years.

In 2015, the kernel’s AF_ALG interface – which exposes the crypto subsystem to unprivileged userspace via sockets – gained AEAD support, including a splice() path that lets you feed page cache pages directly into the crypto engine.

In 2017, a performance optimization made AEAD operations in-place in algif_aead. Instead of copying data into separate source and destination buffers, the kernel chained page cache pages directly into the writable destination scatterlist.

The Theori researchers describe it plainly in their writeup: “Nobody connected the 2017 in-place optimization to authencesn’s scratch writes or to the splice path’s use of page cache pages.” Each change was isolated and reasonable. The vulnerability lived at their convergence.

The result: an unprivileged user opens an AF_ALG socket bound to authencesn, constructs sendmsg() plus splice() pairs that position page cache pages at the right offsets, triggers AEAD decryption – and the kernel writes four controlled bytes into the cached memory of any readable file on the system. Including setuid binaries like /usr/bin/su. The on-disk file is untouched. The page cache version is compromised immediately, system-wide, and every process reading that file now sees the attacker’s version.

The resulting exploit is 732 bytes of Python 3. Standard library only. No race condition, 100% reliable, across Ubuntu 24.04, Amazon Linux 2023, RHEL 10.1, and SUSE 16, on every kernel version since the 2017 optimization.

How it was found

This is the part of the story I want to stay with for a moment.

Theori researcher Taeyang Lee identified AF_ALG plus splice() as an underexplored attack surface – the idea that userspace could feed page cache pages into the crypto engine, where the kernel might do something unexpected with them, felt worth investigating. He built that intuition into Xint Code, an automated security analysis tool, and pointed it at the kernel’s crypto/ subsystem. The scan took approximately one hour. Copy Fail was the highest-severity finding.

The HN thread picked up on this immediately. If a one-hour automated scan of a single subsystem surfaces a critical privilege escalation that has been sitting in production kernels for nine years, the uncomfortable implication is that there are probably more of these. The Xint team says explicitly that the same scan surfaced other high-severity bugs still in coordinated disclosure.

I don’t know what those look like yet. But this is not a one-time finding – it’s a change in the cost of finding this class of vulnerability.

Who is actually exposed

The severity breakdown from the researchers works like this.

High risk: multi-tenant systems where users share a kernel. Kubernetes clusters running workloads from multiple tenants, CI/CD runners executing pull requests from outside contributors, SaaS platforms running untrusted workloads in containers. If your isolation boundary is namespaces and cgroups, a compromised container can become root on the host. The HN discussion noted at least one tester found rootless Podman and user namespaces did confine the exploit – so the container escape picture is more nuanced than the initial headlines suggested. Full container escape details are forthcoming from the Theori team.

Medium risk: single-tenant production servers. This is not a remote exploit by itself. An attacker needs local access first. But once they have it, privilege escalation is deterministic and fast. It chains cleanly with any other vector that gets an attacker a local foothold.

Lower risk: isolated single-user or single-tenant systems where an attacker already has local access. Still bad. Just not a cross-tenant incident.

MicroVM-based isolation – Firecracker, gVisor – is unaffected because each workload gets its own kernel. The exploit requires reaching the host kernel. If you can’t, the blast radius is contained to your own workload.

What to do

The patch is straightforward. Update your kernel to include mainline commit a664bf3d603d. All major distributions have published or are publishing patched kernels.

If you need an interim mitigation before you can patch:

echo "install algif_aead /bin/false" > /etc/modprobe.d/disable-algif.conf
rmmod algif_aead 2>/dev/null || true

The practical impact of disabling algif_aead is minimal for most systems. dm-crypt, LUKS, IPsec, kTLS, OpenSSL, and SSH are all unaffected by this module. The only workloads that require AF_ALG directly are systems using OpenSSL’s afalg engine or custom embedded crypto offload – rare configurations. As the HN thread observed, real-world AF_ALG usage is essentially limited to iwd and non-default cryptsetup configurations.

One note for RHEL and derivatives: some modules are compiled in rather than loadable, so rmmod may fail. In that case, kernel update is the only path.

The pattern

Dirty Cow was a race condition in the copy-on-write mechanism. Dirty Pipe was an uninitialized flags field in a pipe buffer structure that let an attacker overwrite arbitrary read-only files. Copy Fail is a scatterlist aliasing bug triggered through the crypto API. All three are page-cache write primitives. All three lived in the kernel for years – Dirty Cow for nine years before disclosure, Dirty Pipe for about two, Copy Fail for nine. All three, in retrospect, feel like the kind of thing a sufficiently focused review might have caught.

The kernel is enormous. The crypto/ subsystem alone is complex enough that the security community is openly questioning whether AF_ALG should exist at all – it adds significant attack surface for an interface that almost nobody uses. That’s a fair critique, but it also applies to a lot of kernel surface. You cannot retroactively un-expose an interface that has been default-enabled for a decade.

The lesson isn’t that Linux is uniquely bad at security. Large, general-purpose codebases accumulate feature intersections that nobody modeled when each piece was introduced. That’s a property of complexity, not of malice. The people who wrote authencesn, who added AF_ALG AEAD support, and who introduced the 2017 in-place optimization were all doing reasonable things with reasonable intent.

What these vulnerabilities keep surfacing is a structural question about where isolation boundaries live. Shared-kernel infrastructure – anything where multiple tenants share a single kernel – inherits every kernel vulnerability as a potential cross-tenant incident. That’s not a criticism of any specific provider. It’s a property of the architecture.

The fix is a kernel patch. The longer-term question is what your exposure window looks like between the moment a vulnerability is discovered and the moment a patch reaches your fleet. For shared-kernel environments, that window is the window where a “local privilege escalation” is actually a cross-tenant incident waiting to happen.

Patch your kernels. It’s the right move regardless of anything else.