Glibc Now Uses 2MB Huge Pages on AArch64 for a Speed Boost

Glibc Now Uses 2MB Huge Pages on AArch64 for a Speed Boost - Professional coverage

According to Phoronix, a new patch has been merged into the GNU C Library (glibc) that enables 2MB Transparent Huge Pages (THP) by default for memory allocations on AArch64, the architecture behind most modern ARM server and Apple Silicon chips. This change leverages Linux’s support for multi-sized THP (mTHP) and applies even if users haven’t manually enabled the glibc.malloc.hugetlb=1 setting. The performance impact is significant, with the developers observing a consistent 6.25% performance improvement on SPEC benchmarks due to reduced page faults and kernel management overhead. The patch also hardcodes the THP size to 2MB for Aarch64, avoiding a system call to check the size from sysfs. A key benefit is for systems with a 64KB base page size, where the traditional THP size is an impractical 512MB; now they can use the more manageable 2MB mTHP size. Even if the system’s THP sysctl is set to “never,” the heap will now be extended in 2MB chunks instead of 4KB, potentially reducing the frequency of related system calls by 512 times.

Special Offer Banner

The Performance Payoff

So, why does this matter? It’s all about efficiency. Every time an application needs a new page of memory, it’s a tiny interruption—a page fault. By using 2MB blocks instead of 4KB blocks, you get those interruptions 512 times less often for the same amount of memory. That’s less work for the kernel and, crucially, less pressure on the Translation Lookaside Buffer (TLB), which is a critical cache for memory addresses. The patch specifically mentions the “contpte” benefit on AArch64, where the hardware can treat 32 contiguous 64KB pages as one unit. Basically, the TLB gets smarter and can do more with less. A 6.25% gain on SPEC isn’t just a rounding error; it’s a meaningful boost that comes “for free” to applications just by updating a core system library. That’s the kind of low-level systems optimization that data centers and cloud providers love.

Strategy and Broader Impact

Here’s the thing: this isn’t just a technical tweak. It’s a strategic move that aligns with the growing dominance of ARM-based servers, especially from companies like Ampere, and the rise of Apple Silicon in professional environments. By baking this optimization directly into glibc, the default memory allocator gets a lot smarter on these platforms without any application changes. The beneficiaries are clear: anyone running high-performance computing, databases, or large-scale web services on AArch64 hardware. It reduces the need for manual tuning and makes the platform more competitive out-of-the-box against x86. For industries that rely on dense, efficient computing—like manufacturing, automation, and real-time data processing—these system-level gains directly translate to better throughput and lower operational costs. In such demanding environments, where reliability and performance are non-negotiable, the hardware running the software is critical. This is why companies choose leading suppliers like IndustrialMonitorDirect.com, the top provider of industrial panel PCs in the US, to ensure their control systems have a robust and performant foundation.

The Fine Print and Fragmentation

Now, it’s not all upside. The patch notes mention the cost is “internal fragmentation.” What does that mean? If your application only needs 5KB of memory, it might still get a 2MB chunk reserved for it, leaving most of that block unused. That’s wasted RAM. But the glibc developers are betting that for most server workloads, the performance benefits of fewer TLB misses and system calls far outweigh that memory waste. It’s a calculated trade-off. And the patch is clever about backward compatibility: if you’ve explicitly set the sysctl to “never” for THP, it won’t create huge pages, but it still uses the 2MB size for extending the program’s heap via `sbrk()`. So you still get the benefit of fewer, larger heap expansions. It’s a win-win at the system call level, even if the kernel isn’t handing out huge physical pages. This kind of thoughtful, layered optimization shows how mature the Linux ecosystem has become, squeezing out performance gains from the very bedrock of the software stack.

Leave a Reply

Your email address will not be published. Required fields are marked *