Paper 2026/093

Optimized Implementation of ML-KEM on ARMv9-A with SVE2 and SME

Hanyu Wei, Fudan University, Shanghai, China
Wenqian Li, Fudan University, Shanghai, China
Shiyu Shen, City University of Hong Kong, Hong Kong, China
Hao Yang, City University of Hong Kong, Hong Kong, China
Yunlei Zhao, Fudan University, Shanghai, China
Abstract

As quantum computing continues to advance, traditional public-key cryptosystems face increasing vulnerability, necessitating a global transition toward post-quantum cryptography (PQC). A primary challenge for both cryptographers and system architects is the efficient integration of PQC into high-performance computing platforms. ARM, a dominant processor architecture, has recently introduced ARMv9-A to accelerate modern workloads such as artificial intelligence and cloud computing. Leveraging its Scalable Vector Extension 2 (SVE2) and Scalable Matrix Extension (SME), ARMv9-A provides sophisticated hardware support for high-performance computing. This architectural evolution motivates the need for efficient implementations of PQC schemes on the new architecture. In this work, we present a highly optimized implementation of ML-KEM, the post-quantum key encapsulation mechanism (KEM) standardized by NIST as FIPS 203, on the ARMv9-A architecture. We redesign the polynomial computation pipeline to achieve deep alignment with the vector and matrix execution units. Our optimizations encompass refined modular arithmetic and highly vectorized polynomial operations. Specifically, we propose two NTT variants tailored to the architectural features of SVE2 and SME: the vector-based NTT (VecNTT) and the matrix-based NTT (MatNTT), which effectively utilize layer fusion and optimized data access patterns. Experimental results on the Apple M4 Pro processor demonstrate that VecNTT and MatNTT achieve performance improvements of up to $7.18\times$ and $7.77\times$, respectively, compared to the reference implementation. Furthermore, the matrix-vector polynomial multiplication, which is the primary computational bottleneck of ML-KEM, is accelerated by up to $5.27\times$. Our full ML-KEM implementation achieves a 52.47% to 60.09% speedup in key encapsulation across all security levels. To the best of our knowledge, this is the first work to implement and evaluate ML-KEM leveraging SVE2 and SME on real ARMv9-A hardware, providing a practical foundation for future PQC deployments on next-generation ARM platforms.

Metadata
Available format(s)
PDF
Category
Implementation
Publication info
Preprint.
Keywords
Post-Quantum CryptographyML-KEMNTTSVE2SMEARMv9-A
Contact author(s)
hywei24 @ m fudan edu cn
liwq24 @ m fudan edu cn
crypto @ sher1e dev
crypto @ d4rk dev
ylzhao @ fudan edu cn
History
2026-01-23: approved
2026-01-20: received
See all versions
Short URL
https://ia.cr/2026/093
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2026/093,
      author = {Hanyu Wei and Wenqian Li and Shiyu Shen and Hao Yang and Yunlei Zhao},
      title = {Optimized Implementation of {ML}-{KEM} on {ARMv9}-A with {SVE2} and {SME}},
      howpublished = {Cryptology {ePrint} Archive, Paper 2026/093},
      year = {2026},
      url = {https://eprint.iacr.org/2026/093}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.