In a recent development, the future release of the Linux 6.13 kernel has introduced a patch that proposes a new processing algorithm to determine the control amount of CRC32C. This new implementation has resulted in a remarkable reduction in code size, slashing it from 4546 bytes to just 418 bytes. The optimization of the logic of the cycle and decrease in the number of operations have brought about significant acceleration, particularly beneficial for the Retpoline protection against Spectre attacks. Notably, on AMD Zen 2 processors, there is up to an 11.8% boost in performance, while Intel Emerald Rapids and Intel Haswell see increases of 6.4% and 4.8% respectively.
Furthermore, with RTPOLINE enabled, the impact of this optimization is even more pronounced. Performance on Intel Emerald Rapids skyrockets by 66.8%, on Intel Haswell by 35.0%, and on AMD Zen 2 by 29.5%.
Prior to this update, the CRC32C algorithm utilized 128 detailed cycles (UNROLL), resulting in bloated code size. The excessive number of transition commands within the cycles posed a challenge to optimization, particularly given modern processors’ support for out-of-order execution. The latest implementation has brought down the number of iterations to just four, leading to a significant reduction in code size and improved operation speed.