A group of researchers from universities in the USA and Australia have recently unveiled two new Rowhammer-class attacks, namely GDDRHammer and GeForge. These attacks enable unprivileged CUDA kernels on NVIDIA GPUs to manipulate individual bits in GDDR video memory chips. Unlike the previous GPUHammer method, these new attacks go beyond the confines of GPU memory and grant access to the entire main memory associated with the CPU address space. In fact, the researchers have demonstrated exploits that can provide root access to the host system when running an unprivileged CUDA kernel on the GPU.
Both GDDRHammer and GeForge attacks exploit a vulnerability that results in the premature loss of charge in specific video memory cells, allowing the attackers to alter the bits stored in those cells. By disrupting the operation of the GPU memory allocator (cudaMalloc), the attackers can circumvent the isolation of GPU memory and map GPU virtual addresses to arbitrary addresses in the physical memory of the GPU or CPU.
The attacks work by distorting bit values in the video memory, particularly in tables storing GPU memory pages responsible for translating virtual addresses to physical ones. The distinction between GDDRHammer and GeForge lies in the fact that GDDRHammer modifies the last level page table whereas GeForge modifies the last level page directory.
These address translation tables play a crucial role in facilitating direct GPU access to CPU memory. By changing the address in the GPU page table to a physical address in the main RAM and setting the APERTURE flag, which enables CPU memory mapping mode, attackers can read and write data by accessing the main memory via the PCIe bus with IOMMU disabled.
The researchers were able to successfully demonstrate these attacks on high-performance professional video cards like the NVIDIA RTX A6000 and consumer models like the NVIDIA RTX 3060. The technique developed for bypassing Rowhammer protection, combined with parallelization tools in the GPU, increased the frequency of cell corruption significantly. Although ECC (Error Correcting Codes) can be temporarily enabled to mitigate attacks, additional overhead is introduced, and vulnerabilities like ECCploit and ECC.fail can potentially bypass this protection.