MIT and NVIDIA Speed Up AI with Zeroes

Scientists from MIT and NVIDIA have developed two methods to accelerate the processing of sparse tensors, which are data structures used for high-performance calculations in machine learning models. These methods have the potential to significantly improve the performance and energy efficiency of systems, particularly large-scale machine learning models used in generative artificial intelligence (AI).

Tensors are data structures that play a crucial role in machine learning models. They provide a way to represent and manipulate multi-dimensional arrays of data. The new methods developed by researchers from MIT and NVIDIA focus on effectively utilizing sparsity in tensors. By skipping zero values and saving computational resources and memory, these methods offer optimizations for handling sparse tensors. However, the use of sparsity in large tensors is not without challenges. Defining non-zero values in these tensors can be a complex task.

To address these challenges, the researchers proposed two solutions. The first solution, outlined in a research paper titled “Efficient Sparse Tensor Methods for AI,” enables hardware to efficiently identify non-zero values using various resolution templates. The second solution involves increasing the use of storage buffers and reducing the reliance on external memory.

The researchers developed an accelerator called Highlight, which is capable of processing various resolution templates and performing effectively even with models that have no zero values. They utilized a technique called “hierarchical structured durable” to represent these resolution templates.

Another approach, discussed in a research paper titled “Tailors and Swiftiles: Efficient Sparse Tensors for AI,” allows for the efficient elimination of redundant data to accelerate workloads. This method quickly evaluates the optimal size of data blocks, thereby conserving computing resources. Combining these two methods resulted in a doubling of processing speed and a 50% reduction in energy consumption compared to existing accelerators.

Xue, one of the authors of the development, explains that “Swiftiles allows us to evaluate the optimal size of these blocks without the need for repeated clarification, thanks to the support of transfer.” This transfer support enables more efficient evaluation of the data block sizes.

Looking ahead, the researchers plan to extend the idea of transfer support to other aspects of computer architecture and improve the process of assessing the optimal level of transfer. These advancements could further enhance the performance and energy efficiency of systems, contributing to the advancement of AI technologies.

Sources:
– [First method paper](https://arxiv.org/pdf/2305.12718.pdf)
– [Second method paper](https://arxiv.org/pdf/2310.00192.pdf)

/Reports, release notes, official announcements.