Artificial intelligence (AI) models, especially those found in medical image analysis and speech recognition, handle incredibly complex data structures demanding significant computational power. This heavy processing is a key reason why deep learning models often require substantial energy.
To address this efficiency issue, researchers at MIT have developed an automated system that empowers deep learning developers to leverage two forms of data redundancy simultaneously. This dual approach substantially mitigates the computational load, bandwidth usage, and memory requirements necessary for machine learning tasks.
Traditionally, optimization techniques have been labor-intensive and restricted developers to using either sparsity or symmetry—two forms of redundancy inherent in deep learning data structures. However, the new MIT initiative allows for the creation of algorithms that utilize both redundancies seamlessly, resulting in computational speed boosts of nearly 30 times in certain tests.
This user-friendly system is designed with a simplified programming approach, making it accessible for scientists who may not be deep learning experts but still seek to enhance the efficiency of the AI algorithms they deploy for data processing. The potential applications extend into scientific computing as well.
“For an extended period, properly capturing these data redundancies demanded considerable implementation effort. Our system allows scientists to express their desired computations abstractly, without needing to specify the exact computation steps,” explains Willow Ahrens, an MIT postdoctoral researcher and co-author of a paper detailing the system, which will be presented at the International Symposium on Code Generation and Optimization.
Alongside lead author Radha Patel ’23, SM ’24, and senior author Saman Amarasinghe, a professor in the Department of Electrical Engineering and Computer Science (EECS) and a principal researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL), Ahrens underscores the transformational potential of this research.
Streamlining Computation
In the realm of machine learning, data is frequently represented as multidimensional arrays known as tensors. Think of a tensor as a more complex version of a matrix, which is essentially a two-dimensional grid of values. Tensors can possess multiple dimensions, making their manipulation trickier but also offering opportunities for optimization.
Neural networks leverage tensors through repeated matrix multiplications and additions, which are vital for discerning complex patterns in data. The enormous volume of required calculations in these multidimensional structures necessitates significant energy and computational resources.
However, because of the arrangement of data within tensors, engineers can often enhance neural network performance by eliminating redundant calculations. For example, when analyzing user reviews from an e-commerce platform, many entries may be zero, indicating that not all users provided feedback on every product. This type of redundancy is known as sparsity and allows models to save time by only processing non-zero values.
Additionally, tensors can exhibit symmetry, meaning that the data in the structure’s top half mirrors that of the bottom half. When this occurs, the model only needs to analyze one half of the tensor, streamlining the computation. This concept is recognized as symmetry.
“Attempting to leverage both sparsity and symmetry in tandem can complicate matters significantly,” notes Ahrens.
To simplify this process, she and her colleagues devised a new compiler named SySTeC, designed to translate complex code into a more digestible format for machines. SySTeC optimizes computations by automatically utilizing both sparsity and symmetry in tensors.
The team began developing SySTeC by identifying three crucial optimizations that could be achieved through symmetry. Firstly, if the output tensor is symmetric, only half of the results need to be computed. Secondly, when the input tensor is symmetric, only one half needs to be read. Finally, if intermediate tensor operations produce symmetric results, redundant calculations can be bypassed.
Optimizing Simultaneously
Developers using SySTeC simply input their programs into the system, which then optimizes the code for all symmetry types. Following that, the second phase applies further transformations to focus on storing only non-zero data values, optimizing for sparsity as well.
The end result is efficient, ready-to-use code.
“This integrated approach allows us to maximize the benefits derived from both types of optimization. The beauty of symmetry is that as tensor complexity increases, so does the potential for computation savings,” Ahrens affirms.
In experiments, the researchers achieved remarkable speed increases of nearly 30 times with code generated by SySTeC.
The automation feature of this system is particularly advantageous for researchers looking to process data using entirely new algorithms they are developing from the ground up.
Looking ahead, the team aims to incorporate SySTeC with existing sparse tensor compiler systems to provide users with a more cohesive experience, as well as to address the needs of increasingly complex programming challenges.
This vital work receives support from Intel, the National Science Foundation, the Defense Advanced Research Projects Agency, and the Department of Energy.
Photo credit & article inspired by: Massachusetts Institute of Technology