Behrooz Tahmasebi, a doctoral student in MIT’s Department of Electrical Engineering and Computer Science (EECS) and the Computer Science and Artificial Intelligence Laboratory (CSAIL), was taking a mathematics course on differential equations in late 2021 when a faint inspiration struck him. I hit it. In that class, he first learned about Weyl’s law, formulated 110 years earlier by German mathematician Hermann Weyl. Tahmasebi realized that although on the surface the connection seemed tenuous at best, it might have had some connection to the computer science problem he was grappling with at the time. Weyl’s law provides a formula for measuring the complexity of the spectral information, or data, contained within the fundamental frequency of a drum head or guitar string, he says.
At the same time, Tahmasebi was thinking about measuring the complexity of input data to a neural network, wondering whether complexity could be reduced by taking into account some of the symmetries inherent in the data set. This reduction can ultimately facilitate and speed up the machine learning process.
Conceived about 100 years before the machine learning boom, Weyl’s law has traditionally been applied to very different physical situations, such as the vibration of a string or the spectrum of electromagnetic (blackbody) radiation emitted by a heated object. . Nonetheless, Tahmasebi believed that a customized version of those laws could help solve the machine learning problems he was pursuing. And if the approach is successful, the results can be significant.
He spoke with his advisor Stefanie Jegelka, an associate professor at EECS and an affiliate of CSAIL and the MIT Institute for Data, Systems, and Society. He believed that this idea was definitely worth investigating. As Tahmasebi saw it, Weyl’s law was about measuring the complexity of data, and so was this project. But Weyl’s law, in its original form, said nothing about symmetry.
He and Jegelka have now succeeded in modifying Weyl’s law so that symmetry can be taken into account in assessing the complexity of a dataset. “To the best of my knowledge, this is the first time Weyl’s law has been used to determine how machine learning can be improved through symmetry,” says Tahmasebi.
The paper he and Jegelka wrote was given a “Spotlight” designation when it was presented at the Conference on Neural Information Processing Systems in December 2023. It is widely regarded as the world’s best conference on machine learning.
“This study shows that models that satisfy the symmetries of the problem are not only accurate, but can also produce predictions with smaller errors using a small number of training points,” said Soledad Villar, an applied mathematician at Johns Hopkins University. [This] This is especially important in scientific areas such as computational chemistry where training data is scarce.”
In their paper, Tahmasebi and Jegelka explore how symmetry, or so-called “invariance,” can help machine learning. For example, let’s say the goal of a particular computer run is to select all images that contain the number 3. If the algorithm could identify 3 regardless of its location, the task would be much easier and faster. They are placed within the box regardless of whether they are exactly centered or off to the side, and whether they are right-side up, upside down, or at a random angle. Algorithms with the latter feature can exploit symmetries of translations and rotations. That is, 3 or any other object does not change on its own by changing its position or rotating about any axis. These changes are called constancy. The same logic can be applied to algorithms that identify dogs or cats. You can say that a dog is a dog, and that a dog is a dog, regardless of how it is included in the image.
The authors explain that the point of the entire exercise is to exploit the inherent symmetry of datasets to reduce the complexity of machine learning tasks. As a result, the amount of data required for training may be reduced. Specifically, the new research answers the following question: How much less data is needed to train a machine learning model if the data contains symmetry?
There are two ways to exploit the symmetries that exist to gain advantage or benefit. The first concerns the size of the sample to be examined. For example, let’s say you are tasked with analyzing an image that has mirror symmetry (where the right is an exact duplicate or mirror image of the left). In this case, you don’t need to see every pixel. You can get all the information you need from half of the image. These are two improvements. On the other hand, if you can split the image into 10 equal parts, you can get a 10x improvement. The effect of this kind of boosting is linear.
To give another example, let’s say you are examining a dataset to find a sequence of blocks with seven colors: black, blue, green, purple, red, white, and yellow. This makes things much easier if you don’t care about the order in which the blocks are arranged. If order were important, there would be 5,040 combinations to find. However, if all you are interested in is the sequence of blocks in which all seven colors appear, you have reduced the number of items or sequences you search from 5,040 to just one.
Tahmasebi and Jegelka discovered that it is possible to achieve a different kind of gain through symmetry that operates across multiple dimensions: exponential gain. This advantage is related to the notion that the complexity of the learning task increases exponentially with the dimensionality of the data space. Therefore, exploiting multidimensional symmetries can yield disproportionately large returns. “This is a new contribution that basically tells us that higher-dimensional symmetries are more important because they can give us exponential gains,” says Tahmasebi.
The NeurIPS 2023 paper he wrote with Jegelka contains two mathematically proven theorems. “The first theorem shows that the general algorithm we provide can improve sample complexity,” says Tahmasebi. The second theorem complements the first, he added. “It shows that this is the best gain you can get. “You get nothing else.”
He and Jegelka provided a formula to predict the gain from a particular symmetry for a given application. The advantage of this formula, says Tahmasebi, is its generality. “It works for any symmetry and input space.” This can apply not only to symmetries known today, but also to symmetries yet to be discovered in the future. The latter prospect is not too far-fetched, considering that the search for new symmetries has long been a major driving force in physics. This suggests that the methodology introduced by Tahmasebi and Jegelka will get better over time as more symmetries are discovered.
According to Haggai Maron, a computer scientist at the Technion (Israeli Institute of Technology) and NVIDIA who was not involved in the work, the approach presented in the paper “differs significantly from previous work on the subject, including by adopting a geometrical perspective and using tools from the differential approach. Geometry. These theoretical contributions provide mathematical support to the emerging subfield of ‘geometric deep learning’ with applications to graph learning, 3D data, etc. This paper establishes a theoretical foundation to guide further development of this rapidly expanding field of research. “It helps.”