Large-scale language models, such as those powering popular artificial intelligence chatbots such as ChatGPT, are incredibly complex. Even though these models are being used as tools in a variety of areas, including customer support, code generation, and language translation, scientists still don’t fully understand how they work.
To better understand what’s going on under the hood, researchers at MIT and elsewhere have studied the mechanisms by which these giant machine learning models operate as they retrieve stored knowledge.
They found surprising results. Large language models (LLMs) often use very simple linear functions to recover and decode stored facts. Moreover, the model uses the same decoding function for similar types of facts. A linear function, an equation with only two variables and no exponents, captures a direct linear relationship between two variables.
Researchers have shown that by identifying linear functions over a variety of facts, they can interrogate a model to see what it knows about a new topic and where that knowledge is stored within the model.
Using techniques they developed to estimate these simple functions, researchers found that models often stored the correct information even when they responded incorrectly to prompts. In the future, scientists could use this approach to find and correct falsehoods inside models, which could reduce the tendency of models to sometimes give inaccurate or nonsensical answers.
“Even though these models are trained on a lot of data and are very complex, non-linear functions that are very difficult to understand, sometimes there are very simple mechanisms at work underneath them. This is an example,” says Evan Hernandez, a graduate student in Electrical Engineering and Computer Science (EECS) and co-author of a paper detailing these findings.
Hernandez wrote the paper with co-author Arnab Sharma, a computer science graduate student at Northeastern University. His advisor, Jacob Andreas, is an associate professor at EECS and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). Senior author David Bau, assistant professor of computer science at Northeastern; There are also other researchers from MIT, Harvard University, and the Israeli Institute of Technology. This research will be presented at the International Conference on Representations for Learning.
fact finding
Most large-scale language models, also called transformer models, are neural networks. Neural networks, loosely based on the human brain, contain billions of interconnected nodes, or neurons, which are grouped into layers to encode and process data.
Most of the knowledge stored in the transformer can be expressed as relationships connecting subjects and objects. For example, “Miles Davis plays the trumpet” is a relationship that connects the subject, Miles Davis, and the object, the trumpet.
As the converter gains more knowledge, it stores additional facts about a specific topic across multiple layers. When a user asks a question about that topic, the model must decode the most relevant facts to answer the query.
Someone said to Transformers, “Miles Davis plays. . .” Models should respond with “Trumpet” rather than “Illinois” (the state where Miles Davis was born).
“Somewhere in the network computation there has to be a mechanism that finds the fact that Miles Davis plays the trumpet, extracts that information, and helps generate the next word. We wanted to understand what the mechanism was,” says Hernández.
The researchers set up a series of experiments to investigate LLM and found that, despite LLM being very complex, the model decodes relational information using simple linear functions. Each function depends on the type of fact being retrieved.
For example, a transducer uses one decoding function each time it outputs a musical instrument played by a person, and a different function each time it outputs the state the person was born into.
The researchers developed a method to estimate these simple functions and then calculated the functions for 47 different relationships, such as “capital of a country” and “lead singer of a band.”
Although there may be an infinite number of possible relationships, the researchers decided to study this particular subset because it is representative of the kinds of facts that can be recorded in this way.
They tested each feature by changing the subject to see if they could recover the correct object information. For example, a function for “capitals of countries” should search for Oslo if the topic is Norway, or London if the topic is England.
The function retrieved the correct information more than 60% of the time, showing that some information in the converter is encoded and retrieved in this way.
“But not everything is encoded linearly. For some facts, even if the model knows them and predicts text that matches those facts, it cannot find a linear function for them. “This means the model is doing more complex work to store that information,” he says.
Visualize your model’s knowledge
We also used this feature to see what the model believed to be true about various topics.
In one experiment, they started with the prompt “Bill Bradley was a” and used decoding functions for “plays sports” and “went to college” to see if the model knew that Senator Bradley was a basketball player who attended Princeton. .
“We can show that even though the model focuses on a variety of pieces of information when it generates text, it encodes all of that information,” says Hernández.
They used this probing technique to create what they called “property lenses.” This is a grid that visualizes where specific information about specific relationships is stored within the different layers of the transformer.
Attribute lenses can be created automatically, providing a streamlined way for researchers to understand more about their models. This visualization tool allows scientists and engineers to modify stored knowledge and prevent AI chatbots from providing incorrect information.
Going forward, Hernandez and his collaborators want to better understand what happens when facts aren’t stored linearly. They also want to run experiments using larger models and study the precision of the linear decoding function.
“This is exciting work that reveals a missing piece in our understanding of how large-scale language models remember factual knowledge during inference. Previous studies have shown that LLM builds information-rich representations for specific topics where specific properties are extracted during inference. “This work shows that complex nonlinear computations in LLM for attribute extraction can be well approximated by simple linear functions,” said Mor Geva Pipek, assistant professor in the Department of Computer Science at Tel Aviv University.
This research was supported in part by Open Philanthropy, the Israel Science Foundation, and the Azrieli Foundation Early Career Faculty Fellowship.