Meta AI has released LLaMA, a collection of foundational language models ranging from 7B to 65B parameters. According to the developers, LLaMA can compete with or even surpass existing best models such as GPT-3, Chinchilla, and PaLM.
Large language models (LLMs) trained on large-scale data have shown the ability to perform a variety of tasks, from basic tasks such as summarizing text, preparing text instructions, and writing poetry, to more complex tasks such as writing AI art descriptions.
We used a mix of sources as training datasets for LLaMA developers, including English CommonCrawl, C4, GitHub, Wikipedia, Books, ArXiv, and Stack Exchange. Covered a variety of domains. Unlike Chinchilla, PaLM or GPT-3, LLaMA uses only publicly available data, making its operations open-source compatible, whereas most existing models rely on data that is not publicly available or documented.
To improve training speed, the LLaMA model uses an efficient implementation of causal multi-head attention operators to reduce memory usage and computation. To further improve training efficiency, the developers decided on checkpointing as a means of reducing the number of activations that are recalculated during backward propagation.
Unlike previous research, Meta’s work on LLaMA shows that state-of-the-art performance can be achieved by training solely on publicly available data, rather than relying on proprietary datasets. The developers hope that publishing these models to the research community will help accelerate the development of large-scale language models, improve their reliability, and reduce known problems such as toxicity and bias.
Read more about the study in the paper.