Elon Musk’s xAI has unveiled Grok-1.5, a multimodal AI model designed to outperform competitors in understanding real-world scenarios.
Following in the footsteps of other products like GPT-4V, the new Grok-1.5 introduces visual processing capabilities to analyze everything from documents and diagrams to charts, screenshots and photos.
Grock-1.5 It also dominated text, coding, and math tasks, scoring 50.6% on the MATH benchmark, 90% on the GSM8K benchmark, and 74.1% on the HumanEval benchmark.
This puts Grok-1.5 right into the LLM heavyweight ranks, scoring slightly lower on average than Gemini Pro 1.5, GPT-4, and Claude 3 Opus.
Grok-1.5 also offers longer context understanding, up to 128K tokens, a 16x increase over previous versions, but it lags far behind what Claude 3 Opus and Gemini 1.5 Pro boast.
The A Needle In A Haystack (NIAH) evaluation demonstrated the ability of Grok-1.5 to find embedded text within a context of up to 128K tokens in length.
But what xAI is pushing the hardest is Grok-1.5’s vision technology.
citizen Grok-1.5 converts block schemes to Python code, generates bedtime stories inspired by children’s drawings, creates CSV datasets from screenshots, and even “expands” memes.
Grok-1.5 tops the leaderboard in some existing benchmarks such as Mathvista and TextVQA, and has the highest score in xAI’s newly established benchmark, RealWorldQA.
Internally, Grok-1.5 is powered by a custom distributed training framework that allows xAI teams to prototype their ideas and train new architectures at scale with minimal effort.
xAI Established last year And it includes some of the world’s leading AI researchers with the ambitious goal of “understanding the universe.”
So far we’ve got the witty and strange Grok-1, who tells people how to make drugs and how to synthesize them. Criticizing Musk and Tesla.
Grok is also connected to X’s posts database, which provides, among other unique features:
Musk’s xAI project challenges the largely closed-source ecosystem of generative AI, making its models generally usable in real-world environments. open source license.
xAI’s open thesis, combined with Meta’s similar intent to counter competitors, could become a thorn in the monetization efforts of OpenAI, Microsoft, Anthropic, and Google.
RealWorldQA
Concurrently with the preview of Grok-1.5, xAI released RealWorldQA, a new benchmark consisting of over 700 images. Each image is accompanied by a question and a verifiable answer.
The dataset mainly consists of anonymized images captured from vehicles and other real-world situations.
The RealWorldQA dataset evaluates the spatial understanding capabilities of Grok 1.5 and other multimodal AI models.
Grok-1.5 outperforms its competitors in this xAI generation benchmark, and it will be interesting to see if it catches on.
Although it falls short of understanding the universe, Grok-1.5 will establish itself as another top model in an ever-growing lineup and will show how generative AI in its current form has reached its peak. long.