xAI previews Grok-1.5 and creates a new benchmark called RealWorldQA.

Elon Musk’s xAI has unveiled Grok-1.5, a multimodal AI model designed to outperform competitors in understanding real-world scenarios.

Following in the footsteps of other products like GPT-4V, the new Grok-1.5 introduces visual processing capabilities to analyze everything from documents and diagrams to charts, screenshots and photos.

Grock-1.5 It also dominated text, coding, and math tasks, scoring 50.6% on the MATH benchmark, 90% on the GSM8K benchmark, and 74.1% on the HumanEval benchmark.

This puts Grok-1.5 right into the LLM heavyweight ranks, scoring slightly lower on average than Gemini Pro 1.5, GPT-4, and Claude 3 Opus.

Grok-1.5 also offers longer context understanding, up to 128K tokens, a 16x increase over previous versions, but it lags far behind what Claude 3 Opus and Gemini 1.5 Pro boast.

The A Needle In A Haystack (NIAH) evaluation demonstrated the ability of Grok-1.5 to find embedded text within a context of up to 128K tokens in length.

But what xAI is pushing the hardest is Grok-1.5’s vision technology.

citizen Grok-1.5 converts block schemes to Python code, generates bedtime stories inspired by children’s drawings, creates CSV datasets from screenshots, and even “expands” memes.

Grok-1.5 tops the leaderboard in some existing benchmarks such as Mathvista and TextVQA, and has the highest score in xAI’s newly established benchmark, RealWorldQA.

Grok-1.5’s impressive vision benchmark. Source: xAI

Internally, Grok-1.5 is powered by a custom distributed training framework that allows xAI teams to prototype their ideas and train new architectures at scale with minimal effort.

xAI Established last year And it includes some of the world’s leading AI researchers with the ambitious goal of “understanding the universe.”

So far we’ve got the witty and strange Grok-1, who tells people how to make drugs and how to synthesize them. Criticizing Musk and Tesla.

Grok is also connected to X’s posts database, which provides, among other unique features:

Musk’s xAI project challenges the largely closed-source ecosystem of generative AI, making its models generally usable in real-world environments. open source license.

xAI’s open thesis, combined with Meta’s similar intent to counter competitors, could become a thorn in the monetization efforts of OpenAI, Microsoft, Anthropic, and Google.

RealWorldQA

Concurrently with the preview of Grok-1.5, xAI released RealWorldQA, a new benchmark consisting of over 700 images. Each image is accompanied by a question and a verifiable answer.

The dataset mainly consists of anonymized images captured from vehicles and other real-world situations.

The RealWorldQA dataset evaluates the spatial understanding capabilities of Grok 1.5 and other multimodal AI models.

Grok-1.5 outperforms its competitors in this xAI generation benchmark, and it will be interesting to see if it catches on.

Although it falls short of understanding the universe, Grok-1.5 will establish itself as another top model in an ever-growing lineup and will show how generative AI in its current form has reached its peak. long.

What is a network engineer?

MathPrompt: A new AI method for evading AI safety mechanisms through mathematical encoding

Study: AI Could Lead to Inconsistent Results in Home Surveillance | MIT News

GoM on GST rate rationalization will meet on September 25 to discuss slabs, rate adjustments.

DJI OSMO Action 5 Pro camera features new 40-megapixel sensor and longer battery life

What is a network engineer?

Space Marine 2, Opens the Xbox 360 Era, Brothers Enthusiasts in Steam Reviews

Texas Court Dismisses Consensys Lawsuit Against SEC Regarding Ethereum Investigation

Wheels of Change: Self-Balancing Technologies for Urban Mobility

Discord CEO sheds light on future of gamer communication as users cross 200M

Most Popular

Is Trade Republic available in Croatia? Alternatives for 2024

Reach the Limit – Join EU-Startups CLUB

The New York Times sued developers who created Wordle clones.

Our Picks

GoM on GST rate rationalization will meet on September 25 to discuss slabs, rate adjustments.

DJI OSMO Action 5 Pro camera features new 40-megapixel sensor and longer battery life

What is a network engineer?

xAI previews Grok-1.5 and creates a new benchmark called RealWorldQA.

RealWorldQA

Related Posts