Epoch AI Launches FrontierMath AI Benchmark to Test Capabilities of AI Models

Breaking international news, video, headlines and opinion

Euna Torphy24.11.2024

Epoch AI Launches FrontierMath AI Benchmark to Test Capabilities of AI Models

Epoch AI, a California-based research institute launched a new artificial intelligence (AI) benchmark last week. Dubbed FrontierMath, the new AI benchmark tests large language models (LLMs) on their capability of reseasoning and mathematical problem-solving. The AI firm claims that existing math benchmarks are not very useful due to factors like data contamination and AI models scoring very high scores on them. Epoch AI claims that even the leading LLMs have scored less than two percent on the new benchmark.

Epoch AI Launches FrontierMath Benchmark

In a post on X (formerly known as Twitter), the AI firm explained that it collaborated with more than 60 mathematicians to create hundreds of origins and unpublished math problems. Epoch AI claims that these questions would take even mathematicians hours to solve. The reason behind developing the new benchmark was cited as the limitations with existing benchmarks such as GSM8K and MATH, where AI models generally score a high point.

The company claimed that the high scores achieved by LLMs are largely due to data contamination. This means the questions somehow were already fed into the AI models, resulting in them easily solving the questions.

FrontierMath solves the problem by including new problems that are unique and have not been published anywhere, mitigating the risks associated with data contamination. Further, the benchmark includes a wide range of questions including computationally intensive problems in number theory, real analysis, and algebraic geometry, as well as topics such as Zermelo–Fraenkel set theory. The AI firm says all the questions are “guess proof”, meaning they cannot be solved accidentally without strong reasoning.

Epoch AI highlighted that to measure AI’s aptitude, benchmarks should be created on creative problem-solving where the AI has to maintain reasoning over multiple steps. Notably, many industry veterans believe that the existing benchmarks are not sufficient to correctly measure how advanced an AI model is.

Responding to the new benchmark in a post, Noam Brown, an OpenAI researcher who was behind the company’s o1 model welcomed the new benchmark and said, “I love seeing a new eval with such low pass rates for frontier models.”

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who’sThat360 on Instagram and YouTube.

Poco X7 Pro Could Be the First Smartphone to Ship With Xiaomi’s HyperOS 2 in India

iQOO 13 Colour Options Revealed Ahead of Launch in India on December 3

Post navigation

Previous post:
Trains between London and the West suspended due to Storm Bert
Next post:
Baramati election results live: Ajit Pawar is leading with over 15,000 votes against his nephew

Previous post Trains between London and the West suspended due to Storm Bert

Next post Baramati election results live: Ajit Pawar is leading with over 15,000 votes against his nephew

Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
Comment *
Name *

Email *

Website

Save my name, email, and website in this browser for the next time I comment.

Search

Recent Posts

Budget 2025: Govt may introduce arbitration mechanism to streamline tax disputes
by Easton Lemke

Lions show why they’re made for playoffs, and what else we’re learning in Week 12
by Stuart Padberg

Apple Supplier Goertek to Be NPI Supplier for Two New Products Expected to Arrive in 2026: Ming-Chi Kuo
by Euna Torphy

Ipswich Town 1-1 Man Utd: Why Ruben Amorim expects side to suffer for long period
by Sabryna Stark

IndiGo flights now offer students discounts, special baggage allowance
by Easton Lemke

Vivo Y300 5G Confirmed to Launch in India; Rear Design Teased
by Euna Torphy

Ireland hardens illegal immigration response
by Sabryna Stark

Close

Search for:

Menu

Politics

Business

Technology

Science

Health

Culture & Lifestyle

Education

Sports

Other

Related Post

Apple Supplier Goertek to Be NPI Supplier for Two New Products Expected to Arrive in 2026: Ming-Chi Kuo

25.11.2024

Vivo Y300 5G Confirmed to Launch in India; Rear Design Teased

25.11.2024

NASA and ISRO Join Forces for NISAR Satellite Launch in 2025 to Track Earth’s Changing Surface

25.11.2024

Namechain: The Next Major Project from the Creators of Ethereum Name Service, Explained

25.11.2024

Indian Researchers Develop Energy-Efficient Method to Create Glass, Could Improve Efficiency of Data Centres

25.11.2024

iQOO 13 Colour Options Revealed Ahead of Launch in India on December 3

25.11.2024

Budget 2025: Govt may introduce arbitration mechanism to streamline tax disputes

Lions show why they’re made for playoffs, and what else we’re learning in Week 12

Apple Supplier Goertek to Be NPI Supplier for Two New Products Expected to Arrive in 2026: Ming-Chi Kuo

Ipswich Town 1-1 Man Utd: Why Ruben Amorim expects side to suffer for long period

IndiGo flights now offer students discounts, special baggage allowance

Vivo Y300 5G Confirmed to Launch in India; Rear Design Teased

Ireland hardens illegal immigration response

Record rain and heavy snow slam Northern California, many in Seattle face power outages

NASA and ISRO Join Forces for NISAR Satellite Launch in 2025 to Track Earth’s Changing Surface

Jamie George: ‘England in a very good place’, says captain

HG Infra wins NTPC Vidyut Vyapar Nigam contract for battery energy storage systems

Namechain: The Next Major Project from the Creators of Ethereum Name Service, Explained

Search for: