News

Команда форума

Редактор

Offline

Earlier this year, we took a look at how and why Anthropic's Claude LLM was struggling to beat Pokémon Red (a game, let's remember, designed for young children). But while Claude 3.7 is still struggling to make consistent progress at the game weeks later, a similar Twitch-streamed effort using Google's Gemini 2.5 model managed to finally complete Pokémon Blue this weekend across over 106,000 in-game actions, earning accolades from followers, including Google CEO Sundar Pichai.

Before you start using this achievement as a way to compare the relative performance of these two AI models—or even the advancement of LLM capabilities over time—there are some important caveats to keep in mind. As it happens, Gemini needed some fairly significant outside help on its path to eventual Pokémon victory.

Strap in to the agent harness

Gemini Plays Pokémon developer JoelZ (who's unaffiliated with Google) will be the first to tell you that Pokémon is ill-suited as a reliable benchmark for LLM models. As he writes on the project's Twitch FAQ, "please don't consider this a benchmark for how well an LLM can play Pokémon. You can't really make direct comparisons—Gemini and Claude have different tools and receive different information. ... Claude's framework has many shortcomings so I wanted to see how far Gemini could get if it were given the right tools."

Read full article

Comments

Похожие темы	Форум	Дата
News Google is rolling out Gemini’s real-time AI video features	Overview of computer technology and the Internet.	23 Март 2025
News Google will let you make AI podcasts from Gemini’s Deep Research	Overview of computer technology and the Internet.	21 Март 2025
News So long, Google Assistant. It’s Gemini’s world now	Overview of computer technology and the Internet.	14 Март 2025
News Google Gemini’s AI coding tool is now free for individual users	Overview of computer technology and the Internet.	25 Февраль 2025
News The Morning After: Google accused of using novices to fact-check Gemini’s AI answers	Overview of computer technology and the Internet.	20 Декабрь 2024

News Google is rolling out Gemini’s real-time AI video features

Overview of computer technology and the Internet.

23 Март 2025

News Google will let you make AI podcasts from Gemini’s Deep Research

Overview of computer technology and the Internet.

21 Март 2025

News So long, Google Assistant. It’s Gemini’s world now

Overview of computer technology and the Internet.

14 Март 2025

News Google Gemini’s AI coding tool is now free for individual users

Overview of computer technology and the Internet.

25 Февраль 2025

News The Morning After: Google accused of using novices to fact-check Gemini’s AI answers

Overview of computer technology and the Internet.

20 Декабрь 2024

Tools Web-Органайзер

Tools IP Информер Провайдера

Tools User Temp Cleaner

Tools Netzwerk Analyse Tool Ipconfig

News Why Google Gemini’s Pokémon success isn’t all it’s cracked up to be

News

Похожие темы