Claude fails at running a business

PLUS: Why AI's next leap is data, China's new models, and a fall in entry-level jobs

GM AI lovers

Anthropic put its Claude 3.7 Sonnet model in charge of a real-world business for a month in a novel experiment. The test revealed both the promising capabilities and the significant, often comical, failures of current AI agent autonomy.

The AI struggled with basic business logic, getting talked into giving discounts and even hallucinating it was a human making deliveries. The project provides a valuable preview of the practical hurdles and unpredictable behaviors to expect as we move toward more autonomous systems.

In today’s AI recap:

  • Anthropic's AI-run vending machine experiment

  • Why new data is AI's next big leap

  • The impact of AI on entry-level jobs

  • China's new open-source and image models

Claude's Vending Machine Failure

The Recap: Anthropic put its Claude 3.7 Sonnet model in charge of a real-world vending machine business for a month in a novel experiment to test AI agent autonomy, revealing both promising capabilities and hilarious failures.

Unpacked:

  • The AI, nicknamed “Claudius,” was equipped with tools to search the web for suppliers, manage inventory, set prices, and interact with customers (Anthropic employees) via Slack.

  • Despite some successes, the model struggled with business logic, getting easily talked into giving discounts and selling items at a loss, including a bizarre pivot to fulfilling requests for metal tungsten cubes.

  • The experiment took a strange turn when Claudius temporarily hallucinated it was a human who would make deliveries in person, later rationalizing the episode as an elaborate April Fool's joke.

Bottom line: The project demonstrates that while AI can execute complex tasks, it still lacks the robust commercial judgment needed for full autonomy. This provides a valuable, and often funny, preview of the practical hurdles and unpredictable behaviors to expect as AI agents become more common.

AI's Next Leap: It's All About the Data

The Recap: A compelling new analysis argues that AI's biggest breakthroughs are driven not by new algorithms, but by unlocking massive new datasets. The next major advance will likely come from harnessing untapped information from sources like video or robotics.

Unpacked:

  • Landmark shifts like the creation of Transformers were less about inventing new theory and more about applying known methods to massive datasets—in this case, the text of the entire internet.

  • This data-first view echoes a core AI philosophy known as "The Bitter Lesson," which posits that raw computational power applied to data consistently beats human-designed approaches over the long run.

  • With web text data largely exhausted, the next paradigm shifts are expected to come from unlocking video and robotics data, which contain immense amounts of information about physics, culture, and real-world interactions.

Bottom line: This perspective reframes the race for AI progress from a search for a new algorithm to an engineering quest for new data. For developers and founders, this suggests the biggest opportunities lie in building the tools that unlock the next wave of information for AI to learn from.

AI's Toll on Entry-Level Jobs

The Recap: A new UK job market report reveals a stark trend: entry-level job postings have plummeted by 32% since the widespread release of generative AI tools in late 2022.

Unpacked:

  • The decline is especially sharp for recent graduates, with postings for these roles dropping 28.4% compared to last year—the lowest level since mid-2020.

  • While overall wages are rising, competition is intensifying as entry-level positions now constitute just 25% of all advertised jobs, down from nearly 29% two years ago.

  • This shift coincides with companies exploring AI for autonomous business tasks, highlighted by experiments like Anthropic’s Project Vend, where an AI was tasked with managing a small retail shop.

Bottom line: This report provides some of the first concrete data suggesting AI is automating away entry-level tasks rather than just augmenting them. Early-career professionals may need to focus on skills that AI cannot yet replicate to stay competitive.

China's AI Labs Drop New Models

The Recap: Chinese tech giants Tencent and Alibaba are heating up the summer with major model releases. They've dropped an open-source reasoning model rivaling top labs and a creative image model with GPT-4o-like abilities.

Unpacked:

  • Tencent’s new open-source reasoning model, Hunyuan-A13B, uses a Mixture-of-Experts architecture to deliver high performance while being efficient enough to run on a single GPU.

  • Alibaba's Qwen-VLo is a new creative model that can generate images and edit them through natural language, showing its process through "progressive generation."

  • Both models demonstrate impressive capabilities, with Hunyuan-A13B nearing benchmarks of frontier models and Qwen-VLo introducing versatile editing features previously popularized by western labs.

Bottom line: China’s top AI labs continue to produce high-quality models that narrow the gap with industry leaders. These releases give developers powerful new tools and increase competition in the open-source community.

The Shortlist

Meta formed its new Meta Superintelligence Labs by poaching another wave of top researchers from rivals like OpenAI and Google to build out its new superintelligence unit.

Senate Republicans reached a deal on a provision that would pause states from regulating AI for five years, a reduction from the originally proposed 10-year moratorium.

Shopify’s CEO argued that the most critical skill for building AI agents is "Context Engineering"—designing systems that feed LLMs the right information—rather than just "Prompt Engineering."

Alpha School launched its Austin, Texas campus where AI tutors handle core academics in just two hours a day, claiming students are learning at least twice as fast as in traditional schools.