November | 2025 | numbersandcode

A few months ago, I wrote about how the large models eat up the stack and the workflow and also expand an individual’s reach beyond their skill set (think engineers becoming product managers and GTM specialists). Carrying that analogy from individuals to companies, we have a company that could potentially eat the entire AI ecosystem: Google. They already have the full stack from the chips, the talent, the resources, models, not to mention loads and loads of data. It would not be inconceivable of them to spread their wings and embed themselves into more of the AI and business domains.

The impetus for this train of thought is a hackathon I went to that featured the app building talents of their AI Studio product. Google has been on a marketing and publicity blitz the past few months – holding a ton of meetups and hackathons publicizing this and other features. I’ve probably seen Paige Bailey more in the past few months than in the past 2 years combined.

In the hackathon, we’re given 3 hours to vibe code, deploy a product and put together a presentation with a 3-minute video (on YouTube of course). On top of that, part of the criteria for winning is how our product performs on social media. In most hackathons, this would be impossible because most new offerings from startups don’t work half the time. If something was to be accomplished in 3 hours, there would be a template or a workshop-like program where teams are walked through a reference implementation, which most would hand in anyway.

This was purely starting from scratch with nothing but the build feature of AI Studio, and it worked. We vibe coded and deployed our app into production as did so many other teams, and the variety of ideas that came into fruition was staggering.

So how does this go towards the thesis of Google eating everything?

It’s similar to the Apple Store or Amazon’s marketplace. As more apps get deployed with increasing sophistication in Google’s app ecosystem, Google gets to see what works and what doesn’t. They can then choose to buy, host, or duplicate the product. Either way, Google gets to expand their footprint throughout the AI economy (and collect all that data to boot).

So, what could get in their way? Plenty. Don’t forget Google is still a large company and unless they’ve drastically revamped their culture and structure, they’re still prone to the same missteps that plague big behemoths. Throw in antitrust and competition from another 800-pound gorilla – the Elon Musk company universe (X, XAI, Tesla, Neuralink, SpaceX, …) which will stop at nothing until achieving total domination, and we should see plenty of fireworks in the next few years.

I haven’t mentioned the big frontier labs (OpenAI, Anthropic). They’ll still be around and possibly survive, but they’re not going to 10x, let alone 100x from where they are – they still have to spend to expand, and the combination of Google, Elon, open source, and Chinese models/companies are going to eat into their margins. As an AI investor, my money, across all private and public markets, is going to be on Google.

A few days ago I attended a fireside chat featuring Edwin Chen from Surge AI held at South Park Commons. One of the items he highlighted was how excited he was for RL Environments to better train and evalute frontier models. This caps off a journey that started a few weeks ago when Kasey Zhang and Osmosis hosted the RL IRL forum at Y Combinator which also coincided with an Unsloth/Pytorch/AMD hackathon that featured creating RL environments for Meta’s OpenEnv project.

RL IRL @Y Combinator

Kasey Zhang and Osmosis hosted a forum for applied RL where six talks were held over the course of the afternoon. The summary is written here as well as the recording of some of the talks.

For this post I’d like to highlight the talk by Anshul Bhagi where he gives an overview of Turing’s work in creating Reinforcement Learning (RL) Environments for Enterprise Workflows. Turing focuses on building sophisticated environments and generating high-quality data to help frontier labs and enterprises evaluate and train advanced AI models, particularly for complex, long-range tasks in domains like software engineering, finance, and healthcare. The core of their business involves developing three main types of environments: those for evaluating and training coding agents, fully interactive UI environments that mimic applications, and general purpose function calling environments often based on MCP servers. A significant challenge and key value proposition for Turing is partnering with domain experts to design tasks and data, ensuring the environments accurately reflect high economic value workflows and provide well-calibrated difficulty for model improvement. Finally, the environments are primarily used by customers for model evaluation, generating Supervised Fine-Tuning (SFT) data, and RL training, with a trend towards increasingly complex, hybrid, multi-application workflows.

Contributing RL Environments to OpenEnv

Around the same week I participated in a hackathon sponsored by Unsloth, Pytorch, and AMD. The original challenge was to train agents to pose and answer questions and stump the opponent. The hackathon was moved ahead a week with the additional challenge of creating new environments for the OpenEnv project.

I had a great time creating environments and the OpenEnv project made it easy with a simple interface. My two environments helped agents train to recognize stock trading patterns and play the game of Mastermind while the winners went above and beyond in terms of creativity and implementation:

3rd place:

@sub_zero5167 and @Zeus for a Survival Island game:

https://maroon-demo.vercel.app/

2nd place:

@cpich3g and @dexter for PacMan: https://github.com/cpich3g/pacman-rl

1st place:

@osiris for GRPO on Julia, Ruby, Zig RL environments: https://medium.com/@yogeshsingla481/training-a-multi-language-3b-coder-lm-with-reinforcement-learning-grpo-f37577d5e5f7

Edwin Chen @South Park Commons

Finally, I attended a talk with Edwin Chen of Surge AI hosted by South Park Commons. The fireside chat delved into Edwin’s past, the founding and philosophy behind what Surge AI does, and the company’s deliberate choice to remain entirely bootstrapped, valuing long-term goals and product quality over short-term venture capital metrics.

However, what piqued my attention was his answer to what excites him these days, which was RL Environments. This is emphasized by a recent blog by their research team. It’s a really dense read, but well worth it. In it the team explores how realistic reinforcement learning environments reveal gaps in AI’s ability to act as autonomous agents. In a simulated company called Corecraft Inc., nine leading models — including GPT-5 and Claude Sonnet 4.5 — attempted 150 workplace tasks. Despite their conversational fluency, top models failed over 40% of the time, showing limits in practical reasoning and real-world execution.

Surge then defines a Hierarchy of Agentic Capabilities based on their learnings:

Tool use and planning — decomposing goals and executing steps.

Adaptability — adjusting to unexpected inputs or changing conditions.

Groundedness — staying factual and contextually consistent.

Common-sense reasoning — inferring correctly beyond training data.

While models perform basic planning well, they struggle with adaptability and reasoning. The blog concludes that the future of AI lies in environments that replicate real-world work to cultivate truly capable, grounded agents.

Summary

There you have it — a quick tour of the RL Environment landscape encapsulated by three events over the course of a month. Hope the links help and happy reading!

numbersandcode

Just another WordPress.com site

Monthly Archives: November 2025

An AI investor’s thesis on Google

RL Environments

RL IRL @Y Combinator

Contributing RL Environments to OpenEnv

Edwin Chen @South Park Commons