Experimenting with the LeRobot S101

Some good, some not so good, and clarifications on the documentation

Summary:

About a month ago I got a Lerobot kit from Wowrobo. I’ve assembled the leader and follower arms, teleoperated the units and also collected data and trained a model to perform tasks. So far the documentation has been easy to follow. However, there have been some hiccups and they are documented here:

Prerequisites

HF Lerobot documentation: https://huggingface.co/docs/lerobot/so101

Wowrobo kit – https://shop.wowrobo.com/products/so-arm101-diy-kit-assembled-version-1?variant=46588641607897

Issue #1Power Supplies

Maybe it’s different for different motors ( mine was Feetech ), but the leader and follower arms have different power supplies and they make a big difference. The Leader has a 30W 5V, while the follower has a 36W 12V power supply. This tripped me up several times when the board would lose motor ids in a middle of a run.

Issue #2Training

So you’ve collected your teleop data and are ready to train. There are several options

  1. Locally – I would not recommend training a model locally as this is VERY slow and I have an M3 Macbook Pro – and it may be days before you get to 10K steps
  2. Google Colab – this is an alternative for those who are GPU Poor and the option I ultimately used. The HF instructions have a page that walks you through setting it up here. However, the free tier just gives you a T4, which works if you set batch = 1 and then you’ll have to hope you won’t run out of memory. If you want the beefier A100, which will train 100k steps in about 5 hours, you’ll either have to upgrade to Colab Pro or pay as you go. The PAYG option works if you’re doing a one time deal, but you’ll have to babysit the notebook otherwise it’ll disconnect ( I observed this happens every 90 minutes ) and then you’ll have to start over, unless you mounted the output to your own Google Drive ( see below ). The Colab Pro method is supposed to cause less disconnects, but your mileage may vary. As of today, $10 ( 100 credits ) will train your model with some credits left over.
  3. GPU Providers. There’s plenty to choose from, but then you’ll have to do your own setup.

Mounting your google drive to your Colab notebook. Do this in Colab to save your checkpoints so you can resume if you’re disconnected.

from google.colab import drive
drive.mount('/content/drive')

Issue #3Missing model.safetensors file

When it came time to exercise my newly trained model, I discovered the model.safetensors file wasn’t around. I’m still not sure what happened, but make sure as checkpoints are written to check for this file otherwise all that training is for naught.

I used the following command line, which is different from the one published in HF.

!python lerobot/src/lerobot/scripts/lerobot_train.py --dataset.repo_id=dpang/record-test --policy.type=act --output_dir=drive/MyDrive/outputs/train/act_so101_test --job_name=lr_20251211_0949 --policy.device=cuda --wandb.enable=true --policy.push_to_hub=true --policy.repo_id=dpang/my_policy --save_freq=1000 --batch_size=2

Issue #4

Running inference

In order to properly exercise the model, make sure to uncomment/add the telelop arguments to command line provided by the HF instructions, otherwise you can’t reset the scene. I’m not sure why it’s commented out in the example, you really need it between episodes.

lerobot-record  --robot.type=so101_follower  --robot.port=/dev/tty.usbmodem5AB01812601   --robot.cameras="{front: {type: opencv, index_or_path: 0, width: 1920, height: 1080, fps: 30}, top: {type: opencv, index_or_path: 1, width: 1920, height: 1080, fps: 30}}"  --robot.id=le_follower_arm  --display_data=false  --dataset.repo_id=dpang/eval_test  --dataset.single_task="Push cup forwrd"  --policy.path=/Users/dpang/dev/lerebotHackathon20250615/lerobot/outputs_push_cup/train/push_cup_test/checkpoints/100000/pretrained_model --teleop.type=so101_leader --teleop.port=/dev/tty.usbmodem5AB01788091 --teleop.id=le_leader_arm

Issue #5

Colab line to train pi05

So inference with the ‘act’ model went smoothly, but when it came time to try the ‘pi05’ model, things didn’t work as expected. The documentation here for the colab to the train the model didn’t work. I used the line below instead.

!python lerobot/src/lerobot/scripts/lerobot_train.py \
--dataset.repo_id=dpang/record-test \
--policy.type=pi05 \
--batch_size=4 \
--steps=20000 \
--output_dir=drive/MyDrive/outputs/train/my_pi0_5 \
--job_name=my_pi0_5_training_20260116 \
--policy.device=cuda \
--wandb.enable=true \
--policy.repo_id=dpang/my_policy

In addition, I got error messages requiring authorization for the “paligemma-3b-pt-224” model. There is a link provided to get said authorization, but then you’ll have to restart the notebook. Also, make sure to log into HF otherwise you’ll error out trying to write the model to HF.

!huggingface-cli login

Here is a link to a successful run.

An AI investor’s thesis on Google

A few months ago, I wrote about how the large models eat up the stack and the workflow and also expand an individual’s reach beyond their skill set (think engineers becoming product managers and GTM specialists). Carrying that analogy from individuals to companies, we have a company that could potentially eat the entire AI ecosystem: Google. They already have the full stack from the chips, the talent, the resources, models, not to mention loads and loads of data. It would not be inconceivable of them to spread their wings and embed themselves into more of the AI and business domains.

The impetus for this train of thought is a hackathon I went to that featured the app building talents of their AI Studio product. Google has been on a marketing and publicity blitz the past few months – holding a ton of meetups and hackathons publicizing this and other features. I’ve probably seen Paige Bailey more in the past few months than in the past 2 years combined.

In the hackathon, we’re given 3 hours to vibe code, deploy a product and put together a presentation with a 3-minute video (on YouTube of course). On top of that, part of the criteria for winning is how our product performs on social media. In most hackathons, this would be impossible because most new offerings from startups don’t work half the time. If something was to be accomplished in 3 hours, there would be a template or a workshop-like program where teams are walked through a reference implementation, which most would hand in anyway.

This was purely starting from scratch with nothing but the build feature of AI Studio, and it worked. We vibe coded and deployed our app into production as did so many other teams, and the variety of ideas that came into fruition was staggering.

So how does this go towards the thesis of Google eating everything?

It’s similar to the Apple Store or Amazon’s marketplace. As more apps get deployed with increasing sophistication in Google’s app ecosystem, Google gets to see what works and what doesn’t. They can then choose to buy, host, or duplicate the product. Either way, Google gets to expand their footprint throughout the AI economy (and collect all that data to boot).

So, what could get in their way? Plenty. Don’t forget Google is still a large company and unless they’ve drastically revamped their culture and structure, they’re still prone to the same missteps that plague big behemoths. Throw in antitrust and competition from another 800-pound gorilla – the Elon Musk company universe (X, XAI, Tesla, Neuralink, SpaceX, …) which will stop at nothing until achieving total domination, and we should see plenty of fireworks in the next few years.

I haven’t mentioned the big frontier labs (OpenAI, Anthropic). They’ll still be around and possibly survive, but they’re not going to 10x, let alone 100x from where they are – they still have to spend to expand, and the combination of Google, Elon, open source, and Chinese models/companies are going to eat into their margins. As an AI investor, my money, across all private and public markets, is going to be on Google.

It all seems so quaint – looking back at GenAI posts from 3 years ago

I just had a chance to pull my head above water and spend some time putting some thoughts down after looking at my AI posts from 3 years ago. It’s amazing what has evolved since then and provides a framework to think about what comes next.

The posts in question are related to coding https://numbersandcode.wordpress.com/2022/12/16/coding-kaprekars-constant-using-chatgpt/, and image and video generation https://numbersandcode.wordpress.com/2022/06/22/dall-e/

One recurring theme when writing these articles is that there’ll always be a phrase, “unless you’ve been living under a rock” or “unless you’ve been a hermit” because every week some AI subject goes viral. So unless you’ve been sleeping for the past 20 years like Rip Van Winkle, you would know about…

Cursor from Anysphere and other coding apps like Windsurf, Factory, Devin from Cognition Labs, Lovable, Replit, Bolt…the list goes on and on, not to mention their valuations ( Cursor – $9B https://techcrunch.com/2025/06/05/cursors-anysphere-nabs-9-9b-valuation-soars-past-500m-arr/ and Windsurf – bought by OpenAI for $3B )

Three years ago I was cutting and pasting code between the editor and ChatGPT. Now it’s built into the IDE or in the case of Replit and Lovable, everything is done online. The stack has compressed and with the current trends around agents and reasoning models, what’s preventing whole workflows and systems from being reproduced from a prompt? In the future, you could just have an idea and prompt a product into existence, complete with market research and GTM strategy, not to mention the implementation and product documentation in between.

…and unless you’re sleeping beauty, you would know about

Veo3. Google’s new video generation model that’s been taking the internet by storm with numerous examples here and here. The improvements here are more dramatic – whereas code generation and putting together an algorithm was achievable, stringing videos together to make a believable ad wasn’t remotely possible three years ago. Now anyone can make movies, shorts, ads, and scenarios, and personalize them, possibly in realtime.

By now you would’ve figured out the framework needed to think about what could happen in the next 3 years. It’s compression and expansion – the collapse of the tech stack, workflows, functions, and organizations that are familiar in your world, and it’s ability to help you expand your reach beyond your perception.