Show HN: I taught AI to commentate Pong in real time

207 points by pncnmnp 2 months ago

mtlynch 2 months ago

This is a fun idea, but I feel like Pong is too simple for the execution to work.

I watched the video, and it seemed like everything it was saying, you could have just pre-programmed for the very limited state space of Pong. It reminded me a little bit of the stock John Madden and Pat Sumerall sound bites that would play during 90s / early 2000s Madden games.

Could you apply the same idea to chess or Texas Hold 'Em? I feel like the additional complexity of those games could lead to more interesting commentary.

pncnmnp 2 months ago

Author here. I agree with you - the number of metrics I can experiment with in Pong is limited. Chess and Go are next for me.
Overall, the simplicity of this project has helped me test the waters before diving into more complex territories. The underlying pipeline isn't bad - the approach of collecting events, periodically generating metrics from them, prioritizing them, generating commentary text, queuing those outputs, and then synthesizing speech should serve as the core for similar work.
It's also given me some intuition on how I can construct an "ecosystem" of data surrounding live action, to add a layer of realism to the narratives.
- Tade0 2 months ago
  
  If I may, I would like to propose an, ahem, sport:
  https://m.youtube.com/@JellesMarbleRuns
  Greg Woods' commentary really brings this world of marble racing to life.
  - pncnmnp 2 months ago
    
    Hehehe! I love Jelle's Marble Runs - long-time subscriber. John Oliver introduced me to it - https://www.youtube.com/watch?v=z4gBMw64aqk
- rybosome 2 months ago
  
  This is a great premise, and that underlying pipeline you mention sounds like a generally useful system for live commentary with the appropriate abstractions.
  I’m curious to know more about how you retrieve from this ecosystem of data to add color. You mentioned nearest neighbor search, is that over game state? How is the data stored and queried?
  - pncnmnp 2 months ago
    
    Absolutely! I can elaborate on that part.
    The code starts by simulating 15 tournament years (like from 2010 to 2024), with each year containing 4 grand slam tournaments - held in a knockout format. There are 64 players in the pool, all starting with an initial ELO score.
    These players compete in the tournaments, with outcomes predicted based on their ELO ratings. ELO is then updated after each match. We rank players solely based on their ELO. Once the simulation completes, it generates a wealth of data. For each game, details such as points scored, points allowed, fastest ball speed, number of aces, point-by-point results, and more are simulated.
    We can then cache and use this information for a ton of color commentary. For example, we can identify the GOATs of the game, highlight players who are performing exceptionally well, pinpoint underdogs, find matches similar to the one currently being played, etc.
    However, I am just scratching the surface. Imagine having a function that considers "age" alongside ELO. Then, you could simulate performance based on age as well - and show things like the younger generation overtaking older players, or veterans still competing despite being past their prime. With a fn like this, you could simulate matches that span the past 75-100 years, generating a ton of nice data to analyze.
    Data itself is not fun - you need nice metrics too - for fun correlations! See https://en.wikipedia.org/wiki/Baseball_statistics. The metrics don’t have to be perfect, after all, humans aren’t perfect. The key is engagement.
    To find similar games, I store and cache all historical matches in a KD-tree, then use a NN search to find similar games - that's quite fast!
    Some commentary can also be dynamically generated at runtime - for example, locker-room whispers. It is important to provide GPT with a decent historical window to avoid generating contradictory info in such cases.
htrp 2 months ago

>Could you apply the same idea to chess or Texas Hold 'Em? I feel like the additional complexity of those games could lead to more interesting commentary.
The additional complexity in something like hold 'em lends itself extremely well to LLM generated commentary.
- vunderba 2 months ago
  
  Agreed. Adjusting the LLM temperature to tweak speculation based off the fact that even though the AI commentator has access to all hands, all future draws represent the aspect of "imperfect information" would also be a fun experiment.
93po 2 months ago

i think pong being so simple is why this is funny and interesting
- DigiEggz 2 months ago
  
  I agree with this idea. The reason I visited this is because of the idea of commentating Pong is inherently amusing. It makes me realize I'd be down to watch competitive pong.

QRe 2 months ago

Fun experiment. Main limitation I see is the delay between actions and commentary because of the whole script generation & TTS overhead. It seems like the commentary can quickly fall behind, especially in fast-paced sports.

haneul 2 months ago

Naw there are tricks you can use to pipeline these things so that apparent latency is under 500ms even with significant game state history awareness, and also to interrupt ongoing but freshly out of date commentary.
I couldn’t get it under 250ms though (for rocket league), but the tech should be better now than 2024.
pncnmnp 2 months ago

Author here. TTS and script generation can be a bit of an overhead for now, which is why I've worked with metric aggregates - 30+ bounces rather than exactly 33, for example. For this game, one might ideally want this overhead to be less than the time it takes for the ball to bounce from one paddle to another, which can be around 1–2 seconds. However, there may be another strategy to (maybe?) overcome this: start synthesizing numbers (ignoring the fractional part) using TTS and cache them for both commentators. Then, patch those audio clips together after core part is synthesized. It should be doable, I think - I just haven't gotten to it yet. Note that matching the excitement and tempo of core commentary with those numbers is key - otherwise, it will feel janky.

isaacremuant 2 months ago

The idea is very fun and kudos on the author but it definitely feels lacking on the actual game commentary itself. My perception, is that it throws random fillers that don't quite feel like apt commentary.

If you're jokingly imitating filler from bad commentary I understand but I think I'd like more play by play and less color, but of course pong has a limited amount of inputs to work with for that commentary.

One thing that could very well work for the latency issue some commenters post is to just send the events and receive commentary outside of the rendering and playback so that it, within some max delay, can look more immediate and in sync.

Very fun idea. Hope to see it with more complex things with more inputs.

croemer 2 months ago

If you want to skip the demo's fairly boring pre-match talk, the fun starts here: https://youtu.be/i21wN6CDsE0?si=cdUs_xLCwE8B0ATq&t=153

kovezd 2 months ago

It's funny that even in a simple setting the AI couldn't avoid to hallucinate. As this clearly isn't a champions match.
- croemer 2 months ago
  
  No, it's not a hallucination, it's explicitly what it was shown in the 2.5min you skipped :) https://www.youtube.com/watch?v=i21wN6CDsE0&t=16s
computerthings 2 months ago

[dead]

petercooper 2 months ago

I want this for when I'm working.

"Here we see Peter copying and pasting in some generic quick sort algorithm from.. somewhere. Stack Overflow? ChatGPT? Who knows. And he goes for the compile without writing any tests! Let's see if it compiles first time. And it's a noooooo! Bad luck, let's see how he gets out of this pickle. (I told you he should have written some tests.)"

qwertox 2 months ago

It raaaaaaannnnnn, no exceptiooooonss!!! Can you believe it??? Can you believe it how he compiled that code? What a beauty, what a beautiful job he's done...
I wonder if NotebookLM's podcast function could be used for this, to comment on code with the spirit of a Latin American soccer commentator. Because having it comment code is already pretty useful if you don't want to explain others what you have been doing. It can do that pretty well for you.
- A4ET8a8uTh0_v2 2 months ago
  
  Or we could go with David Attenboroough:P
  dolphin mistral output:
  "In the digital ecosystem, where binary code intertwines with human cognition, there exists an important ritual known as the Coding Review. This intricate dance is not dissimilar to how our ancestors gathered around a communal fire, sharing stories and experiences in order to pass on wisdom and understanding of their world.
  The coding review takes place in a carefully-crafted digital habitat - often referred to as a development team's workspace. Here, the code, akin to DNA that carries the blueprint for all life forms, is meticulously examined by a group of highly specialized creatures known as developers and quality assurance analysts."
Retr0id 2 months ago

Someone could totally make this as a vscode (etc.) extension
pacifika 2 months ago

Like this https://www.youtube.com/watch?v=zgq236m0bcQ
layer8 2 months ago

Or an AI doing the part of the pair programmer who doesn’t have the keyboard.

jart 2 months ago

Do headline games like John Madden do this? That's a great use case for LLMs.

Fripplebubby 2 months ago

You might not know this if you don't actually play these games (Madden, 2K for NBA, MLB The Show), but the commentary is extremely high quality, sometimes comparable to the TV broadcast with riffs and tangents as well as describing the action. Over many years of producing these games they have continually refined the process. Of course, eventually you will hear repeating dialogue if you play the games enough, but I think the baseline quality is going to be _very_ hard to replicate with an LLM.
IshKebab 2 months ago

Yeah I was thinking the same. No more "They've really got to want to win this. This is a game of two halves. Etc."
Though tbh I found it still pretty annoying. Maybe just the tone of voice though, and it's clearly not actually connected to what's happening in the game.
I imagine the major sports game players are working on this.
jsheard 2 months ago

None that I can think of. The Finals has AI generated voiceovers for its announcers, but in that case the lines are pre-written and voice clips generated ahead-of-time so it just reeks of penny-pinching by cutting out real voice actors, rather than using the tech to do things that genuinely weren't possible before.
https://www.youtube.com/watch?v=kZ87wiHps9s

blakeburch 2 months ago

Really fun to see! I'd love to have something similar for esports, like League of Legends or Rocket League. So much of the commentary feels like filler with stats and statements about a player.

haneul 2 months ago

Have done an interactive commentator for rocket league that is also simultaneously your duo partner. Works quite well. This was in October 2024 so the tech is there and even better now.
vishalontheline 2 months ago

E-Sports needs more commentators from Latin America or the Middle East.

neilv 2 months ago

Is a lot of the generated commentary pure fabrication?

raffael_de 2 months ago
If there is a programmed connection to the physics for the in-game commentary then it should be here: https://github.com/pncnmnp/xpong/blob/main/main.py#L212
https://github.com/pncnmnp/xpong/blob/main/main.py#L289:
```
  "- **Shot Angles:** Derive each shot's angle from the (vx, vy) vector:\n"
  "    • Steep angles (>45°) become daring corner lobs or sharp cross-courts.\n"
  "    • Moderate angles (15°-45°) look like graceful arcs that test court coverage.\n"
  "    • Shallow angles (<15°) play out as direct, flat drives down the line.\n"
```
Didn't find where the balls motion is communicated to the LLM.
- SillyUsername 2 months ago
  
  So like real commentary then :D
  - A4ET8a8uTh0_v2 2 months ago
    
    It is all in the delivery, as it were.
- investa 2 months ago
  
  "Real time"
  It does need some pointless anecdotes about past statistics, history of the game, training regimes, new managers and so on!

sim7c00 2 months ago

ths is so funny my god haha. the intro is a bit dry but when the game is on its fire haha :'). what an exhillarating match xD

oulipo 2 months ago

Perfect meta-commentary on why AI is 90% useless stuff. Main use-case of AI in the real-world is really not far from commenting a pong match, eg trying to painfully make something exciting out of something worthless and dull, but not succeeding

smus 2 months ago

I wouldn't say you taught the ai anything so much as wired some API calls together

ayongpm 2 months ago

Pretty cool. I can see how commentary could make even Pong more interesting. Maybe there’s room for a pro Pong competition, kind of like what Tetris has.

indigodaddy 2 months ago

How about an alternative commentary to the staring competitions? It must be wry and dry of course, and hard to beat the existing commentary, but might be interesting to see how it turns out.

https://youtu.be/SWgg20IqibM?si=xP5ZpcQu8P2V2ZTc

pawelduda 2 months ago

Looks like the idea of Morgan Freeman narrating life in real time is closer to reality than ever

netsharc 2 months ago

Hah, now I'm thinking of an AI that knows your personality and steers you do things you've always been fearful or anxious about. "Steve sees a 10/10 girl, the kind he's always felt inadequate to talk to, but he's a different man now, and he's going to tell her 'Tickle your ass with a feather?' in 5... 4... 3... 2...".
Hah, my next startup is an AI-Assist Pick-Up Artist. But that's the "Lamborghini-desiring Crypto-Bro" package that's 49.95 USD/month, the entry level feature would encourage you to go to the gym and eat your vegetables.
AI voices talking to you... now the hallucinations are actually in your head!

hvardhan878 2 months ago

I wonder if you can clone the voice and tonality of Peter Drury and even make a game of Pong emotional.

danjl 2 months ago

What's with the circular ball?

antonvs 2 months ago

It’s 2025. We’ve been able to render small circles for a while now!
Seriously though, the entire graphics display is much more hi res than the original, and it’s not trying to emulate the original resolution. So one slightly more serious way to answer the question is, all the graphics are higher resolution, it’s just that you notice it more when it comes to the ball.
- danjl 2 months ago
  
  Ok, then where's the ray tracing? ;-)
  - antonvs 2 months ago
    
    Here it is! https://www.youtube.com/watch?v=Z7xoagOaZl8
    
    danjl 2 months ago
    
    Nice!

MontgomeryPy 2 months ago

Tom Brady's job as a color commentator may be in jeopardy ;)

DonHopkins 2 months ago

Commentator 1 (Greg “The Swatch Whisperer”): Welcome back, folks, to what can only be described as the pinnacle of human achievement: watching Disney Princess™ Pink paint dry. I haven’t been this excited since the 2002 Home Depot Black Friday Sale when I almost got my hands on a discontinued eggshell Martha Stewart Lavender.

Commentator 2 (Marsha “Two Coats” Hernandez): Greg, I still remember the way you wept in aisle 7. But let’s talk about today’s masterpiece—Disney Princess Pink, the shade officially inspired by the collective inner glow of Aurora, Cinderella, and, dare I say, Ariel's clam-bikini energy.

Greg: Absolutely, Marsha. And look at that glorious semi-damp sheen—like a freshly glazed donut at sunrise. It’s got a dreamy undertone of "your niece’s birthday party at 10 a.m. with a bouncy castle and too much Capri Sun."

Marsha: Oh-ho, what’s this? Is that… yes, I think the lower left quadrant is beginning to matte. Ladies and gentlemen, we may be witnessing the first signs of Stage 3: The Settling of the Pigment.

Greg (choked up): My god… I haven’t seen a transition like this since Elsa’s Let It Go phase. Remember that? How she emotionally dried her entire personality over a solo in under three minutes? Iconic.

Marsha: Speaking of queens, this paint owes everything to Belle’s bedroom in the lost “Live Laugh Library” deleted scene. That’s the shade they were going to use until someone spilled tea on the concept art. Literally. It was Chip. That kid is a menace.

Greg: I’m sorry but—hold on—this is huge. That patch near the window just tightened. We are witnessing micro-shrinkage. It’s subtle, it’s refined, it’s got the attitude of Mulan at a dim sum buffet. She came hungry, and this paint came to DRY.

Marsha: Greg, if this drying pace keeps up, we’re on track for a Suburban First-Timer Finish Time. I haven’t seen Disney Pink behave like this since the infamous 2017 "Frozen Themed Daycare Hallway Incident." They had to repaint in Tiana Teal—the shame.

Greg: And oh! There it is! That final middle patch—she’s going matte, folks. This wall is becoming a canvas of completion, a poetic stillness in a chaotic world. I feel like I just watched Cinderella get her slipper and a Roth IRA.

Marsha (tearfully): This… is why I do this job. For moments like this. For the shimmerless silence. For the slow, glorious commitment to finality.

Greg: And so we leave you, dear viewers, staring into a flat, fully-dry future. The room has changed… and so have we.

antonvs 2 months ago

> Swatch Whisperer
According to Google, you’re only the second person in recorded human history to use these two words together.
aspenmayer 2 months ago

> Greg: And oh! There it is! That final middle patch—she’s going matte, folks. This wall is becoming a canvas of completion, a poetic stillness in a chaotic world. I feel like I just watched Cinderella get her slipper and a Roth IRA.
> Marsha (tearfully): This… is why I do this job. For moments like this. For the shimmerless silence. For the slow, glorious commitment to finality.
> Greg: And so we leave you, dear viewers, staring into a flat, fully-dry future. The room has changed… and so have we.
I’m getting major Broomshakalaka vibes in the best possible way.
https://www.youtube.com/watch?v=zt2uIhAvQZ8