Scoopfeeds — Intelligent news, curated.
computer-science

Why removing 'um' from a recording is harder than it sounds

Hacker News · Jun 12, 2026, 12:42 AM

Key takeaways

  • Linguists have a word for the ums, uhs, ers, and elongated versions (ummmm, uhhhhh) that pad spoken English: disfluencies.
  • I don t record a lot of voice audio, but a few friends do, and they tell me editing those out by hand is miserable.
  • uvx erm input.wav That s the whole interface for the common case.

Linguists have a word for the ums, uhs, ers, and elongated versions (ummmm, uhhhhh) that pad spoken English: disfluencies.

I don t record a lot of voice audio, but a few friends do, and they tell me editing those out by hand is miserable. So I built erm to do it.

uvx erm input.wav That s the whole interface for the common case. It writes a cleaned .wav and a JSON cut list next to the input. This post walks through how it works, because the obvious approach doesn t sound very good and most of the code is the stuff that fixes that.

Article preview — originally published by Hacker News. Full story at the source.
Read full story on Hacker News → More top stories
Aggregated and edited by the Scoop newsroom. We surface news from Hacker News alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop