Alright, let me tell you about this “swearing in tv” thing I messed around with. It was kinda fun, kinda dumb, but hey, that’s what side projects are for, right?

So, the basic idea was: could I detect swearing in a TV show and then, like, do something about it? Maybe mute it, maybe blur the screen – I wasn’t sure yet. First thing I did was google around for existing libraries. Figured someone must have tackled this problem before. Found a few Python libraries for audio analysis and some pre-trained models for speech recognition. That seemed like a good starting point.
Next step? I needed some TV show audio. I ripped the audio from a few episodes of “South Park” because, well, it’s South Park. Plenty of swearing to test with. I used some free audio editing software to extract the audio into WAV files. Then, I started messing with the Python libraries.
I picked a speech-to-text library (I think it was SpeechRecognition). I had to install a bunch of dependencies, which was a pain, but eventually, I got it working. I fed it the audio file and it spat out the transcript. The accuracy wasn’t amazing, but it was good enough to pick up most of the curse words.
Okay, so now I had a transcript. The next part was the swear word detection. I built a simple list of curse words – the usual suspects. Then I just looped through the transcript and checked if any of those words were present. Real basic stuff.
Here’s where things got a bit trickier. I wanted to know when the swearing happened, not just if it happened. So, I had to figure out how to get timestamps. This meant digging deeper into the speech-to-text library. Turns out, it could give you the start and end times for each word. Awesome!

Now, I had a list of timestamps for all the curse words. The next step was to actually do something with that information. I decided to go with muting the audio. I used another Python library (pydub, maybe?) to manipulate the audio. I basically told it to cut out the audio between the start and end timestamps of each swear word.
Putting it all together, I had a script that would:
- Take an audio file as input.
- Transcribe the audio.
- Detect swear words and their timestamps.
- Mute the swear words.
- Output a “clean” audio file.
Did it work perfectly? Hell no. The speech recognition wasn’t perfect, so it missed some swear words. And sometimes it muted words that weren’t even swear words because the transcription was off. Plus, the muting was kinda abrupt and noticeable.
To improve it I would need to:
- Use a better speech-to-text model.
- Fine-tune the swear word detection.
- Implement a smoother muting technique (maybe fade the audio in and out).
I thought about blurring the screen, but that sounded like way more work. I’d have to sync the video and audio, analyze the video frames, and blur out the right parts of the screen at the right times. Too much effort for a weekend project.

The point is, it was a fun little experiment. It showed me how much work goes into something like censoring TV shows. And I learned a bit about audio processing and speech recognition along the way.
I haven’t touched the code in months, but maybe I’ll revisit it someday. Who knows, maybe I’ll build the world’s first truly swear-free TV experience!