Okay, so check it out, I was messing around with some old data the other day, specifically the 2013 Kentucky Derby results. Thought it would be a simple thing, right? Just grab the info and maybe do some basic analysis. Boy, was I wrong.

First, I started by trying to find a clean dataset. You know, something in CSV or even a decent table online. Nope. Had to scrape it off some dusty old horse racing site. The HTML was a mess, tables all over the place, and inconsistent formatting. I used Python with BeautifulSoup to try and wrangle it. Spent a good hour just cleaning up the data and getting it into a usable structure. Total headache!
Here’s what I actually did:
- Scraped the data: Used requests to get the HTML, then BeautifulSoup to parse it. The hardest part was figuring out the right CSS selectors to target the table with the results.
- Cleaned the data: Oh man, this was rough. Some cells had extra spaces, others had weird characters. Used regular expressions to clean it up. Also, had to convert the finishing times from strings to actual numbers (seconds).
- Loaded into Pandas: Once I had a somewhat clean list of lists, I dumped it into a Pandas DataFrame. Finally felt like I was making progress!
Then came the “fun” part – trying to analyze it. I wanted to see if there were any interesting correlations between things like post position, odds, and finishing position. Turns out, the data was too limited for anything groundbreaking. I did manage to plot a histogram of the finishing times, which was kinda neat. It showed a fairly normal distribution, which I guess is what you’d expect.
What I tried to do, and failed (mostly):
- Correlation analysis: Wanted to see if there was a strong correlation between odds and finishing position. There wasn’t really anything significant. Probably need way more data to see a real trend.
- Predictive model: Thought I could build a simple model to predict the finishing position based on the odds and post position. The accuracy was terrible, like completely random. Turns out horse racing is more complicated than just numbers!
Ultimately, I didn’t discover any hidden secrets of the 2013 Derby. But I did spend a solid afternoon wrestling with data, which is always a good learning experience. Learned a few new tricks with BeautifulSoup and Pandas. And I was reminded that even seemingly simple data analysis can quickly turn into a messy project. So, that’s the story of my dive into the 2013 Derby results. Nothing earth-shattering, but it kept me busy!

Lessons Learned:
- Data cleaning is ALWAYS the most time-consuming part. Prepare for it.
- Don’t expect to find groundbreaking insights in small datasets.
- Horse racing is hard to predict!