This is another one of those notes-to-self for later, and perhaps to inspire others to try. Putting the log back into blog. While I’d love to learn enough Python or what-have-you to scrape the data from a website, the following tools got the job done.
- Import.io to do the heavy lifting of scraping. The best option I found in an exhaustive half hour of searching and testing.
- Open Refine to split columns where I wanted, though that’s only a part of its power
- Using a spreadsheet as a crowbar to make sure the data was in the right columns. Open Refine probably is the right tool, but good ol’ LibreOffice Calc got the job done.
And pen and index cards, to note what I did so before I try and scrape data from another site, I’ll do a better job.