Updated, 9 May 2015: See bottom of the post for the changes.
I finished the last blogpost having scraped the Ofsted ratings with Python, using Scraperwiki to turn the data into a JSON API. So, what to do with the output of my scraper?
Well, my ambitions were fairly modest as far as presenting or visualising the data went. For now at least, all I wanted to produce was a basic table of all inspection ratings. Importantly though, it had to update automatically as new schools were inspected and additional ratings came through.
ImportJSON, and a dead-end
My first idea was to try to get the data into a Google Sheet. Google Sheets are easy to embed into websites and, I thought, would open up options for more sophisticated presentations of the data later on.
Google Sheets also had the advantage of being the most straightforward option, at least in theory.
Sheets offers a number of options for importing data, but it doesn’t contain an option for importing JSON data – not out the box anyway.
But helpfully someone has written a Google Apps Script that adds an ImportJSON function to Sheets with minimal effort.
After loading this script, the function pulled in all of my scraped data – the Sheet can be found here, in fact.
At this point, I was maybe a little hasty.
While the function appeared to have worked correctly, I wanted to check that it would update properly as and when the scraper pulled in new data.
Playing around with the scraper to hide certain rows of data, I noticed that this wasn’t reflected when I refreshed my Google Sheet. So I ditched using Google Sheet and ImportJSON.
(A subsequent conversation with Paul Bradshaw has suggested that Google Sheet only pulls in data from external sources roughly every half hour by default. I probably never waited this long, hence why I say I may have been a bit hasty to move on to a new option.)
How else to build a table of results that would update automatically?
Specifically, the sources I used for my code were:
Then, in order to run D3 code in WordPress site, I installed a small plugin – WP-D3.
And…voila, I had my table, which will update as and when new free schools are inspected for the first time or a school receives a second visit.
The resource looks basic, and could definitely be added to – there are other pieces of information it would be useful to include, such as whether a free school is a primary or secondary school, and it’d be great to add features such as filters and a search box as the number of inspected free schools grows.
But for now, the tool does what I’d hoped it would, and definitely makes this important dataset more accessible to people.
Lessons I learnt
Although an obvious one, one of the main things I learnt in doing this is that even a relatively simple web tool requires much more effort than will be apparent from the outside.
Even after I’d completed the scraper – probably the easy part, actually – it took me a good amount of time to work out how best to bring the results into the EverythingFreeSchools site.
My experience of ImportJSON – and the fact that the Google Sheet I set up does in fact appear to have stayed up-to-date since I set it up – should probably also teach me to check every possibility for why something doesn’t appear to be working, before moving on to another option.
That said, I’m glad that I ditched Google Sheets as a method.
The output of my efforts – back-end plus front-end – can be found
here here [NB: the front-end as it stands now was largely the result of work which I describe in this third blogpost about scraping the Ofsted website].
If anyone wants the scraper code then do get in touch. I intend to make the code available to all – probably by moving to hosting it on Morph.io – but in the meantime I’d be happy to share it by email.
And if you’ve got any thoughts on what I’ve done, then please either leave a comment or get in touch.
Links to my free school Ofsted ratings resource have also been updated – EverythingFreeSchools, the site on which it originally sat, is now deprecated.