Updated, 9 May 2015: See bottom of the post for the changes.
One of the projects I’ve spent quite a bit of time working on over the past year has been the specialist free schools news site I co-founded, EverythingFreeSchools [NB: as of September 2014 this site is no longer live. The free school Ofsted ratings resource I’ve developed can now be found here.]
The site has performed well, building a regular readership of people with an interest in the policy. And besides the news stories the site has broken, one thing that has proved popular are data resources – such as this look at how many teachers without Qualified Teacher Status each free school used.
So for a number of months (yes, months), when I’ve had a spare half hour I’ve been working on something – a scraper of all free School Ofsted ratings.
The first question to answer is ‘why would you want to scrape the Ofsted ratings?’
Well, Ofsted ratings matter for free schools – possibly more so than for maintained schools and other types of academy.
As I explain in a little more detail in a post for EverythingFreesSchools, mMost free schools have started with an intake of only one year, and will grow each year. So it will be a while yet before there’s a track record of exam results by which to judge how the policy is working.
Ofsted inspection ratings are the best indication we have so far of how free schools are performing.
The Ofsted site offers some options for filtering searches, but ‘free schools’ is not one of the options offered.
Ofsted also publishes monthly stats, that do give the latest inspection results, in a way that can be filtered. But they’re precisely that – monthly stats. Call me impatient, but I want something a little more real-time.
The resource was inspired in part by Watchsted, a great site that opens up Ofsted rating data. John Winstanley from the Watchsted team has also scraped free school Ofsted ratings, however the data is only updated every few weeks.
The scraper itself is built in Python using Scraperwiki.
Scraperwiki provides somewhere to host the scraper, and allows scrapers to be set to run it at regular intervals.
In the case of the free schools Ofsted ratings scraper, I’ve set it to check the Ofsted website ever night, so the data is always (almost) up-to-date.
The Ofsted website
Thankfully, the Ofsted website is fairly well structured.
Every school that has been inspected to date has its own page, identified by a unique reference number (URN) – this is the page for Discovery New School, the first free school that was forced to close.
Given the Ofsted website doesn’t allow searches by ‘free school’ status, I took a list of URNs for all open free schools, formatted it slightly in Word and fed it into the Python code.
(This is one area of possible improvement for the scraper – not hard-coding the URNs to hit. But for now it isn’t a problem – new free schools generally open at a single, fixed time of the year – and I’m not aware of an API that gives ready access to all open free schools URNs.)
So the scraper tries to go to the Ofsted website of every free school, and where it finds a site, starts grabbing information.
Specifically the scraper grabs the name of the school, the inspection and publication dates, and – the important one – the inspection rating.
It then saves each set of scraped data into a database that can later be queried.
There’s more data I might try and grab, or pull in, later – the local authority in which the school is located, for one – but for now I wanted to keep the scraper simple.
Errors and issues
I hit a couple of major issues when testing the scraper.
Firstly, and most significantly, the HTML of the containers that hold the actual inspection ratings is slightly…well, unexpected.
Why the rating of ‘inadequate’ schools is held in a container with a class of ins-judgement ins-judgement-4, while the rating of ‘requires improvement’ schools – the rating above inadequate, so logically the third rating – is held in a class of ins-judgement ins-judgement-5, I’m not sure.
That took me longer than I’d like to admit to spot.
Secondly, I noticed that the scraper would not handle 16-19 free schools, of which there are a small number, well. Ofsted inspections of colleges differ to those for primary and secondary schools, so the ratings pages on the Ofsted site are structured differently. This might be something to handle in a later iteration of the scraper.
Back to the scraper. Another advantage of Scraperwiki is that the database of outputted data can easily be queried with SQL.
A couple of simple queries – to strip out free schools that have not been inspected yet, and sort the data by Ofsted rating – got me to wanted.
Scraperwiki then gives the option of producing a JSON API from your data – effectively meaning your data is available at a fixed URL, and can be pulled into another application, or used in a website.
Once I’d got the data, the next step was to produce a basic front-end for it – but that’ll have to wait for now.
The output of my efforts – back-end plus front-end – can be
found here found here [NB: the front-end as it stands now was largely the result of work which I describe in this third blogpost about scraping the Ofsted website].
If anyone wants the scraper code then do get in touch. I intend to make the code available to all – probably by moving to hosting it on Morph.io – but in the meantime I’d be happy to share it by email.
The code probably isn’t as efficient as it might be, but it does the job.
Check back in a couple of days to find out how I turned the JSON feed into a basic data resource on EverythingFreeSchools. And do leave any questions, comments or thoughts below.
Links to my free school Ofsted ratings resource have also been updated – EverythingFreeSchools, the site on which it originally sat, is now deprecated.