For all practical purposes, the free legal database run by the British and Irish Legal Information Institute (BAILII) is an official source of judgments from senior courts that any member of the public or any journalist can use. But while anyone can read individual judgments and quote bits of them elsewhere, what are the rules about downloading and re-using the content in bulk? Is it public open data or are there restrictions on its re-use? There seems to be some confusion about this, which this article aims to unpick.
What is BAILII?
BAILII is a legal information database which was founded as a charity 20 years ago by a group of lawyers and academics who wanted to provide free online access to legal information, particularly judgments of the higher courts which can set precedents that change or clarify the law. Such judgments therefore form, alongside legislation, one of the primary sources of new law.
It is widely used by the judiciary themselves, both to find judgments from other courts and to publish their own judgments. You’d think the judiciary would have their own website to do this, but they don’t. The Judicial Office website (www.judiciary.uk) posts announcements and information about judges and their work, but it only publishes a limited selection of actual judgments, in cases it deems to be of public (ie newsworthy) interest. Nor do the courts, for the most part, publish their own judgments. The Supreme Court does, and many of the tribunals do, but most courts do not. For them the default location, used by lawyers and non-lawyers alike, is BAILII. Indeed, there is a link to BAILII on the Judiciary website (and also to ICLR, as publishers of the official Law Reports), basically pointing people to where they can get better information.
BAILII is also the preferred destination of hyperlinks from citations of judgments in blogs and from articles in the media (when they bother to link at all), since there is no paywall or registration barrier to the reader accessing the content. (We at the Transparency Project encourage such linking and generally frown on publications which could provide such links but don’t.) By contrast, commercial legal databases, while they may contain far more in terms of content and commentary, are outside the financial reach of casual readers and less well resourced lawyers, such as those engaged in publicly funded (legal aid) work or working pro bono (unpaid).
Yet BAILII occupies a somewhat anomalous position in legal publishing, the nature of which was highlighted by a recent letter sent by the Ministry of Justice in response to an inquiry by a legal publisher. The publisher had asked if it would be permissible to download and re-use judgments published on BAILII. The MOJ’s response was that, although the text of the judgments provided by the judiciary was covered by Crown copyright,
“The MOJ agrees that these judgments can be reused under the terms and conditions of the Open Government Licence”.
It went on:
“Once the judgments have been sent to BAILII by the various teams at the RCJ, the BAILII website becomes the source of those judgments.”
Finally, the MOJ clarified that
“we are content for you to reuse the judgments you want under the terms of the Open Government Licence. They have been made available through BAILII for the purposes of making them easily downloadable.”
The word “download” can, of course, refer simply to the opening of a document or page on the internet, which requires the browser to download the contents of the page in order to display it. But when people talk about downloading, they often mean something more than just opening a piece of content like a document or an image: they mean saving a copy of it, either on the device itself or in a cloud storage facility, with a view to republishing or reusing that content. Many apps automatically save a copy anyway, or can do so if the setting permit it.
Even allowing for this distinction, however, the clear implication of the MOJ’s letter is that anyone can go and help themselves to the data on BAILII and that this includes bulk data downloading, or what is called “scraping”, of ALL the data from BAILII.
Crawling spiders, robots, caching and snippeting
BAILII’s first and most important restriction is against search engines such as Google seeking to index its contents by “crawling” over its site. Crawling is an automated process whereby an indexing “spider” or “robot” script basically opens and reads and indexes in turn the contents of each and every page of each and every website. That’s how you can find things when searching on the internet. Your search terms are matched by something already in the index and the relevant web page is identified in the results. Many of those pages have in fact been “cached” (ie copied and stored temporarily) by the search engine provider, so they can load more quickly than by locating and reopening the original page.
Why is this a problem for BAILII? Their prohibition on crawling is designed to protect them against a situation where a judgment has been accidentally published containing details that should have been redacted, for reasons of privacy, confidentiality or national security, and yet can still be accessed via an earlier cached version of the page, or details preserved in the short sentence or two, or “snippet” of content, displayed on the results page after a search, when the original judgment has already been removed from BAILII’s database.
Google respects this restriction but other search engines such as Bing (the Microsoft search engine) apparently do not. But Google is the most widely used search engine, in those jurisdictions likely to access content on BAILII, so perhaps it is the most important to conform.
Scraping and bulk downloading
“The copyright in the text of legislation and judgments displayed on BAILII’s website may belong to courts, other government bodies, judges, and/or to commercial publishers.”
“Copyright in the hypertext markup of judgments and all other presentational / value added aspects of judgments belongs to BAILII and the authors of the software tools used, whose rights are reserved.”
We understand that to mean that the original approved text of the judgments, which is usually available from BAILII in the form of a downloadable PDF or an RTF document, is not covered by BAILII’s copyright because BAILII has basically not done anything to edit or reformat the contents, let alone add value. However, the HTML (dynamic web page display via the browser of your device) is covered by BAILII’s copyright because they have enriched and formatted the text, often adding links and tagging to improve navigation. For example, they might add dynamic links from a table of contents in a long judgment to enable readers to jump to a particular section, or add law report citations to the name of a case or a link to another document on BAILII.
“users may copy, print and distribute legal materials published on BAILII’s website free of charge and without any other authorization from BAILII, provided that BAILII is identified as the source of the document.”
But that is expressly made subject to the “prohibited uses” listed by BAILII on the same page. These include, most relevantly:
“(a) incorporating … HTML versions of judgments into another website or into the output of a computer program not provided by BAILII itself (including apps or other programs used on a hand-held device or tablet computer);
(b) storing … HTML versions of judgments
(d) abusive use of the BAILII website’s resources and services via automated mechanisms or otherwise, in particular for bulk downloading of documents.”
This clearly prohibits any process involving scraping either as an automated process or indeed manually opening and copying one by one all the HTML judgments in one or more of BAILII’s collections. What is less clear is whether the systematic copying and re-use of the PDF / RTF copies originating from the courts, in which BAILII has no separate copyright, is also prohibited. We think not (unless it can be described as an abuse of BAILII’s resources under point (d) above) and that the MOJ communication was really intended to convey that meaning, rather than anything to do with BAILII’s own copyrighted value-added material (as contained in the HTML display).
Even allowing for that meaning, there remains a question about whether BAILII’s status, and the limited way it is supported by the MOJ, really caters for such a function. But that’s not to say it couldn’t.
What could BAILII do?
BAILII does not currently provide an API (Application Program Interface), which is a facility on a website enabling people to obtain and retrieve content directly (ie without doing individual searches). For example, ICLR on its website www.iclr.co.uk uses an API on www.legislation.gov.uk (the official statute law database managed by the National Archives), enabling users of the ICLR platform to search and retrieve updated legislation directly from source without going to the other website.
If BAILII were to provide something similar, it would enable another information provider to “fetch” content directly from BAILII without having to go to the BAILII website and perform a fresh search. It would also, of course, enable commercial providers to reap all the benefits of BAILII’s collection. If they did so, they could pay BAILII a fee that would help support its operations. However, BAILII has so far resisted calls for it to provide such a service, whether monetised or not.
One problem with such an approach – and indeed with any form of scraping or bulk downloading of content, authorised or not – is the risk that any anonymisation failures which may have occurred prior to delivery of the content to BAILII will be magnified. Such failures do occur, and we have frequently been the ones to spot them and alert BAILII and the Judicial Office. If the same content is also going to be copied over to third party sites, particularly unauthorised ones, the ability to swiftly takedown or amend the content (and the protection against the wrongly released content being crawled and revealed in a cached page or search result snippet) will be severely curtailed. This could have a negative effect on family justice transparency.
A commitment to open justice
These questions about BAILII need to be understood in the context of a wider problem with the systematic provision of public legal information – or legal information that SHOULD be public but often isn’t. That’s not because it’s secret or confidential or subject to a reporting restriction. It’s because it simply hasn’t been published by the official bodies that created it. For example, a comparison of the judgments available on BAILII with those available on commercial legal information services revealed a staggering shortfall, with many more judgments being available behind the paywalls than on BAILII’s free service. (See Daniel Hoadley, Open access to case law – how do we get there? Internet Newsletter for Lawyers, November 2018.)
There is a lack of system and standardisation in the process of transcription and distribution of judgments, which under the open justice principle ought to be available for public scrutiny and research by anyone. (This was one of the themes of our panel discussion at the Byline Festival in August this year: see A Byline Festival conversation about Truth, Trust and Transparency in the Courts.)
Some courts are very good at this – the UK Supreme Court being the supreme example, where all its judgments, as well as videos of the hearings and often the supporting documents are all accessible freely on its own website (something triumphantly demonstrated in the recent Miller 2 / Cherry “Prorogation case” appeals). Other courts are not so good. For example, research by members of TP found decidedly “patchy” adherence to guidelines about publication of family cases issued by Sir James Munby as President of the Family Division in 2014. (See Doughty, Twaite, and Magrath, Transparency through publication of family court judgments, Cardiff University, 2017.)
None of that is BAILII’s fault. The direction needs to come from the MOJ, the courts’ service (HMCTS) and the judiciary.
The Government has committed to working with civil society groups to develop a commitment to open justice in the next Open Government National Action Plan (2021-23). The drafting of such a commitment provides an ideal opportunity for the Ministry of Justice, HMCTS and Judiciary to tackle the issues set up above, as well as other inconsistencies and uncertainties about the external publication of public court data. Scoping work by Spotlight on Corruption (formerly Corruption Watch) in collaboration with the Open Government Network is expected to be published in coming weeks; we will report more on this in due course.
Since you’re here…
We have a small favour to ask!
The Transparency Project is a registered charity in England & Wales run largely by volunteers who also have full-time jobs. We’re working hard to secure extra funding so that we can keep making family justice clearer for all who use the court and work within it. We’d be really grateful if you were able to help us by making a small one-off (or regular!) donation through our Just Giving page. You can find our page, and further information here.
Thanks for reading!