Working with the British Library
Findmypast have been working with the British Library since 2010 to collate, curate and digitise the vast collection of historical newspapers and records held by the institution.
The digitised archive is published in the high-profile British Newspaper Archive, a joint venture between the partners, and at findmypast.co.uk, a long-established family history and genealogy platform.
For the first time, with the launch of The Social History Archive, most of these titles are now available for institutional subscription to academic researchers and lecturers. At the time of writing, the Social History Archive contains over 80% of the content available on the BNA and this will rise to 95% by year end.
To celebrate this milestone, we wanted to share some highlights of our journey with the British Library.
Digitising the archive
The British Library has a mandate to preserve and provide easy access to its archive of historical newspapers and records. One of our core values is to honour and preserve history, so we were delighted when, in 2010, the library selected us as the preferred partner for the digitisation of its newspaper collection.
Our initial 10-year partnership began with a commitment to digitise and preserve 40 million historic newspaper pages stored entirely in hard copy and microfilm at the British Library's newspaper archive in Colindale, London. Soon after our partnership began the British Library established a new centre for newspapers at their home in Boston Spa entirely dedicated to the archive. We relocated our dedicated team and scanning units and have been working together in Yorkshire since.
With the significant technology and manpower needed for the scanning and publishing operation, the team at Boston Spa was supported from the beginning by our operations centre in Scotland who provide technology infrastructure, data engineering, quality control and customer support.
The project, which was renewed in 2020, has been a great success. As of mid-2024, we have digitised more than 80 million pages from the archive. We produce up to 350,000 images per week, creating the largest digital collection of British and Irish newspapers in the world. The collection includes national and regional publications, many of which are no longer in print, dating back to 1699.
In addition to our efforts in newspaper digitisation, it's important to highlight that we've also safeguarded a wealth of historical records. This includes baptisms, marriages, and burials from the British Library's extensive East India Company and India Office archives, which document the British presence in India.
What do we digitise?
Because our scanning team is based in the British Library's own storage facilities, we have extensive access to the British Library's National Newspaper Building. One benefit of being able to access the original bound volumes or 'books' of newspapers and periodicals is that we have been able to scan some of the rarest and most fragile newspapers in the collection. We have even scanned single pages more than two feet wide! As well as scanning from the original paper volumes, we can also access the library's extensive microfilm collection, allowing us to increase our page throughput.
Investment in processes and equipment = high quality images
We've made significant investments in scanning equipment over the years including A0 colour scanners, A1 book scanners and high-speed microfilm scanners that allow us to scan pages up to a square metre in size. The quality of our scanning equipment means that whether we are working from original paper or microfilm, our digital images are incredibly high quality. We've also spent a lot of time in the past few years making our scanning process as efficient as possible, with frequent quality checkpoints throughout the whole process. This not only makes us faster, but it also means that we don't have to go back and rescan pages after the fact.
Creating a searchable index of millions of newspaper pages
Once we have scanned the newspaper pages in Boston Spa in Yorkshire, the images are moved to our operations centre up the road in Dundee, Scotland.
First, individual digitised pages are re-assembled into whole newspapers, and then into entire year runs of newspapers, with the information about where and when titles were published being added to the searchable data.
Next, we use Optical Character Recognition (OCR) technology to "read" the pages and create a text version of what is on the page. This technology is developing all the time, and although it is some way from perfect, we are constantly refining tools and processes to make sure that we can build as accurate an index of the content within the pages as we can.
Once we have built a text index for each page, this information is fed into our search system, allowing Social History Archive users to search billions of words across millions of pages within seconds. We then break the images for each page down into a series of smaller images in order that we can provide a very fast "zooming" function within the website, allowing users to zoom in and out of pages within seconds. And finally, after a last round of quality checks, we publish new pages online.
We index for a wide range of metadata, but it doesn't matter if our indices are wide-ranging if the search tools that interrogate them aren't equally sophisticated. Our search has been built over years and works seamlessly, allowing researchers to move with little friction from a mass of records to very specific outputs, filtering and amending results for a highly efficient learning process.
Accessing the British Library's newspaper collection
Alongside the comprehensive British Newspaper Archive, the British Library newspaper collection is also available to family historians who subscribe to findmypast.co.uk, our genealogy and family history platform. Now, for the first time, much of the British Library collection is also available via the Social History Archive to academics and researchers who work in higher education institutions around the world.
It's still possible to visit the reading rooms at the British Library in St Pancreas or Boston Spa to view the newspaper archive, but most access is now enabled digitally (rather than the physical object being retrieved). With around 80% of physical material collated for digitisation being classed as “unfit for use” unless handled by qualified experts, digitisation thus fully supports the British Library's imperative to preserve these artefacts in their original form whilst making the rich social history they contain accessible to all.
Conclusion
At the time of writing, the Social History Archive includes almost 80% of the newspaper content of the British Newspaper Archive, and we anticipate this share growing to more than 95% by the end of 2024. Newspapers are also available from the Social History Archive via subscription to Collections, smaller subsets of newspaper titles grouped by region.
Our partnership with the British Library has comprehensively met our joint objectives of preserving and making their archive more accessible. We continue to scan, digitise, preserve and publish documents, with new pages constantly being added to expand the resources available for research and teaching. The partnership has significantly improved access to historical documents and newspapers, making valuable research materials available for the first time to a worldwide audience.