Working with the British Library

Findmypast have been working with the British Library since 2010 to collate, curate and digitise the vast collection of historical newspapers and records held by the institution.

The digitised archive is published in the high-profile British Newspaper Archive, a joint venture between the partners, and at findmypast.co.uk, a long-established family history and genealogy platform.

For the first time, with the launch of The Social History Archive, most of these titles are now available for institutional subscription to academic researchers and lecturers. At the time of writing, the Social History Archive contains over 80% of the content available on the BNA and this will rise to 95% by year end.

To celebrate this milestone, we wanted to share some highlights of our journey with the British Library.

Black and white illustration of 10 people in a large glass house wearing Victorian-era clothes reading newspapers at lecterns or desks.
People reading newspapers in the Crystal Palace Reading Room, London, from The Lady's Newspaper, 24 March 1855

Digitising the archive

The British Library has a mandate to preserve and provide easy access to its archive of historical newspapers and records. One of our core values is to honour and preserve history, so we were delighted when, in 2010, the library selected us as the preferred partner for the digitisation of its newspaper collection.

Our initial 10-year partnership began with a commitment to digitise and preserve 40 million historic newspaper pages stored entirely in hard copy and microfilm at the British Library's newspaper archive in Colindale, London. Soon after our partnership began the British Library established a new centre for newspapers at their home in Boston Spa entirely dedicated to the archive. We relocated our dedicated team and scanning units and have been working together in Yorkshire since.

13 historic newspaper mastheads and pages combined in a single image.
A small selection of the newspaper titles digitised and published for the British Library

With the significant technology and manpower needed for the scanning and publishing operation, the team at Boston Spa was supported from the beginning by our operations centre in Scotland who provide technology infrastructure, data engineering, quality control and customer support.

The project, which was renewed in 2020, has been a great success. As of mid-2024, we have digitised more than 80 million pages from the archive. We produce up to 350,000 images per week, creating the largest digital collection of British and Irish newspapers in the world. The collection includes national and regional publications, many of which are no longer in print, dating back to 1699.

The front page of a newspaper shaded a light gold colour with the words the sun appearing at the top of the page alongside a heraldic crest, and a large image of queen victoria in profile at the centre of the page surrounded by printed text.
Special 'golden' issue of The Sun, 28 June 1838, marking the coronation of Queen Victoria

In addition to our efforts in newspaper digitisation, it's important to highlight that we've also safeguarded a wealth of historical records. This includes baptisms, marriages, and burials from the British Library's extensive East India Company and India Office archives, which document the British presence in India.

What do we digitise?

Because our scanning team is based in the British Library's own storage facilities, we have extensive access to the British Library's National Newspaper Building. One benefit of being able to access the original bound volumes or 'books' of newspapers and periodicals is that we have been able to scan some of the rarest and most fragile newspapers in the collection. We have even scanned single pages more than two feet wide! As well as scanning from the original paper volumes, we can also access the library's extensive microfilm collection, allowing us to increase our page throughput.

7 similar volumes from a single set of books on a shelf, bound in brown leather with gold text and each with a canvas loop attached to allow each book to be removed from the shelf.
British Library newspaper books

Investment in processes and equipment = high quality images

We've made significant investments in scanning equipment over the years including A0 colour scanners, A1 book scanners and high-speed microfilm scanners that allow us to scan pages up to a square metre in size. The quality of our scanning equipment means that whether we are working from original paper or microfilm, our digital images are incredibly high quality. We've also spent a lot of time in the past few years making our scanning process as efficient as possible, with frequent quality checkpoints throughout the whole process. This not only makes us faster, but it also means that we don't have to go back and rescan pages after the fact.

A man standing between two sets of shelves each containing numerous volumes of newspapers bound into large books, the man is holding and examining one of the volumes.a man leaning over a scanner on a waist height desk holding a document in place before closing the scanner cover, a substantial number of large books are visible on shelves in the background
Selecting and scanning newspapers at the Boston Spa centre

Creating a searchable index of millions of newspaper pages

Once we have scanned the newspaper pages in Boston Spa in Yorkshire, the images are moved to our operations centre up the road in Dundee, Scotland.

First, individual digitised pages are re-assembled into whole newspapers, and then into entire year runs of newspapers, with the information about where and when titles were published being added to the searchable data.

Next, we use Optical Character Recognition (OCR) technology to "read" the pages and create a text version of what is on the page. This technology is developing all the time, and although it is some way from perfect, we are constantly refining tools and processes to make sure that we can build as accurate an index of the content within the pages as we can.

Once we have built a text index for each page, this information is fed into our search system, allowing Social History Archive users to search billions of words across millions of pages within seconds. We then break the images for each page down into a series of smaller images in order that we can provide a very fast "zooming" function within the website, allowing users to zoom in and out of pages within seconds. And finally, after a last round of quality checks, we publish new pages online.

We index for a wide range of metadata, but it doesn't matter if our indices are wide-ranging if the search tools that interrogate them aren't equally sophisticated. Our search has been built over years and works seamlessly, allowing researchers to move with little friction from a mass of records to very specific outputs, filtering and amending results for a highly efficient learning process.

Accessing the British Library's newspaper collection

Alongside the comprehensive British Newspaper Archive, the British Library newspaper collection is also available to family historians who subscribe to findmypast.co.uk, our genealogy and family history platform. Now, for the first time, much of the British Library collection is also available via the Social History Archive to academics and researchers who work in higher education institutions around the world.

A website page from the social history archive titled search for stories in our extensive newspaper collections, with a simple search form and search button.

It's still possible to visit the reading rooms at the British Library in St Pancreas or Boston Spa to view the newspaper archive, but most access is now enabled digitally (rather than the physical object being retrieved). With around 80% of physical material collated for digitisation being classed as “unfit for use” unless handled by qualified experts, digitisation thus fully supports the British Library's imperative to preserve these artefacts in their original form whilst making the rich social history they contain accessible to all.

Conclusion

At the time of writing, the Social History Archive includes almost 80% of the newspaper content of the British Newspaper Archive, and we anticipate this share growing to more than 95% by the end of 2024. Newspapers are also available from the Social History Archive via subscription to Collections, smaller subsets of newspaper titles grouped by region.

Our partnership with the British Library has comprehensively met our joint objectives of preserving and making their archive more accessible. We continue to scan, digitise, preserve and publish documents, with new pages constantly being added to expand the resources available for research and teaching. The partnership has significantly improved access to historical documents and newspapers, making valuable research materials available for the first time to a worldwide audience.