The Library's reading rooms are open. Before you visit, please read Visiting the Library.
The Library has launched Amplify -
an innovative platform that for the first time delivers our digitised audio collections online.
Each sound file is paired with a computer-generated transcript – these transcripts can contain errors or inconsistencies and Amplify allows users to correct any mistakes they find as they listen along.
Launching Amplify to the public has been months in the making and the process has been a wonderful learning experience . This has been a highly collaborative project, as well as a fairly experimental one and it has opened up some fascinating and exciting opportunities for further progress and innovation in this space.
Over the last few years, the State Library has dabbled in a few different crowdsourcing projects, our most notable being the development of our Transcript Tool – built for the transcription of handwritten manuscript collections.
Armed with the successes and lessons from this project, we set out to revisit and rejuvenate our organisational approach to crowdsourced activities. Our aim was to review what we had already done, figure out what worked and what didn’t and come up with a plan to build a sustainable and scalable model moving forward.
We established a handful of overarching project principles to guide every decision that we made throughout our work. We were aiming to find a platform that we could launch as a pilot – a new way of thinking about digital volunteering at the Library.
At every stage, we asked ourselves if choices we were making aligned to one or all of the following criterion:
- Equitable access
- Shared cultural heritage
- Open source
Despite the amazing work of the Library’s Digital Excellence Program over the last few years, digitising an item does not immediately make it accessible. Video files, for example, require a time-coded transcript as well as a descriptive narration file in order to be accessible under the guidelines set out in WCAG.
The Library is frequently faced with the challenge of finding the resources to make all our digitised collections available while meeting accessibility requirements. This is where crowdsourcing platforms come in. Giving access to our collections through digital platforms where members of the public can then assist us in the task of making them better – by filling out the blanks in metadata or image tags, for example – is an invaluable way to enrich our collections and make them more accessible. It also gives people the opportunity to engage with and explore our collections in ways they haven’t been able to before.
We wanted to capitalise on this expertise if and where we could. Rather than building we wanted to borrow. We were interested in finding an open-source platform that we could reuse and customise if required. As this was a pilot project, it was important to find ways to limit our outlay on development and administration.
It became apparent very quickly that we were keen to work with an audio collection – delivering a vast audio archive online was new territory for the Library – the requirements related to web accessibility frequently slowed progress in this space, but we were ready to take the challenge head on!
Serendipitously, right as we decided to tackle an audio project our colleagues and mentors at the New York Public Library Labs released a significant update to their own Transcript Editor. This release was very close to what we were looking for in terms of functionality, and best of all offered a tangible answer to our ongoing challenge: how can we release our extensive audio content online in a way that meets accessibility requirements?
Thankfully, the NYPL Labs has always been committed to contributing to the global open-source community and we became one of the immediate benefactors of this dedication by being able to freely access the codebase developed for the Transcript Editor and then repurpose it for our own use.
We reached out to the NYPL Labs team who graciously answered our many questions, as well as provided feedback throughout the development process.
The Transcript Editor was built in Ruby on Rails which is a framework the State Library had never worked with. The decision was made to outsource the configuration and required customisations so as to not impede on the existing workload for internal development teams, and also in an effort to expedite the project. We engaged Sydney-based Ruby development agency, Reinteractive. Development, customisation and testing were completed in just four weeks.
We also worked with VoiceBase, who provide us with the computer-generated transcripts. In researching this project we tested both human and machine generated transcripts. While it is impossible to beat the quality of human transcription services, machine services have the benefit of being more cost effective.
The trade-off for using an affordable machine transcription service is that the accuracy levels are variable. The quality of transcripts of course directly correlates to the quality of the audio recording that you are inputting. Luckily for us, with this collection, the results were transcripts that in their unedited state were still useable and meaningful, but that could do with a little bit of polish.
While Amplify is very much a 'version' of the NYPL Transcript Editor, there are a few notable differences in our implementation. Customisations allowed us to more acutely meet our business requirements, including:
- a ‘play all’ button – giving users the option to listen to a full audio file rather than line by line;
- the inclusion of full-width feature images on individual pages – to showcase some of the wonderful images from associated archives;
- adding a batch importer for transcript files – in order to override the built-in API integration that NYPL had designed, and
- social sharing – so that users can share individual audio files to their personal social networks.
With other crowdsourcing platforms, we found that administrative and content review processes were often laborious. Even though corrections and contributions are made by volunteers, a lot of staff time is required to review samples of the transcripts and “accept” them as part of a Library record – effectively double-handling the work.
Amplify’s review process works via a sophisticated consensus algorithm where the status of each line of text directly correlates to how many people have worked on it. When visiting Amplify for the first time, if a transcript has not been completed, you will see the original, unedited transcript file. As soon as three individual users have transcribed a line in the exact same way, that line then becomes ‘complete’ and is locked for further editing by any other users. If the three people who had transcribed a particular line have discrepancies in their suggested edits, the line will remain open for further changes until a consensus is reached. This process takes the pressure off Library staff to review every line of text before it can be considered complete – and completed lines can also still be flagged for review should a user believe they contain errors.
Another favourite feature in Amplify is that a user is not required to create an account to start transcribing. This approach is great because it allows a more spontaneous experience. Someone might only be interested in listening to a piece of audio rather than transcribing it at first, but if they notice an obvious error in the transcript as they listen along , it is as easy as a few strokes of the keyboard to fix it. Transcribing requires no sign-up and verification first which may hinder some users.
The benefit of creating an account, of course, is to be able to track any changes you have made on transcriptions across on the site. This option will definitely speak to users who intend to return multiple times to use Amplify as a research tool or even those who are just committed to transcribing a whole audio file, but for those that are taking a more cursory look, they (and we) can still reap the benefits.
Amplify was launched with a small collection of 75 hours of audio from our extensive Rainbow Archive. The audio is a colourful collection of interviews with residents of Nimbin, discussing the impact of the 1973 Aquarius Festival and the counter-culture and alternative lifestyles movement that transformed the Northern Rivers region. Our intention is to eventually add our full sound archive to Amplify so that all of our audio collections can be accessed in the same way.
This project has initiated many thoughtful and interesting discussions about the potential for doing more with our audiovisual material. Amplify has created refreshed enthusiasm and momentum for tackling the web accessibility issues we have so long faced in both creative and collaborative ways. Beyond using Amplify for archival collections, we are already exploring the possibility of using the platform to transcribe and give access to contemporary materials – such as recordings of the many events and talks held at the Library.
Amplify is just the first stage of a larger program of work – the one we set out to explore in the very beginning. A program that expands our experimentation and experience with crowdsourcing platforms, but also one that fosters opportunity and purpose for a broad community of digital volunteers. A program that commits to connecting with a multitude of users, who may have a differing subject or task-based interests, but who have one key objective in common – to engage deeply and meaningfully with New South Wales’ cultural heritage.
Written by Jenna Bain
Digital Projects Leader, State Library of NSW