Public holiday: the Library will be open on 3 October. View opening hours

Q&A Dr Stephen Wan

Research scientist Dr Stephen Wan, of CSIRO’s Data61,
worked with the State Library on its innovative social media archive.

Dr Stephen Wan, of CSIRO’s Data61

What’s your role at the CSIRO?

I’m very lucky to lead the Language and Social Computing team at CSIRO’s Data61, made up of talented researchers and engineers in the fields of natural language processing, information retrieval and social media analytics. We tackle a variety of research topics, connected by the idea that large text collections — such as public social media — might be a treasure trove of insights that can help us better understand society, help with business decisions, or help answer research questions in fields like health. As with the Library’s social media archive, the result is often a new prototype that we design and build for the wider community.

What has been the greatest challenge of creating a social media archive?

Of the many challenges was to design a tool that would help the Library’s staff manage how data was collected and curated. Often collecting data with a keyword results in junk data, because the word may have many different meanings. This is one of the fascinating aspects of language that makes it such a compelling topic of research. Our archival tool analyses the diversity of topics collected around a keyword to help the Library collect relevant posts.

Why is it important for the Library?

The Library has a significant role in collecting information about what life is like in New South Wales, through materials such as newspapers, books and photographs. Public social media is an extension of this, as it captures some of our social discourse about life in the state.

Have you worked on similar projects?

At CSIRO’s Data61 we have worked on a number of social media analytics projects like this. Another rewarding project was with the Black Dog Institute, looking at the role of social media data in furthering mental health research, particularly on the topics of suicide and depression. We have also looked at using social media to provide information to help manage natural disasters such as fires and earthquakes.

What interests led to your current career?

I’ve always been interested in language and thought, and so I initially studied psychology and linguistics, followed by postgraduate studies in computational linguistics and natural language processing. Merging linguistics with computer science seemed like the perfect blend of interests. As a researcher at CSIRO’s Data61, I’m attracted to the opportunity to convert intuitions about language into software. We learn something about language, and end up with useful tools that help us manage and make sense of large archives.


social media archive 

This article first appeared in SL magazine autumn 2018.