Why Secondary Data Should Come First

The argument put forward in this post has been brewing in my mind – and being put into practice in my research work – since some time before COVID19 appeared in our midst. The pandemic has accentuated the point I want to make.

Essentially, my argument is this: researchers should make as much use of secondary data as possible before we even think about gathering any primary data.

Most novice researchers are taught that new research requires primary data; that original research requires data gathered for the purpose by the researcher or the research team. Most research ethics committees focus most of their efforts on protecting participants. We need to change this. I believe we should be teaching novice researchers that new/original research requires existing data to be used in new ways, and primary data should be gathered only if absolutely necessary. I would like to see research ethics committees not only asking what researchers are doing to ensure the safety and wellbeing of participants, but also requiring a statement of the work that has been done using secondary data to try to answer the research question(s), and a clear rationale for the need to go and bother people for more information.

I believe working in this way would benefit researchers, participants, and research itself. For researchers, gathering primary data can be lots of fun and is also fraught with difficulty. Carefully planned recruitment methods may not work; response rates can be low; interviewees often say what they want to say rather than answering researchers’ questions directly. For participants, research fatigue is real. Research itself would receive more respect if we made better and fuller use of data, and shouted about that, rather than gathering data we never use (or worse, reclassifying stolen sacred artefacts and human remains as ‘data’ and refusing to return them to their communities of origin because of their ‘scientific importance’ – but that’s another story).

Some people think of secondary data as quantitative: government statistics, health prevalence data, census findings, and so on. But there is lots of qualitative secondary data too, such as historical data, archival data, and web pages current and past. Mainstream and social media provide huge quantities of secondary data (though with social media there are a number of important ethical considerations which are beyond the scope of this post).

Of course secondary data isn’t a panacea. There is so much data available these days that it can be hard to find what you need, particularly as it will have been gathered by people with different priorities from yours. Also, it’s frustrating when you find what you need but you can’t access it because it’s behind a paywall or it has an obstructive gatekeeper. Comparison can be difficult when different researchers, organisations, and countries gather similar information in different ways. It can be hard to understand, or detect any mistakes in, data you didn’t gather yourself, particularly if it is in large, complicated datasets. Information about how or why data was gathered or analysed is not always available, which can leave you unsure of the quality of that data.

On the plus side, the internet allows quick, easy, free access to innumerable quantitative and qualitative datasets, containing humongous amounts of data. Much of this has been collected and presented by professional research teams with considerable expertise. There is scope for historical, longitudinal, and cross-cultural perspectives, way beyond anything you could possibly achieve through primary data gathering. Working with secondary data can save researchers a great deal of time at the data gathering stage, which means more time available for analysis and reporting. And, ethically, using secondary data reduces the burden on potential participants, and re-use of data honours the contribution of previous participants.

There are lots of resources available on using quantitative secondary data. I’m also happy to report that there is now an excellent resource on using qualitative secondary data: Qualitative Secondary Analysis, a recent collection of really good chapters by forward-thinking researchers edited by Kahryn Hughes and Anna Tarrant. The book includes some innovative methods, interesting theoretical approaches, and lots of guidance on the ethics of working with secondary data.

Some people think that working with secondary data has no ethical implications. This is so wrong it couldn’t be wronger. For a start, it is essential to ensure that informed consent for re-use has been obtained. If it hasn’t, either obtain such consent or don’t use the data. Then there are debates about how ethical it is to do research using secondary data about groups of people, or communities, without the involvement of representatives from those groups or communities. Also, working with secondary data can be stressful and upsetting for researchers – imagine if you were working with historical data about the Holocaust, or (as Kylie Smith does) archival data about racism in psychiatric practice in mid-20th century America. Reading about distressing topics day after day can be harmful to our emotional and mental health, and so to our physical health as well.

These are just a few of the ethical issues we need to consider in working with secondary data. Again, it is beyond the scope of this post to cover them all. So working with secondary data isn’t an easy option; although it is different from working with primary data, it can be just as complex. I believe novice researchers should learn how to find and use secondary data, in ethical ways, before they learn anything about primary data gathering and analysis.

This blog, and the monthly #CRMethodsChat on Twitter, is funded by my beloved Patrons. It takes me at least one working day per month to post here each week and run the Twitterchat. At the time of writing I’m receiving funding from Patrons of $70 per month. If you think a day of my time is worth more than $70 – you can help! Ongoing support would be fantastic but you can also make a one-time donation through the PayPal button on this blog if that works better for you. Support from Patrons and donors also enables me to keep this blog ad-free. If you are not able to support me financially, please consider reviewing any of my books you have read – even a single-line review on Amazon or Goodreads is a huge help – or sharing a link to my work on social media. Thank you!

5 thoughts on “Why Secondary Data Should Come First

  1. Pingback: Using online texts as sources of data in research – Ana Canhoto

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.