What Is Data?

Last week, in the context of some work I’m doing for a client, I was trying to find something someone had written in answer to the question: what is data? I looked around online, and in my library of methods books, and I couldn’t find anything except some definitions.

The definitions included:

  • Factual information used as a basis for reasoning or calculation (Merriam-Webster)
  • Information, especially facts or numbers, collected to be used to help with making decisions (Cambridge English Dictionary)
  • Individual facts, statistics, or items of information, often numeric (Wikipedia)

Data is also, demonstrably, a word, and a character in Star Trek. So far, so inconclusive. Yet people talk and write about data all the time: in the media, in books and journals, in conversations and meetings. And they use it to refer to many other things than facts or numbers. Data may be anything from a piece of human tissue to the movement of the stars.

Euro-Western researchers conventionally speak and write of ‘collecting’ data. And indeed some data can be collected. If you want to research beach littering, you can go and collect all the litter from one or more beaches, and then use that litter as data for analysis. If you want to know what differences there may be in how print media describes people of different genders, you can collect relevant extracts from a bunch of articles and then use those extracts as data for analysis. So this is valid in some cases. However, if you plan to research lived experience by collecting data, you are effectively viewing people as repositories of data which can be transferred to researchers on request, and viewing researchers as people who possess no data themselves so need to take it from others. Clearly neither of these positions are accurate.

Some Euro-Western researchers speak and write of ‘constructing’ data. This refers to the generation of data as a creative act, such as through keeping a diary for a specified length of time, taking photographs during a walking interview, or making a collective collage in a focus group. Even conventional interview or focus group data can be viewed as being constructed by researcher and participant(s) together.

Autoethnographers and embodiment researchers privilege data from their own lived experience, though often they also use data collected from, or constructed with, others. But for these researchers, their own sensory experiences, thoughts, emotions, memories and desires are all potential data.

For Indigenous researchers, all of these and more can be used as data, which is often co-constructed with the researcher and all participants working together in a group. This is done in whatever way is appropriate for the researcher’s and participants’ culture. Māori research data is co-constructed through reflective self-aware seminars. In the Mmogo method from southern Africa, objects with symbolic and socially constructed meanings are co-constructed from familiar cultural items such as clay, grass stalks, cloth and colourful buttons, during the research process, to serve as data (Chilisa 2020: 223-4,243). Indigenous researchers in America, Canada and Australia use oral history, stories and artworks as data (Lambert 2014:29-35).

All of this tells us that data is not purely facts and numbers, as the definitions would have us believe. Conversely, we could conclude from the examples above that pretty much anything can be data. This does not mean anything can be data for any research project. You’re not likely to find a cure for disease by collecting bus timetables, or identify the best way to plan a new town by making inukshuk. But bus timetables could be very useful for research into public transport systems, and making inukshuk could be integral to Indigenous research into the knowledge and belief systems of Arctic peoples.

Data can be documents or tattoos, poems or maps, artefacts or photographs – the list is very, very long. And of course a research project may use different kinds of data, which could be collected, or constructed, or some of each. The question we need to ask ourselves, at the start of any research project, is: what kind(s) of data are most likely to help us answer our research question, within its unique context including any constraints of budget and/or timescale? In the end, for some projects, the answer will be facts, or numbers, or both. But if we assume this from the start, we close off all sorts of potentially interesting and useful options.

This blog, the monthly #CRMethodsChat on Twitter, and the videos on my YouTube channel, are funded by my beloved Patrons. Patrons receive exclusive content and various rewards, depending on their level of support, such as access to my special private Patreon-only blog posts, bi-monthly Q&A sessions on Zoom, free e-book downloads and signed copies of my books. Patrons can also suggest topics for my blogs and videos. If you want to support me by becoming a Patron click here. Whilst ongoing support would be fantastic you can make a one-time donation instead, through the PayPal button on this blog, if that works better for you. If you are not able to support me financially, please consider reviewing any of my books you have read – even a single-line review on Amazon or Goodreads is a huge help – or sharing a link to my work on social media. Thank you!