Can bad data be good data? Reflections upon the Consumer Data Research

Image from https://www.cdrc.ac.uk/wp-content/uploads/2015/04/sustainability121714.jpg

by Ed Dargan*

The Consumer Data Research Council (CDRC) (established by the ESRC) held the CDRC Data Partner Forum on the 6th May at the Saïd Business School, University of Oxford. The key aim of the CDRC is to help organisations maximise the potential of innovation by opening up their data to trusted researchers so that they can provide solutions that drive economic growth and improve our society. During the day, the presentations were based around three themes of missing data, data sources and research design.

For the retail demand modellers, the inclusion of seasonal demand, especially for seaside locations, being able to account for natural barriers and include travel times based upon real journey times were seen as important. It was useful to see how different data values were being clustered to form classifications, as this is something that needs to be done with the footfall data available to IPM, in the big data project we are just about to start with Springboard.

The importance of data representation was an important theme. Missing data, both spatially and temporally was identified as a challenge and a number of techniques were identified to ‘fill-in’ missing data. A recurring theme was the problem of using time constrained census data when analysing concurrent data that is updated more frequently. Also identified was the accuracy problem of end-user supplied outcome codes, in this case failed delivery reasons.

With any spatial and temporal data, there is the challenge of providing a digestible visual display. With so much data available, this was acknowledged as a challenge that most of the presenters using geographical mappings faced.

As a data source, supermarket loyalty cards were discussed. Interestingly, it was found that loyalty card usage was least likely to occur for small and frequent purchases, no matter what type of store was visited or the socio-demographic classification of the customer. The map of users of a store showed a more dispersed geographical spread around the UK than expected. This highlighted the problem of customers failing to update their home address details when moving home and the subsequent difficulties in interpreting loyalty card spatial data.

However, when problems in the data were identified, this fed into the recurring observation that so called bad data, that is data identified statistically to be problematic, should not always be removed or cleansed using missing data techniques. Alternatively, this so called bad data could be the most interesting data of all for a researcher and/or commercial organisation. For example, people who don’t update their loyalty card details could lead to some very useful insights into such customers. Perhaps they are a very profitable segment?

Useful resources identified during the presentations included: http://maps.cdrc.ac.ukwhich includes views of geodemographic, retail and general metrics for the larger towns and cities. Various views are provided, one that seemed a useful barometer of high street health was the retail view which for some towns (presumably only a few have the data available) provides changes to retailer types and vacancy rates over a set period of time.

Overall, it was a very good day. The presentations were very interesting and there was also the opportunity to meet and mix with other academics and business representatives.

Below is a list of the sessions and presentations:

Session 1: Missing Data and Missing People

• Thomas Waddington: Modelling the temporal variation in supermarket revenue estimates

• Eusebio Odiari – Infilling missing values in consumer Big Data

• Michail Pavlis – The geography of non-delivery

• Emily Sheard – Enumerating the ambient population in the context of crime

• Guy Lansley, Chrysanthi Kollia – The spatio-temporal geodemographics of youth

Session 2: Novel Data Sources and their Geographic Integration

• Nik Lomax and Martin Clarke – Home owner mobility: assessing distance and geodemographic consistency using consumer data

• Hai Nguyen, Oliver O’Brien – naming conventions and ethnicity

• Guy Lansley, Wen Li – Areas and activities: integrating consumer registers

• Alyson Lloyd, James Cheshire, Roberto Murcio – How representative are high street retailer data?

• Anastasia Ushakova – Temporal patterns of energy consumption and vulnerable consumers

• Tim Rains – Data linkage of store loyalty cards

Session 3: Big Data and Research Design

• Alex Singleton, Bala Soundararaj: Dynamic high streets – SmartStreetSensor

• Mark Birkin – Spatial microsimulation, big data and policy analysis: an example from the UK travel market consumer data

• Phani Chintakayala – Do green attitudes and demographics drive sustainable product consumption?

*Ed Dargan is a PhD Student at the Institute of Place Management, Manchester Metropolitan University.

This article was first published on Prof Cathy Parker’s blog

Institute of Place Management (IPM) Blog

Supporting people who develop, manage and make places better

Can bad data be good data? Reflections upon the Consumer Data Research Council Partner Forum

Related

Leave a Reply Cancel reply

Menu

Institute of Place Management (IPM) Blog

Supporting people who develop, manage and make places better

Share this:

Related

Leave a Reply Cancel reply

Menu