Finally, what is such concept as citizen data science capable of when it comes to disrupting others? Let us investigate by means of looking at specific sectors and extrapolating what we have learned so far. I will look here at three (because three is a magical number) diverse sectors to give a taste of the broadness of reach of citizen data science.
Quick note on disruptive innovation – this article on Harvard Business Review very well outlines what is and is not a disruptive innovation. Where I refer to disruptive innovation in the next paragraphs, it is this article that was my source of knowledge.
Data Analysis Consultancy Service
Doing stuff in-house is usually much cheaper than having the same workload outsourced to an external team of analysts. As the situations stands right now, a business which does not have a dedicated data scientist or a whole team, but who still wishes to inform its decisions on data-driven insights would be forced to pay for such service to a dedicated company. They would then come, collect the data (if you did not have them), do the analysis and give you the report. This would take a few weeks and afterwards you would have to digest the report and decide on the best course of action.
With citizen data science, in a good case, imagine that the process would be much simpler. With the framework implemented, you would already be collecting the data. The analysts would be your own employees (ideally all of them), with a great grasp of the business and the market you are in. The insights would not come at the end of a long investigation, but would be rather flowing in, naturally, as the employees would discover them, so you could discuss them as they come and make incremental implementations which would also become a second-nature of your business, after a few iterations. All this, at almost zero cost, bit by bit improving your internal processes of conducting your business.
But would it truly disrupt the Data Analysis Consultancy Service (more broadly, any consultancy offering such service)? Well, established consultancies might be overproviding at the moment for the lower segment of the market, i.e. SMEs, which would possibly represent a potential point of entrance for citizen data science. The problem is, however, that citizen data science is not only a single technology, but represents/requires broader procedural changes within companies. As long as that is taken care of, I would argue it represent a disruptive innovation, entering at the lower market segment with the potential to move up the ladder to reach the higher-margin customers (big projects at big companies).
Statistical Software Packages
Again moving in the region of hypothetical, let’s try, for a while, imagine that we could substitute programs such as R, Python, SPSS or Stata. All of them provide robust tools for researchers to explore their datasets. Their selling point is their flexibility, with R and Python being coding languages (i.e. virtually limitless possibilities of meddling around with data), and SPSS and Stata having features of coding languages but being more readily geared towards statistical tools. In effect, Advanced Analytics is trying to do just that – meddle around with data in as many ways as possible. Some would perhaps say that the focus on business does give the data a dimension which is more tangible than a usual scientific dataset, but is it?
If we take IBM Watson, for example, its workings are well explained in this video below:
In summary, Watson’s biggest strength is its natural language processing which means that it understands the data in a way in which people normally produce them. For example, with data from Twitter, an easy way to look for a trend is to quantify the number of tweets which include a certain word. Remember the dispute over “this dress’ colour?” If we wanted to know which side was most prevalent, we could look for hashtags #whiteandgold and #blackandblue and count occurrences of each one of them. However, what if the tweet was more like:
“tl;dr dress is NOT #whiteandgold”
Okay, with some thinking, I believe it would be feasible to account those negative mentions of colour, but what would we do with:
“srsly #whiteandgold?” or “wtf #blackandblue no way, it’s obvs #whiteandgold”
Now that’s a difficult one to crack, and beyond technical possibility, it would not be feasible to include all such cases into our search. As a result, then, our data would include false positives / false negatives.
Watson would not have this problem though. Because it can actually understand the tweets, it would know that “srsly” has a negative connotation and that for “wtf #…. no way, it’s obvs #…” it is the second hashtag that should count.
In research, cleaning the data is the first step and takes a lot of time, as I had the chance to experience in my Quantitative Methods module last term. To have the programme remove the need to meticulously clean the data is an incredible time-saver which I am sure would be welcomed by all.
Watson also suggests trends in the data, which is the second step after the data has been cleaned. Again, to have these suggestions available immediately is a time saver, and it might also spot trends which would not be spotted otherwise.
Obviously, the freedom to play with the data in any way is fundamental for a researcher. If the first generation of Citizen Data Science tools don’t implement this, I am sure that by the second generation, someone would have already spotted the opportunity to modify those tools to include as much freedom as R/Python, and the technology would find its way into the hands of researchers. My prediction is that powerful yet simple analytical tools will certainly not be left to gather dust only, and citizen science data represents this the best.
Similarly to above, could we describe citizen data science as a disruptive technology in relation to statistical packages? Possibly. With the complexity of the current solutions (R, Python, Stata, SPSS), there might be a segment of researchers looking for something easier to use, yet with same analytical capabilities. Accounting for the fact that there would not be a need for procedural change (in contrast to a company), it is very possible that researchers would start adopting these tools. And as I mentioned above, with second/third generation of the tools including bigger freedom specifically for researchers, the technology would have the potential to move into mainstream and disrupt the traditional solutions.
Competitive Advantage/Internal Processes
Although not an industry or a company, Keeley’s Ten Types of Innovation states that one such innovation is a “process innovation” which can be translated as process/predictive analytics standardisation. As citizen data science would kind of influence exactly this standardisation, I thought it might be well worth it include the internal process (translated into competitive advantage) as a “disrupted place”.
Since we have agreed at the beginning that leveraging data in one’s business is quickly becoming an important competitive advantage, Citizen Data Science could help facilitate obtaining such advantage.
In contrast to companies reliant on centralised IT teams, many more people would suddenly be able to develop insights. In an ideal world, this would be as if you grew your IT team multi-fold, but at very little cost. In addition, the insights, developed by the business people, would possibly come with an already structured business case. Last, the implementation of such insights would be much easier, as the very people to execute them would be the ones who have thought of them – they would know how to do it and would feel ownership of the idea as well.
Because this is not a company, I don’t think I can talk about a true disruptive innovation here. However, if we took two companies, of similar size and value proposition, the one which would adopt citizen data science would certainly have a competitive advantage over the another one, and could disrupt its business by way of being much more efficient.