Quick Summary of Citizen Data Science blogs

Citizen Data Scientist – why not stay by Data Scientist only?

  • Data Scientist is no longer sufficient to keep up with increasing amount and variety of data
  • Citizen Data Science to empower all employees to develop their own insights
  • A business is not a homogeneous thing – data scientist has limited expertise of the business whereas the many employees do not

How will “the Citizen” change the world?

  • Data increasingly necessary for businesses to keep the edge on competition
  • Insights from data can significantly improve business; Tesco case study
  • Data Scientist incapable of having hyper localised knowledge that managers have
  • Citizen Data Scientist equipped with the right tools to develop good data insights informed by his localised business knowledge (i.e. highly relevant)

What are the barriers to adoption of Citizen Data Science?

  • Two main barriers – citizens and technology
  • Citizens – not knowledgeable, difficult to train
  • Technology – difficult to dumb down ease of use while retaining analytical prowess
  • Legacy Processes – mainly for big companies; legacy software, legacy decision making, legacy workforce

What types of companies will develop Citizen Data Science into commercial products?

  • Mainly two – analytics first and data first
  • Analytics first, then data – traditional approach, many companies, successful and going strong
  • Data first, then analytics – Salesforce leveraging its CRM platform

What competences will companies need in order to develop and commercialise Citizen Data Science?

  • Three main – Security, User Experience, Advanced Analytics
  • Security – because ever-increasing attempts at hacking; customers conscious of their personal data
  • User Experience – ease of use, enjoyable working experience
  • Advanced Analytics – necessary backbone, important to increase analytical power and ease of use at the same time

Where will the competition come from?

  • 5 leaders of Advanced Analytics
  • Competition to emerge from anywhere – open-source projects utilising community as a proof of that

What existing companies or industries might “the Citizen” disrupt?

  • Three examples – Data Analysis Consultancy Services, Statistical Software Packages, Internal Processes
  • Consultancy Services – expensive, takes long time, disconnected from everyday working of the company
  • Statistical Software Packages – might lose competition on Advanced Analytics insights to newer solutions
  • Internal Processes – clumsy, IT-centric, missing out on business expertise of other employees

Citizen Data Scientist – why not stay by Data Scientist only?

Science. Scientific. Scientist. These almost magic-like words carry, for most people, a heavy connotation of respect. Politicians use those terms to substantiate their claims, while media like to use them to gain credibility in their articles. It’s been roughly 150 years that the term scientist replaced the term ‘man of science’ – about the time that there was a shift in the paradigm of what constitutes a “scientist” – and citizen science might just be the push in the right direction.

In 2015, Gartner Hype Cycle – a reputable source for emerging technologies – included Citizen Data Science among those technologies currently rising to popularity. The reason was simple – recent years saw a massive emergence of data science, and many firms have now created positions such as Chief Data Scientist with which they aim to create unique insights for their business, from the data that their company collects through day-to-day operations. There is, however, a huge bottleneck here – our ability to process the data is not catching up to our ability to create and collect these data, not least because we would have insufficient technology. IBM Spark and Hadoop are well-established competitors in the field of Big Data, but because Big Data does not only mean a huge amount of data, but also a huge variety of data, we are still only coming to terms with how to process them. Individual companies need individual solutions, but even if we imagine that we do have that, we still need someone that would make sense of the data – i.e. derive the actual insights. The reason companies are employing their own data scientist is pragmatic – being within the firm positions these people best for understanding the business itself, its specificities and niches, which in turn enables the scientist to look at the data through a lens of business expertise.

Here comes the catch, however. A business is usually not a homogenous thing. Sure, in an ideal world, every single employee in the firm is aligned by the firm’s single value proposition. But even then, the way a marketing department sees the business is different to the way the engineers see it which is different to how the executives think about it. And with these different perspectives, you can expect as many types of data as well. A Chief Data Scientist, then, although knowledgeable about the data and the business, still has a limited way of operating. What’s the solution then? More Chief Data Scientists? Hardly, as these are extremely scarce and quite pricey. Better technology, possibly AI? Maybe, but we are still nowhere near substituting actual human professionals with AI.  Take business professional and teach them how to work with data? Yes, partially. While it is clear that additional data education can never substitute a proper one, it is surprisingly easy to acquire useful data-handling skills. And this is exactly where the citizen data science is heading.

Actually, a bit further, to be honest. What Gartner claims is that it is not the data-handling education which will drive citizen data science – it should be improved technology which will be user-friendly, require minimal training and provide enough power to enable even non-data-scientists drive valuable insight from the data that they have. Still further, having these people recognise the value of data, implementing these insights into business practices should also become easier, helping to drive everything, from engineering to sales. That is the “citizen data science”, and that is what we will look at in the next few posts.

How will “the Citizen” change the world?

Needless to say, we live in the age of data. It’s not because, miraculously, we have somehow started to create more data. No, we still walk, attend lecturers, and go to grocery shops. The real reason is our improved ability to collect the data. Daily, heck, hourly, we use our smartphones, leaving behind a track of data via usage of countless of apps – Gmail, Youtube, Spotify, Facebook, WhatsApp, etc. According to Bernard Marr, 83% of professionals say data is making existing services and products more profitable, and 60% feel that data is generating revenue within their organisation. Talk about mainstream adoption potential. The data is, simply, making the world spin right now and although we probably won’t see a single-event breakthrough in how we use data, it will gradually pervade all aspects of life and business and establish itself as a normal thing.

It’s clear then that the potential of changing the world is there (and I would even argue that the change is already bound to happen). To explain how this will change the world, we should start with already established position of a Chief Data Officer (i.e. data scientist), a hot new job, which companies are incorporating into their structures.

According to IBM, data scientist is someone who “is inquisitive, can stare at the data and spot trends”. A difference to a traditional statistician is a strong business acumen and the ability to deal with “multiple disparate sources [of data]” (there are other, more fun definitions that you can read at The Guardian. Even until now everybody understood that data is important, but only hiring and nurturing a data-biz professional puts data from background to foreground, as something that needs to be seriously considered a part of your business. This means understanding that if you are a grocery store, your business is no longer only fast-moving consumer goods, but also data on your customers. Your business’ value proposition stays the same, but the process of how you achieve it can significantly change (improve) with the usage of data.

Here are numerous case studies of how companies leverage (big) data to their advantage. One for all – Tesco collecting 70 million data points related to refrigerators and using them to monitor performance, estimate when machines need to be serviced and do a more proactive maintenance cutting down on energy costs. As I said above, the usage of data did not change that Tesco is primarily a grocery store, but it helped reduce costs and, I assume, improved quality of refrigerated goods because of better machine management. The idea of a data scientist is creating more of these insights, in-house, and cheaper, because she knows the business and the data.

Now imagine a step further – similar insights would not need to rely on a single data scientist or a single team of data scientists. Everyone (well, not the grocers, but manager-level positions) in the business would have the ability to look at the data and develop some insights within their positions. As I mentioned in the introduction, business is not a homogenous thing. What you do and how you do it changes from engineering to marketing department, or from a store in Kensington to a store in Ealing (London’s poverty profile). Obviously, there are data and insights which are company-wide, but many of the useful insights would be hyper-localised and a single team without that hyper-localised knowledge (that a manager working there for a few years has) would find it impossible to access this knowledge.

In my eyes, then, citizen data science is about accessing an additional level of data. Official definition by Gartner (see Note 2) talks about a person who creates models that leverage prescriptive predictive analytics. The person who is able to perform simply and moderately sophisticated analytic tasks. At the moment, we are starting to access big data and leverage it to great benefits, to both businesses and customers. Other technologies on the Gartner Hype Cycle, like advanced analytics, will provide yet another way of accessing knowledge/data, informing on tools for citizen data scientists as well. In line with this movement, I don’t think that citizen data science (and I am repeating myself now) will be a major breakthrough, but it will enable us to drive businesses to ever-increasing productivity, enabling better products and better services.

Thus how will citizen data science really change the world? Subtly, but once gained, the newly-found freedom of leveraging complex data by ourselves, without rigorous training in statistics or data skills, will not be easy to give up. And when the Internet of Things hits in its full force, and we will have big data in our homes (and basically everywhere) as well, we might even see the rise of personal data science; a natural, almost mundane, extension by then.

What are the barriers to adoption of Citizen Data Science?

There are two main aspects of citizen data science – the “citizens” and the data technology. Both are needed to unlock the true potential of citizen data science, and by extension, both can act as significant barriers.


Let’s take the “citizens” first. Although I was talking about “almost everyone”, in the end, the employees will need some kind of inner motivation/drive to want to work with the data. They will also require basic education in understanding data which they could develop and apply to the business. This might be either provided by the business itself (increasing cost of adoption of citizen data science) or, by a stretch of imagination, the future education systems will implement the basic data skills into national curricula (as it has happened with programming). Clearly, those two solutions are not mutually exclusive, and since the latter is a long-term one, in the next at least 10 to 20 years the businesses wishing for citizen data science will have to think of a way to impart essential data skills onto the employees. Third solution is to push for hiring employees who are better equipped to handle the data (as was already happening in 2014). In theory, this leads the university academics (possibly the most responsive curriculum) to adjust the courses to reflect the needs of the employers (also happening, check the previous source).

The most sceptical account of citizen data science, and by analogy a barrier to its adoption, that I have read comes from this piece, in which the author essentially says that whatever the technology, as long as the person operating it does not have a proper skillset for data, the results/insights will never be trustworthy. Personally, however, I think that he might be missing a bigger picture about the evolving technology.


The reason I hold such belief is that of the pair citizen-technology, Gartner believes that what will help citizen data science is the technology that will enable people with minimal data skills create valuable business insights. Today already the data tools are becoming more user friendly, also becoming geared towards non-expert users. It remains to be seen if the technology will evolve to such phase that would incite a mainstream adoption of the citizen data science. IBM, however, already claims to offer solution that scales this barrier – IBM Watson Analytics (as do SAS, Dell, KNIME and RapidMiner, as you will [HERE – reference to another blogpost]. Still, it is unclear how much of an average Joe one can be to still usefully utilise Watson. In this video (dated November 2015), IBM claims to have half a million users which they categorise as “citizen analysts”. Unfortunately, because there are no specific definitions for who “citizen data scientist” or “citizen analyst” really is, it’s difficult to say that Gartner and IBM mean the same thing. For the time being, then, I think that it is fair to say that we are not there yet, but that the future looks optimistic.

Legacy Processes

Third barrier of adoption are legacy processes – the legacy software, the legacy decision making and the legacy workforce. Obviously, this is mostly relevant only to the really big companies, but they are also important in advancing technology to mainstream adoption. For them, every change in how they operate represents a huge initial costs. Just imagine. With approx. hundreds of managers, every single one would need the relevant software, as well as service that software. All of them would also need to know how to use it effectively, so a training workshop for these purposes would likely be set up. Those without quantitative skills would need to be trained. Then internal protocols would need to be updated to include the feedback loop through which the company could utilise the insights created by these citizen data scientists. I could continue, but the bottom line is that it would be incredibly costly and therefore it constitutes a barrier to adoption by the big companies. Unless there would be a significant improvement (also expressed in terms of money saving/profit) to be made from such move, large companies have a long way before fully embracing the framework of citizen data science, if it ever goes mainstream, that is.

What types of companies will develop Citizen Data Science into commercial products?

In the environment of advanced analytics, there are a number of well-established players as well as a plethora of smaller companies making rounds with their solutions. The sector itself has been projected to grow 14% in 2016 to reach value of $1.5bn. Quite understandably, most of them are trying to positions themselves into “advanced insights, simple operation” space, because every single business has the need for data, but not every single one has necessarily the resources to employ a skilled data scientist to handle the data. But without actually trying all of them, it is incredibly difficult to really see how ‘simple’ the operation of the software is, and how would “an average Joe” describe the experience of working with it.

Analytics first, then data

Typologically, the companies that are best equipped to offer solutions for citizen data science are those which have their primary strength in analytics. Although companies like SAS and SAP are at the forefront of business intelligence, there is a different set of leaders for the Advanced Analytics (under which citizen data science belongs). According to Gartner Magic Quadrant, these are SAP, IBM, KNIME, RapidMiner and Dell. These companies are fully geared towards providing analytics solutions and would be the “traditional” ones in this space. What is interesting here is that KNIME and RapidMiner operate under open-source licences and support community-created extension to their core software, which supposedly greatly enhance the range of functions the software provides.

Data first, then analytics

Other companies that might be able to provide citizen data science solutions are those that already have a well-developed bank of data on which they could offer tailored solutions directly to their customers. For example, Salesforce is best known for its cloud-based Customer Relationship Management solution. In 2013, Salesforce introduced Wave, a Business Intelligence software which leverages data already hosted on Salesforce and markets itself as “the only analytics solution designed for Salesforce users”. Here, Salesforce identified a potential to extend their business from, essentially, hosting the data to analysing it, and because they already had the data and the platform their solution feels only natural to their customers. From 2012 to 2013, Salesforce had the highest increase in market share and highest increase in growth rate. Similar companies could follow, although obviously the requirement of strong analytics component might not suit everyone.

What competences will companies need in order to develop and commercialise Citizen Data Science?

There is possibly a great number of competences that are needed for any company to develop and commercialise a new product. Here, I list three of the key competences that I identified, indicating in brackets whether they relate to commercialisation or development.

Security (commercialisation)

With the trend of moving towards Cloud, the ability to present not only working, but also secure solutions is an essential quality for any company hoping to offer data analytics. First, security concerns are limiting companies’ adoption of cloud and mobility. Second, consumers, also owning to major data breaches at Ashley Madison or TalkTalk, are increasingly aware of the security of their data. The source states that three quarters are concerned with the amount of personal data shared with brands online. Third, stringent official regulations are in plans and compliance will have to become more of a priority in order to persuade businesses that adopting a particular solution will not carry hidden costs of potential bureaucratic investigations.

User experience (UX) (commercialisation)

If it is important to provide a great UX to those who understand the data, then it is even more important to offer seamless UX to those, who might feel overwhelmed by the content. Take up of the software depends on ease and pleasure (as a function of time spent on analysis to the usefulness of the result) of use. It’s not enough then to just have a powerful tool – it also needs to be enjoyable to work with, otherwise it will be difficult to persuade employees to use it, posing a barrier to company-wide adoption. This importance of this criterion can be substantiated by the fact that Gartner Magic Quadrant specifies User Interface (UI) when talking about company’s profile. For example, UI as Dell’s strength, but KNIME’s weakness (kind of).

Advanced analytics (development)

To keep citizen data science relevant, the tools will need to be able to offer ever improving ability to crunch complex data. From big data to Internet of Things to predictive analytics, enabling the same level of users to access more and more complicated analysis might reach its limits one day, but the later that happens, the more business value and profit there is to be found in citizen data science solutions.

Where will the competition come from?

As an emerging technology and “the next big thing”, it is difficult to say what will be “the next next big thing” that will supersede citizen data science. It might be Artificial Intelligence that will do analytics for us and make redundant any kind of human input required now. Or it might be that Internet of Things will convey a similar level of localised knowledge and personalised input won’t be needed anymore. We could be brainstorming for a bit longer, but I would rather use this space to look at the real competition in the next few years – competition within industry. There is a number of players aspiring to have a share of this market. I don’t believe that their solutions will ever merge into one-size-fits-all, and as such it is definitely not only the price competition that we will see.

IBM Watson, is having the most visibility from among the currently available tools, and as such could be described to be at the top of the food-chain. Surprisingly for me, a look at the Gartner Magic Quadrant for Advanced Analytics betrays another picture.

As part of the Business Intelligence (BI) sector, we could assume that Advanced Analytics, and specifically Citizen Data Science, will be emerging from the same companies. According to Gartner, the top 5 vendors of Business Intelligence and Analytics Software were SAP, Oracle, IBM, SAS Institute and Microsoft. The story is a bit different though. Gartner Magic Quadrant made specifically for Advanced Analytics agrees with the top 5 BI vendors only in two names – SAS and IBM.

Magic Quadrants_Advanced Analytics.png
Source: Gartner Magic Quadrant – Advanced Analytics

The other three leaders, as you can see, are KNIME, RapidMiner and Dell. All three are commended by Gartner on their improvement of user experience of tools aimed at citizen data scientists. I won’t be going over individual companies as you can do that yourself on the link above (Magic Quadrant), but I do want to note a few things:

  1. KNIME is open-source which means that it is free to download for anyone; it has commercial extensions but its analytics platform core is free – that such program is successfully competing against giants like IBM and SAP tells a lot about the where to look out for the competition in the field – almost anywhere
  2. RapidMiner is also open-source but with a bit more commercialisation going around; similarly to KNIME it proves that to be a relevant competition to those giant software companies, one does not have to be a giant herself
  3. Both RapidMiner and KNIME support community-developed extensions to their core software, which is also a thing which drives their value up, offering additional variety of tools in addition to the powerful core software – a curious (but not solitary) case of crowdsourcing of abilities which under a strong and well-maintained platform synergise together and pack a strong analytics punch
  4. On the other hand, community support cannot be taken for granted, and in case that KNIME or RapidMiner would fell out of favour of the community, the most important questions is – what would be left of them?

Thus, where will competition come from? Well, it appears that it might come from anywhere (i.e. a place of community solution), but wherever it might come from, it will still need at least the three basic competences mentioned in here.