Visualisation makes big data usable. People are inherently visual and using effective visualisation engages the pattern matching capabilities of our brain to take in information more quickly. There are a large number of organisation who have tools to help but the growth areas are in open source and start-ups. People with ‘R’ skills are in demand for this work. Visualisation leads to insight but action is then required from the insight otherwise this is trivia.
This article is a follow on to my previous article on big data <- (convert this to link), please read this to get an understanding of some of the background to this article (thought you don’t have to). This a brief introduction to visualisation to give a quick sense of what it is about and why it is important (including biological factors!). It will cover the importance and business of visualisation, typical uses and finally some of the players in the space.
Visualisation and Big Data
Data visualisation is hot - there is a never ending list of companies that have software or want to help you with visualisation and there are online courses available to help you gain the skills and knowledge. So why is visualisation hot? Two words - Big Data, these two words strike fear and/or excitement into people depending on their perspective. Excitement from the people that want to use big data and fear from those who may be tasked with implementing and managing!
Big Data may be the underlying cause but the real impact of visualisation is, it makes sense on the data that you have. Being able to process and refine large amounts of information through Big Data techniques is all well and good but, unless you have tools and techniques to take that data and present the results in a form that is intelligible to your audience and beyond, it is just columns and rows of data (and pretty uninteresting). Business are relying more on big data for decision making but the visualisation is what provides the insight.
Inherently most of have some sort of response to visual input, one only has to look at the effort film makers, TV, marketers etc. put into the visual image to understand the importance. Images significantly aid understanding and retention of information as well as providing a sense of comprehension that either takes many words or cannot be expressed in full through words. Our visual system is built and tuned for visual analysis, we take in a lot of information via our eyes and our brains are then very good at pattern matching, edge detection and shape recognition. Pattern matching is important as much information is carried in the pattern or the pattern breaks e.g. outliers, this gives us meaning.
To illustrate the importance of visualising the data, I will use the classic Anscombe’s quartet. The data that produces each graph below has the same mean, variance, correlation for each axis and linear regression but as you can see produces a remarkably different graph!
So delivering information quickly and succinctly requires the use of visuals which in turn leads to actionable information or insight. Something can be done about the pattern or outlier. In business this can be a competitive advantage through being able to take action more quickly if the information can be presented in the ‘right’ way. Of course the ‘right’ way requires experience, skills and knowledge rather than a ’tool’.
What is Visualisation being used for?
One could say everything and this is partly true but I am going to concentrate on the main uses either and those relevant to the audience for this piece. Some of the main uses of visualisation have been in healthcare and allied areas such as pharmaceutical research, health science, genomics etc. Other key areas have been in sales especially clickstream analysis for large (and not so large) ecommerce sites, customer satisfaction usually collecting data from online and contact centres. For information technology specifically log analysis, events and correlation, security incidents and security events over time.
One of the issues impacting organisations with monitoring type data, is the ever larger firehose of data being thrust towards the analyst of the data. This is akin to the issues that air forces (and aircraft manufacturers and airlines) around the world are having to deal in pilot overload. They are trying to deal with the information flow so that pilots are not overwhelmed with information and then cannot act on it. In combat situations, this the difference between living or dying! Visualisation is the only approach currently that scales, to deal with the increased flow of data. Our pattern recognition capabilities help us make sense of the data with hits and tips from the software.
Using the visualisation can be the hard part. Taking the information contained within the visualisation, making sense of what is seen, identifying the insight and then taking action is part of the human process that needs training and taking the ‘right’ action. However, not taking any action will be worse that not taking some action – Insights from analytics for which you take no action are just trivia! Once action has been taken then collect new data and see if change has occurred in the way you expected – report the cycle.
So who are the players in this field
First lets tools at the tools and the companies behind them. As you might expect with visualisation coming from an academic background a number of the tools in use are open source but there are a companies now providing support for these tools as well as software companies who can sell (at time very expensive) software. The biggest development in this area has been the proliferation of web based tools often with a free tier available for limited or ad supported use.
Probably the most talked about tool is ‘R’. This is an open source statistical language with excellent graphing capabilities. It has a large eco-system of additional libraries that provide significant enhancement to the base tool. As it is open source and a collaborative development effort some the usability is compromised and it has a steep learning curve to get the best out of the tool. R has its own IDE as well as integrating with others via plugins and via additional plugins, integrate with code management tools. This can be important when dashboards and data sources become confused and extract after extract is created, obscuring (and changing) the source data.
SAS is the venerable closed source tool with a long history in this area. The company has been scrambling to find more relevance in this particular market area (notwithstanding the many other areas it does very successful business). SAS have released libraries to support the R language. SAS like R has a steep learning curve to get the best out of the tool. SAS will use its own data stores as well as connect to a multitude of other sources.
Tableau Software has made huge strides in this space with its general ease of use and accessibility for non data scientists. Its growth has exploded over rivals like Oracle BI Suite and IBM Cognos due to pricing, value delivery and better visualisation capability. It does tend to excel in delivering interactive dashboards. It has become an issue for IT departments with many business units buying copies and effectively paying more than having managed purchases as well as under used licences.
Splunk is both visualisation and data processing engine. It is probably better at the processing, correlating and indexing of data than the visualisation side but is with including here. The tool is focussed on processing machine generated data such as logs of many types and can be done in ‘real time’. Many security teams have implemented Splunk for security log and incident analysis and the tool has been very successful in this area. More recently Splunk, the company, has been trying to broaden the appeal of Splunk outside this fairly narrow area into more diverse big data and visualisation functions. I am not sure how successful this will be due the fairly ‘techie’ nature of the tool.
Tibco Spotfire is another complex tool but is seen to be easier to learn than SAS. Very good visualisation capabilities. It has a hybrid in-memory and in-database data store and very powerful processing capability on top of this for analysis and visualisation. Similar in power to SAS and easier to understand but not necessarily less complex.
Pentaho is open core software – having both an open source edition and a paid enterprise version with extra features and support. The software provides a layer to interface to a very wide range of Big Data and traditional data sources. Pentaho provides in-memory analytics and visualisation capabilities as well as sourcing data from a number of backends. The overall platform provides a workflow like processing view from data to analytics.
Just as a word of caution, these tools are not the only ones available!
So we have talked about the tools, now who do I think is doing this well in a business context:
• GE – especially their power engineering. There receive telemetry data from their turbine generator installed in power stations around the world every 15 secs
• GE and Rolls Royce – aircraft engineering. Data from the engines of their engines is sent back partly during and after every flight. The information is visualised to provide engineers with information on the health of the engines
• CSIRO – have a wealth of information is genomics, spatial data and climate data, with the brains trust to pull together some great visualisation
• Hans Rosling – world health expert, created a visualisation of health over time (see link)
• National Geographic has some interesting visualisation on their web site including here
• http://www.informationisbeautiful.net/ is an interactive site with some excellent examples – not all business though
• American Express
Some visualisation start-ups are trying the change the way we think about and look at data including on mobile devices. Often these are hiding complex calculation under a relatively easy to use interface. Here are a few:
• http://www.ayasdi.com/ – network graphs
• http://www.clearstorydata.com/ – data platform and analytics capability, focus of telling the business story
• http://www.platfora.com/ – another integrated data store and analytics capability very much targeted at big data
• https://www.graphiq.com/ – a little different, ‘pre’ built visualisation based on a search for a subject, try this – https://www.graphiq.com/search/search?cid=1&query=galaxy%20s5
• http://www.sisense.com/ – accessing data from multiple sources and bringing it together
• http://sqrrl.com/ – cybersecurity focus – threat detection, hunt and isolate
• http://www.datameer.com/ – another platform
• https://www.palantir.com/palantir-metropolis/ – statistical modelling and visualisation
How Diaxion can help – our expertise in this area is in the architecture, design and setup of the infrastructure to enable the delivery of these tools and platforms. If you are interested, call us to find out more.
Below are some examples of visualisations – click the image for the original story
Simulation of measles infection rates