Believe it or not, the major of our decisions are made based on graphs, ratios, tables and any statistical visualization. But what if those resources are wrong? There are many examples of bad decision based on misleading charts, from the number of abortion and cancer screening, to student’s graduation and environmental change. It’s sometime hard to figure out if data is well explained by charts, but it’s even harder when it comes to big data.
The very first important and the basic rule for any accurate visualization is the right scaling and labeling for axis. More than that, you should check if data plotted correctly especially when more than one scale are showing on Y-axis. When it comes to the time, which is shown on X-axis, you should consider the proper period of time to be included in graph. Are couples of months a good indicator for showing a company’s stock price? Take a look at the picture above, why are these graphs are misleading? You can check Stephanie Glen video for more examples of misleading graphs of these types.
Choosing the best trend line for data is necessary. This picture well depicted the misunderstanding that comes from choosing inappropriate trend line; the linear trend line is showing the decrease in quantity on Y-axis, while polynomial trend line is showing the increase. Function approximation would be a solution for finding the best trend for data with high fluctuation. Photo Credit
Ratios and dimensionless numbers are always giving a better insight in data visualization. This fact is well explained by David McCandless in his Ted talk. The picture a shows top 5 countries with the highest total number of soldiers. While picture b shows the top 5 countries with the highest ratio of total number of soldiers over number of people (number of soldiers/ the population of country) who are living within the country. Which country has the biggest army? China or North Korea?
As I mentioned above, it even gets harder to make sure about the accuracy of visualization when it comes to big data. The source of information in big data visualization is critical. In other words, visualization of big data needs to cover all 4 V’s of big data in order to make sure we considered whole data. Unlike data sampling method to use the subset of data within a population, the beauty of big data comes from the whole data source. What are other misleading factors in big data visualization?