35 – 4/8 Learn Data Manipulation and Visualisation

  1. Learn the fundamentals of Statistics
  2. Learn SQL
  3. Learn Python for Data Analysis
  4. Learn Data Manipulation and Visualisation
  5. Learn Statistical Analysis
  6. Learn Data Visualisation Tools
  7. Work on Projects
  8. Learn Data Storytelling

What a broad term this is; data manipulation and visualisation. Data manipulation could be something as simple as transforming a field from text to number. Or a number field from integer to float. It could be something as complex as creating a pipeline to best understand choices made by rideshare users during various times of the day and the impacts they have on their wallets. Depending on your context and end goal, it will be necessary to follow a certain number of steps in a process in order to visualise the findings you have discovered in a dataset (or collection of datasets).

Why are we doing it?

Source: https://www.polymersearch.com/blog/10-good-and-bad-examples-of-data-visualization

What do you think of when you look at the left bar chart? What about the right? To me, the right bar chart allows me to understand that the first thing I need to consider is the blue coloured bar. It signals to me that it is important compared to the rest. Whereas the left bar chart requires more work from my brain; why are they different colours? Are they related to one another? How are they related? Is the magnitude of difference majorly important?

What about this visualisation? Can you think what’s good about it? What are its strengths? Well, what about where it could be better? For me, I don’t know what’s important, what’s not, and where to start piecing them together. I can tell that this is personal information about a group of people, but I don’t understand the link between them – apart from the fact that they all seem to be between 59 and 60 (in descending order), and potentially in the United States with myoffice email accounts. So, this could be a national company employee list of people who might be entering retirement? Yeah, it’s tricky.

As you can see, we’re doing this so that we can easily tell a story to someone who doesn’t have the understanding of the data area or isn’t as close to the dataset as you’ve been. This audience can range from external stakeholders to the company (those with/without subject area knowledge) and internal stakeholders (of varying degrees of subject-specific knowledge).

Main Concepts

Here is list of the concepts that are important for effective visualisation:

  1. Know Your Audience – Tailor your data visualisations to your audience’s preferences and comprehension levels, ensuring that your content is both inspiring and relevant to their needs,
  2. Set Your Goals – Establish clear goals and objectives for your data visualisation efforts, creating a logical narrative and focusing on key insights to drive strategic decision-making,
  3. Choose The Right Chart Type – Select the most effective chart types, such as line graphs, number charts, maps, pie charts, gauge charts, bar/column charts, area charts, spider charts, and treemap charts, based on your specific project, audience, and purpose,
  4. Be Careful Not To Mislead – Avoid common misleading practices in data visualisation, such as truncating axes, omitting data, and correlating causation, to present an accurate and honest representation of the information,
  5. Take Advantage Of Colour Theory – Use colour strategically to highlight key points, evoke emotions, and create engaging visuals, considering preconceived colour associations and maintaining consistency throughout your visualisations,
  6. Prioritise Simplicity – Keep your visual designs simple, avoiding clutter and unnecessary elements, using classic fonts, and sticking to a light colour palette to enhance clarity and understanding,
  7. Handle Your Big Data – Effectively manage and present large volumes of data by understanding its value, ensuring clear labelling, promoting data literacy across your organisation, and using business dashboards for streamlined visualisation,
  8. Use Ordering, Layout, And Hierarchy To Prioritise – Organise your data into a clear hierarchy, order it logically, and use layout techniques to prioritise information, making your visualisations more efficient and successful,
  9. Utilise Word Clouds And Network Diagrams – Employ network diagrams for structured data and word clouds for unstructured data, providing a visually digestible representation of complex information,
  10. Use Text Carefully – Incorporate text elements such as captions, labels, legends, and tooltips thoughtfully, ensuring they add value without overcrowding the visualisations,
  11. Include Comparisons – Present tangible comparisons to highlight contrasts, strengths, weaknesses, trends, and other insights, making your data more relatable and actionable,
  12. Tell Your Tale – Craft a narrative around your data, considering a clear structure with a beginning, middle, and end, to engage your audience and communicate your most critical messages effectively,
  13. Merge It All Together – Combine various visualisations into cohesive dashboards, leveraging modern dashboard technology to provide a comprehensive view of key performance indicators,
  14. Make It Interactive – Enhance engagement by incorporating interactive elements in your visualisations, allowing users to explore and extract insights in real-time,
  15. Consider The End Device – Optimise your visualisations for various devices, particularly mobile, by focusing on essential graphs, charts, and readable text to ensure a seamless user experience,
  16. Apply Visualisation Tools For The Digital Age – Leverage digital tools and interactive online dashboards to efficiently collect, collate, and present data in a comprehensive and impactful manner, and
  17. Never Stop Learning – Continuously improve your data visualisation skills by gathering feedback, learning from experience, and staying updated on best practices, ensuring ongoing success in creating effective visuals.

Let’s see how I can integrate these into the following case.

Fuel-Cost Breakdown

In one of my jobs, I was tasked with a kilometre breakdown of a van based on events that occurred over the duration of logged usage times in the hopes to distribute fuel costs over various projects.

https://docs.google.com/spreadsheets/d/1BGt8cPXIF_EO_4Uok1tbV50cznB6l9Nb1IPM9RkAMUs/edit#gid=1974045550

googleapps domain=”docs” dir=”spreadsheets/d/e/2PACX-1vRYZwgwF1SDhi_Tye_SFzkihsZUDm0VhSVuzxoXrEGjRTsX05N1T8lMQ1vTmcW50PE7OoY6auRjXzpv/pubhtml” query=”embedded=true” /]

First, I logged all of the available information that I had. Specifically, this was: Start and End dates, Start and End distances, Purpose:

Next, I calculated the KM travelled by a simple subtraction formula: column I – column H. Then, I utilised the combination of IF, ISNUMBER, and SEARCH to provide me with the main purpose of the trip; Location A, Location B, Mountain Biking, or Once-Off trip.

=IF(ISBLANK(J3),,IF(ISNUMBER(SEARCH(“Loc A”,J3)),“Loc A”,IF(ISNUMBER(SEARCH(“TTC”,J3)),“Loc B”,IF(ISNUMBER(SEARCH(“MTB”,J3)),“Mountain Biking”,“Once-off”))))

I then used a combination of IF, IS BETWEEN, and HLOOKUP to determine the full cost of petrol for each trip:

=IF(F3<$P$2,0,

IF(ISBETWEEN(F3,$P$2,$P$3),

HLOOKUP(L3,$A$2:$D$4,3,0)*$R$2,

IF(ISBETWEEN(F3,$P$3,$P$4),

HLOOKUP(L3,$A$2:$D$4,3,0)*$R$3,

IF(ISBETWEEN(F3,$P$4,$P$5),

HLOOKUP(L3,$A$2:$D$4,3,0)*$R$4,

IF(ISBETWEEN(F3,$P$5,$P$6),

HLOOKUP(L3,$A$2:$D$4,3,0)*$R$5,

IF(ISBETWEEN(F3,$P$6,$P$7),

HLOOKUP(L3,$A$2:$D$4,3,0)*$R$6,

IF(ISBETWEEN(F3,$P$7,$P$8),

HLOOKUP(L3,$A$2:$D$4,3,0)*$R$7,

IF(ISBETWEEN(F3,$P$8,$P$9),

HLOOKUP(L3,$A$2:$D$4,3,0)*$R$8))))))))

For the dashboard values, I used the SEARCH information (column L) and KM travel (column K) to calculate each Location’s percentage:

=SUMIF($L$3:$L$63,A$8,$K$3:$K$63)

=A10/SUM($A$10:$C$10)

Summary

Although there would be many more, context-specific, list of advantages and disadvantages of data manipulation and visualisation, here is what I’ve been able to come up with:

Advantages

  1. Easily sharing information,
  2. Interactively explore opportunities, and
  3. Visualise patterns and relationships.

Disadvantages

  1. Biassed or inaccurate information,
  2. Correlation doesn’t always mean causation, and
  3. Core messages can get lost in translation.

Leave a comment