45 – 6/8 Learn Data Visualisation Tools

  • Learn the fundamentals of Statistics
  • Learn SQL
  • Learn Python for Data Analysis
  • Learn Data Manipulation and Visualisation
  • Learn Learn Statistical Analysis
  • Learn Data Visualisation Tools
  • Work on Projects
  • Learn Data Storytelling

In the ongoing Data Analyst Roadmap series, we’ve navigated through pivotal terrains like statistics, SQL, Python for data analysis, data manipulation, and statistical analysis. Now, entering the sixth phase, we’re delving into the area of data visualisation tools. Within the data analyst’s toolkit, the art of conveying insights is just as critical as the analytical process itself. Imagine possessing a wealth of valuable data but grappling with the challenge of effectively communicating its significance to stakeholders or team members. Herein lies the importance of mastering data visualisation tools.

Our focus, Learn Data Visualisation Tools, centres on how distinct tools shed light on the same dataset – a qualitative and quantitative survey encapsulating the sentiments of individuals within the data industry. Today, I will be looking at evaluating this data visually using Power BI. The rationale behind selecting Power BI is rooted in its ability to offer a diverse perspective on data visualisation, catering to a spectrum of user preferences and technical proficiency. Power BI, a Microsoft product, seamlessly integrates with other Microsoft tools for dynamic reporting, such as Excel, a ubiquitous spreadsheet tool, and boasts robust visualisation capabilities familiar to many analysts.

In a previous post, I outlined many of the charts available in Google Sheets. Throughout this comparative exploration, we’ll unravel the strengths, limitations, and optimal context of this tool, using those charts outlined in Google Sheets post. By the end, you’ll not only discern the strengths and limitations of the tool within visualisation, but also gain valuable insights for selecting this tool for the right analytical scenario. Although I feel visualisation is a weakness that I want to improve on, this visual journey through the landscape of data-driven insights will allow me to uncover the transformative potential of Power BI to convert raw data into a compelling narrative.

Data Retrieved

The data that was used in this analysis was retrieved from the github repository of Alex the Analyst. Disclosure: Before building this Visualisation, I edited the 3rd column ‘Date Taken’ to allow for the dates to fit within either DD/MM/YYYY or MM/DD/YYYY format and trigger no errors.

Viewing the Raw Data

Data collected:

  • Unique ID
  • Email
  • Date Taken (America/New_York)
  • Time Taken (America/New_York)
  • Browser
  • OS
  • City
  • Country
  • Referrer
  • Time Spent

Information asked:

  • Q1 – Which Title Best Fits your Current Role?
  • Q2 – Did you switch careers into Data?
  • Q3 – Current Yearly Salary (in USD)
  • Q4 – What Industry do you work in?
  • Q5 – Favorite Programming Language
  • Q6 – How Happy are you in your Current Position with the following?
    • (Salary)
    • (Work/Life Balance)
    • (Coworkers)
    • (Management)
    • (Upward Mobility)
    • (Learning New Things)
  • Q7 – How difficult was it for you to break into Data?
  • Q8 – If you were to look for a new job today, what would be the most important thing to you?
  • Q9 – Male/Female?
  • Q10 – Current Age
  • Q11 – Which Country do you live in?
  • Q12 – Highest Level of Education
  • Q13 – Ethnicity

Power BI

Load Data

Import data from Excel:

Select the dataset then Transform Data:

Now we have the data loaded:

Clean Data

First thing I’ll do is remove some unnecessary columns:

  • Browser, OS, City, County, Referrer.

You can see this action being recorded on the right-hand side:

I want to start cleaning up the questions. For instance, Q1 – Which Title Best Fits your Current Role?. This is a fantastic piece of information as we’re able to start breaking down these different roles, but it’s not very clean:

This column includes many various options under Other. To make this visualisation a lot easier, as opposed to cleaning it in Excel or SQL, I will collate all of these different Other options into one Other. To do this, I will select the column then click Split Column > By Delimiter above it:

I’m then going to select Custom > ( > Left-most delimiter:

This will separate all Others by the left parenthesis that proceeds it:

Now I can remove that newly created column. I’ll repeat this process for ‘Q4 – What Industry do you work in?’, ‘Q5 – Favorite Programming Language’, ‘Q11 – Which Country do you live in?’, and ‘Q13 – Ethnicity’:

Next I want to break up the numbers in the salary brackets:

I want to break up the numbers and take the average of the numbers. For instance, from the first entry, break 106 and 125 into two columns then take the average in the third. To be safe, I’m going to duplicate this column and edit that duplicated column then Split [the] column By Digit to Non-Digit:

Now you can see how the columns have been split:

I can remove the last column then I will do a Replace Values command on ‘[xxx] – Copy.2’ twice times; once for the ‘k’ and another for the ‘-’:

There will also be a ‘+’ symbol for those who selected the salary amount ‘225k+’. I’m going to replace the ‘+’ symbol with 225, so that the average will end up amounting to 225.

For this average salary calculation, I first need to transform these text columns into number:

Next, I’ll make a custom column:

I’ll then enter the formula: 

=([#’Q3 – Current Yearly Salary (in USD) – Copy.1”]+[#’Q3 – Current Yearly Salary (in USD) – Copy.2”])/2

Now I can remove the other two columns and move Salary Average next to the previous ‘Q3 – Current Yearly Salary (in USD)’.

Building Visualisation

Clicking on Report, I will start to integrate this data using some visual tools.

Title

Using a Text Box, I’m going to name the report ‘Data Professional Survey Breakdown’

Survey Demographics

Using the Cards feature, we’re going to create an overview of the participants involved with this survey.

This first card will display the total number of survey participants. I’m going to use the ‘Unique ID’ field, and count the unique IDs that have responded:

Next will be the average age:

Job Titles

Given there are a number of different jobs, and it is helpful to compare them at a glance, I’m going to use a stacked bar chart. Along with these job title, I’ll add the average salary (a field we have just spent time transforming):

I have added:

  • Q1 to the y-axis,
  • Average Salary, with the attribute set to Average, on the x-axis, and
  • Q1 to the Legend, to create the colour bars, helping to differentiate the individual bars.
Programming Languages

Within data, there are a number of different languages that programmers can use, such as R, Python, C++, etc. I’m interested in uncovering which range of job titles within the industry and are using which language. So, I am going to use a clustered column chart, as many different roles can use a mixture of the languages, with ‘Favourite Programming Language’, ‘Count of Unique ID’, and ‘Job Type’ as the necessary attributes.

Location

Even though I could use a world map as an obvious option for filtering subsets of the data, implementing a tree map lowers the user’s time barrier of having to search through a map for the information that they are searching for. ‘Which country to you live in’ and the distinct count of respondents of those countries will be used:

Current Work-Life Balance Satisfaction

Since I want to gauge satisfaction from a scale of 0 to 10, I will use the gear chart so that a user can grasp the sentiment in a moment’s notice. ‘How Happy are you in your Current Position with the following?’ and the Count of those responses will be used in this chart.

If I place the last (work-life balance) Q6 into the Value slot, it will give me a min/max range between 0 and 11.48:

So, I need to add the same (last) Q6 into the Minimum value field (set to Min) and Maximum value field (set to Max) and then I will have the 0 to 10 range:

Current Salary Satisfaction

I’m now going to do the same for Salary satisfaction (third last Q6):

Difficulty to Enter the Data Field

I believe it is interesting to gauge the different approaches that people have had into the industry. Everyone has a different story and a unique way of arriving to it. I, myself, am still on the track, but I’m loving the journey and various routes that I am meandering my way through.

Salary by Gender

The last element I will add is salary by gender. To do this, I will take the average of the salary earner, and split the data by gender in the pie chart:

Final Dashboard

After a bit of manoeuvring, etching, and colouring, I have finalised this dashboard:

One response to “45 – 6/8 Learn Data Visualisation Tools”

Leave a comment