Factors Impacting Hollywood Film Profit

Python function displaying correlation statistics

Overview

Python, including the Pandas library for data cleaning, and visualisation.

GitHub Project.


Entire Process

The following outlines the major steps taken to transform the dataset into analytical data types for correlation purposes. Exhaustive code can be found via the GitHub link.

Results:

  • Votes and budget have the highest correlation to gross earnings.
  • Company has a low correlation to gross earnings.

Clerical

Import Libraries

Inspect First Few Lines of Data

Data Cleaning

Check for Missing Data

Check Data Types for Columns

Change Data Types for ‘budget’ and ‘gross’

Create Correct Year column

Order by gross revenue

Remove duplicates

Correlations

Which attribute is more correlated to gross revenue?

Numerical Perspective of Correlation Matrix

Colourised Correlation Matrix

Include All Columns into Correlation

Converting text columns into a numerical value

Organise Correlation Output by Attribute

Output Only Highly Correlated Pairs (>0.5)

Leave a comment