Canada Open Data : Exploration of Wages


  • [ ] Decide on which data set is to be analysed
  • [ ] Ensure a method to access the same version of the data to enable reproducing the analysis.
  • [ ] Visual exploration to understand the features available.
  • [ ] Formulate questions for Exploratory Data Analysis (EDA)
  • [ ] Evaluate the possible directions in terms of applying ML
  • [ ] Plan for a shiny app that allows viewing the answers to the above in an interactive manner.
    • [ ] Self-hosted Shiny app will need the Shiny server to be setup on the VPS. This is the desired setup.
    • [ ] Alternatively, the apps can be hosted for free on shinyapps.rstudio .
      • This is okay and common as a start point, but there are several limitations to the free service, like the speed of loading, and limitation of resources used and so on.
  • [ ] Perform EDA
  • [ ] Perform ML
    • Atleast 2 approaches appear to make sense: Linear Regression (+ extended methods like GLMNet) to predict the trend and K-means clustering.
  • [ ] Review
  • [ ] Publish results / report.

Rough notes

Datasets that will be analysed

Notes on visual exploration




Links to dataset webpages

Wages, salaries, employers social contributions. by province and territory link

Employee wages by occupation, annual link

  • contains over 4 million rows or observations.
  • The CSV file itself is 1GB.
  • The number of features ~23, of which atleast 3 are of no use to the analysis and perhaps more.

Employee wages by industry, annual link

Wages – 2017, 2018 link