Some notes on research-compendium

These are my notes while studying the research-compendium concept, which is essentially a bunch of guidelines to produce research that is ‘easily’ reproducible. I’m sure I will have better methods in hand when I comprehend v.py and make progress in my tasks. The notes are mostly based on https://peerj.com/preprints/3192/, which is recommended as a canonical…

Canada Open Data : Exploration of Wages

Plan [ ] Decide on which data set is to be analysed [ ] Ensure a method to access the same version of the data to enable reproducing the analysis. [ ] Visual exploration to understand the features available. [ ] Formulate questions for Exploratory Data Analysis (EDA) [ ] Evaluate the possible directions in terms of applying ML [ ]…

Using ESS for Data Science

RStudio is a formidable IDE to work with and offers an environment to seamlessly work with multiple languages beyond R. It is especially convenient for tasks involving frequent visualisation of data frames and plots, and for use with Shiny app development. However, the text (i.e code) editing capabalities are still significantly lacking compared to the…

Init: custom docker container for Data Science

I have enabled the TCP on port 8787 via UFW, and am running a docker container of the image ‘rocker/tidyverse’, which is providing an Rstudio IDE online to play with. This was relatively straightforward, starting with installing docker on my debian machine, and then pulling in the rocker/tidyverse container, as well as the rocker/shiny-verse container.…

Easy to resize the swap partition in Linode

In the ‘new’ interface for Linode manager – the advanced tab contains the list of disks. When the Linode is fully powered down, it is possible to resize the partitions as desired. My earlier notes indicate that atleats 2.5GB of swap space was required to install Rstudio (without docker) in the past, on a machine…

Installing UMAP took up nearly 1-1.5 GB of swap

Atleast ~500MB of RAM and around 1-1.5GB of swap was used while installing the UMAP package into rocker/tidyverse. It also took ~10 minutes atleast, and started with installing the reticulate, Rspectra and Rcppeigen (?) packages first, after which UMAP was installed. It would certainly save time and head-banging to have umap already installed into a…

ggplot2 –> plotly, transferring subtitles and captions

Lets presume that a ggplot object(g) is available, and the idea is to convert this into a plotly object (p), which offers enhanced interactivity of plots. However, the subtitle and captions defined in the said ggplot object do not get translated into plotly. This is a feature enhancement that has been raised in November 2016…

Rapidly accessing cheatsheets to learn data science with Emacs

Matt Dancho’s course DSB-101-R is an awesome course to step into ROI driven business analytics fueled by Data Science. In this course, among many other things – he teaches methods to understand and use cheatsheets to gain rapid level-ups, especially to find information connecting various packages and functions and workflows. I have been hooked to…