Some notes on research-compendium

These are my notes while studying the research-compendium concept, which is essentially a bunch of guidelines to produce research that is ‘easily’ reproducible. I’m sure I will have better methods in hand when I comprehend v.py and make progress in my tasks. The notes are mostly based on https://peerj.com/preprints/3192/, which is recommended as a canonical…

Canada Open Data : Exploration of Wages

Plan [ ] Decide on which data set is to be analysed [ ] Ensure a method to access the same version of the data to enable reproducing the analysis. [ ] Visual exploration to understand the features available. [ ] Formulate questions for Exploratory Data Analysis (EDA) [ ] Evaluate the possible directions in terms of applying ML [ ]…

Python notes and snippets

The following resources were utilised to develop the snippets and notes below. Other links are also available inline with the text. The Mouse v/s The Python – Mike Driscoll’s website Real Python email newsletters, books, courses. Howard Abram’s video on literate dev-ops using Emacs, as well as his blog posts in general Python cookbook :…

Using ESS for Data Science

RStudio is a formidable IDE to work with and offers an environment to seamlessly work with multiple languages beyond R. It is especially convenient for tasks involving frequent visualisation of data frames and plots, and for use with Shiny app development. However, the text (i.e code) editing capabalities are still significantly lacking compared to the…

"If you look over all these Makefiles you’ll see that there are probably only five or six elements which are repeated over and over. It doesn’t take many lines in a Makefile to get powerful results, yet I run the command make literally dozens of times per day in widely varying projects. GNU Make is…

MathJax basic tutorial and quick reference (Mathematics Meta Stack Exchange)

(Deutsch: MathJax: LaTeX Basic Tutorial und Referenz)
To see how any formula was written in any question or answer, including this one, right-click on the expression it and choose “Show Math As > …

This is an excellent quick reference for MathJax and Latex. It is easy to look up and the subsequent posts also contain useful references.

Init: custom docker container for Data Science

I have enabled the TCP on port 8787 via UFW, and am running a docker container of the image ‘rocker/tidyverse’, which is providing an Rstudio IDE online to play with. This was relatively straightforward, starting with installing docker on my debian machine, and then pulling in the rocker/tidyverse container, as well as the rocker/shiny-verse container.…

Installing UMAP took up nearly 1-1.5 GB of swap

Atleast ~500MB of RAM and around 1-1.5GB of swap was used while installing the UMAP package into rocker/tidyverse. It also took ~10 minutes atleast, and started with installing the reticulate, Rspectra and Rcppeigen (?) packages first, after which UMAP was installed. It would certainly save time and head-banging to have umap already installed into a…

ggplot2 –> plotly, transferring subtitles and captions

Lets presume that a ggplot object(g) is available, and the idea is to convert this into a plotly object (p), which offers enhanced interactivity of plots. However, the subtitle and captions defined in the said ggplot object do not get translated into plotly. This is a feature enhancement that has been raised in November 2016…

Rapidly accessing cheatsheets to learn data science with Emacs

Matt Dancho’s course DSB-101-R is an awesome course to step into ROI driven business analytics fueled by Data Science. In this course, among many other things – he teaches methods to understand and use cheatsheets to gain rapid level-ups, especially to find information connecting various packages and functions and workflows. I have been hooked to…

Jupyter notebooks to Org source + Tower of Babel

This post provides a simple example demonstrating how a shell script can be called with appropriate variables from any Org file in Emacs. The script essentially converts a Jupyter notebook to Org source, and Babel is leveraged to call the script with appropriate variables from any Org file. This reddit thread and blog post elucidate…