Data-Science

Docker driven datascience environment and workflow

Developed a workflow built on Docker (and other tools) to create a reproducible, standard, consistent environment to run a variety of datascience projects and cater to development and production modes. The images enable deploying dashboards like shiny or streamlit.io quickly and with ease and addressing problems with a standadrd toolset.

Federal R&D Spending on climate change

An EDA using `R` of federal government data of the R&D budget towards Climate Change.

Using ESS for Data Science

RStudio is a formidable IDE to work with and offers an environment to seamlessly work with multiple languages beyond R. It is especially convenient for tasks involving frequent visualisation of data frames and plots, and for use with Shiny app development. However, the text (i.e code) editing capabalities are still significantly lacking compared to the likes of Emacs and Vim. Besides this, it does not offer a seamless interface integrating task, time management and multi-language programming environments to the extent available within Org-mode via Emacs.

Docker container and image management within Emacs

As my Emacs configuration will indicate, I have installed the package docker.el with it the dockerfile and docker-compose minor mode packages. The main docker package enables me to list, view, launch and generally manage containers from within Emacs instead of using vebose Shell commands and possibly constructing aliases for common commands. The latter 2 packages are more useful for developing and editing docker files (including within Org source blocks) with syntax highlighting.

Setting up Continuous Integration (CI) for docker containers

This blog post takes you through the process of setting up Continuous Integration for building docker images via Dockerhub and Github, and via Github Actions. It also contains a condensed summary of important notes from the documentation. Goal: Gain an overview of CI and actually use it to get automated builds of the docker images that built for my datascience toolbox. Essentially I want to be able to a status check the docker containers that I am maintaining.

Notes - What they forgot to teach you about R

The book, ‘What they forgot to teach you about R’ being co-authored by <https://twitter.com/JennyBryan @JennyBryan> is not yet completed, however I was still compelled to go through the existing material as it was an engaging read. These are some notes captured from the book. Verbatim quotes from the book are encapsulated. My notes and observations are added in plain text. I recommend you cultivate a workflow in which you treat R processes (a.

Some notes on research-compendium

These are my notes while studying the research-compendium concept, which is essentially a bunch of guidelines to produce research that is ‘easily’ reproducible. The notes are mostly based on marwick-2018-packag-r , which is one canonical reading on the concept. Other references are mentioned throughout the text, and also collected separately. These notes were prepared a few weeks ago during a foray into Docker. They are neither complete not comprehensive - but will serve as a good refresher of the principle concepts.

Notes on Docker

Docker is a fascinating concept that could be potentially useful in many ways, especially in Data science, and making reproducible workflows / environments. There are several articles which have great introductions and examples of using docker in data science This is an evolving summary of my exploration with Docker. It should prove to be a handy refresher of commands and concepts. TODO What is Docker A brief summary of what Docker is all about.

R notes and snippets

Lubridate - introductory technical paper This paper (Grolemund and Wickham) offers a good introduction and comparison between using lubridate and not using it, as well as several examples of using the library. It also offers some case studies which can serve as useful drill exercises. Importing multiple excel sheets from multiple excel files This is one approach to importing multiple sheets from multiple excel files into a list of tibbles. The goal is that each sheet is imported as a separate tibble.

MongoDB and NoSQL Databases

Introduction These are my notes on NoSQL databases and the prime differences between them and SQL databases. The notes are mostly based off the Udemy course Introduction to MongoDB, and therefore primarily focused on using MongoDB at the moment. Methodology and Tools Installing Mongodb The instructions are available in the mongoDB manual. This is for the Community edition, and on a Mac as welll as Linux machine (Antergos) Mac If never installed before, tap the resource first.

Rapidly accessing cheatsheets to learn data science with Emacs

Matt Dancho’s course DSB-101-R is an awesome course to step into ROI driven business analytics fueled by Data Science. In this course, among many other things - he teaches methods to understand and use cheatsheets to gain rapid level-ups, especially to find information connecting various packages and functions and workflows. I have been hooked to this approach and needed a way to quickly refer to the different cheatsheets as needed.

Nteract : An interactive computing environment

A slide deck from Netflix, mentions using Nteract as their programming notebook, and prompted a mini exploration. This blog post by Safia Abdalla, (a maintainer/ developer of Nteract) introduces Nteract as an open source, desktop-based, interactive computing application that was designed to overcome a bunch of limitations in Jupyter Notebook’s design philosophy. One key difference (among many others) is the ability to execute code in a variety of languages within a single notebook, and it also appears that that the electron based desktop app should make it easier for beginners to start coding.