Python notes and snippets

The following resources were utilised to develop the snippets and notes below. Other links are also available inline with the text.

General notes on python

  • high level language (i.e compact syntax, and easy to learn)
  • Technically – python code is compile to bytecode, but not to machine code.
  • memory management and other aspects are handled internally. Low level languages like C do not.
  • Original development of python is cpython (written in C), and first released in 1991.
  • Dynamically types language (variables can change types after declaration)
  • considered an interpreted language as the code is compiled at runtime with cpython.

Syntax, type and operator notes

  • \\ : floor division
  • ** : exponentiation
  • % : modulus operator
  • unary and binary operators
    • Unary : -5 or +10
    • binary : 5 -10
    • the exponential operator has higher precedence than the unary operator.
        print(-5 ** 2)
        import sys
  • The `not` operator takes precedence over `and` which takes precedence over `or`.
  • Augmented assignment
    • `+=`, `-=`, `*=`, `/=`, `//=`, `%=`, `**=`
    • the variable to which this is being assigned must already be created.
  • id : every object created is stored in a specific location in memmory. This can be found using id. However, it is important to note that ‘a’ in the example below is simply a name that refers to the actual object. ‘a’ by itself is not an object.
a = 6
  • Common Built-in object types are:
    • int,
    • bool,
    • float,
    • complex (by appending ‘j’).
    • list
    • dict
    • tuple
    • set
  • strings:
    • sequence of characters
    • Character: smallest possible component of text that can be printed with single keyboard press.
    • a single character is a string of length 1
  • Encoding – UTF8/ASCII
    • ASCII: represents 128 unique characters using 7 bits. 7 bits can encode 2^7 = 128 characters.
    • bit : smallest unit of information for a computer.
    • unicode: represents each character with 4 bytes. There are 8 bits per byte. This means each unicode encoding can represent 2^32 (4 billion unique characters)
    • internally, each character is represented as an integer in python.
    • More details can be found at link.

    … a Unicode string is a sequence of code points, which are numbers from 0 through 0x10FFFF (1,114,111 decimal). This sequence of code points needs to be represented in memory as a set of code units, and code units are then mapped to 8-bit bytes. The rules for translating a Unicode string into a sequence of bytes are called a character encoding, or just an encoding.

Using conda

Activating and deactivating environments

Creating a new environment for data science, with nothing installed.

conda create -n min_ds

Activating said channel:

conda activate min_ds

Deactivating currently active environment:

conda deactivate


A channel is a repository of python packages. By default, the instillation is from the default channel.

It is possible to create your own channel on Anaconda cloud containing your custom list of packages. Such custom channels are listed under the conda-forge channel, and all the packages are under the same channel.

Showing all the channels:

conda config –show channels

Adding the conda-forge channel

conda config –env –add channels conda-forge

Removing a channel

conda config –env –remove channels conda-forge

This will create a .condarc file, which is likely to be located at ~/. This file can also be directly edited.

Adding a channel will make it the first channel that conda looks into install packages.

Default behavior: flexible. If the same package exists in both default and conda-forge, then the latest version of the package (in any channel) will be installed. To remedy this, the channel priority can be set:

Listing the current priority setting of channels:

conda config –show channel_priority

Note that the default setting is flexible. This can be changed to strict. Newer versions of conda may not be the same.

conda config –env –set channel_priority_strict

Any channels provided to the c option will take precedence over the channels in the .condarc file. However, if the package is not found in the specified channel, conda will look in other channels as well.

To force conda to look only in a specific channel, use the -c as well as --overide-channels argument.

conda install  -c conda-forge --override-channels numpy

Verifying that the specified environment is being used

This can be done in jupyter notebooks, or for example in org mode.

import sys

Renaming a conda environment

The environment cannot be renamed. Therefore it needs to be cloned and then named as desired.


conda create --name new_name --clone old_name
conda remove --name old_name --all

Miniconda + basic packages

  • Note taken on [2019-02-11 Mon 10:14]
    Install miniconda to a user directory rather than /opt

Using Miniconda is more efficient than installing the complete version of Anaconda which takes a huge amount of space and is oriented towards graphical use of the included tools like the Anaconda navigator.

Miniconda is available in the AUR and can be installed using yaourt, after finding the appropriate version available. The default installation is in the /opt directory. Therefore any package installations will require sudo rights for writing.

It is better to install miniconda from the default installation script to a user folder

For miniconda v3:

cd ~/temp
wget ""

To ensure compatibility with Emacs and ob-ipython and ob-ipython-upstream in the case of scimax – make sure of installing the jupyter and ipython packages.

conda install pandas scikit-learn seaborn notebook matplotlib

It seems by default conda saves the tar files as well as the decompressed files of the libraries which are installed. To save some space: these can be deleted safely using

conda clean -t

Removing miniconda:

It appears that the only way to remove miniconda is by deleting the miniconda installation folder. This means all the packages installed via conda will be deleted. This makes it all the more important to use virtual environments for test purposes.

Reasons for not using Python

  • No single programming language is ‘always’ the right choice.
  • Example: “..unlikely you’re going to write a real-time operating system kernel in Python.”
  • Example: unlikely that Python will be used to implement the next generation rendering engine.

Perenthesis to chain methods

Enclosing the entire command within parenthesis is necessary in python, because whitespace matters (in python).

# using parethenses to put methods on different lines
  .replace('t', 'a')


Frequency of occurence .value_counts()

groupby : same as group_by in R

agg() : similar to summarize in R

ins.groupby(‘sex’).agg({‘charges’: [‘mean’, ‘max’, ‘count’]}).round(0)

pivot_table() : comparison across groups made easier

pt = ins.pivot_table(index=’sex’, columns=’region’,
values=’charges’, aggfunc=’mean’).round(0)

Test example

ins = read_csv("~/my_projects/

DataFrame and Series

DataFrame: 2D, rectangular data structure Series: A single dimension of data. Similar to a single column of data, or a 1D array.

Notes on the pep 8 style guide

  • pep 8 :- python enhancement proposal 8.
  • the guide is not gospel. There may be situations where not following the style is more important. Therefore one should know when to be inconsistent.
  • Consistency within a module is most important. This is follow by the project and then the documentation.
  • The above is especially applicable when dealing with older code.
  • 4 space per indentation
  • [ ] Find out what are hanging indents
  • [ ] Note the methods of closing brackets for multi-line components.
  • Limit all lines to a max of 79 characters
  • Code in the core Python distribution should always use UTF-8 (or ASCII in Python 2).
  • Files using ASCII (in Python 2) or UTF-8 (in Python 3) should not have an encoding declaration.
  • imports should be on separate lines.
  • However this is okay : from subprocess import Popen, PIPE
  • Imports are always put on top of a file
  • Imports should be grouped in the following order:
    • Standard library imports.
    • Related third party imports.
    • Local application/library specific imports.
    • You should put a blank line between each group of imports.
  • Module level “dunders” (i.e. names with two leading and two trailing underscores) such as all, author, version, etc. should be placed after the module docstring but before any import statements except from future imports. Python mandates that future-imports must appear in the module before any other code except docstrings.
  • use inline comments sparingly. Inline comments are comments that are on the same line as the statement
  • Functions and classes should be separated by 2 blank lines
  • continuations of long expressions onto additional lines should be indented by 4 extra spaces from their normal indentation level.

Naming conventions

Package and Module Names Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.

When an extension module written in C or C++ has an accompanying Python module that provides a higher level (e.g. more object oriented) interface, the C/C++ module has a leading underscore (e.g. _socket).

Class Names Class names should normally use the CapWords convention.

The naming convention for functions may be used instead in cases where the interface is documented and used primarily as a callable.

Note that there is a separate convention for builtin names: most builtin names are single words (or two words run together), with the CapWords convention used only for exception names and builtin constants.

Function and Variable Names Function names should be lowercase, with words separated by underscores as necessary to improve readability.

Variable names follow the same convention as function names.

Method Names and Instance Variables Use the function naming rules: lowercase with words separated by underscores as necessary to improve readability.

Use one leading underscore only for non-public methods and instance variables.

Constants Constants are usually defined on a module level and written in all capital letters with underscores separating words. Examples include MAX_OVERFLOW and TOTAL.

Check out pep 257 for docstring conventions <2018-06-20 Wed>

  • Note that most importantly, the “”” that ends a multiline docstring should be on a line by itself, e.g.:

“””Return a foobang

Optional plotz says to frobnicate the bizbaz first. “”” For one liner docstrings, please keep the closing “”” on the same line.

Using try and except

It is possible to try out a set of instructions and then store the exceptions that crop up. The exceptions that are stored in python also have specific names. It is possible to use an exception to check for that particular name and store it in a variable.

schedule_file = open(‘schedule.txt’, ‘r’)

except FileNotFoundError as err:
print (err)

[Errno 2] No such file or directory: 'schedule.txt'

Why encapsulate in a main function

The concept of ‘main’ – best practice.

  • once the function is defined, it is called using a set of variables which the function needs to perform and get the answers.
  • Therefore, these initial set of values or statements are called ‘main’
  • therefore, often these initial set is included in a function called main()
  • Then only main() is called and it is not necessary to call all the variables.
  • This is a best practise in python.

Notes on Functions

A list of functions / dictionaries

Source: Dan Bader @ Real Python

A function can be evaluated and returned to a particular variable in a list. This is essentially creating a list of functions that can be easily called or updated as required.

def addition(a,b):
return a+b

def subtraction(a,b):
return a-b

def multiplication(a,b):
return a*b

func_list = [addition, subtraction, multiplication] print(func_list[0](3,2))


Jupyter notebooks to org

jupyter nbconvert –to markdown $path_ipynb.ipynb –output $cwd/$
pandoc $cwd/$ -o $cwd/$
cp $path_ipynb.ipynb $cwd

Testing shell script from different org block:

This application is used to convert notebook files (*.ipynb) to various other



Arguments that take values are actually convenience aliases to full
Configurables, whose aliases are listed on the help line. For more information
on full configurables, see '--help-all'.

    set log level to logging.DEBUG (maximize logging output)
    generate default config file
    Answer yes to any questions instead of prompting.
    Execute the notebook prior to export.
    Continue notebook execution even if one of the cells throws an error and include the error message in the cell output (the default behaviour is to abort conversion). This flag is only relevant if '--execute' was specified, too.
    read a single notebook file from stdin. Write the resulting notebook with default basename 'notebook.*'
    Write notebook output to stdout instead of files.
    Run nbconvert in place, overwriting the existing notebook (only 
    relevant when converting to notebook format)
    Clear output of current file and save in place, 
    overwriting the existing notebook.
    Exclude input and output prompts from converted document.
--log-level=<Enum> (Application.log_level)
    Default: 30
    Choices: (0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL')
    Set the log level by value or name.
--config=<Unicode> (JupyterApp.config_file)
    Default: ''
    Full path of a config file.
--to=<Unicode> (NbConvertApp.export_format)
    Default: 'html'
    The export format to be used, either one of the built-in formats, or a
    dotted object name that represents the import path for an `Exporter` class
--template=<Unicode> (TemplateExporter.template_file)
    Default: ''
    Name of the template file to use
--writer=<DottedObjectName> (NbConvertApp.writer_class)
    Default: 'FilesWriter'
    Writer class used to write the  results of the conversion
--post=<DottedOrNone> (NbConvertApp.postprocessor_class)
    Default: ''
    PostProcessor class used to write the results of the conversion
--output=<Unicode> (NbConvertApp.output_base)
    Default: ''
    overwrite base name use for output files. can only be used when converting
    one notebook at a time.
--output-dir=<Unicode> (FilesWriter.build_directory)
    Default: ''
    Directory to write output(s) to. Defaults to output to the directory of each
    notebook. To recover previous default behaviour (outputting to the current
    working directory) use . as the flag value.
--reveal-prefix=<Unicode> (SlidesExporter.reveal_url_prefix)
    Default: ''
    The URL prefix for reveal.js. This can be a a relative URL for a local copy
    of reveal.js, or point to a CDN.
    For speaker notes to work, a local reveal.js prefix must be used.
--nbformat=<Enum> (NotebookExporter.nbformat_version)
    Default: 4
    Choices: [1, 2, 3, 4]
    The nbformat version to write. Use this to downgrade notebooks.

To see all available configurables, use `--help-all`


    The simplest way to use nbconvert is
    > jupyter nbconvert mynotebook.ipynb
    which will convert mynotebook.ipynb to the default format (probably HTML).
    You can specify the export format with `--to`.
    Options include ['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'rst', 'script', 'slides']
    > jupyter nbconvert --to latex mynotebook.ipynb
    Both HTML and LaTeX support multiple output templates. LaTeX includes
    'base', 'article' and 'report'.  HTML includes 'basic' and 'full'. You
    can specify the flavor of the format used.
    > jupyter nbconvert --to html --template basic mynotebook.ipynb
    You can also pipe the output to stdout, rather than a file
    > jupyter nbconvert mynotebook.ipynb --stdout
    PDF is generated via latex
    > jupyter nbconvert mynotebook.ipynb --to pdf
    You can get (and serve) a Reveal.js-powered slideshow
    > jupyter nbconvert myslides.ipynb --to slides --post serve
    Multiple notebooks can be given at the command line in a couple of 
    different ways:
    > jupyter nbconvert notebook*.ipynb
    > jupyter nbconvert notebook1.ipynb notebook2.ipynb
    or you can specify the notebooks list in a config file, containing::
        c.NbConvertApp.notebooks = ["my_notebook.ipynb"]
    > jupyter nbconvert --config
Azure new
machine learning - general

Expand shortened pathnames expanduser

from os.path import expanduser

Changing filenames in a directory from %Y%m%d to %Y-%m-%d

import os
from os.path import expanduser, join

base_path = expanduser('~/my_org/journal')
file_names = os.listdir(base_path)


for item in file_names:
    alt_name = item[0:4]+'-'+item[4:6]+'-'+item[6:9]
    old_path = join (base_path, item)
    new_path = join(base_path, alt_name)
    os.rename(old_path, new_path)

Simple HTTP server

Reference: Real Python newsletter

It is possible to start a python server based on any location. This can be used to view websites, and even as a file browser. Starting the server is simplicity itself.

# Navigate the directory desired
python3 -m http.server

Use case for python server

  • [ ] Could a custom HTML page be designed to quickly access particular items? However Alfred or spotlight could be quicker.
  • [ ] What can such a server do?
    • Since Hugo has inbuilt commands to deploy websites, and any generated html file can viewed on the browser – the need for a python server is not yet clear.


List comprehension

Source: bonus content videos of Real Python

  • defining a list directly using a for loop.
  • shorthand for a regular for loop
  • shorthand for a regular loop and adding filtering

# Example of directly defining a list using a shorthand for loop
import pandas as pd

# Basically the above is the same as:

squares = [] for x in range(10):
squares.append(x * x)


# modifying the original code

squares2 = print(squares2)

ModuleNotFoundErrorTraceback (most recent call last) <ipython-input-3-eac809631b48> in <module> 1 # Example of directly defining a list using a shorthand for loop -—> 2 import pandas as pd 3 print() 4 5 # Basically the above is the same as:

ModuleNotFoundError: No module named ‘pandas’

Using the assert method

  • Source: python tricks book
  • assert is used to check the existence of condition. The program will proceed if the condition is true. If not, an Assertion Error will be thrown up. This message can be customised.
  • assertions are internal self-checks and note meant as a communication to the user.
  • meant to inform the developer about unrecoverable errors in the program. This differentiates it from usual if-else conditionals
  • Aids in debugging.
  • Using an additional argument it is possible to provide a custom message.
  • Don’t use asserts for data validation
    • assertions can be disabled globally using the -o and -oo switches (and other techniques)
    • When disabled – none of the conditional expressions will be evaluated.
    • Therefore, never use an assert to check for admin privileges or such conditions. Remember the example of deleting a product from a product catalog.
  • Asserts that never fail

Simple example

# Perform calculation only if a >= 100 and b < 200

a = 20
b = 200

def test_func(a,b):
assert (a >= 100) & (b <= 200)
return (a + b)


# Building on the same example as in the book

def apply_discount(product, discount, threshold):
price = int(product[‘price’] * (1.0 * discount))
assert threshold <= price <= product[‘price’] return price

# Defining the product dictionary. One of the keys in the dictionary have to be price

shoes = {‘name’: ‘adidas’, ‘price’: 14900}

# Applying a 10% discount on shoes, and defining the threshold to be 1500
apply_discount(shoes, 10/100, threshold = 1500)

AssertionErrorTraceback (most recent call last)
<ipython-input-17-a8990c61cc1b> in <module>()
     11 shoes = {'name': 'adidas', 'price': 14900}
---> 13 apply_discount(shoes, 10/100, threshold = 1500)

<ipython-input-17-a8990c61cc1b> in apply_discount(product, discount, threshold)
      4 def apply_discount(product, discount, threshold):
      5     price = int(product['price'] * (1.0 * discount))
----> 6     assert threshold <= price <= product['price']
      7     return price



General notes

Some methods associated with manipulating strings:

  • upper()
  • count()
cap_me = "hello this is a normal string"

Notes on String formatting

  • Strings can be enclosed within a single or a double quote. Better to be consistent through the program.
  • Strings can be concatenated, and spaces can be included. Hash is used to include comments. Print command can be used to concatenate.
 is a comment that is is not interpreted.
first= 'monty'
total= first + " " + second
print (total)
print (first,"", second)
 how an extra space is added in the 2nd line. This basically means that the space is added automatically.
print (first,second,'. This is corrected.')
monty python
monty  python
monty python . This is corrected.
  • Double quote and single quote combination can be used for words that have an apostrophe


print("This is Mac's notebook")
 i had used a single quote to enclose the string, then the string would have terminated at Mac's. Knowing this is useful with respect to string manipulation.
This is Mac's notebook
  • Convert a number into a string, use the function str. Remember to concatenate numbers into strings before concatenating
number = 1
string = str(number)
print (type (string))
print (type (number))
<class 'str'>
<class 'int'>
  • Test program to explore formatting strings
movie1 = "Clear and present danger"
movie2 = "tom dick and harry"
print ("My favorite movies \n \t", movie1, "\n \t", movie2)
My favorite movies
       Clear and present danger
       tom dick and harry

Notes on String manipulation

  • Strings are actually a list of characters, to be treated as an array or a matrix. Therefore, a string[0], gives me the first character
string1= 'Ragavan'
print(string1[3:])  is to print a range of the characters
  • Len() – for length of the string. Space is included as a count. This can be used to figure out the middle of the string.

string1= ‘Shreyas Ragavan’
print (type(len(string1)))
that the space is included as a count

  • String Slice formula: variable[start:end+1]
string1= 'Shreyas Ragavan'
s2= string1[2:]
print (s2)
print (s3)
print (s4)
reyas Ragavan
  • Integer division can be used to round up divisions. Python 2 – single division and integer division are the same thing.
s1_5= len(s1)//2
s2_5= len(s2)//2
print (s1[s1_5:], s2[s2_5:])
<class 'int'>
avan nthi

Test program

word = 'Python'
result=rest + "-" + first +"y"
print (result)

Find your python version

  • Note taken on [2018-07-16 Mon 11:17]
    The following is already added to .bash_profile when Anaconda is installed (Mac OS)
  • This is added by Anaconda 2.2.0 installer

    export PATH="/Users/shreyas/Applications/anaconda/bin:$PATH"

  • This is added by Anaconda3 5.1.0 installer

    export PATH="/Users/shreyas/anaconda_install/anaconda3/bin:$PATH"

  • To reload the profile:

    source ~/.bash_profile

It is important to verify that the intended python interpreter version is being used.

Using the command line (shell):

python –version

Using the sys module in a python program.

import sys

sys.version_info(major=3, minor=6, micro=5, releaselevel='final', serial=0)
3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]

Finding the versions of scipy, numpy, pandas, scikit-learn, matplotlib

import scipy
import numpy
import matplotlib
import pandas
import sklearn

print(‘scipy: %s’ %scipy.__version__)
print(‘Numpy: %s’ %numpy.__version__)
print(‘Matplotlib: %s’ %matplotlib.__version__)
print(‘Pandas: %s’ %pandas.__version__)
print(‘scikit-learn: %s’ %sklearn.__version__)

scipy: 1.2.0 Numpy: 1.15.4 Matplotlib: 3.0.2 Pandas: 0.24.1 scikit-learn: 0.20.2

Dataframe peek function

This is a function to peek into the head, tail and structure of a pandas dataframe and print the output in a structured format.

import pandas as pd

#function to quickly take a peek into the head, tail and structure of a dataframe
def df_peek(dataframe):
df_head = dataframe.head()
df_tail = dataframe.tail()
df_description = dataframe.describe()
df_any_null = pd.isnull(dataframe).any()
df_shape = dataframe.shape
df_info =
print("Data contains {} rows and {} columns.\n".format(df_shape[0],df_shape[1]))
print("Data info : \n {}\n".format(df_info))
print("Head : Dataframe \n{}\n".format(df_head))
print("Tail : Dataframe \n{}\n".format(df_tail))
print("Description of the numerical quantities \n{}\n".format(df_description))
print("Describing columns with Null values \n {} \n".format(df_any_null))


Leave a Reply

Your email address will not be published. Required fields are marked *