# V – Exploration

Table of Contents

## Updates

• Note taken on [2019-08-25 Sun 21:36]
• Started exploring the SHA-512 cipher and recording notes.
• Switched to reading the Serious Cryptography book and digging into the fundamentals of encryption. This has made a large difference even though I’ve barely covered the first chapter. Added a section for notes.
• Note taken on [2019-08-22 Thu]
• Specified python 2.7 in the init section per Diana’s comment. I need to explore how virtual environments work with different versions of python and set this up.
• Made progress with the vdiff portion, and added notes expanding on CH’s algorithm outline. This is my first exposure to awk.
• Updated the section on using diff and patch. The example using diff is complete. Patch should probably be an additional, but is pending.

## init

This post will contain my notes, exploration and final implementation of V. The current plan is to implement V in Python (2.7). There is a Q&A section that I will use to accumulate questions as they pop up, and the answers. Incomplete / pending areas will be marked TODO or with – [ ].

Note: Though I have always wanted to learn Lisp beyond my Emacs configuration, I will start with python.

## Notes

Source code ‘patches’ are submitted by all kinds of people that you don’t (or can’t) know, and therefore can’t trust. Using a WoT – it is possible to build a network of people you do know, and whose work (code/intentions) you trust. To ensure that the patch submitted was indeed done by Mr. X, he will need to sign the patch file with his GPG signature. When that signature is provided as a standalone file – it is called a detached signature. To verify that the signature indeed belongs to Mr X, the public key of said Mr. X has to be used. Deedbot (an IRC bot) enables the registration of public keys in a public key server, linking these keys to IRC nicknames. Therefore the public keys are retrieved from this server, hosting the keys for a particular WoT and they are used for the purpose of verification of signatures.

The principal components are the source code patches, signatures of said patches, and the public keys for these signatures.

V extracts the principal components above (each stored in it’s own directory), and determines the ‘correct’ order to apply the patches, finally producing a complete source tree by ‘pressing’ all the filtered patches together.

The order of patches is important because I may submit a patch today improving the work you did last week. My patch was built on your patch and therefore it would not work if I applied my patch before your patch.

### vpatches and vdiff [0/4]

Grokking vdiff, which creates vpatches.

• [ ] Presume that a vpatch is different from a patch – due to the interim ‘filters’ applied (patch signatures from WoT + verification with public key)
• [ ] Fuzzy-merge concept to be understood (Refer log – asciilifeform’s comment)
diff -uNr $1$2 | awk 'm = /^(---|\+\+\+)/{s="sha512sum \"" $2 "\" 2>/dev/null " | getline x; if (s) { split(x, a, " "); o = a[1]; } else {o = "false";} print$0 " " o} !m { print $0 }'  • Diff Flags (from man diff for ready reference) • -u -U NUM –unified[=NUM] : Output NUM (default 3) lines of unified context. • -r –recursive : Recursively compare any subdirectories found. • -N –new-file : Treat absent files as empty. •$1 and \$2 refer to 2 files that are being diffed.
• [ ] If a folder of files is the input, then are the files compared 2 at a time? This may be connected to toposort, or the subsequent portion of the algo.
• [ ] Awk portion
• ^(---|\+\+\+) : the portion between / / is the pattern. The subsequent portion between { } is the command to execute on matching patterns. Here the pattern is any line beginning with a ‘-‘ or a ‘+’.
• [ ] then why three + / – in the pattern? The first line of a diff output has 3 + and —. However the subsequent unified output has the individual lines with a single + or -.
• [ ] why ‘\+’ and not just ‘+’ ?
• [X] is this related to the default ‘3 lines of unified context’? – No. I have tried specifying 4 lines and the output still looks the same.
• the pattern is being assigned to the variable ‘m’.
• [ ] getline is to read a record and in this case store it into the variable x. (Gawk docs, but should be applicable to awk.)
• [ ] sha512sum (Refer initial notes below)

### Encryption

• Encryption uses an algorithm called a cipher (Eg SHA-512, SHA-384 and so on) and a secret value called the key.
• The encrypted message is called ciphertext.
• Classical ciphers predate computers. Eg Caesar cipher (shift forward by 3 letters, wrapping around letter A. ie. ZOO becomes CRR).
• Most classical ciphers used substitution. The substitution has to follow a permutation. Secure permutation needs to satisfy:
• Permutation to be determined by a key
• Eg Vigenere cipher – 16th century : Key: DUH (3, 20, 7 : distance from letter A, which is applied to each letter in sequence) –> CRYPTO becomes FLFSNV
• Different keys should result in different permutations.
• If different keys result in the same permutation, then it becomes easier to ‘attack’ or decrypt without the key.
• Permutation should look random : it should not exhibit any apparent pattern.
• [ ] Important to be aware of frequency analysis being used to attack a cipher,
• Example:
• Symmetric encryption : the same key is used to encrypt and decrypt
• Asymmetric or public-key encryption : where a private and public key pair is used and different keys are used to encrypt and decrypt.

#### References

• Serious Cryptography (O’reilly) – Good Reads link

### Hashing and SHA-512

• Note taken on [2019-08-25 Sun 20:37]
The explanations from the medium article reference, and other results from a brief google search was insufficient / not lucid. This prompted me to switch to the Serious cryptography book and start studying encryption from the fundamentals in a structured manner from reliable sources. The book was chosen only after a brief look at reviews on Goodreads and no consultation with Diana, but so far – I’ve learnt more from reading it than any resource. There are other books available in the o’reilly library and I will discuss this choice with Diana.
• sha512 is a hashing algorithm that belongs to the Secure Hash Standard (SHS)
• this standard covers several hashing algorithms.
• Hashing functions take in specified data as input and produce an output called a hash digest.
• The hash digest has to satisfy some conditions [0/2]
• [ ] Uniform distribution
• I’m not clear what exactly this means.
• Fixed length
• the length depends on the algorithm being used. SHA-512 produces digests that are 512 bits long. SHA-384 produces 384 bits.
• irrespective of the size of the input, the output’s length will be the same.
• This clarifies the part of the explanation from the channel (Q&A section) on signatures – about the entire message not being available in the hash.
• [ ] Collision resistance
• No two inputs should produce the same hash. This is extremely interesting to consider as an idea and needs further digging for clarity.

## Q&A

### The signature [0/3]

The following is a raw paste of a conversation in IRC, and permission was taken to use the excerpt as a part of my notes. The conversation explains how signatures work.

• [ ] Sort through the raw text,
• [ ] summarise
• [ ] retain relevant portions

Salient points

1. Signatures are generated using the private key and verified using the public key. This is different from sending someone a message encrypted with their public key (which can be decrypted only with the corresponding private key).
<shrysr> what exactly does 'signing' a file do?
<mancha> ok, conceptually, it's like signing a document with a pen. you authenticate it's from you. like signing a check or an application for something
<shrysr> okay - so somebody will use my public key to verify it was signed by me?  That part doesnt connect for me... how is the sign verified by a public key. Does that mean anybody can sign a document using my public key?
<mancha> technically, what gnupg does is: 1) generates a digest (or hash) of the message, generated a digital signature of this hash using the sender's *private* key and sends this in some way to the recipient alongside the message
<mancha> the receiver can then use the sender's *public* key to verify the digital signature
<mancha> signatures can only be made with private keys. they can be verified by the public key. so anyone can verify and only you can sign.
<shrysr> ah so the signature is made with private keys.
<mancha> yes
<shrysr> how does then a detached signature work. As in, you release  a patch, say and you also attach a detached signature file.
<mancha> attach a detached? heh.
<mancha> ok detached is just a mode of transport
<mancha> let's put this in algo form
<mancha> i make a message "m" and then i hash it hash(m). i then sign it sign(hash(m))
<mancha> i send you the message m and separately i send you sign(hash(m))
<mancha> the receiver can then use the sender's *public* key to verify the digital signature
<mancha> signatures can only be made with private keys. they can be verified by the public key. so anyone can verify and only you can sign.
<shrysr> ah so the signature is made with private keys.
<mancha> yes
<shrysr> how does then a detached signature work. As in, you release  a patch, say and you also attach a detached signature file.
<mancha> attach a detached? heh.
<mancha> ok detached is just a mode of transport
<mancha> let's put this in algo form
<mancha> i make a message "m" and then i hash it hash(m). i then sign it sign(hash(m))
<mancha> i send you the message m and separately i send you sign(hash(m))
<mancha> shrysr: i am explaining what is happening in steps
<shrysr> yes. I see it now... the sign includes hash(m) - and so the relationship is confirmed.
<Riastradh> shrysr: Imagine you have a signet ring that you can stamp wax seals with.  Everyone knows what your wax seals look like, but only you have the signet ring to make them.  Digital signatures are even better: you stamp the text of the document into the wax seal too, so nobody can transplant a wax seal from one document to another.
<mancha> shrysr: so whether the signature is attached to the message or not, is not relevant
<mancha> i.e. it can be message and message.sig
<mancha> that's just packaging
<shrysr> right
<mancha> shrysr: the point is not hard to understand really. we're hashing a message an mapping it to something in signature space, mapping the hash.
<shrysr> yea i just had a glimpse of it. Thats very interesting to think how such a map could be generated.

<mancha> now, on the receiver end, the receiver gets the message, also hashes it, then uses a verifier function on the hash and the signature)
<mancha> if the verification yields: non-forgery, you've authenticated it
<mancha> shrysr: these are so-called signatures with appendix
<mancha> so you need the pair (m,s) message and signature
<mancha> and you can verify
<mancha> now, m & s can come bundled or can come separately. that's up to you.
<mancha> that's just a transport issue.

<shrysr> but s does contain m itself anyway right ? and hmmm > the "receiver also hashes it.." Why does hashing happen here? shouldnt it be unhashing or verifying?

<Riastradh> No, s doesn't actually contain m; it just has enough information, combined with the public key, to verify that m was the original message.

<mancha> shrysr: say the message is m. so i sign it by generating s=S(h(m))
<mancha> i send you m and s
<mancha> you then compute h(m) and do V(h(m),s) and see if it responds: non-forgery
<mancha> maybe we should subscript these things. A is sender and B is recipient. so A sends m and s=S_{A,k}(h(m))
<mancha> and B verifies with V_A(h(m),s)
<mancha> V_A means using A's public key and S_{A,k} means signing with A's private keyt
<shrysr> ohhhhhhhhh i think i see it
<shrysr> when B computes h(m) : B uses ...some kindda key ?
<mancha> the public key comes into to play but not when hashing, at least not in my example
<mancha> you can look at DSA if you want to see one concrete example of this algorithmically
<mancha> wikipedia does an ok job of explaining the algo: https://en.wikipedia.org/wiki/Digital_Signature_Algorithm


Separate topic interleaved in the discussion

<Riastradh> mancha: (hashing is not a separate step; it is an essential part of the sign' operation)

<Riastradh> mancha: (it is orthogonal, yes; I'm saying the sign(hash(m))' decomposition is inaccurate: either the hash step you wrote is unnecessary, or sign' is not a secure signature scheme)

<Riastradh> mancha: If sign' has the standard EUF-CMA security expected of anything we call a signature scheme', then there is no need to hash a message first.  (Certainly _inside_ sign' there's going to be some kind of hashing, but it's not necessarily with a fixed hash function -- it may depend on the secret signing key.)


### WTF is a patch and how is it different from a commit? [0/0]

• Refer StackX discussion and howtogeek article
• Patch: the result of a diff operation. This can be interpreted as the list of differences between 2 commits. More than 2 commits / files can also be compared
• Commit: ~ changeset ~ . ‘Performing a commit’ creates a changeset, which is also called a commit.
• Side note: [2019-08-15 Thu] I’ve actually used diff several times in different softwares, including Emacs, and read man diff for the first time today.
• diff / patch
• a diff operation is solving the ‘longest common subsequence’ problem. i.e finding the longest sequence of common terms.
• The output format of diff can be customised in many ways.
• the patch command can use the output of diff as a set of instructions and apply the patch over the target file.

#### Example with plain text to grok patch and diff.

• Note taken on [2019-08-22 Thu]
New lines or individual lines were made to clearly view the changes line by line. Otherwise the entire paragraph is considered (unless I am missing a flag to consider a period as the end of a line and not a newline character)
/bin/bash
cat > ~/temp/a.txt << EOF
What is Lorem Ipsum?
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.
It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
EOF

cat ~/temp/a.txt
&#91;/sourcecode&#93;

<p> Creating b.txt with random changes (-deletions + additions)  in the text. </p>

/bin/bash
cat > ~/temp/b.txt << EOF
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
This is an entirely new line number 1.
Entirely new line number 2
EOF

cat ~/temp/b.txt
&#91;/sourcecode&#93;

<p> Creating third file with more random deletions and additions. </p>

/bin/bash
cat > ~/temp/c.txt << EOF
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
This is an entirely new line number 1.
Entirely new line number 2, with some modifications from b.
New line number 3.
EOF

cat ~/temp/c.txt
&#91;/sourcecode&#93;

&#91;sourcecode language="shell" title="" &#93;
cd ~/temp/
diff -uNr a.txt b.txt > d1.txt
cat d1.txt


Comparing c.txt to a.txt. Note that the first 2 lines always signify --- (first file) and +++ (second file). This is irrespective of additions or deletions. This only signifies the first and second file. This explains the diff pattern in vdiff.

cd ~/temp/
diff -uNr c.txt a.txt > d2.txt
cat d2.txt


### What then is V?

…text was deliberately structured with due consideration given not only to its meaning, but also to its source, and to its context. Prior attempts at structuring software, at first consisting of a naive approach focused on meaning only, over time added a half-hearted consideration of context, very unequally and rather haphazardly.

Used together with specialised scripts, V-genesis allows an agent to reconstruct a complete Bitcoin tree, verify its correctness, and manage his investment of trust at all junctures so that he is never required to implicitly trust either an unknown code author, or a code snippet of unknown provenance.

## 3 responses on “V – Exploration”

1. “The current plan is to implement V in Python (3?)” – no, certainly not Python 3. Python is frozen at version 2.7 (there is more in the logs but for a quick pointer: http://logs.minigame.biz/2016-11-28.log.html#t14:27:47).

It can help a lot with understanding to do your own implementation and if Python is your choice for that (i.e. it’s what you are most comfortable with currently) then go for it, sure. In the longer term, once you get V, the next step can easily be to use it as a spring-board to learn a bit of Ada too for instance. But this is not set in stone right now, it’s simply *one* possible direction.

2. The reference book for crypto is Handbook of Applied Cryptography by Menezes, van Oorschot and Vanstone, CRC Press. The book is quite compact for what it contains but very readable in my opinion and a good investment anyway because you will get back to it over and over again to work your way through the Maths for algorithms, if nothing else. And you’ll need it anyway when you get to study FFA, which is the next mandatory step after V (for a high-level summary re FFA: http://ossasepia.com/2019/07/29/overview-of-ffa-ch1-ch19/).

The trouble is that there aren’t in fact any serious books i.e. with actual proofs for all the various claims (more secure! better! whatever-else, difficult!) thrown left, right and centre. And this is especially true for hashing algorithms. For this reason, I would be especially weary of just “picking a book from the shelf and/or based on goodbooks recommendations” or similar because what passes for “serious cryptographer” is more about believing the stuff than being serious and rigorous about having a proof for it. For reference, here’s a #trilema thread: http://logs.nosuchlabs.com/log/trilema/2017-11-22#1742339