Compare revisions

William Waites · William Waites · William Waites · William Waites · William Waites · William Waites
--- a/README.md
+++ b/README.md
+# CS101/3 - Cognitive Science and Artificial Intelligence
+
+## Info
+
+- **Lecture**
+  - Mondays 14:00-15:00 RC426
+- **Labs:**
+  - LT1105, LT1201, LT1221, LT1320 :: Mondays 15:00-17:00
+  - **Demonstrators:**
+    - Adel Dadaa <[adel.dadaa@strath.ac.uk](mailto:adel.dadaa@strath.ac.uk)>
+    - Pat Prochacki <[pat.prochacki@strath.ac.uk](mailto:pat.prochacki@strath.ac.uk)>
+    - Tochukwu Umeasiegbu <[tochukwu.umeasiegbu@strath.ac.uk](mailto:tochukwu.umeasiegbu@strath.ac.uk)>
+
+- **Links**
+  - [Course Mattermost Channel](https://mattermost.cis.strath.ac.uk/learning/channels/cs101-22-24)
+
+## Marking Scheme
+
+*Final due date for all assignments: 26 February*
+
+- 10% :: Participation
+- 30% :: Assignment 1 - [Yak Shaving](./cs101-csai-lec1.pdf)
+  - 20% :: Create a git repository with a file, and share it
+  - 10% :: Put a transcript of a session with the Emacs doctor in that file
+- 30% :: Assignment 2 - [Probability and Text](./cs101-csai-lec1.pdf)
+  - 10% :: Write a program to output random characters
+  - 10% :: Write a program that, given a character, predicts the next character
+  - 10% :: Write a program to output a sequence of characters
+  - 10% (bonus) :: Write a program that outputs a sequence of characters conditional on the previous two characters
+- 30% :: Assignment 3 - [Stochastic Parrot](./cs101-csai-assignment3.md)
+  - 10% :: Write a program to output random words
+  - 10% :: Write a program that, given a word, predicts the next word
+  - 10% :: Write a program to output a sequence of words
+
+## Topics
+
+- Yak Shaving - Software Engineering Tooling [(PDF)](./cs101-csai-lec1.pdf)
+- Some Philosophical Experiments
+- Symbol Manipulation and Logic
+- Probability and Text prediction
+- Vector Spaces and Word embedding
+
+
+## Python Quick-Start Guide
+
+[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/r54u4z_qay0/0.jpg)](https://www.youtube.com/watch?v=r54u4z_qay0)
+
+[Setting up Python - Write-Up](https://gitlab.cis.strath.ac.uk/xgb21195/cs101-csai/-/blob/main/setup.md?ref_type=heads)
--- a/README.org
+++ b/README.org
-#+title: CS101/3 - Cognitive Science and Artificial Intelligence
-
-* Info
-
-** Lecture
-
-Mondays 14:00-15:00 UC201
-
-** Labs:
-
- LT1105, LT1201, LT1221, LT1320 :: Mondays 15:00-17:00
-*** Demonstrators:
- Pat Prochacki <[[mailto:pat.prochacki@strath.ac.uk][pat.prochacki@strath.ac.uk]]>
- Tochukwu Umeasiegbu <[[mailto:tochukwu.umeasiegbu@strath.ac.uk][tochukwu.umeasiegbu@strath.ac.uk]]>
- TBD
-  
-** Links
-
- [[https://mattermost.cis.strath.ac.uk/learning/channels/cs101-22-24][Course Mattermost Channel]]
-
-* Marking Scheme
-
- 10% :: Participation
- 30% :: Assignment 1 - [[./cs101-csai-lec1.pdf][Yak Shaving]]
-  - 20% :: Create a git repository with a file, and share it
-  - 10% :: Put a transcript of a session with the Emacs doctor in that file
- 20% :: Assignment 2
- 20% :: Assignment 3
- 20% :: Assignment 4
-  
-* Topics
-1. Yak Shaving - Software Engineering Tooling [[./cs101-csai-lec1.pdf][(PDF]])
-2. Some Philosophical Experiments
-3. Symbol Manipulation and Logic
-4. Probability and Text prediction
-5. Vector Spaces and Word embedding
--- a/cs101-csai-assignment3.md
+++ b/cs101-csai-assignment3.md
+# CS101/3 - CS/AI Assignment 3 - The Stochastic Parrot
+
+Assignment 3 is just like Assignment 2 but with words instead of letters.
+
+You may wish to use the Python NLTK (https://www.nltk.org) Natural Language
+Toolkit to help break the text up into tokens (words)
+
+You may adapt the helper functions from Assignment 2 to work with words 
+rather than letters or you may write your own from scratch
+
+Optionally, instead of doing this programming exercise, you may write a
+short essay, no longer than 2000 words on one of the topics that we have
+touched on in the philosophy of mind:
+
+  - Turing's Imitation Game
+  - Dneprov's Game (or, equivalently, Searle's Chinese Room)
+  - Logical Behaviorism or Functionalism
+  - The Engineering End-Run
+
+Assignment 3 is worth 30% in total.
+
+## 3a word probabilities - 10%
+
+Given an input text, compute the word probabilities and generate a word,
+
+	xgb21195@cafe:~$ ./assignment-3a example.txt
+        the
+
+## 3b conditional word probabilities - 10%
+
+Given an input text and a word, generate the next word according to the
+conditional probabilities in the text,
+
+	xgb21195@cafe:~$ ./assignment-3b example.txt the
+	cat
+
+## 3c a stochastic parrot - 10%
+
+Given an input text, generate a sentence of a particular length according
+to the conditional word distributions,
+
+	xgb21195@cafe:~$ ./assignment-3b example.txt 10
+	the cat ate a burrito that is not a butterfly
+
+
--- a/cs101-csai-lec1.org
+++ b/cs101-csai-lec1.org
@@ -35,18 +35,24 @@ Room 12.20 Livingstone Tower

 * Who are your tutors

+- Adel Dadaa <[[mailto:adel.dadaa@strath.ac.uk>][adel.dadaa@strath.ac.uk]]>
 - Pat Prochacki <[[mailto:pat.prochacki@strath.ac.uk][pat.prochacki@strath.ac.uk]]>
 - Tochukwu Umeasiegbu <[[mailto:tochukwu.umeasiegbu@strath.ac.uk][tochukwu.umeasiegbu@strath.ac.uk]]>
- TBD
  
 * Where is this course?

-*** Lecture:
-Mondays 14:00-15:00 UC201
+- Lecture    :: Mondays 14:00-15:00 RC426
+- Zoom       :: https://strath.zoom.us/j/88271723449
+- Meeting ID :: 882 7172 3449
+- Password   :: 964919

 *** Labs:
 Mondays 15:00-17:00 in LT1105, LT1201, LT1221, LT1320

+* Illness - COVID, Flu, etc
+
+*If you are unwell do not come to lectures or labs*
+
 * Where is this course?

 ** Topic 3 repository
@@ -57,11 +63,22 @@ Mondays 15:00-17:00 in LT1105, LT1201, LT1221, LT1320
 ** Mattermost
 - https://mattermost.cis.strath.ac.uk/learning/channels/cs101-22-24
 - Discussion, asynchronous Q&A, mutual assistance
-  
-* Marking Scheme
+
+* Subject matter
+
+1. Software Engineering Tooling
+2. Some Philosophical Experiments: Turing's Imitation Game, Searle's
+   Chinese Room, Weizenbaum's ELIZA, the "Engineering End-Run"
+3. Symbol Manipulation and Logic: Before the AI Winter, Expert Systems
+4. Probability and Text prediction: Statistical methods for text
+   processing and machine translation
+5. Optimisation, Vector Spaces and Word embedding
+6. Large Language Models, GPT etc
+
+* Marking scheme

 - 10% :: Participation
- 30% :: Assignment 1
+- 30% :: Assignment 1 - Yak Shaving
 - 20% :: Assignment 2
 - 20% :: Assignment 3
 - 20% :: Assignment 4
@@ -70,6 +87,14 @@ Mondays 15:00-17:00 in LT1105, LT1201, LT1221, LT1320

 - Each student is responsible for their own assignments
 - Working in groups is allowed but not required
+
+* Lab work on Linux
+
+- The CIS-managed computer labs on the 11th, 12th, and 13th floor of
+  Livingstone Tower
+- Remote graphical connection via https://guacamole.cis.strath.ac.uk/
+- Remote terminal connection =ssh cafe.cis.strath.ac.uk=
+- Documentation: https://docs.cis.strath.ac.uk/remote-compute/
  
 * Yak Shaving
 \begin{center}
@@ -90,6 +115,9 @@ Submit coursework in a way that:

 1. Has something to do with software development practice
 2. Makes it possible to automate marking
+
+We will not be using MyPlace, we will use the department Gitlab server
+instead.
   
 * Git...


--- a/cs101-csai-lec1.pdf
+++ b/cs101-csai-lec1.pdf
--- a/cs101-csai-lec2.org
+++ b/cs101-csai-lec2.org
+#+title: CS101 - Cognitive Science and Artificial Intelligence - Lecture 2
+#+startup: beamer
+#+latex_class: beamer
+#+latex_class_options: [14pt,aspectratio=169,hyperref={pdfpagelabels=false},svgnames]
+#+latex_header_extra: \usetheme{strath}
+#+latex_header_extra: \usepackage{tikz}
+#+latex_header_extra: \usepackage{pgfplots}
+#+latex_header_extra: \usepackage{filecontents}
+#+options: toc:nil
+
+* CS101 - Cog. Sci. and AI
+
+** Today
+
+- Recap of assignment 1 - git
+- Probability and text
+
+* Assignment 1
+
+- 74 students have attempted to submit the assignment
+- of these, 72 have actually submitted the assignment
+
+*This is only slightly more than half the class!*
+
+Marking will begin this week.
+
+If you are having trouble, *ask a demonstrator for help*.
+
+If you do not do this assignment, you will not be able to submit any
+of the other assignments and you will get *0* for this module.
+
+* Probability Distributions: Definition
+* Probability Distributions: Definition
+
+- Support :: a set $S$,
+  $$
+  S = \left\{ x_1, x_2, \ldots, x_n \right\}
+  $$
+- Probability map :: a mapping
+  $$
+  \Pr: S \rightarrow \left[0,1\right]
+  $$
+- Condition ::
+  $$
+  \sum_{x \in S} \Pr(x) = 1
+  $$
+
+* Probability Distributions: a Fair Coin
+
+Consider a fair coin$\ldots$
+
+$$
+S_1 = \left\{ \mathtt{heads}, \mathtt{tails} \right\}
+$$
+
+$$
+\Pr_1(x) = \begin{cases}
+\frac{1}{2} & x\,is\,\mathtt{heads}\\
+\frac{1}{2} & x\,is\,\mathtt{tails}
+\end{cases}
+$$
+
+$$
+\Pr_1(\mathtt{heads}) + \Pr_1(\mathtt{tails}) = \frac{1}{2} + \frac{1}{2} = 1
+$$
+
+* Probability Distributions: Two Fair Coins
+\begin{align*}
+S_2 = S_1 \times S_1 = \left\{&
+(\mathtt{heads}, \mathtt{heads}), (\mathtt{heads}, \mathtt{tails}),\right.\\
+&\left.(\mathtt{tails}, \mathtt{heads}), (\mathtt{tails}, \mathtt{tails})
+\right\}
+\end{align*}
+
+\begin{align*}
+\Pr_2(x,y) &= \Pr_1(x)\Pr_1(y),\qquad\forall (x,y) \in S_2\\
+&= \frac{1}{2}\cdot\frac{1}{2} = \frac{1}{4}
+\end{align*}
+
+*Assumption:* these two coin tosses are /independent/
+
+* Probability Distributions: Fair Dice
+
+Consider a (fair) six-sided die,
+$$
+S_{d6} = \left\{1,2,3,4,5,6\right\}
+$$
+$$
+\Pr_\text{fair}(x) = \frac{1}{6}, \qquad\forall x \in S_{d6}
+$$
+
+* Probability Distributions: Unfair Dice
+
+Now consider an /unfair/ six-sided die,
+
+$$
+\Pr_\text{unfair}(x) = \begin{cases}
+\frac{1}{5} & x = 1\\
+\frac{1}{10} & x \in \left\{2,3,4,5\right\}\\
+\frac{2}{5} & x = 6
+\end{cases}
+$$
+
+* Questions
+
+- Can we quantify /fairness/?
+
+- Can we distinguish /information/ from random noise?
+  
+* Entropy
+
+\begin{align*}
+H(\Pr) &= -\sum_{x\in S} \Pr(x) \log_2 \Pr(x)\\
+\Pr &: S \rightarrow \left[0,1\right]
+\end{align*}
+
+Define: $ 0 \log_2 0 \equiv 0 $
+
+* Entropy
+
+\begin{center}
+\vspace{-2\baselineskip}
+\hspace{2ex}\includegraphics[height=\textheight]{./img/neglog.png}
+\end{center}
+
+* Entropy -- Fair Coin
+
+\begin{align*}
+H(\Pr_1)
+&= -\left[ \Pr_1(\mathtt{heads})\log_2 \Pr_1(\mathtt{heads}) +
+           \Pr_1(\mathtt{tails})\log_2 \Pr_1(\mathtt{tails}) \right]\\
+&= -\left[ \frac{1}{2}\log_2 \frac{1}{2} + \frac{1}{2}\log_2 \frac{1}{2}\right]\\
+&= -2 \frac{1}{2} \log_2 \frac{1}{2}\\
+&= -\log_2 \frac{1}{2}\\
+&= 1
+\end{align*}
+
+* Entropy -- Two Fair Coins
+\begin{align*}
+S_2 = \left\{&
+(\mathtt{heads}, \mathtt{heads}), (\mathtt{heads}, \mathtt{tails}),\right.\\
+&\left. (\mathtt{tails}, \mathtt{heads}), (\mathtt{tails}, \mathtt{tails})
+\right\}
+\end{align*}
+
+\begin{align*}
+\Pr_2(x) &= \frac{1}{4}, \qquad\forall x \in S_2
+\end{align*}
+
+* Entropy -- Two Fair Coins
+\begin{align*}
+H(\Pr_2)
+&= -\left[ \Pr_2(\mathtt{HH})\log_2 \Pr_2(\mathtt{HH}) +
+           \Pr_2(\mathtt{HT})\log_2 \Pr_2(\mathtt{HT}) +\right.\\
+&\qquad\,\left.
+           \Pr_2(\mathtt{TH})\log_2 \Pr_2(\mathtt{TH}) +
+           \Pr_2(\mathtt{TT})\log_2 \Pr_2(\mathtt{TT})\right]\\
+&= -4 \frac{1}{4} \log_2 \frac{1}{4}\\
+&= -\log_2 \frac{1}{4}\\
+&= 2
+\end{align*}
+
+* Entropy -- Fair Dice
+
+$$
+H(\Pr_\text{fair}) = -\log_2 \frac{1}{6} \approx 2.58\ldots
+$$
+
+* Entropy -- Unfair Dice
+\begin{align*}
+\Pr_\text{unfair}(S) &= \left\{\Pr_\text{unfair}(1), \Pr_\text{unfair}(2), \Pr_\text{unfair}(3), \Pr_\text{unfair}(4), \Pr_\text{unfair}(5), \Pr_\text{unfair}(6) \right\}\\
+      &= \left\{\frac{1}{5}, \frac{1}{10}, \frac{1}{10}, \frac{1}{10}, \frac{1}{10}, \frac{2}{5} \right\}
+\end{align*}
+
+* Entropy -- Unfair Dice
+\begin{align*}
+\Pr_\text{unfair}(S) &= \left\{\Pr_\text{unfair}(1), \Pr_\text{unfair}(2), \Pr_\text{unfair}(3), \Pr_\text{unfair}(4), \Pr_\text{unfair}(5), \Pr_\text{unfair}(6) \right\}\\
+      &= \left\{\frac{1}{5}, \frac{1}{10}, \frac{1}{10}, \frac{1}{10}, \frac{1}{10}, \frac{2}{5} \right\}
+\end{align*}
+
+\begin{align*}
+H(\Pr_\text{unfair}) &= -\left[
+\frac{1}{5}\log_2 \frac{1}{5} + 4\frac{1}{10}\log_2 \frac{1}{10} + \frac{2}{5}\log_2 \frac{2}{5}
+\right]\\
+&\approx 2.32\ldots
+\end{align*}
+
+* Entropy -- Really Unfair Dice
+What if...
+$$
+Pr(x) = \begin{cases}
+1 & x = 1\\
+0 & \text{otherwise}
+\end{cases}
+$$
+
+\begin{align*}
+H(Pr) &= -1 \log_2 1 - 5\cdot0\log_2 0\\
+      &= 0
+\end{align*}
+
+* Normalised Entropy
+
+$$
+\bar{H} = \frac{\text{entropy of a distribution}}{\text{entropy of the uniform distribution}}
+$$
+
+
+* Normalised Entropy
+
+$$
+\bar{H} = \frac{\text{entropy of a distribution}}{\text{entropy of the uniform distribution}}
+$$
+
+Recall entropy of the uniform distribution is just $-log_2 \frac{1}{|S|}$
+
+Notation -- $|S|$ is number of elements in the support
+
+* Normalised Entropy
+
+For our unfair vs fair (uniform) dice,
+
+$$
+\bar{H}(\Pr_\text{unfair}) = \frac{H(\Pr_\text{unfair})}{-\log_2\frac{1}{|S_{d6}|}} = \frac{2.32\ldots}{2.58\ldots} = 0.898\ldots
+$$
+
+* Normalised Entropy
+
+There better concept of relative entropy of two distributions,
+*Kullback-Leibler divergence*. You would learn about this in a course on
+Information Theory.
+
+For our purposes, the normalised entropy will do.
+
+* So what about text? Choose an alphabet
+
+$$
+S_\alpha = \mathtt{'abcdefghijklmnopqrstuvwxyz\_'}
+$$
+
+(by '_' we mean a space)
+
+Some light pre-processing:
+- make all letters lower case
+- ignore punctuation etc
+
+* So what about text? Probability distribution
+
+Choose a letter at random from a text.
+
+What is the chance you pick =e= or =q= or =' '= (space?)
+
+- Support :: all (ascii) letters + space
+- Mapping ::
+  $$
+  \Pr_\alpha(x) = \frac{\mathrm{count}(x)}{\mathrm{count}(\text{all letters})}
+  $$
+
+
+* Letter probabilities
+
+#+begin_src
+a 0.0654   b 0.0124   c 0.0214   d 0.0311   e 0.1061
+f 0.0195   g 0.0144   h 0.0547   i 0.0604   j 0.0014
+k 0.0043   l 0.0316   m 0.0196   n 0.0586   o 0.0633
+p 0.0145   q 0.0009   r 0.0483   s 0.0537   t 0.0783
+u 0.0236   v 0.0078   w 0.0177   x 0.0013   y 0.0164
+z 0.0004     0.1727
+#+end_src
+
+* Normalised entropy of letter probabilities
+
+$$
+H(\Pr_\alpha) = 4.095\ldots
+$$
+
+Maximum entropy, $\log_2 \frac{1}{27} \approx 4.754\ldots$
+
+$$
+\bar{H}(\Pr_\alpha) = \frac{4.095\ldots}{4.754\ldots} = 0.861\ldots
+$$
+
+* Pair and conditional probability
+
+\begin{align*}
+\Pr(y | x) &\qquad\text{probability of }y\text{ given }x\\
+\Pr(x,y) &\qquad\text{probability of seeing the pair }(x,y)
+\end{align*}
+
+\begin{align*}
+\Pr(x,y) &= \Pr(y|x) \Pr(x)\\
+\Pr(y) &= \sum_x \Pr(y|x) \Pr(x)
+\end{align*}
+
+* Assignment 2a - Probability and Text (10%)
+
+Write a program named =assignment-2a= that takes a file and
+prints out a letter   according to the distribution of letters in
+that file, e.g. 
+
+#+begin_src
+xgb21195@cafe:~/cs101$ assignment-2a filename
+e
+xgb21195@cafe:~/cs101$
+#+end_src
+
+* Assignment 2a - Probability and Text (10%)
+
+/hint: use the Python 3 built in/ =random.choices= /function/
+
+Note:
+
+#+begin_src
+random.choices(population, weights=None, ...)
+#+end_src
+
+- =population= $\rightarrow$ support
+- =weights= $\rightarrow$ probabilities
+  
+https://docs.python.org/3/library/random.html
+
+* Assignment 2b - Probability and Text (10%)
+
+Write a program named =assignment-2b= that takes a file and a
+letter and prints out a following letter according to the
+/conditional/ distribution letters given the previous letter that file,
+e.g.
+
+#+begin_src
+xgb21195@cafe:~/cs101$ ./assignment-2b filename e
+r
+xgb21195@cafe:~/cs101$
+#+end_src
+
+* Assignment 2c - Probability and Text (10%)
+
+Write a program named =assignment-2c= that takes a filename and a
+number and prints out a sequence of characters according to the
+conditional distribution from 2b
+
+#+begin_src
+xgb21195@cafe:~/cs101$ ./assignment-2c filename 25
+end usanve n imemas hely
+xgb21195@cafe:~/cs101$
+#+end_src
+
+* Assignment 2
+
+- Your programs *must* be named =assignment-2a=, =assignment-2b=, and
+  =assignment-2c= in your git repository
+- You can write your programs in whatever language you wish. If you
+  use a compiled language, you must include a binary that runs on the
+  CIS linux machines and source code and a Makefile to build it.
+- You *must* make sure the programs run on the CIS linux machines, if it
+  does not, you will receive no marks, no exceptions.
+
+* Assignment 2 bonus (10%)
+
+Write a program like 2c that instead of using probability conditional
+one /one/ previous letter, conditions on the previous /two/ letters.
+
+Call this program =assignment-2d=
+
+* Materials
+
+https://gitlab.cis.strath.ac.uk/xgb21195/cs101-csai
+
+look in the =lec2= subdirectory.
+
+- letters.py :: definition of "letters" and a functions to get
+  distributions of letters from a file
+- entropy.py :: implementation of entropy function and example of
+  using it on a file (for letters)
+- example.py :: examples of using this function
+- republic.txt :: Plato's Republic from Project Gutenberg
+  
+* Reading for next week
+
+The Game, Anatoly Dneprov, 1961
+
+http://q-bits.org/images/Dneprov.pdf
+
+
--- a/cs101-csai-lec2.pdf
+++ b/cs101-csai-lec2.pdf
--- a/img/idlewin.png
+++ b/img/idlewin.png
--- a/img/neglog.png
+++ b/img/neglog.png
--- a/img/pycharmwin.png
+++ b/img/pycharmwin.png
--- a/img/pythoncmd.png
+++ b/img/pythoncmd.png
--- a/img/vscodewin.png
+++ b/img/vscodewin.png
--- a/lec2/__init__.py
+++ b/lec2/__init__.py
--- a/lec2/entropy.py
+++ b/lec2/entropy.py
+#!/usr/bin/env python
+
+from letters import file2prob, file2pairs
+from math import log
+
+def entropy(dist):
+    """
+    Calculate the entropy of a probability distribution. The
+    probability distribution is given as a dictionary where
+    the keys are the support and the values are the probabilities
+    """
+    return -sum(p*log(p, 2) for p in dist.values() if p > 0)
+
+if __name__ == '__main__':
+    import sys
+    
+    if len(sys.argv) != 2:
+        print("Usage: %s filename" % sys.argv[0])
+        sys.exit(-1)
+        
+    filename = sys.argv[1]
+
+    probs = file2prob(filename)
+    conds = file2pairs(filename)
+
+    cc = sum([ [probs[c]*d for d in conds[c].values()]
+               for c in conds ], [])
+    centropy = -sum(p*log(p,2) for p in cc if p > 0)
+    print(entropy(probs)/-log(1/len(probs),2), centropy/-log(1/len(cc),2))
--- a/lec2/example.py
+++ b/lec2/example.py
+#!/usr/bin/env python
+
+from entropy import entropy
+
+fair_coin = { "heads": 0.5, "tails": 0.5 }
+two_coins = { "HH": 0.25, "HT": 0.25, "TH": 0.25, "TT": 0.25 }
+fair_die  = { 1: 1/6, 2: 1/6, 3: 1/6, 4: 1/6, 5: 1/6, 6: 1/6 }
+unfair_die = { 1: 1/5, 2: 1/10, 3: 1/10, 4: 1/10, 5: 1/10, 6: 2/5 }
+
+print("Entropy of a fair coin:", entropy(fair_coin))
+print("Entropy of two fair coins:", entropy(two_coins))
+print("Entropy of a fair six-sided die:", entropy(fair_die))
+print("Entropy of an unfair six-sided die:", entropy(unfair_die))
+
+from letters import file2prob, letters
+
+print("Entropy of the uniform alphabet:",
+      entropy({ c: 1/len(letters) for c in letters }))
+
+probs = file2prob("republic.txt")
+
+lines = []
+for i in range(6):
+    line = []
+    for j in range(5):
+        try:
+            letter = letters[i*5+j]
+        except IndexError: 
+            continue
+        line.append("%s %.04f" % (letter, probs[letter]))
+    lines.append("   ".join(line))
+
+print("#+begin_src")
+print("\n".join(lines))
+print("#+end_src")
+            
+print("Entropy of letters in Plato's Republic:", entropy(probs))
--- a/lec2/letters.py
+++ b/lec2/letters.py
+#!/usr/bin/env python
+
+letters = 'abcdefghijklmnopqrstuvwxyz '
+
+def file2prob(filename):
+    """
+    Read a file and return a dictionary of letters and
+    their probabilities
+    """
+    letter_dict  = {}
+    letter_total = 0
+
+    with open(filename, encoding="utf-8") as fp:
+        for c in fp.read():
+            if c.lower() not in letters:
+                continue
+            c = c.lower()
+            letter_dict[c] = letter_dict.get(c, 0) + 1
+            letter_total += 1
+
+    probs = { c: letter_dict[c]/letter_total for c in letter_dict }
+    return probs
+
+def file2pairs(filename):
+    """
+    Read a file and return a dictionary of letters and
+    the probabilities of following letters. That is, the
+    conditional probability of a letter given its
+    predecessor.
+    """
+    letter_dict  = {}
+
+    previous = None
+    with open(filename, encoding="utf-8") as fp:
+        for c in fp.read():
+            c = c.lower()
+            if c not in letters:
+                continue
+            if previous is None:
+                previous = c
+                continue
+            d = letter_dict.setdefault(previous, {})
+            d[c] = d.get(c, 0) + 1
+            previous = c
+
+    probs = { c: { d: letter_dict[c][d]/sum(letter_dict[c].values())
+                   for d in letter_dict[c] } for c in letter_dict }
+    return probs
+
+if __name__ == '__main__':
+    import sys
+    
+    if len(sys.argv) != 3:
+        print("Usage: %s filename letter" % sys.argv[0])
+        sys.exit(-1)
+        
+    filename = sys.argv[1]
+    letter = sys.argv[2]
+
+    probs = file2prob(filename)
+    print(probs[letter])
--- a/lec2/republic.txt
+++ b/lec2/republic.txt
--- a/marking/aabbcc.txt
+++ b/marking/aabbcc.txt
+aabbcc
--- a/marking/abcd.txt
+++ b/marking/abcd.txt
+abcd
--- a/marking/abcda.txt
+++ b/marking/abcda.txt
+abcda
No results found