#+title: CS101 - Cognitive Science and Artificial Intelligence - Lecture 2 #+startup: beamer #+latex_class: beamer #+latex_class_options: [14pt,aspectratio=169,hyperref={pdfpagelabels=false},svgnames] #+latex_header_extra: \usetheme{strath} #+latex_header_extra: \usepackage{tikz} #+latex_header_extra: \usepackage{pgfplots} #+latex_header_extra: \usepackage{filecontents} #+options: toc:nil * CS101 - Cog. Sci. and AI ** Today - Recap of assignment 1 - git - Probability and text * Assignment 1 - 74 students have attempted to submit the assignment - of these, 72 have actually submitted the assignment *This is only slightly more than half the class!* Marking will happen this week. If you do not do this assignment, you will not be able to submit any of the other assignments and you will get *0* for this module. * Probability Distributions: Definition * Probability Distributions: Definition - Support :: a set $S$, $$ S = \left\{ x_1, x_2, \ldots, x_n \right\} $$ - Probability map :: a mapping $$ \Pr: S \rightarrow \left[0,1\right] $$ - Condition :: $$ \sum_{x \in S} \Pr(x) = 1 $$ * Probability Distributions: a Fair Coin Consider a fair coin$\ldots$ $$ S_1 = \left\{ \mathtt{heads}, \mathtt{tails} \right\} $$ $$ \Pr_1(x) = \begin{cases} \frac{1}{2} & x\,is\,\mathtt{heads}\\ \frac{1}{2} & x\,is\,\mathtt{tails} \end{cases} $$ $$ \Pr_1(\mathtt{heads}) + \Pr_1(\mathtt{tails}) = \frac{1}{2} + \frac{1}{2} = 1 $$ * Probability Distributions: Two Fair Coins \begin{align*} S_2 = S_1 \times S_1 = \left\{& (\mathtt{heads}, \mathtt{heads}), (\mathtt{heads}, \mathtt{tails}),\right.\\ &\left.(\mathtt{tails}, \mathtt{heads}), (\mathtt{tails}, \mathtt{tails}) \right\} \end{align*} \begin{align*} \Pr_2(x,y) &= \Pr_1(x)\Pr_1(y),\qquad\forall (x,y) \in S_2\\ &= \frac{1}{2}\cdot\frac{1}{2} = \frac{1}{4} \end{align*} *Assumption:* these two coin tosses are /independent/ * Probability Distributions: Fair Dice Consider a (fair) six-sided die, $$ S_{d6} = \left\{1,2,3,4,5,6\right\} $$ $$ \Pr_\text{fair}(x) = \frac{1}{6}, \qquad\forall x \in S_{d6} $$ * Probability Distributions: Unfair Dice Now consider an /unfair/ six-sided die, $$ \Pr_\text{unfair}(x) = \begin{cases} \frac{1}{5} & x = 1\\ \frac{1}{10} & x \in \left\{2,3,4,5\right\}\\ \frac{2}{5} & x = 6 \end{cases} $$ * Questions - Can we quantify /fairness/? - Can we distinguish /information/ from random noise? * Entropy \begin{align*} H(\Pr) &= -\sum_{x\in S} \Pr(x) \log_2 \Pr(x)\\ \Pr &: S \rightarrow \left[0,1\right] \end{align*} Define: $ 0 \log_2 0 \equiv 0 $ * Entropy \begin{center} \vspace{-2\baselineskip} \hspace{2ex}\includegraphics[height=\textheight]{./img/neglog.png} \end{center} * Entropy -- Fair Coin \begin{align*} H(\Pr_1) &= -\left[ \Pr_1(\mathtt{heads})\log_2 \Pr_1(\mathtt{heads}) + \Pr_1(\mathtt{tails})\log_2 \Pr_1(\mathtt{tails}) \right]\\ &= -\left[ \frac{1}{2}\log_2 \frac{1}{2} + \frac{1}{2}\log_2 \frac{1}{2}\right]\\ &= -2 \frac{1}{2} \log_2 \frac{1}{2}\\ &= -\log_2 \frac{1}{2}\\ &= 1 \end{align*} * Entropy -- Two Fair Coins \begin{align*} S_2 = \left\{& (\mathtt{heads}, \mathtt{heads}), (\mathtt{heads}, \mathtt{tails}),\right.\\ &\left. (\mathtt{tails}, \mathtt{heads}), (\mathtt{tails}, \mathtt{tails}) \right\} \end{align*} \begin{align*} \Pr_2(x) &= \frac{1}{4}, \qquad\forall x \in S_2 \end{align*} * Entropy -- Two Fair Coins \begin{align*} H(\Pr_2) &= -\left[ \Pr_2(\mathtt{HH})\log_2 \Pr_2(\mathtt{HH}) + \Pr_2(\mathtt{HT})\log_2 \Pr_2(\mathtt{HT}) +\right.\\ &\qquad\,\left. \Pr_2(\mathtt{TH})\log_2 \Pr_2(\mathtt{TH}) + \Pr_2(\mathtt{TT})\log_2 \Pr_2(\mathtt{TT})\right]\\ &= -4 \frac{1}{4} \log_2 \frac{1}{4}\\ &= -\log_2 \frac{1}{4}\\ &= 2 \end{align*} * Entropy -- Fair Dice $$ H(\Pr_\text{fair}) = -\log_2 \frac{1}{6} \approx 2.58\ldots $$ * Entropy -- Unfair Dice \begin{align*} \Pr_\text{unfair}(S) &= \left\{\Pr_\text{unfair}(1), \Pr_\text{unfair}(2), \Pr_\text{unfair}(3), \Pr_\text{unfair}(4), \Pr_\text{unfair}(5), \Pr_\text{unfair}(6) \right\}\\ &= \left\{\frac{1}{5}, \frac{1}{10}, \frac{1}{10}, \frac{1}{10}, \frac{1}{10}, \frac{2}{5} \right\} \end{align*} * Entropy -- Unfair Dice \begin{align*} \Pr_\text{unfair}(S) &= \left\{\Pr_\text{unfair}(1), \Pr_\text{unfair}(2), \Pr_\text{unfair}(3), \Pr_\text{unfair}(4), \Pr_\text{unfair}(5), \Pr_\text{unfair}(6) \right\}\\ &= \left\{\frac{1}{5}, \frac{1}{10}, \frac{1}{10}, \frac{1}{10}, \frac{1}{10}, \frac{2}{5} \right\} \end{align*} \begin{align*} H(\Pr_\text{unfair}) &= -\left[ \frac{1}{5}\log_2 \frac{1}{5} + 4\frac{1}{10}\log_2 \frac{1}{10} + \frac{2}{5}\log_2 \frac{2}{5} \right]\\ &\approx 2.32\ldots \end{align*} * Entropy -- Really Unfair Dice What if... $$ Pr(x) = \begin{cases} 1 & x = 1\\ 0 & \text{otherwise} \end{cases} $$ \begin{align*} H(Pr) &= -1 \log_2 1 - 5\cdot0\log_2 0\\ &= 0 \end{align*} * Normalised Entropy $$ \bar{H} = \frac{\text{entropy of a distribution}}{\text{entropy of the uniform distribution}} $$ * Normalised Entropy $$ \bar{H} = \frac{\text{entropy of a distribution}}{\text{entropy of the uniform distribution}} $$ Recall entropy of the uniform distribution is just $-log_2 \frac{1}{|S|}$ Notation -- $|S|$ is number of elements in the support * Normalised Entropy For our unfair vs fair (uniform) dice, $$ \bar{H}(\Pr_\text{unfair}) = \frac{H(\Pr_\text{unfair})}{-\log_2\frac{1}{|S_{d6}|}} = \frac{2.32\ldots}{2.58\ldots} = 0.898\ldots $$ * Normalised Entropy There better concept of relative entropy of two distributions, *Kullback-Leibler divergence*. You would learn about this in a course on Information Theory. For our purposes, the normalised entropy will do. * So what about text? Choose an alphabet $$ S_\alpha = \mathtt{'abcdefghijklmnopqrstuvwxyz\_'} $$ (by '_' we mean a space) Some light pre-processing: - make all letters lower case - ignore punctuation etc * So what about text? Probability distribution Choose a letter at random from a text. What is the chance you pick =e= or =q= or =' '= (space?) - Support :: all (ascii) letters + space - Mapping :: $$ \Pr_\alpha(x) = \frac{\mathrm{count}(x)}{\mathrm{count}(\text{all letters})} $$ * Letter probabilities #+begin_src a 0.0654 b 0.0124 c 0.0214 d 0.0311 e 0.1061 f 0.0195 g 0.0144 h 0.0547 i 0.0604 j 0.0014 k 0.0043 l 0.0316 m 0.0196 n 0.0586 o 0.0633 p 0.0145 q 0.0009 r 0.0483 s 0.0537 t 0.0783 u 0.0236 v 0.0078 w 0.0177 x 0.0013 y 0.0164 z 0.0004 0.1727 #+end_src * Normalised entropy of letter probabilities $$ H(\Pr_\alpha) = 4.095\ldots $$ Maximum entropy, $\log_2 \frac{1}{27} \approx 4.754\ldots$ $$ \bar{H}(\Pr_\alpha) = \frac{4.095\ldots}{4.754\ldots} = 0.861\ldots $$ * Pair and conditional probability \begin{align*} \Pr(y | x) &\qquad\text{probability of }y\text{ given }x\\ \Pr(x,y) &\qquad\text{probability of seeing the pair }(x,y) \end{align*} \begin{align*} \Pr(x,y) &= \Pr(y|x) \Pr(x)\\ \Pr(y) &= \sum_x \Pr(y|x) \Pr(x) \end{align*} * Assignment 2a - Probability and Text (10%) Write a program named =assignment-2a= that takes a file and prints out a letter according to the distribution of letters in that file, e.g. #+begin_src xgb21195@cafe:~/cs101$ assignment-2a filename e xgb21195@cafe:~/cs101$ #+end_src * Assignment 2a - Probability and Text (10%) /hint: use the Python 3 built in/ =random.choices= /function/ Note: #+begin_src random.choices(population, weights=None, ...) #+end_src - =population= $\rightarrow$ support - =weights= $\rightarrow$ probabilities https://docs.python.org/3/library/random.html * Assignment 2b - Probability and Text (10%) Write a program named =assignment-2b= that takes a file and a letter and prints out a following letter according to the /conditional/ distribution letters given the previous letter that file, e.g. #+begin_src xgb21195@cafe:~/cs101$ ./assignment-2b filename e r xgb21195@cafe:~/cs101$ #+end_src * Assignment 2c - Probability and Text (10%) Write a program named =assignment-2c= that takes a filename and a number and prints out a sequence of characters according to the conditional distribution from 2b #+begin_src xgb21195@cafe:~/cs101$ ./assignment-2c filename 25 end usanve n imemas hely xgb21195@cafe:~/cs101$ #+end_src * Assignment 2 - Your programs *must* be named =assignment-2a=, =assignment-2b=, and =assignment-2c= in your git repository - You can write your programs in whatever language you wish. If you use a compiled language, you must include a binary that runs on the CIS linux machines and source code and a Makefile to build it. - You *must* make sure the programs run on the CIS linux machines, if it does not, you will receive no marks, no exceptions. * Assignment 2 bonus (10%) Write a program like 2c that instead of using probability conditional one /one/ previous letter, conditions on the previous /two/ letters. Call this program =assignment-2d= * Materials https://gitlab.cis.strath.ac.uk/xgb21195/cs101-csai look in the =lec2= subdirectory. - letters.py :: definition of "letters" and a functions to get distributions of letters from a file - entropy.py :: implementation of entropy function and example of using it on a file (for letters) - example.py :: examples of using this function - republic.txt :: Plato's Republic from Project Gutenberg * Reading for next week The Game, Anatoly Dneprov, 1961 http://q-bits.org/images/Dneprov.pdf