This is a introductory course in computational linguistics for undergraduates. There are no prerequisites. There is no textbook. Student will learn to program using Python (3.x) and also learn to use basic computational tools such as NLTK for language analysis. New for 2024, students will also study Large Languge Models (LLMs) using Transformer tools. A term project is required. Both classroom lectures and computer laboratory exercises will be used. A term project illustrating the skills learnt is required. Students will need their own computer, preferably a laptop. Software We will use Python and nltk.Examples
|
Instructor: Sandiway Fong sandiway AT arizona.edu
Office: 311 Douglass
|
See lecture 1 slides and syllabus.pdf.
Available in both Adobe PDF and Microsoft Powerpoint formats.
Date | Lecture Notes | Number of Slides |
Panopto | Topic | |
---|---|---|---|---|---|
Powerpoint | |||||
1/11 | lecture1.pdf | lecture1.pptx | 34 | Viewer | Administrivia and Introduction.
Examples of things we'll be able to do with Python and nltk. Homework 1: Install Python 3 on your computer. |
1/16 | lecture2.pdf | lecture2.pptx | 39 | Viewer | The Google N-grams lecture.
Quick Homework 2. |
1/18 | lecture3.pdf | lecture3.pptx | 36 | No recording available |
Homework 2 review.
Computers and supercomputers. The human brain. The Turing Machine. Binary and Hexadecimal. Numbers on a computer. Python and numbers. Computer MAC addresses. Homework 3. 6pm: corrected slide for due date. |
1/23 | lecture4.pdf | lecture4.pptx | 26 | Viewer |
Homework 3 review.
CPU datatypes: integers and floating point numbers. Character representation on a computer. ASCII. Unicode. UTF-8. BOM. |
1/25 | lecture5.pdf | lecture5.pptx | 17 | Viewer |
Python: numbers and strings. Type coercion.
Homework 4. |
1/30 | lecture6.pdf | lecture6.pptx | 22 | Viewer |
Homework 4 review.
Python: strings vs. lists. max(). sorted(). range(). sys.argv: command line arguments. Terminal log: terminal6.txt |
Date | Lecture Notes | Number of Slides |
Panopto | Topic | |
---|---|---|---|---|---|
Powerpoint | |||||
2/1 | lecture7.pdf | lecture7.pptx | 20 | Viewer |
Recap: sorting on the command line
Modifying lists: append, extend, insert, and remove. Homework 5: install nltk |
2/6 | lecture8.pdf | lecture8.pptx | 19 | Viewer |
A note on python -m pip install.
File: alice.txt Exercises:
|
2/8 | lecture9.pdf | lecture9.pptx | 21 | Viewer | A bit more on eval().
sorting: sorted() and list.sort() with the key= sort parameter. more on lists: stacks and queues, reversed() vs. .reverse(). Homework 5 Terminal log: terminal9.txt |
2/13 | lecture10.pdf | lecture10.pptx | Viewer | Lecture canceled due to sickness. | |
2/15 | lecture11.pdf | lecture11.pptx | 21 | Viewer |
A recent trend: pricey vs. pricy.
Homework 5 Review. Some extras: (1) Plotting x-y graphs directly using matplotlib.pyplot. (2) nltk.FreqDist() .freq(). Lists for queues: deque ("deck") from collections. An obscure way to reverse a list arange() from numpy Slides corrected: 2:05pm |
2/20 | lecture12.pdf | lecture12.pptx | 18 | Viewer | List comprehensions: conditional.
A class exercise: taking out the punctuation. String methods: word.startswith(string), word.endswith(string) and word.istitle(). |
2/22 | lecture13.pdf | lecture13.pptx | 24 | Viewer |
Announcement about next week.
More complex patterns: regular expressions (regex) A class exercise: words that end in -ly. Homework 6 Terminal log: terminal13.txt |
2/27 | lecture14.pdf | lecture14.pptx | 23 | Viewer | Pre-recorded Lecture.
Homework 6 review. More on regex: re.IGNORECASE, groups: a recap, \n and repeated groups, re.finditer() and words in context. Regex Exercises: extra credit only. |
2/29 |   |   | Viewer | No Lecture. See above. |
Date | Lecture Notes | Number of Slides |
Panopto | Topic | |
---|---|---|---|---|---|
Powerpoint | |||||
3/5 | Viewer | Spring Break: no class | |||
3/7 | Viewer | Spring Break: no class | |||
3/12 | lecture15.pdf | lecture15.pptx | Viewer | No lecture. Class canceled. | |
3/14 | lecture16.pdf | lecture16.pptx | 28 | Viewer | Regex exercises: review.
Stylometry: modern and old. who wrote wuthering heights Mendenhall: Mendenhall1887.pdf terminal16.txt |
3/19 | lecture17.pdf | lecture17.pptx | 15 | Viewer |
Worked example for Mendenhall. Live programming, see terminal log.
Oliver Twist: oliver_twist.txt terminal17.txt |
3/21 | lecture18.pdf | lecture18.pptx | 15 | Viewer | Homework 7. Parts 1, 2 and 3.
Let's look at Mendenhall's claims about six-letter words and 100,000 words. Use Oliver Twist, Nicholas Nickleby and David Copperfield in this homework. |
3/26 | lecture19.pdf | lecture19.pptx | 35 | Viewer | Homework 7 Review.
nltk.bigrams(). Generating random text using nltk.ConditionalFreqDist(). Slides updated: 2:30pm |
3/28 | lecture20.pdf | lecture20.pptx | 20 | Viewer |
Homework 8 on bigrams and nltk.ConditionalFreqDist().
File: randomtext.py Other applications of nltk.ConditionalFreqDist(). Example: word length as the condition. Terminal log: terminal20.txt |
Date | Lecture Notes | Number of Slides |
Panopto | Topic | |
---|---|---|---|---|---|
Powerpoint | |||||
4/2 | lecture21.pdf | lecture21.pptx | 24 | Viewer |
Homework 8 Review. Claude Shannon and his claims about n-gram
language models.
Bibliomancy Random text generation using a trigram model. Terminal log: terminal21.txt |
4/4 | lecture22.pdf | lecture22.pptx | 21 | Viewer |
A note on term projects
nltk.pos_tag(words). A worked example: compare the top-20 adjectives from Oliver Twist and Nicholas Nickleby. Terminal log: terminal22.txt Slides corrected: 1:55pm |
4/9 | lecture23.pdf | lecture23.pptx | 29 | Viewer | Artificial Intelligence:
Large Language Models (LLMs), e.g. ChatGPT.
Transformer Neural Net Architecture. Language Modeling Task: predict what's to come next. Masked Language Modeling Task: predict something surrounded by context. Masked Language Modeling Task: gender bias? Sentiment Analysis: positive/negative sentiment. |
4/11 | lecture24.pdf | lecture24.pptx | 37 | Viewer | Artificial Intelligence: Transformers contd.
Summarization Task A look at Word Embeddings: encoder Transformer. Homework 9 |
4/16 | lecture25.pdf | lecture25.pptx | 16 | Viewer | Homework 9 Review
Homework 10: Term project proposals please! Some final words about Masked Language Models. Syntax: let's try some parsers available online Slides corrected: 4/17 6pm |
4/18 | lecture26.pdf | lecture26.pptx | 13 | Viewer | syntax and nltk: writing grammar rules based on a parse.
Senses for preposition with. Some hands-on programming. Terminal log: terminal26.txt Grammars built in class: g.txt / g2.txt |
4/23 | lecture27.pdf | lecture27.pptx | 25 | Viewer | ChatGPT and structural ambiguity.
more grammar stuff with nltk: Empty categories and ambiguity: the chicken is ready to eat. Grammars built in class: g3.txt Terminal log: terminal27.txt |
4/25 | lecture28.pdf | lecture28.pptx | 27 | Viewer | WordNet and nltk.
File: fullhyponyms.py Unicode decoding: foreign characters with Python: some problems on macOS. Terminal log: terminal28.txt |
Date | Lecture Notes | Number of Slides |
Panopto | Topic | |
---|---|---|---|---|---|
Powerpoint | |||||
5/1 | lecture29.pdf | lecture29.pptx | 31 | Viewer | Continuing with WordNet: similarity measures.
|