To my linguistics homepage

LING 388
Computers and Language
Spring 2024

This is a introductory course in computational linguistics for undergraduates. There are no prerequisites. There is no textbook. Student will learn to program using Python (3.x) and also learn to use basic computational tools such as NLTK for language analysis. New for 2024, students will also study Large Languge Models (LLMs) using Transformer tools. A term project is required.

Both classroom lectures and computer laboratory exercises will be used. A term project illustrating the skills learnt is required.

Students will need their own computer, preferably a laptop.

Software

We will use Python and nltk.

Examples

10K word chunks, word length frequency distributions.
Universal Declaration of Human Rights: How many words?
English names ending in?
Stream of consciousness and sentence length.
The Buffalo sentence diagrammed.

Instructor: Sandiway Fong sandiway AT arizona.edu
Office: 311 Douglass

Administrivia

Location Emil W. Haury Anthropology Building, Rm 219
Time Tuesday-Thursday 12:30-1:45 pm

Syllabus

See lecture 1 slides and syllabus.pdf.

Lecture Notes

Available in both Adobe PDF and Microsoft Powerpoint formats.

January

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
1/11 lecture1.pdf lecture1.pptx 34 Viewer Administrivia and Introduction.
Examples of things we'll be able to do with Python and nltk.
Homework 1: Install Python 3 on your computer.
1/16 lecture2.pdf lecture2.pptx 39 Viewer The Google N-grams lecture.
Quick Homework 2.
1/18 lecture3.pdf lecture3.pptx 36 No recording available Homework 2 review.
Computers and supercomputers. The human brain. The Turing Machine.
Binary and Hexadecimal. Numbers on a computer. Python and numbers. Computer MAC addresses.
Homework 3.
6pm: corrected slide for due date.
1/23 lecture4.pdf lecture4.pptx 26 Viewer Homework 3 review.
CPU datatypes: integers and floating point numbers.
Character representation on a computer. ASCII. Unicode. UTF-8. BOM.
1/25 lecture5.pdf lecture5.pptx 17 Viewer Python: numbers and strings. Type coercion.
Homework 4.
1/30 lecture6.pdf lecture6.pptx 22 Viewer Homework 4 review.
Python: strings vs. lists. max(). sorted(). range(). sys.argv: command line arguments.
Terminal log: terminal6.txt

February

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
2/1 lecture7.pdf lecture7.pptx 20 Viewer Recap: sorting on the command line
Modifying lists: append, extend, insert, and remove.
Homework 5: install nltk
2/6 lecture8.pdf lecture8.pptx 19 Viewer A note on python -m pip install.
File: alice.txt
Exercises:
  1. alice.txt: string to words, counting, averages.
  2. Looking at the full text from nltk.corpus.gutenberg.
  3. Using nltk.FreqDist(corpus)
  4. set = vocabulary
  5. open() and .read() for text files.
Terminal log: terminal8.txt
2/8 lecture9.pdf lecture9.pptx 21 Viewer A bit more on eval().
sorting: sorted() and list.sort() with the key= sort parameter.
more on lists: stacks and queues, reversed() vs. .reverse().
Homework 5
Terminal log: terminal9.txt
2/13 lecture10.pdf lecture10.pptx Viewer Lecture canceled due to sickness.
2/15 lecture11.pdf lecture11.pptx 21 Viewer A recent trend: pricey vs. pricy.
Homework 5 Review. Some extras: (1) Plotting x-y graphs directly using matplotlib.pyplot. (2) nltk.FreqDist() .freq().
Lists for queues: deque ("deck") from collections.
An obscure way to reverse a list
arange() from numpy
Slides corrected: 2:05pm
2/20 lecture12.pdf lecture12.pptx 18 Viewer List comprehensions: conditional.
A class exercise: taking out the punctuation.
String methods: word.startswith(string), word.endswith(string) and word.istitle().
2/22 lecture13.pdf lecture13.pptx 24 Viewer Announcement about next week.
More complex patterns: regular expressions (regex)
A class exercise: words that end in -ly.
Homework 6
Terminal log: terminal13.txt
2/27 lecture14.pdf lecture14.pptx 23 Viewer Pre-recorded Lecture.
Homework 6 review.
More on regex: re.IGNORECASE, groups: a recap, \n and repeated groups, re.finditer() and words in context.
Regex Exercises: extra credit only.
2/29     Viewer No Lecture. See above.

March

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
3/5 Viewer Spring Break: no class
3/7 Viewer Spring Break: no class
3/12 lecture15.pdf lecture15.pptx Viewer No lecture. Class canceled.
3/14 lecture16.pdf lecture16.pptx 28 Viewer Regex exercises: review.
Stylometry: modern and old.
who wrote wuthering heights
Mendenhall: Mendenhall1887.pdf
terminal16.txt
3/19 lecture17.pdf lecture17.pptx 15 Viewer Worked example for Mendenhall. Live programming, see terminal log.
Oliver Twist: oliver_twist.txt
terminal17.txt
3/21 lecture18.pdf lecture18.pptx 15 Viewer Homework 7. Parts 1, 2 and 3.
Let's look at Mendenhall's claims about six-letter words and 100,000 words.
Use Oliver Twist, Nicholas Nickleby and David Copperfield in this homework.
3/26 lecture19.pdf lecture19.pptx 35 Viewer Homework 7 Review.
nltk.bigrams(). Generating random text using nltk.ConditionalFreqDist().
Slides updated: 2:30pm
3/28 lecture20.pdf lecture20.pptx 20 Viewer Homework 8 on bigrams and nltk.ConditionalFreqDist().
File: randomtext.py
Other applications of nltk.ConditionalFreqDist(). Example: word length as the condition.
Terminal log: terminal20.txt

April

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
4/2 lecture21.pdf lecture21.pptx 24 Viewer Homework 8 Review. Claude Shannon and his claims about n-gram language models.
Bibliomancy
Random text generation using a trigram model.
Terminal log: terminal21.txt
4/4 lecture22.pdf lecture22.pptx 21 Viewer A note on term projects
nltk.pos_tag(words). A worked example: compare the top-20 adjectives from Oliver Twist and Nicholas Nickleby.
Terminal log: terminal22.txt
Slides corrected: 1:55pm
4/9 lecture23.pdf lecture23.pptx 29 Viewer Artificial Intelligence: Large Language Models (LLMs), e.g. ChatGPT.
Transformer Neural Net Architecture.
Language Modeling Task: predict what's to come next.
Masked Language Modeling Task: predict something surrounded by context.
Masked Language Modeling Task: gender bias?
Sentiment Analysis: positive/negative sentiment.
4/11 lecture24.pdf lecture24.pptx 37 Viewer Artificial Intelligence: Transformers contd.
Summarization Task
A look at Word Embeddings: encoder Transformer.
Homework 9
4/16 lecture25.pdf lecture25.pptx 16 Viewer Homework 9 Review
Homework 10: Term project proposals please!
Some final words about Masked Language Models.
Syntax: let's try some parsers available online
Slides corrected: 4/17 6pm
4/18 lecture26.pdf lecture26.pptx 13 Viewer syntax and nltk: writing grammar rules based on a parse.
Senses for preposition with.
Some hands-on programming.
Terminal log: terminal26.txt
Grammars built in class: g.txt / g2.txt
4/23 lecture27.pdf lecture27.pptx 25 Viewer ChatGPT and structural ambiguity.
more grammar stuff with nltk:
Empty categories and ambiguity: the chicken is ready to eat.
Grammars built in class: g3.txt
Terminal log: terminal27.txt
4/25 lecture28.pdf lecture28.pptx 27 Viewer WordNet and nltk.
File: fullhyponyms.py
Unicode decoding: foreign characters with Python: some problems on macOS.
Terminal log: terminal28.txt

May

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
5/1 lecture29.pdf lecture29.pptx 31 Viewer Continuing with WordNet: similarity measures.
  • based on synonymy + semantic (and other) relations.
  • natural to do some comparisons with Word Embeddings with respect to similarity: we'll use word2vec and glove
File: hypernyms.py


To my linguistics homepage