|
This on-campus course continues the introductory LING/C SC/PSYC 538
Computational Linguistics1. This is a course
designed to give students more in-depth knowledge and hands-on
experience with technique and software than is possible in 538.
Part of this course will involve more advanced and in-depth
exploration of fundamental topics covered in 538, e.g. with respect
to writing grammars.
The larger part of the course involves projects using software packages.
As part of the course, students will be expected to develop the skills to install, run and perform project work on their own machines.
Projects:
- Parsing algorithms.
- Treebanks (phrase-structure/dependency-based): e.g. Penn Treebank, lookup software.
- Part-of-speech taggers.
- The use and modification of statistical parsers trained on Treebanks
- Advanced linguistic theories: the Minimalist Program
- Ontologies and Semantic Networks: WordNet etc.
- Question-Answering (QA)
- more...
1Note: 538 is a pre-requisite for this class. For
on-campus students, 538 is offered in Fall semesters only.
Software
We will make use of programming languages Python (3.x),
Perl and Prolog, corpora from the LDC and other
sources. Plus other software packages, e.g. (Java-based)
parsers. All software used will be freely available.
Students are expected to have their own laptop. And possess sufficient
privileges to install packages on their machines. Linux and MacOS will
be supported. Only partial support for Windows 10, students on
Windows PC should install Linux as well.
Grading
Students will be given a series of tasks to
accomplish. Satisfactory completion of
all tasks will result in a superior grade.
Readings
Required reading will be from the draft version of the 538 course textbook Speech and Language
Processing 3rd edition, (Jurafsky & Martin), and in the form of project documentation
(manuals) and papers and/or dissertations to be made available
on-line.
Instructor: Sandiway Fong sandiway AT arizona.edu
Office: 311 Douglass
Administrivia
| Location |
Psychology Rm 305 |
| Time |
Monday-Wednesday 11:00 am - 12:15 pm |
|
|
Syllabus
See lecture 1 slides and syllabus.pdf.
Lecture Notes
Available in both Adobe PDF and Microsoft Powerpoint formats.
January
| Date |
Lecture Notes |
Number of Slides |
Panopto |
Topic |
| PDF |
Powerpoint |
| 1/14 |
lecture1.pdf |
lecture1.pptx |
43 |
Viewer |
Course details and syllabus.
Syntax Review: constituent and depndency parses. Homework 1 Q1-Q3.
HW 1 Part 2: Copy PTB.
HW 1 Part 3: install tregex.
Note: possible Windows 11, Ubuntu and macOS issues detailed at the
end.
|
| 1/19 |
  |
  |
  |
  |
No Lecture. MLK Jr Holiday.
|
| 1/21 |
lecture2.pdf |
lecture2.pptx |
36 |
Viewer |
Homework 1 Review.
Install PTB into nltk. (Homework 2)
tregex and PTB.
Appendix: macOS TimesRoman FontBook problem.
|
| 1/26 |
lecture3.pdf |
lecture3.pptx |
20 |
Viewer |
tregex contd.
Homework 3: British vs. American English and tregex.
Slides updated (12:40pm): whiteboard diagrams added
|
| 1/28 |
lecture4.pdf |
lecture4.pptx |
16 |
Viewer |
nltk and ptb.
Class exercise.
Homework 4
Class code: class4.py
Terminal: terminal4.txt
|
February
March
April
| Date |
Lecture Notes |
Number of Slides |
Panopto |
Topic |
| PDF |
Powerpoint |
| 4/1 |
lecture19.pdf |
lecture19.pptx |
32 |
Viewer |
Some lemma methods: .derivationally_related_forms() and
.pertainyms() .
WordNet size.
Example use of WordNet: Semantic Opposition. Compare with ChatGPT.
Terminal log: terminal19.txt
|
| 4/6 |
lecture20.pdf |
lecture20.pptx |
36 |
Viewer
Part 1 /
Viewer Part 2 |
Case study: the senses of break as a verb.
Discussion and live programming.
Save and restore results using pickle.
File: break.py
Terminal log: terminal20.txt
Slides updated: 1pm
|
| 4/8 |
lecture21.pdf |
lecture21.pptx |
21 |
Viewer Part 1 / Viewer
Part 2 |
Case study: the senses of break as a verb contd.
Homework 10: continue the Case Study. 4 tasks. 5th task: extra credit.
A note on Stanza trees vs. nltk Tree.
File: task5.py
Terminal log: terminal21.txt
|
| 4/13 |
lecture22.pdf |
lecture22.pptx |
33 |
Viewer |
Homework 10 Review.
Language Acquisition: Childes database vs. PTB.
Using benepar in nltk/spaCy
Slides updated: 12:10pm
|
| 4/15 |
lecture23.pdf |
lecture23.pptx |
33 |
Viewer |
Zipf's Law and the PTB, aka any large dataset looks Zipfian!
POS tag ambiguity: the PTB tagging guide vs. actual tags in the PTB.
PTB Productions
File: zipf.py
Slides updated: 12:40pm
|
| 4/20 |
lecture24.pdf |
lecture24.pptx |
23 |
Viewer |
PTB Productions: counting and distribution
Case Study: How many rules do we need to acquire to parse a
sentence?
Slides updated: 12:30pm
|
| 4/22 |
lecture25.pdf |
lecture25.pptx |
25 |
Viewer |
find_scfg() code walkthrough time!
File: find_scfg.py
Homework 11: experiments on finding the smallest CFG.
|
| 4/27 |
lecture26.pdf
Download this instead! |
Removed: .pptx: too large! |
29 |
Viewer |
Homework 11 Review
And now for something completely different!
Advertising: LING696A: Seminar on Syntax and Computation
Slides and Chomsky video clips on Inquiry into Language: the Strong
Minimalist Thesis (SMT).
|
| 4/29 |
lecture27.pdf |
Removed: .pptx: too large! |
37 |
Viewer |
Introduction to the SMT continued.
Basic Property (BP) of Language
Communication vs. Thought
Evolution and Biology: examples - the eye and the brain (CNS).
|
May
| Date |
Lecture Notes |
Number of Slides |
Panopto |
Topic |
| PDF |
Powerpoint |
| 5/4 |
lecture28.pdf |
lecture28.pptx |
20 |
Viewer |
On corpus Linguistics: not the way to do science of language.
Three factors in language design: the Third Factor.
The slow brain bottleneck.
Sensor/brain mismatch.
Contrasting models of language: deep neural nets vs. SMT.
Cognitive basis for Merge: Natural numbers and language.
Agreement: requires a non-linear model
Two types of Merge: Internal and External.
|
| 5/6 |
lecture29.pdf |
lecture29.pptx |
33 |
Viewer |
Two types of Merge: Internal and External. An example of how we can
construct parses.
Multiple Workspaces: reflexively construct multiple parses.
Duality of Semantics
Simplicity and the Markovian Assumption
Minimal Search as the only possible Search operation.
|
To my linguistics homepage
|