To my linguistics homepage

LING 581
Advanced Computational Linguistics
Spring 2026

This on-campus course continues the introductory LING/C SC/PSYC 538 Computational Linguistics1. This is a course designed to give students more in-depth knowledge and hands-on experience with technique and software than is possible in 538.

Part of this course will involve more advanced and in-depth exploration of fundamental topics covered in 538, e.g. with respect to writing grammars.

The larger part of the course involves projects using software packages.

As part of the course, students will be expected to develop the skills to install, run and perform project work on their own machines.

Projects:

  1. Parsing algorithms.
  2. Treebanks (phrase-structure/dependency-based): e.g. Penn Treebank, lookup software.
  3. Part-of-speech taggers.
  4. The use and modification of statistical parsers trained on Treebanks
  5. Advanced linguistic theories: the Minimalist Program
  6. Ontologies and Semantic Networks: WordNet etc.
  7. Question-Answering (QA)
  8. more...

1Note: 538 is a pre-requisite for this class. For on-campus students, 538 is offered in Fall semesters only.

Software

We will make use of programming languages Python (3.x), Perl and Prolog, corpora from the LDC and other sources. Plus other software packages, e.g. (Java-based) parsers. All software used will be freely available.
Students are expected to have their own laptop. And possess sufficient privileges to install packages on their machines. Linux and MacOS will be supported. Only partial support for Windows 10, students on Windows PC should install Linux as well.

Grading

Students will be given a series of tasks to accomplish. Satisfactory completion of all tasks will result in a superior grade.

Readings

Required reading will be from the draft version of the 538 course textbook Speech and Language Processing 3rd edition, (Jurafsky & Martin), and in the form of project documentation (manuals) and papers and/or dissertations to be made available on-line.

Instructor: Sandiway Fong sandiway AT arizona.edu
Office: 311 Douglass

Administrivia

Location Psychology Rm 305
Time Monday-Wednesday 11:00 am - 12:15 pm

Syllabus

See lecture 1 slides and syllabus.pdf.

Lecture Notes

Available in both Adobe PDF and Microsoft Powerpoint formats.

January

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
1/14 lecture1.pdf lecture1.pptx 43 Viewer Course details and syllabus.
Syntax Review: constituent and depndency parses. Homework 1 Q1-Q3.
HW 1 Part 2: Copy PTB.
HW 1 Part 3: install tregex.
Note: possible Windows 11, Ubuntu and macOS issues detailed at the end.
1/19         No Lecture. MLK Jr Holiday.
1/21 lecture2.pdf lecture2.pptx 36 Viewer Homework 1 Review.
Install PTB into nltk. (Homework 2)
tregex and PTB.
Appendix: macOS TimesRoman FontBook problem.
1/26 lecture3.pdf lecture3.pptx 20 Viewer tregex contd.
Homework 3: British vs. American English and tregex.
Slides updated (12:40pm): whiteboard diagrams added
1/28 lecture4.pdf lecture4.pptx 16 Viewer nltk and ptb.
Class exercise.
Homework 4
Class code: class4.py
Terminal: terminal4.txt

February

Date Lecture Notes Number
of Slides
Panopto Topic
PDF Powerpoint
2/2 lecture5.pdf lecture5.pptx 25 Viewer Homework 3 Review
SWI-Prolog revisited: cheat sheet.
Definite clause grammar (DCG) rules.
Language membership question and enumeration.
Extra argument for nonterminals: recovering a parse tree.
Grammars:
apbp.prolog
apbp2.prolog
sheeptalk.prolog
sheeptalk2.prolog
Terminal: terminal5.txt
2/4 lecture6.pdf lecture6.pptx 6 Viewer: Part 1 / Viewer: Part 2 DCG Class Exercise:
  1. colorless green ideas sleep furiously /
  2. *colorless green ideas sleeps furiously /
  3. revolutionary new ideas appear infrequently

  4. Subject-verb number agreement.
    Class code: class6.prolog
    Terminal log: terminal6.txt
2/9 lecture7.pdf lecture7.pptx 38 Viewer Class Exercise (from last time) remarks.
Homework 4 Review
Context-sensitive grammars in Prolog: three methods.
The case of the context-sensitive language {anbncn | n> 0}: parts 1 and 2.
abc_parse.prolog
abc_count.prolog
Class code: class7.py
Terminal log: terminal7.txt
2/11 lecture8.pdf lecture8.pptx 23 Viewer The case of the context-sensitive language {anbncn | n> 0}: part 3.
Writing a context-sensitive grammar.
abc_cs.prolog
Homework 5
Slides updated: 4:25pm, see whiteboard pictures.
2/16 lecture9.pdf lecture9.pptx 28 Viewer: Part 1 / Viewer: Part 2 Homework 5 Review
The Cross Serial Dependencies lecture.
Developing the Prolog context-sensitive grammar in 4 stages.
g1.prolog
g2.prolog
g3.prolog
Terminal: terminal9.txt
Update: 12:40pm slides updated with whiteboard photo
2/1 lecture10.pdf lecture10.pptx 31 Viewer: part 1 / Viewer: part 2 Turn to writing our own CFGs for natural language:
  1. A note on SWISH.
  2. Agreement.
  3. The problem with Prolog left recursion.
  4. A grammar transformation: left recursive to right recursive BUT structure preserving.
Homework 6.
Files:
nl1.prolog / nl2.prolog / nl3.prolog
left.prolog / left2.prolog
Class Exercise: class10.prolog
Terminal log: terminal10.txt
2/23         No Lecture. (I'm out of town.)
2/25 lecture11.pdf lecture11.pptx 34 Viewer Homework 6 Review.
A note on PP-stacking.
Language judgements when humans have a choice: case study anaphor and control verbs. Compare with ChatGPT then and now. Is ChatGPT becoming more human-like?
Two factors: model size increase, and built-in Web search.


To my linguistics homepage