API

We document here the major lexical interfaces to the implemented system, which currently runs in SWI-Prolog, and generally follows its syntax.

Summary: named heads are given as a list to run/1.
Named heads must be defined in grammar.pl
Merge fires and assembles these heads (in order) into a single syntactic object (SO). That SO can be read/interpreted at the C-I interface and also Spelled out.

Example:
Named heads: [john, d, he, self, intelligent, v_be, 'Tinf', consider, 'v*', 'T', c]
Spell out: John considers himself to be intelligent
SO: [c[c][T[d[d][john]][T[T][v*[d[d][john]][v*[v*][consider[consider][Tinf[self[self][he[he][d[d][john]]]][Tinf[Tinf][vbe[self[self][he[he][d[d][john]]]][vbe[vbe][intelligent[intelligent][self[self][he[he][d[d][john]]]]]]]]]]]]]]]
SO (tree) (strikethrough indicates unspelled heads):

See Spellout section below for the frontier processing.

A separate glossary of relevant terms, highlighted in blue is provided here.

run/1



At the browser user interface, the run/1 command is sent via the Websocket to the Prolog server for processing. (Notation: run is the predicate name and 1 is the arity.) Generally, the command is run(list of heads). The Prolog list list of heads contains comma-separated names of heads defined in the grammar.

Example:

[mary, d, see, 'v*', [john, d], 'Tpast', c]
is the list of names of heads necessary to assemble the syntactic object (SO) for the declarative sentence John saw Mary. Each name is a Prolog atom, may be unquoted, e.g. mary or d, or quoted, e.g. 'v*', ' 花子' (Hanako) or 'Tpast'.

Equivalently, in the Terminal user interface, the run/1 command can be typed directly at the Prolog prompt (?-).

(base) Machine$ swipl
Welcome to SWI-Prolog (threaded, 64 bits, version 9.0.4)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.

For online help and background, visit https://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).

?- [mm].
true.

?-   run([mary, d, see, 'v*', [john, d], 'Tpast', c]). 

mkSO/2

mkSO(Name, Head) takes the name of a head defined in the grammar as a bunch of uninterpretable and interpretable features, and creates a syntactic object (SO). This SO will enter Merge.

Example:

?- mkSO('Tpast',SO).
SO = [Tpast].
'Tpast' is the quoted atom representing the Tpast head. ['Tpast'] is the pretty-printed syntactic object created by mkSO/2.

To obtain the features of the head, mkSO/2 will call the predicate features/4 defined in grammar.pl.

(Note: pretty printing of syntactic objects is defined in portray.pl.)

Deep dive: in this implementation, SOs are represented by a structure so(LBL,uFs,iFs,Fs,SOs) of ariy 5. LBL = label. uFs, iFs, Fs = Prolog lists of uninterpretable, interpretable and implementation-specific features, respectively. SOs is a list of sub-constituents. Do not rely on this encoding, things have changed substantially and will change even more substantially in the future.

features/4

features(C,UFs,IFs,Fs) defines for head name C three lists of features.

  1. UFs is the list of uninterpretable features, possibly empty ([]).
  2. IFs is the list of interpretable features, also possibly empty ([]).
  3. Finally, Fs is the list of Machine internal features, or features that do not seem to belong to either list above. Unfortunately, this list is not empty ([]).

Example: In the current implementation, 'Tpast' is the name of the past tense variant of the T head.

?- features('Tpast',UFs,IFs,Fs).
UFs = [f(phi, _, _)],
IFs = [],
Fs = [ef(_, _), occ(_), id(_)].
The past tense head has uninterpretable ɸ-features, no interpretable features, and several bookkeeping features.

Generally, featural properties are listed in grammar.pl. Here are the definitions for matrix C, declarative (named c) and interrogative (named c_Q).

% matrix declarative
features(c,[],[],[id(_),phase(_)]).
		      
% matrix interrogative
features(c_Q,[f(wh,_),f('T',_)],[f(scope)],[ef(_,_),id(_),phase(_)]).  
Both c and c_Q are Phases, and have feature phase(_) that mark the local limit for Merge. We assume the Phase feature is not read at the C-I interface, nor is it valued by probe-goal, so it's neither interpretable nor uninterpretable.

(For pretty-printing, Phase heads are marked in blue in the browser user interface trees.)

Interrogative C, named c_Q, is a probe with an uninterpretable Wh feature that must be valued, and it also possesses an interpretable scope feature. The probe will find goal wh-DPs with an interpretable wh feature. Interrogative C also has an edge feature that permits a wh-NP to raise here (and possibly spell-out here). (In later theories such as Box theory, no Internal Merge operation is necessary.) Finally, interrogative C has an uninterpretable T feature (forced by the adoption of Pesetsky & Torrego). In Who saw John?, interrogative C has both uninterpretable Wh and T simultaneously valued by subject DP who. By economy, this prevents T separately raising to interrogative C, which can value T (on C), from the wh-DP raising to value Wh on C. Otherwise, T in interrogative C will result in *Who that saw John? as an option, as T in C isobligatorily pronounced as that.

Deep dive: the bookkeeping features are all imperfections in the current implementation, and should be eliminated in future versions.

  1. occ(_) is used to control spell-out. In the implemented theory, 'Tpast' may raise to C and be pronounced as that, following Pesetsky & Torrego. The lower occurrence of 'Tpast' is not pronounced. occ/1 keeps track of which occurrences may be pronounced.

  2. id(_) is used an an unique identifier. In the probe-goal implementation, we sometimes need to uniquely identify which head is doing the valuing.

  3. ef(_,_) is the Edge feature. It is not a feature to be read at the C-I interface, so it's not interpretable. It is not targeted by probe-goal, so it's not an uninterpretable feature either. It is placed here as it doesn't fit in either category. The fact that T has a surface subject position is really irreducible in the current theory.

nRoot/1

Common nouns are divided into nRoot1/1 or nRoot2/1 or nRoot3/1 in grammar.pl. The difference between the three types is in terms of ɸ-features.

nRoot1/1 nouns are 3rd person singular nouns. Features are determined by the following line:

  features(N,[f('D',_)],[f(phi,[3,sg,n],_)],[occ(_)]) :- nRoot1(N).
  
See also documentation of features/4 above.

Similarly, nRoot2/1 are 3rd person plural nouns.

  features(N,[f('D',_)],[f(phi,[3,pl,n],_)],[occ(_)]) :- nRoot2(N).
  

nRoot3/1 nouns are unspecified for Number.

Examples of the three types are given below:

nRoot1/1 nRoot2/1 nRoot3/1
nRoot1(apple) nRoot2(boxes) nRoot3(fish)
nRoot1(man) nRoot2(men)
nRoot1(violin) nRoot2(shelves)

nRoot2/1 entries for nRoot1/1 nouns with regular plurals are not explicitly listed. A spelling rule that adds -s will be used to form the corresponding nRoot2/1 entry.

phase/1

Phase heads are defined in grammar.pl to contain a feature phase that limits search. As it is not valued in probe-goal operations, the feature is not an uninterpretable feature. It is also not read at the C-I interface, so it is not an interpretable feature either. Hence, it is placed in the third feature list.

Examples:

% complementizers
% matrix declarative
features(c,[],[],[id(_),phase(_)]). 
 % matrix interrogative
features(c_Q,[f(wh,_),f('T',_)],[f(scope)],[ef(_,_),id(_),phase(_)]). 
 % embedded declarative
features(c_e,[f('T',_)],[f(theta,_)],[ef(_,_),id(_),phase(_)]). 
 % wh-movement host, no scope feature
features(c_eQ,[f(wh,_),f('T',_)],[f(theta,_)],[ef(_,_),id(_),phase(_)]). 
 % embeddded interrogative
features(c_Qe,[f(wh,_),f('T',_)],[f(theta,_),f(scope)],[ef(iF(wh),_),id(_),phase(_)]).
 % relative C
features(c_rel,[f(rel,_),f('T',_),f(phi,_,_)],[],[ef(_,_),id(_),phase(_)]).
 % complementizer for
features(for,[f(phi,_,_)],['T'],[occ(_),phase(_)]). 

% -self and possessive 's 
features(self,[f(case,_,_),f('N',_)],[f(theta,_)], [phase(_),release(_),occ(_)]). 
features('\'s',[f(case,_,_),f('N',_)],[f(phi,[3,_,_],_),f(theta,_)], [ef(unvalued_iF(theta),_),phase(_),occ(_)]). 
  

vRoot/1

Verbs are listed using vRoot/1, the single argument should be the base nonfinite form.

features(V,[],[],[occ(_)]) :- vRoot(V).

Deep dive: occ(_) is a bookkeeping feature used to control whether the occurrence of the verb is spelled out or not.

Spellout

Spellout is part of Externalization (EXT), based on linear order, and not part of Narrow Syntax.

Basic algorithm: Look at the frontier of the SO as a list of heads. Ignore all heads marked by occ as not the highest occurrence (of that head). Let remaining heads be [h1, h2, .., hn].

Then proceed linearly through the list:

  1. If no_spellout(hi), defined below, is true for hi. Highest occurrence, but skip the head. Continue with hi+1
  2. If block_heads(hi, hi+1), defined below, is true for bigram hi hi+1 crash the current derivation. No Spell out is possible.
  3. If realize_heads2/3 or realize_heads/3, defined below, maps bigram hi hi+1 to new list of heads [Hi,.., Hi+k], k usually 0, then continue with Hi,.., Hi+k in place of hi hi+1.
  4. If hi is T or v, apply realize_head/2, defined below, to compute a morpheme X replacing hi. Continue with hi+1.

    [This implies two passes for TNS to verb inflection, one to convert TNS to a morpheme, e.g. -ed, and one to convert the morpheme + head, e.g. -ed fall to fell. Maybe worth converting this to a single pass algorithm.]

  5. If none of the above cases apply to hi, it is passed through unchanged.

Example: consider the frontier shown for the sentence John considers himself to be intelligent (at the top of this document).

[c, d, john, 'T', 'v*, consider, self, he, 'Tinf', vbe, intelligent]
C, v* and D are not spelled out, conversion of T and Tinf into -s and to is applied, names are spelled with inital caps, and Accusative Case is turned into affix -acc.
[John, '-s' , consider, '-self', '-acc', he, to, be, intelligent]
Affix hop:
[John,  consider, '-s', he, '-acc', -self', to be, intelligent]
2nd pass: consider + -s -> considers and he + -acc + -self -> himself etc.:
[John,  considers, himself, to be, intelligent]

no_spellout/1

Certain heads never spell out.

no_spellout(d). 	% empty determiner	
no_spellout(d_rel). 	% empty determiner (for relativization) 
no_spellout(q).		% empty wh-determiner 
no_spellout(pro). 	% small pro and big PRO  
no_spellout('PRO'). 
no_spellout('G'). 	% dyadic preposition G 

Other spellout rules are language-particular and explicitly learned.

Verbal inflection

Inflection for spellout is controled by affix rules or explicitly listing exceptions (for irregular verbs).

Example: write has irregular forms when combined with morpheme -en or -ed(ɸ).

vRoot(write).

spell_out(en,write,written).
spell_out(ed(ɸ),write,wrote).

Note: ɸ in -ed(ɸ) is not used in the English implementation. Verbal suffixes are computed by realize_head/2.

realize_head/2

realize_head/2 maps a single head onto a verbal inflectional morpheme. In the current implementation, T with tense features and (computed) ɸ-features generates the following ɸ-specific inflectional morphemes, -m, -re, -s, and -ed, in English.

Examples:

realize_head(t(pres,[1,sg],_),-m).	% am
realize_head(t(pres,[2,sg],_),-re). 	% are
realize_head(t(pres,[3,sg],_),-s).
realize_head(t(pres,[_,pl],_),-re).	% are
realize_head(t(past,[1,sg],_),-ed(sg)).
realize_head(t(past,[1,pl],_),-ed(pl)).
realize_head(t(past,[2,_],_),-ed(pl)).
realize_head(t(past,[3,sg],_),-ed(sg)).
realize_head(t(past,[3,pl],_),-ed(pl)).
Note: NUM in -ed(NUM) is currently unused.

realize_head/2 is also used for auxiliary heads in English verbal system to impose a linear constraint on the output form of the next verb (after affix hopping).

Examples:

realize_head(prog,-ing).  % be -ing eat -> be eating
realize_head(perf,-en).   % have -en eat -> have eaten
realize_head(prt,-en).    % be -en eat -> be eaten

spell_out/3

In English, pronouns do inflect with Case. At Spell-out, non-nominative Case such as acc (Accusative) and obq (Oblique) are separate morphemes in this implementation.

Examples:

spell_out(acc,who,whom).
spell_out(acc,he,him).
spell_out(acc,she,her).
spell_out(acc,they,them).
spell_out(acc,i,me).
spell_out(acc,'I',me).
spell_out(acc,we,us).
spell_out(obq,he,him).
spell_out(obq,she,her).
spell_out(obq,they,them).
spell_out(obq,i,me).
spell_out(obq,we,us).

In English, -self, a separate lexical head in this implementation, will combine with pronominals.

Examples:

spell_out(self,him,himself).
spell_out(self,her,herself).
spell_out(self,my,myself).
spell_out(self,you,yourself).
spell_out(self,you,yourselves).	% plural
spell_out(self,our,ourselves).
spell_out(self,they,themselves).
spell_out(self,he,nospellout).	% lexical gap
The last case indicates -self + he -> *heself.

Regular verbal spellout rules for English are also described using spell_out/3, summarized below.

  1. -s + basebases
  2. -ed(ɸ) + base (ending in e) ➝ based
  3. -ed(ɸ) + base (otherwise) ➝ baseed
  4. -en + base (ending in e) ➝ based
  5. -en + base (otherwise) ➝ baseed
  6. -ing + base (ending in e) ➝ base(minus e)ing
  7. -ing + base (otherwise) ➝ baseing

block_heads/2

Certain head bigrams cannot spell out at all. If generated, derivation will fail.

Examples:

block_heads(of,who_rel).	% *of who
block_heads(than,who_rel).	% *than who

realize_heads2/3 and realize_heads/3

Certain head bigrams, e.g. personal pronouns in English, spell out specially.

[Need to explain why we have both spell_out/3 and realize_heads2/3 or collapse them.]

Examples:

realize_heads2(he,'\'s',[his]).		% he + 's -> his
realize_heads2(she,'\'s',[her]).	% she + 's -> her
realize_heads2(they,'\'s',[their]).
realize_heads2(you,'\'s',[your]).
realize_heads2(i,'\'s',[my]).
realize_heads2(it,'\'s',[its]).
Prolog notation: '\'s' is an atom. The escaped single quote is a single quote character inside '...'. So '\'s' is the unquoted atom 's. See atom in the glossary.

T in C spellout as that also uses realize_heads/3.

Example:

realize_heads(T,C,[that]) :- morpheme_withFs(T,'T'), type(C,c).

Examples:


Last modified: Thu Aug 17 20:35:26 MST 2023