![]() ![]() ![]() |
|
Obligatory: | Optional: |
(*) Not actually described here. Reference is to predicate
agreeAGR/4
from constituent feature access.
lexicon(W,C,L)
|
Word W , category label C , and a simple list
of features L .
|
This is the sole interface to the lexicon for word lookup. The
implementation of lexicon/3
is left up to the
user. This allows for much flexibility in how the lexicon may be
organized, for example, with respect to inflected forms.
However, note that there are no mode restrictions. This means that it should not be assumed that the word itself or any other parameter will necessarily be supplied upon lookup. In other words, the implemented predicate must be able to function as an enumerator (and, of course, terminate) when called with one or more uninstantiated parameters. For example, here are some of the many possible modes of usage:
Call | Comment |
lexicon(+W,-C,-L) |
(WordMatch) |
lexicon(+W,+C,-L) |
(evalExpand for X=word ) |
Also, we assume in the case of ambiguity the predicate will supply the possible matches one at a time.
Sample Implementation:
Here is a simple implementation of lexicon/3
as used in
a small English lexicon supplied with PAPPI:
lexicon(Word,C,Fs) :- lex(Word,C,Fs). % directly available lexicon(Form,v,Fs) :- % non-base verb forms lex(Form,v,Base,F1), verbFeatures(Base,F2), append1(F1,F2,Fs). verbFeatures(Base,F) :- lex(Base,v,F1), pick1(morph(Base,_),F1,F).Here, words are either stored directly as
lex/3
facts,
or, in the case of inflected verbs, as pairs with the
inflected forms in lex/4
and the base (or infinitival)
form in lex/3
. This pairing scheme is used to avoid
unnecessary replication of information. For example, here is the entry
for the verb eat:
%% lex/3 lex(eat,v,[morph(eat,[]),grid([agent],[[patient]])]). %% lex/4 lex(eating,v,eat,[morph(eat,ing)]). lex(eats,v,eat,[morph(eat,s)]). lex(ate,v,eat,[morph(eat,ed(1))]). lex(eaten,v,eat,[morph(eat,ed(2))]).
Referenced by: X-bar rules compiler (compile-time) / GLR machine builder (compile-time) / Expand Contractions (run-time)
References:
probeLexicon(+Atom)
|
Holds if atom Atom is present in the lexicon.
|
This is used by the contraction mechanism for output items of the form
Atom=word
to check that Atom
can be found in
the lexicon. For efficiency, the predicate should be defined to be
a deterministic lookup procedure.
Example:
In the English lexicon, words are encoded using lex/3
(base forms) or lex/4
(fully inflected forms). Hence the
following definition:
% deterministic probeLexicon(Word) :- lex(Word,_,_) -> true ; lex(Word,_,_,_), !.That is, as far as
probeLexicon
is concerned, it doesn't
matter whether Word
occurs once or many times in the
lexicon.
Reference: contraction
term(C)
|
Holds if C is a terminal symbol (or category label)
present in the lexicon.
|
Here is a sample definition:
term(n). term(v). term(a). term(p). term(c). term(adv). term(det). term(neg). term(mrkr). term('$').
$
is a dummy terminal symbol (used by the GLR machine)
that must be present in every lexicon. It is purely a dummy since
there will be no lexical entries of category $
.
Note that C
is not necessarily restricted to heads as
well as the usual non-projecting terminal symbols. For example, in a
theory of noun phrases based on DPs (Determiner Phrases), we might
choose to exercise the option of lexical insertion at a phrasal level
as well:
term(dp). term(np). term(n). term(d). ... lex(one,np,[count(+),agr([3,sg,[]]),a(-),p(-)]). ... %% Common nouns lex(actor,n,[a(-),p(-),count(+),agr([3,sg,m]),vow]). %% Proper nouns lex(bill,dp,[a(-),p(-),agr([3,sg,m])]).
Referenced by:
X-bar rules compiler (compile-time)
/ GLR machine builder (compile-time)
/ ParsePF
(run-time)
References:
contraction(+K,+W,+L)
|
Contractions expand an input word W , or two adjacent
words W1 W2 , into a list of output words L .
Contractions are grouped into classes, denoted by K . Classes may be used to define a context in which
a contraction may fire. Any contraction belonging to the null class,
[] , may fire at any time.
|
Every lexicon must contain declarations for contraction/3
and contraction/4
. If there are no entries, the lexicon
should contain the following two lines:
no contraction(_,_,_). no contraction(_,_,_,_).For more details on how the contraction mechanism operates, see the description of
ParsePF
.
We now proceed to describe the pattern matching options for
the word to be expanded, i.e. W
for
contraction/3
, and W1
and W2
for contraction/4
.
Word Match Patterns
The following patterns are available for both
contraction/3
and contraction/4
.
Given a word W
in the input:
-V |
V is a variable. Succeeds with
V bound to W . |
For example, in the French lexicon, W
= l'homme
successfully matches the folowing rule with X
set to
l and Y
to homme.
contraction([],X,''''+Y,[X+[e,a]=word,Y=word]).
+Atom |
Atom is a simple atom. Succeeds if
Atom is identical to W . |
For example, W
= 'd successfully matches
'd
and triggers the rule:
contraction([],'''d',[would]).
-V+suffix |
V is a variable.
suffix is an atom. Succeeds if suffix is a
non-empty suffix of W . V is bound to the
remainder of W .
|
For example, in the Turkish lexicon, W
= elmayý
successfully matches X+yý
with X
bound to
elma using the rule:
contraction(case,X+yý,[X=pf([block(cop),block(case)]),acc]).
-V+double(-C,+L) |
V and C are variables.
L is a list of atoms representing single
characters. Succeeds if W ends with a doubled character
C chosen from L . |
For example, in the Hungarian lexicon, we have a rule for a doubled final consonant:
contraction(doubledFC,X+double(C,[m,l,t,r]), [X+C=pf([allowOnly(poss)]),lengthenedFC]).
-V+single(-C,+L) |
V and C are variables.
L is a list of atoms representing single
characters. Succeeds if W ends in a character C
chosen from L .
|
Example:
contraction(past,X+single(C,[t,d])+ett,[X+C=pf([allowOnly(caus)]),pst]).
-V$+L |
V is a variable. L contains a list of
lexical features. Succeeds if W contains all the features
in L . Note W must be a word in the lexicon.
|
For example, here is a rule in the Hungarian lexicon for forcing az to be analyzed as a demonstrative, as opposed to being a determiner, when it precedes a non-vowel-initial word.
contraction([],az,Y$[not(vow(+))],[az$[dem],Y]).
We now move on to describe the possible output patterns for the contraction rules. In general, the output pattern is a (possibly empty) list. Each list item must conform to one of the following forms:
Output Patterns Items
+Atom |
Atom is a simple atom.
|
For example, here the output of the rule for can+'t
is
a pair of atoms, can
followed by neg
:
contraction([],can,'''t',[can,neg]).
+Atom$[F1,..,Fn] |
Atom must have features
[F1,..,Fn] . Atom is a
simple atom. Fi are
features.
|
Feature values can be used to select particular lexicon items or instantiate feature values in generic elements. Here is an example of the latter from the Japanese lexicon:
contraction(u,X+tta,[X=morph(_,u),past$[suffix(tta,a4c3a4bf)]]).The second element in the output, namely
past
, is a
generic past tense suffix. The morphological form of this suffix
will vary according to the verb stem. In this case, it will take
form -tta when proceeded by a verb of the -u-class.
The morphological form is encoded using the feature
suffix
. Hence, this will force the suffix
feature for [past]
to unify with
suffix(tta,a4c3a4bf)
.
+Atom$$[F1,..,Fn] |
Feature Hopping: features
[F1,..,Fn] will attach
and apply to the item immediately following
Atom . Atom is a
simple atom. Fi are
features.
|
Note: the item immediately following Atom
, call it
Next
, must arise from the same original input word.
sequence. That is, Atom
and Next
must share
the same derivation sequence of contractions. If Atom
happens to be the last or rightmost output word, the rule containing
the feature hop will fail.
This output pattern is designed for use when Atom
affects the appearance of or applies some constraint to the next
morpheme.
For instance, here are two verb stemming rules from the Japanese lexicon for verbs that end in -u and -ku, respectively:
contraction(vNStem,X+wa,[X=word(base(u))$$[prefix(wa,a4ef)]]). contraction(vNStem,X+ka,[X=word(base(ku(_)))$$[prefix(ka,a4ab)]]).The first rule is used, for example, in conjunction with the base entry for the verb kau (buy):
lexicon(ka,v,base(u),[index(_),morph(kau,base(u)), grid([agent],[theme]),k(c7e3),eng(buy)]).and:
contraction(vEnd,X+nai,[X=pf([require(vNStem)]),negnpast]).to correctly decompose kawanai as the negative non-past form of kau. Basically, the
vNStem
rule says that
the negative stem - for kau is kawa.
The feature prefix(wa,a4ef)
hops onto the following item,
namely negnpast
, derived earlier (via vEnd
)
from the negative non-past ending -nai. In other words, the
prefix -wa effectively serves as a bridge or linking mora for
combining -u verbs with negative endings.
Note that feature hopping is local in the sense that
prefix/2
hops onto an element that is local to
ka+wa+nai. In particular, -wa cannot be used to
bridge a word that follows this original input word.
-X=word |
Restricts X to be a word in the lexicon. Note,
X is typically a variable that (must) occur
in the input pattern.
|
(Note: the lexicon will be accessed twice for X
. Once
using the predicate probeLexicon
during contraction processing. lexicon/3
will be used second time
around to pick up the lexical features.)
For example, here is an (over-productive) rule from the French lexicon for transforming vowel contractions like l'homme into le homme:
contraction([],X,''''+Y,[X+[e,a]=word,Y=word]).Here,
X
and Y
must refer to variables in the
input pattern. Note also, unlike in the case of
Y=word
, the left side of the output pattern
X+[e,a]=word
is not a simple variable. In general, the
left side of an equative output item can be one of the following:
|
-X=word(+Form) |
Restricts X to be a word with Form in the
lexicon. Note, X is typically a variable that (must)
occur in the input pattern.
|
This output form behaves identically to the standard output form
X=word
except with respect to word lookup.
Lookup is performed via probeLexicon(Word,Form)
and
lexicon(Word,C,Form,Fs)
. These two predicates must be
defined in the lexicon only if this output form is used.
For example, this is used in the Japanese implementation to separate verb base forms from other lexical entries. For instance:
contraction(vStem4,X+i,[X=word(base(ku(1)))$$[prefix(i,a4a4)]]). contraction(vStem4,X+t,[X=word(base(ku(2)))$$[prefix(t,a4c3)]]).Here,
X
will be looked up with Form
as
base(ku(1))
or base(ku(2))
depending on
whether X
is immediately followed by -i or
-t. These rules are used in conjunction with:
contraction(vEnd,X+ta,[X=pf([require(vStem4)]),past]).to decompose forms such as kaita (buy+past) and itta (go+past) into ka+i+ta and i+t+ta, respectively.
The base forms for verbs kaku (write) and iku
(go) should be defined using lexicon/4
as
follows:
lexicon(ka,v,base(ku(1)),[index(_),morph(kaku,base(ku(1))), grid([agent],[theme]),k(bdf1),eng(write)]). lexicon(i,v,base(ku(2)),[index(_),morph(iku,base(ku(2))),grid([agent],[]), allowExt(goal),k(b9d4),eng(go)]).We can then define
probeLexicon/2
(deterministic) as:
probeLexicon(Word,X) :- lexicon(Word,v,X,_), !.
References:
probeLexicon/2
/ lexicon/4
X=pf(R) |
X must be either a lexical item or contain further
contractions. X , typically an input pattern variable, is
one of the possible forms described above for the left side of the
equative expression. pf(R) indicates that either:
|
For example, the following rule in the Japanese lexicon strips off the past tense verb ending -ta and looks for either a verb stem, as in the case of mi+ta, or further contraction processing, as in the case of the passive form mi-rare-ta:
contraction(vEnding,X+ta,[X=pf([block(vEnding)]),past]).
Here, we have a restriction [block(vEnding)]
which means
that if X
is subject to further contraction processing,
rules of the class vEnding
are blocked from
applying. In particular, this encodes the fact that a single verb
cannot be doubly marked with respect to tense.
In general, R
in pf(R)
is a (possibly empty)
list of class restrictions, each of which may be of the following
form:
|
X=+F |
X must be a lexical item with feature F .
X , typically an input pattern variable, is one
of the possible forms described above for the left side of the
equative expression.
|
The "progressive" or habitual form of iru in Japanese:
contraction([],X+iru,[X=morph(_,te),iru]).Here,
X
must be a verb with the te-form ending.
See also the section on Expand
Contractions
for details on debugging the contraction rule
mechanism.
Finally, we summarize the class mechanism for contraction
.
Classes and the Application of Contraction Rules
In defining rules of the form:
contraction(+K,+W,+L) contraction(+K,+W1,+W2,+L)the grammar writer is free to group contraction rules into classes by naming the class of each rule (
K
).
The class restriction mechanism, encoded by R
in the
output pattern item X=pf(R)
described above, allows named
classes to be blocked or permitted to apply as needed. For example, if
number agreement follows case endings, the class mechanism can be used
to properly sequence firing of the various suffix rules by having
number agreement rules explicitly block case rules.
More generally, one can write default restriction rules for
contractions. This is provided as a notational convenience. The contraction_default
declaration may be used to specify default rules either for
unrestricted classes or for some particular class. For example, the following
two declarations prevent rules of any named class from firing again
once that class has applied to the input word:
define_contraction_defaults. contraction_default(X,[block(X)]).
Classes may also be grouped into superclasses. This is also used to restrict the scope of contraction rule application. For example, one might have a set of classes that apply only to nouns and another set that apply only to verbs. If a superclass has been declared for a given class, subsequent rounds of contraction processing for a given word will be restricted to rules that belong to the same superclass.
Finally, note:
[]
) class are not subject to restriction by any of
the mechanisms discussed above. In other words, null class contraction
rules are always applicable - except for case (2).
blockContraction
.
References:
ParsePF
/ blockContraction
/ define_contraction_defaults
/ contraction_default
/ superClass
/ Expand Contractions
blockContraction(W)
|
Declares that no contraction rule should be used to expand word
W .
|
In general, contraction rules optionally apply. That is, PAPPI will
pursue both lines of inference if there is an input item for which
both a lexical entry exists and to which a contraction rule may apply.
The blockContraction
declaration allows the user to
override this default behaviour on a word-by-word basis, for example,
in the case of irregular expansions.
Example:
In the Turkish lexicon, we prevent any further morphological decomposition of the following genitive-marked personal pronouns:
blockContraction(benim). % my blockContraction(senin). % your blockContraction(onun). % his/her/itsThat is, there exist lexical entries for benim, senin and onum.
Every lexicon must contain declarations for blockContraction
.
If there are no entries, the lexicon should contain the following line:
no blockContraction(_).
Example:
References:
contraction
superClass(SK,K)
|
Declares contraction class SK to be the superclass of
class K .
|
Every lexicon must contain declarations for superClass
.
If there are no entries, the lexicon should contain the following line:
no superClass(_,_).
Example:
In the Hungarian implementation, we have two superclasses
n
and v
:
superClass(n,case). superClass(n,num). superClass(n,doubledFC). superClass(n,poss). superClass(n,poss1). superClass(n,poss2). |
superClass(v,infin). superClass(v,infl). superClass(v,past). superClass(v,agr). superClass(v,mood). superClass(v,subj). superClass(v,caus). |
The basic idea behind the superclass declaration is to restrict
contraction expansion sequences to particular groups of classes.
For example, if a contraction that belong to superclass n
has been used to expand a word, any subsequent application of a
contraction rule to that word should come from a compatible class,
i.e. some class with superclass n
.
References:
contraction
define_contraction_defaults
|
Precedes all contraction_rule declarations.
|
If the lexicon uses the contraction_default
meta-rule
mechanism, a define_contraction_defaults
header
line must precede all such declarations.
Example:
%% Defaults for contraction class restrictions define_contraction_defaults. contraction_default(X,[block(X),block(doubledFC)]). ...References:
contraction_default
contraction_default(C,R)
|
Declares a contraction restriction list R for class
C .
|
The possible contraction restrictions are described above in the
section for X=pf(R)
. Note that there
are no mode restrictions on C
. In particular,
C
can be a variable - in which case it applies to all
named classes.
Example from the Hungarian lexicon:
contraction_default(X,[block(X),block(doubledFC)]).This states that rules of all named classes
X=pf(R)
are blocked from firing again. Also, by default,
all named classes block the class doubledFC
. For instance,
this imples the rule:
contraction(case,X+át,[X+a=pf([]),acc]).is actually equivalent to writing:
contraction(case,X+át,[X+a=pf([block(case),block(doubledFC)]),acc]).
Finally, note that a define_contraction_defaults
declaration must precede the contraction_default
rules.
References:
contraction
/ define_contraction_defaults
define_lex_forms
(PAPPI 3.x only) |
Precedes all ending , rootEnding ,
lexTemplate and lexForms declarations.
|
If the lexicon uses the lexForms
off-line compilation
mechanism, a define_lex_forms
header must precede all
such entries.
Example:
define_lex_forms. ending(ing,ing). ending(s,s). ending(ed(_),ed). ... lexTemplate(Root,C,Form,Infl,lex(Infl,C,Root,[morph(Root,Form)])). lexForms(appear,v,[ing,s,ed(_)]). lexForms(appreciate,v,[ing,s,ed(_)]).References:
ending
/
rootEnding
/
lexTemplate
/
lexForms
ending(Ending,Form)
(PAPPI 3.x only) |
Defines the morphological form Form associated with ending
code Ending . Ending is used in
lexForms macro declarations.
|
[Note: the define_lex_forms
declaration must precede all
ending
entries.]
Example:
In the English implementation we have:
ending(ing,ing). ending(s,s). ending(ed(_),ed). ending(ed(1),ed). ending(ed(1),en).The first parameter will be referenced in
lexForms
declarations. For instance:
lexForms(appear,v,[ing,s,ed(_)]).The verb
appear
has ending forms ing
,
s
and ed(_)
. These are associated with
the strings -ing
, -s
and
-ed
, respectively. The stem appear
will
combine with the strings to produce the inflected forms
appearing
, appears
and
appeared
, respectively.
See rootEnding
for information on defining concatenation
rules for endings.
References:
define_lex_forms
/
rootEnding
/
lexTemplate
/
lexForms
rootEnding(Root,Ending,Form)
(PAPPI 3.x only) |
Defines a rule for producing Form from Root
combined with Ending . Ending is defined
separately using ending declarations. Root
is matched against word stems in lexForms macro declarations.
|
[Note: the define_lex_forms
declaration must precede all
rootEnding
entries.]
Example:
In the English implementation, we have:
ending(ing,ing). ending(s,s). ending(ed(_),ed).
lexForms(appear,v,[ing,s,ed(_)]). lexForms(appreciate,v,[ing,s,ed(_)]).Under the default concatenation rule, the
lexForms
macro
will generate the correct inflected forms for appear
but
not appreciate
.
We can either list the lex/4
entries manually
as defined in the section on lexicon/3
, or define
rootEnding
rules to cope with the vowel stem
ending as follows:
rootEnding(X+e,ing,X+ing). rootEnding(X+e,ed,X+ed). rootEnding(X+e,en,X+en).These rules will override the default concatenation rule when the stem ends in the vowel
e
.
References:
define_lex_forms
/
ending
/
lexicon/3
/
lexForms
lexTemplate(Root,Category,Ending,Form,Clause)
(PAPPI 3.x only) |
Defines a template for macro expansion of lexForms
declarations. Will be used to generate a lexical entry
Clause for a given word Form with category
label Category derived from stem Root +
ending code Ending .
|
Example:
In the English implementation, we have:
ending(ing,ing). ending(s,s). ending(ed(_),ed). lexTemplate(Root,C,Form,Infl,lex(Infl,C,Root,[morph(Root,Form)])). lexForms(appear,v,[ing,s,ed(_)]).The
lexForms
declaration for appear
in
conjunction with the above template will generate the following
lex/4
clauses:
lex(appearing,v,appear,[morph(appear,ing)]). lex(appears,v,appear,[morph(appear,s)]). lex(appeared,v,appear,[morph(appear,ed(_))]).References:
ending
/
lexForms
lexForms(Root,Category,Endings)
(PAPPI 3.x only) |
Declares stem Root should be macro expanded using the
list of ending codes Endings to form a series of lexical
entries with category label Category .
|
The lexForms
mechanism is a compilation scheme. That is,
macro expansion of lexForms
will be carried out
(off-line) at lexicon compilation time. (By contrast, stemming done
using the contraction mechanism will be performed at run-time.)
For each lexForms
declaration, the following sequence of
steps will be carried out:
Endings
will be looked up via
ending
declarations to retrieve the corresponding ending.
Root
to
produce a final morphological form.
Note: if a rootEnding
rule applies to the combination, it
will override the default simple concatenation rule.
lexTemplate
.
Finally, the whole series of definitions must be headed by a
define_lex_forms
declaration. Within the various
declarations, there is the further restriction that clauses for
lexForms
must go at the end. Definitions for
ending
and rootEnding
may appear in any
order.
Examples:
In the English implementation we have:
define_lex_forms. ending(ing,ing). ending(s,s). ending(ed(_),ed). ending(ed(1),ed). ending(ed(2),en). rootEnding(X+e,ing,X+ing). rootEnding(X+e,ed,X+ed). rootEnding(X+e,en,X+en). lexTemplate(Root,C,Form,Infl,lex(Infl,C,Root,[morph(Root,Form)])).Given this, the following macro definition:
lexForms(appear,v,[ing,s,ed(_)]).will produce the inflected entries for
appear
shown below:
lex(appearing,v,appear,[morph(appear,ing)]). lex(appears,v,appear,[morph(appear,s)]). lex(appeared,v,appear,[morph(appear,ed(_))]).Similarly, we can define a macro for
arrive
as follows:
lexForms(arrive,v,[ing,s,ed(1),ed(2)]).This produces the following block of entries:
lex(arriving,v,arrive,[morph(arrive,ing)]). lex(arrives,v,arrive,[morph(arrive,s)]). lex(arrived,v,arrive,[morph(arrive,ed(1))]). lex(arriven,v,arrive,[morph(arrive,ed(2))]).Note, for instance, that the entry for
arriving
has been
derived via the concatenation rule:
rootEnding(X+e,ing,X+ing).Inflected forms for verbs irregular to these rules can be simply spelled out as per the examples in
lexicon/3
. For
example:
lex(eating,v,eat,[morph(eat,ing)]). lex(eats,v,eat,[morph(eat,s)]). lex(ate,v,eat,[morph(eat,ed(1))]). lex(eaten,v,eat,[morph(eat,ed(2))]).References:
define_lex_forms
/
ending
/
rootEnding
/
lexicon/3
/
lexTemplate
![]() ![]() ![]() |