This is software for browsing and searching treebanks using
logic expressions. It is capable of handling large treebanks,
e.g. the Penn TreeBank (PTB). It renders bracketed expressions
as nicely-formatted trees.
Note 1: This is not free software. Download here. See the Licensing section here.
Note 2: The Penn TreeBank is not included here.
I cannot legally re-distribute that.
|
The example screenshot on the right shows the program operating on the supplied Lasnik & Uriagereka (L&U) treebank (317 sentences).
[Sentence File = On the left display panel, all 104 sentences matching the pattern "TO" are selected. [Match Sentence = TO, search parameters are literal match (regexp not checked) and All (entire treebank). Show "Selected Only" restricts the display to matching instances only.] On the right display panel, the Bikel-Collins parse for sentence #43 is displayed. [Parse associated with a sentence can be displayed by clicking the mouse button. The background of the selected sentence is highlighted in blue on the left panel. The sentence number is given above the parse tree (here: 43/317). |
(This viewer employs the same underlying tree renderer as used in the next release of PAPPI.)
tgrep2
: notes on
how to use the program in conjunction with the treebank
search program tgrep2
. (Not yet available.)
The treebanksearch program is capable of operating in
synchronized multi-window mode for browsing purposes.
The operation of this mode is described in a separate webpage. (See here.)
[On the right, the upper window contains one instance of the
treebank viewer. The treebanksearch program is the lower window.
The treebank viewer has the Collins parser model 1 parses loaded
for the Penn Treebank (PTB). The treebanksearch program has the
"gold-standard" PTB trees loaded.
(Michael Collins's parser is available here.)
The two programs are operating in synchronized mode. This means
the treebankviewer is slaved to treebanksearch program for
tree display purposes.
In this particular snapshot, the upper and lower windows are
displaying (different) trees for the sentence: Neither
Lorillard nor the researchers who studied the workers were aware
of any research on smokers of the Kent cigarettes .]
|
Use the appropriate button to bring up a file dialog box or type directly into the entry field.
Sentence file lu.lisp
and Prolog tree file
lu.pl
for the Lasnik & Uriagereka (L&U) treebank are
supplied with the distribution.
File Formats
The format is one line per sentence and one line per tree.
The number of lines for the sentence and tree files should be the same.
Anything can be present in the sentence file: each line is treated as a simple string for search.
The tree file must be parse-able by the tree renderer.
Each tree should occupy one line and be acceptable to Prolog.
Format is:
tree(Tree).
where tree node Tree should be of form:
n(NodeName,Child1,..,Childn)
NodeName should be an acceptable Prolog atom.
Atoms starting with an upper case letter should be quoted as follows, e.g. VP should be 'VP'
Each child node Childi should either be an atom or (recursively) a tree node.
Example:
Prolog tree input for the sentence John slept
tree(n('S',n('NP',n('NNP','John')),n('VP',n('VBD','slept')),n('.','.'))).Bikel parser output is in Lisp sexp format:
(S (NP (NNP John)) (VP (VBD slept)) (. .))
Code for converting Bikel parser output is detailed in the appendix.
Press "Load Files" to load the files.
The background of the sentence currently being displayed is highlighted in blue. The sentence number is given above the tree.
In addition to directly clicking on a sentence, the Up and Down arrows on the keyboard can be used to display the tree for the preceding and following sentence.
To go directly to a sentence, enter the sentence number in the sentence number box and press Return. Example:
Screen and window sizing
Scrollbars are available when appropriate in both display windows.
If a scrollbar is not visible, expand the window.
The entire program window can be expanded or re-sized by dragging the handle at the bottom-right. | |
The divider separating the two display windows can be moved using the (small square) drag handle. |
[Note: the right display window below has been resized to accommodate the large parse tree. The vertical scrollbar for the left display window has been occluded.]
Regular expression searches can also be specified by checking the
regexp
flag.
Example: Down
Click to focus the keyboard input in the entry box next to
"Match Sentence" and type in [The box currently receiving focus is always marked with a black border.] Then press Return or select "Down" from the pull-down menu. The first matching TO in the file found in sentence 10
will be displayed in red.
Press Return again (or select "Down") to move to the next match shown in red for sentence 12
(See second picture on the right.) [By default, the search proceeds downwards from the current insertion point in the left panel. For example, you can scroll around and click to set the insertion point for the left display panel. Selecting "Down" (or clicking in the Match Sentence entry box and pressing Return) will find the next matching string from that point.] (Note: Sentence search looks only in the sentence file. You must click on a sentence to display the associated tree.) |
|
Example: Up
Search may also proceed upwards from the current insertion point.
Select "Up" from the pull-down menu. Pressing Return will now default to moving up to the next match. |
Example: All
Instead of examining a single match at a time, selecting
"All" instead of "Up" or "Down" will highlight (in red) all
possible matches simultaneously in the sentence file.
Here below, we see the effect of option "All" on the
matching string |
|
3 matches (TO) are visible here. There are actually 104 matches in 317 sentences.
|
|
We can eliminate non-matching sentences and narrow the display down to the 104 matching sentences only by pressing the "Selected Only" button next to the "Show:" label. |
|
The display toggles to show only sentences with highlighted matches. [Note: The "Show Selected Only" button is now greyed out.] To toggle back to the full display, press the (now-activated) "All" button to the right of the "Show" label. The "Show Selected Only"/"Show All" toggle is also used in Tree Search. (See below.) |
Regular Expression Search
An example of a simple regular expression "Show All"
sentence search. Matching string is:
which matches either (Standard Unix regular expression syntax is assumed.) |
To enable tree search, the "Enable" button must be pressed.
This loads all the trees into the Prolog database.
The "Enable" button is now greyed out and replaced by the label "Enabled".
The "Match Tree (Prolog)" entry box now accepts Prolog queries and is now no longer greyed out.
Prolog queries (currently) are evaluated using one of two formats:
|
Queries must obey Prolog syntax (but exclude the line-ending period).
Currently, Prolog syntax and run-time errors result in a TCL error message being generated (to be fixed).
Match Tree (Prolog): One
Example: node(X,'VP'), branching(X,3)
This query states: "Find a tree such that.."
[Following Prolog convention, logic variables start with an
uppercase letter.
In the example, X
is a variable denoting a tree node.
Constants that begin with an uppercase letter must be
enclosed in single quotes (to avoid being interpreted as logic
variables).
In the example, 'VP'
names a constant beginning with
an uppercase letter.
Conjunctive queries must be separated by a comma, disjunctive ones
by a semicolon. \+ is the Prolog negation operator.]
In other words: "Find the first matching tree that has a VP with 3
children".
The result is given below:
The first matching tree is associated with sentence 18
*I believe sincerely John to be here
The program displays the matching tree (VP
has three
children headed by VBP
, ADVP
and
S
) and highlights the associated sentence.
Match Tree (Prolog): All
Example: node(X,'VP'), branching(X,3)
This query states: "Find all matching trees with a VP with 3 children".
The result is given below:
The left display panel is restricted to the sentences associated with the 31 matching trees.
The tree associated with sentence 22
I persuaded John to leave
is currently displayed.
Note: The left display is currently in "Show Selected Only" mode (see the Sentence Search documentation above), and may be toggled back to showing all sentences in the treebank by pressing the "All" button.
Prolog tree primitives
Currently, the following primitive logic formulas are pre-defined:
Primitive | Documentation | |
node(V,Label)
|
V: variable that names the node
Label: label associated with the node |
Example: node(X,'NNP')
Semantics: there exists a node X with label NNP in the matching tree.
|
branching(V,N)
|
V: variable standing for a node
(must be previously introduced via node/2 ).
N: positive integer indicating the branching factor of V |
Example: branching(Z,3)
Semantics: node Z has branching factor 3.
|
V1 dom V2
dom(V1,V2)
|
V1: variable standing for a node
(must be previously introduced via node/2 ).
V2: variable standing for a node (need not be previously introduced via node/2 ).
|
Example: X dom Y
Semantics: node X properly dominates node Y .
Note: dom is likely to be renamed to pdom (for properly dominates) soon.
|
V1 idom V2
idom(V1,V2)
|
V1: variable standing for a node
(must be previously introduced via node/2 ).
V2: variable standing for a node (need not be previously introduced via node/2 ).
|
Example: X idom Y
Semantics: node X immediately dominates node Y .
|
notree
|
No variables present |
Example: notree
Semantics: matches cases where no parse tree was recovered. Usage: e.g. the Collins parser models |
The following connectives are currently supported:
Logic connective | Documentation | |
A , B (comma: conjunction)
|
A and B are logic formulas. |
Example: node(X,'NP'), X idom Y, node(Y,'NNP')
Semantics: in the matching tree, there exist nodes X and Y with labels NP and NNP (respectively) such that X immediately dominates Y .
|
A ; B (semicolon: disjunction)
|
A and B are logic formulas. |
Example: node(X,'NP'), X idom Y, (node(Y,'NNP') ; node(Y,'PRP'))
Semantics: in the matching tree, there exist nodes X and Y . X has label NP and Y has either label NNP or PRP . Furthermore, X immediately dominates Y .
Note: parentheses can be used to delimit the scope of the connectives. By default, conjunction has a smaller scope than disjunction in Prolog. |
\+ A (negation)
|
A is a logic formula. |
Example: \+ node(X,'PRP')
Semantics: there does not exist a node X with label PRP (personal pronoun) in the matching tree.
|
It is possible to add additional Prolog definitions, e.g. c-command etc.
(Documentation forthcoming.)
Example:
node(Y,'VP'), Y dom X, node(X,'SBAR')
node(Y,'VP'), Y idom X, node(X,'SBAR')
We illustrate the difference in finding a VP that dominates some SBAR node vs. immediately dominating the SBAR node.
node(Y,'VP'), Y dom X, node(X,'SBAR')
172 matches are found. The tree for the first match (sentence 9) is shown above in the right display panel. Note that VP
dominates SBAR
via the intermediate node NP
.
node(Y,'VP'), Y idom X, node(X,'SBAR')
109 matches are found. The tree for the first match (sentence 17) is shown. Only instances where VP
immediately dominates the SBAR
node are returned.
Platform | File | Install/Run | ||
MacOS X (PowerPC)
(10.3, 10.4) |
treebanksearch-powerpc.zip
(Updated: 2/15/07)
Note: Requires Aqua Tcl/Tk (10.4: already installed by default, 10.3: download from http://tcltkaqua.sourceforge.net/) |
(Unzip if necessary.)
Drag application to your Application folder.
Double-click application.
Double-click application again. |
||
MacOS X (Intel)
(10.4) |
Application executable references /Library/Frameworks/Tk.framework/Versions/8.4/Tk
File should report: Tk: Mach-O universal binary with 2 architectures Tk (for architecture ppc): Mach-O dynamically linked shared library ppc Tk (for architecture i386): Mach-O dynamically linked shared library i386 |
(Unzip if necessary.)
Drag application to your Application folder.
Double-click application.
Double-click application again. |
||
Windows XP |
treebanksearch-winxp.zip
(Not available yet.)
[Version compiled sans SP1 on Visual C++ 2005 Express Edition. SP1 breaks the code.] Note: This relies on ActiveTCL and Microsoft Visual C++ DLLs. You need to install ActiveTCL for Windows XP. Download from http://downloads.activestate.com/ActiveTcl/Windows/.
|
(Unzip the treebankviewer folder.)
Place folder in C:\Program Files
To run, double-click the executable
[Do not double-click |
See the treebankviewer
homepage downloads here for the example
Lasnik & Uriageraka treebank and the Penn Treebank.
Collins Parser Output
This is the output from running the Wall Street Journal (WSJ) section of the Penn Treebank (PTB) on the Collins parser.
Models 1 through 3 in treebankviewer
format. Temporarily
restricted download (requires password):
Download | Sentence File | Prolog TreeBank |
wsj-collins.zip (34.8MB, .zip archive) | wsj.txt
Temporarily included. |
Source: wsj-collins-m1.pl (Model 1)
Compiled: wsj-collins-m1.po
Source: wsj-collins-m2.pl (Model 2)
Compiled: wsj-collins-m2.po
Source: wsj-collins-m3.pl (Model 3)
Compiled: wsj-collins-m3.po
|
The following pop-up will appear if you do not have a keycode
for the machine on which treebanksearch is
installed.
You should note the displayed hostname. The keycode supplied to you will be an (exact) function of this name. |
|
If the keycode is corrupt or the application has been moved to
another machine or the machine hostname has changed, the
following pop-up will appear when starting
treebanksearch .
A new keycode will need to be generated. |
|
A second pop-up will appear inviting you to enter a valid
keycode.
Copy and paste the supplied keycode exactly, and press "OK". |
|
Re-start treebanksearch
. You will not see the keycode
pop-ups again.