This is freely-available software for displaying and browsing
treebanks.
It renders bracketed expressions as nicely-formatted trees. It is fast and capable of handling large treebanks, e.g. the Penn TreeBank (PTB). Now available for MacOS X (PPC and Intel), Windows XP and Linux (Debian-based and RedHat-based) platforms. (See download section here.) |
tgrep2
.
[Doug Rohde's tgrep2
program is available here.]
Only the Lasnik and Uriagereka sample treebanks for Dan
Bikel's Collins parser and PAPPI are made generally available.
[Dan Bikel's multilingual statistical parsing engine: here.] You have to roll your own Penn Treebank (PTB), i.e. Penn TreeBank not included here. Obviously, I cannot legally re-distribute that.
[You need to have a license from the Linguistic Data
Consortium (LDC).]
[The PTB link on this webpage is password protected.]
[University of Arizona students: the TREEBANK_3 cd is freely
available for loan from the library. See the library catalog.]
(See the treebank section here.) |
(May 2007) There is also a new version of the viewer that can also be used with CCG derivations output by the C&C CCG parser. See here.
Search terms specify categories (possibly wildcard) and relations between nodes in terms of standard concepts such as dominance and precedence. Plus load in additional linguistically-motivated definitions such as c-command and government etc. In other words, you can construct queries using the full power of logic programming. Display matching trees.
(cf. conventional software for treebank search,
e.g. tgrep
or tgrep2
.)
(The viewer employs the same underlying tree renderer as used in the next release of PAPPI.)
tgrep2
: notes on
how to use the treebank viewer in conjunction with the treebank
search program tgrep2
.
Use the appropriate button to bring up a file dialog box or type directly into the entry field.
[Sentence File dialog box, i.e. "Sentence File" button has been pressed. File lu.lisp
selected for loading.]
Sentence file lu.lisp
and Prolog tree file
lu.pl
for the Lasnik & Uriagereka (L&U) treebank are
supplied with the distribution.
[Prolog Tree File dialog box. File lu.pl
selected for loading.]
File Formats
The format is one line per sentence and one line per tree.
The number of lines for the sentence and tree files should be the same.
Anything can be present in the sentence file. Each line is treated as a simple string for display.
However, the tree file must be parse-able by the tree renderer.
Each tree should occupy one line and be acceptable to Prolog.
Format is:
tree(Tree).
where tree node Tree should be of form:
n(NodeName,Child1,..,Childn)
NodeName should be an acceptable Prolog atom.
Atoms starting with an upper case letter should be quoted as follows, e.g. VP should be 'VP'
Each child node Childi should either be an atom or (recursively) a tree node.
Example:
Prolog tree input for the sentence John slept
tree(n('S',n('NP',n('NNP','John')),n('VP',n('VBD','slept')),n('.','.'))).Bikel parser output is in Lisp sexp format:
(S (NP (NNP John)) (VP (VBD slept)) (. .))
Press "Load" to load the files into the viewer.
The background of the sentence currently being displayed is highlighted in blue. The sentence number is given above the tree.
In addition to directly clicking on a sentence, when the window focus is on the left display panel, the Up and Down arrows on the keyboard can be used to display the tree for the preceding and following sentence.
To go directly to a sentence, enter the sentence number in the sentence number box and press Return. Example:
Screen and window sizing
Scrollbars are available when appropriate in both display windows.
If a scrollbar is not visible, expand the window.
The entire program window can be expanded or re-sized by dragging the handle at the bottom-right. | |
The divider separating the two display windows can be moved using the (small square) drag handle. |
[Note: the right display window below has been resized to accommodate the large parse tree. The vertical scrollbar for the left display window has been occluded.]
wsj.txt
, Prolog Tree File:
wsj.pl
tgrep2
tgrep2
has been installed.
[Doug Rohde's tgrep2
program is available here.]
I also assume the WSJ section of the PTB has been loaded into the viewer.
Example query (taken from http://www.ldc.upenn.edu/ldc/online/treebank/):
tgrep2 -c wsj2.t2c 'VP << /^believe/ < (S < (/^NP/ !<< /[*]/ !< (-NONE- < T)) < (VP|AUX << to))'[Here,
wsj2.t2c
is the pre-processed index file for
the WSJ produced by tgrep -p
.]
Normal tgrep2
output for this query is fairly
difficult to read:
(VP (VBD said) (, ,) (`` ``) (S (NP-SBJ (PRP We)) (VP (VBP believe) (SBAR (-NONE- 0) (S (NP-SBJ (PRP$ our) (NN decision) (S (NP-SBJ (-NONE- *)) (VP (TO to) (VP (VB plead) (-LRB- -LRB-) (NP (JJ guilty)) (-RRB- -RRB-) (PP-CLR (TO to) (NP (DT these) (NNS charges))))))) (VP (VBZ is) (ADJP-PRD (JJ responsible) (CC and) (JJ proper))))))))
(VP (VBP lead) (S (NP-SBJ (NNS readers)) (VP (TO to) (VP (VB believe) (SBAR (IN that) (S (NP-SBJ (DT the) (NNP House)) (VP (VBD reduced) (NP (DT the) (NNS capital-gains) (NN tax)) (PP-TMP (IN for) (NP (CD two) (NNS years) (RB only))))))))))
(VP (VBD said) (, ,) (`` ``) (S (PP (VBN Given) (NP (NP (DT the) (NN state) (POS 's)) (JJ strong) (NN bargaining) (NN position))) (: ...) (NP-SBJ (PRP we)) (VP (VBP believe) (SBAR (-NONE- 0) (S (NP-SBJ (DT the) (NNP NU) (NN plan)) (VP (VBZ provides) (NP (NP (DT the) (JJS best) (NN recovery)) (ADJP (JJ available) ('' '') (PP (TO to) (NP (NP (NAC (NNP PS) (PP (IN of) (NP (NNP New) (NNP Hampshire)))) (POS 's)) (NN equity) (NNS holders)))))))))))
(VP (VBG making) (S (NP-SBJ (NNS traders)) (VP (VB believe) (SBAR (-NONE- 0) (S (NP-SBJ (DT the) (NN market)) (VP (VBD was) (ADVP-PRD (RB back) (PP (TO to) (NP (JJ normal))))))))))
(VP (VBZ expects) (S (NP-SBJ (DT the) (NN deflator)) (VP (TO to) (VP (VB rise) (NP-EXT (NP (CD 3.7) (NN %)) (, ,) (PP (ADVP (RB well)) (IN below) (NP (NP (DT the) (JJ second) (NN quarter) (POS 's)) (CD 4.6) (NN %))) (, ,))))) (PP-PRP (ADVP (RB partly)) (IN because) (IN of) (SBAR-NOM (WHNP-1 (WP what)) (S (NP-SBJ (PRP he)) (VP (VBZ believes) (SBAR (-NONE- 0) (S (NP-SBJ (-NONE- *T*-1)) (VP (MD will) (VP (VB be) (NP-PRD (ADJP (RB temporarily) (JJR better)) (NN price) (NN behavior)))))))))))
(VP (VBD said) (, ,) (`` ``) (S (NP-SBJ (PRP We)) (VP (VBP believe) (SBAR (-NONE- 0) (S (NP-SBJ-1 (NP (DT the) (NN partnership)) (PP (IN of) (NP (NP (NNP Fox)) (, ,) (NP (PRP$ its) (NNS affiliates)) (CC and) (NP (NNS advertisers))))) (VP (VP (VBZ is) (VP (VBG succeeding))) (CC and) (VP (MD will) (VP (VB continue) (S (NP-SBJ (-NONE- *-1)) (VP (TO to) (VP (VB grow))))))))))))
(VP (VBZ believes) (S (NP-SBJ (NP (DT the) (JJ legal) (NN action)) (PP (IN by) (NP (DT the) (JJ British) (NN firm)))) (`` ``) (VP (TO to) (VP (VB be) (PP-PRD (IN without) (NP (NN merit)))))))
(VP (VBD said) (, ,) (`` ``) (S (NP-SBJ (PRP It)) (VP (VBZ indicates) (ADVP (RB perhaps)) (SBAR (IN that) (S (NP-SBJ (NP (DT the) (NN balance)) (PP-LOC (IN in) (NP (DT the) (NNP U.S.) (NN economy)))) (VP (VBZ is) (RB not) (ADJP-PRD (ADJP (RB as) (JJ good)) (SBAR (IN as) (S (NP-SBJ-2 (PRP we)) (VP (VBP 've) (VP (VBN been) (VP (VBN led) (S (NP-SBJ (-NONE- *-2)) (VP (TO to) (VP (VB believe))))))))))))))))
(VP (VBD said) (, ,) (S (`` ``) (NP-SBJ-1 (PRP We)) (VP (VBP continue) (S (NP-SBJ (-NONE- *-1)) (VP (TO to) (VP (VB believe) (SBAR (SBAR (-NONE- 0) (S (NP-SBJ (PRP$ our) (NN approach)) (VP (VBZ is) (ADJP-PRD (JJ sound))))) (, ,) (CC and) (SBAR (IN that) (S (NP-SBJ (PRP it)) (VP (VBZ is) (ADJP-PRD (ADJP (RB far) (JJR better)) (PP (IN for) (NP (DT all) (NNS employees))) (PP (IN than) (NP (NP (DT the) (NN alternative)) (PP (IN of) (S-NOM (NP-SBJ (-NONE- *)) (VP (VBG having) (S (NP-SBJ (DT an) (NN outsider)) (VP (VB own) (NP (DT the) (NN company)) (PP (IN with) (S-NOM (NP-SBJ (NNS employees)) (VP (VBG paying) (PP-CLR (IN for) (NP (PRP it))) (ADVP (RB just) (DT the) (JJ same)))))))))))))))))))))))
(VP (VBP believe) (S (NP-SBJ (PRP themselves)) (VP (TO to) (VP (VB be) (VP (VBG serving))))))
Invoking the query with the -x flag, will give us sentence number (and VP node number) output.
tgrep2 -x -c wsj2.t2c 'VP << /^believe/ < (S < (/^NP/ !<< /[*]/ !< (-NONE- < T)) < (VP|AUX << to))'
5175:26 14103:6 21204:27 29432:68 29570:39 33275:25 33836:45 39564:61 42224:18 48195:9These tree numbers can be copied and pasted into the treebankviewer. For example:
Tree number 33836 is displayed:
Some platforms accept drag-and-drop of the tree numbers.
(No cut-and-paste or drag-and-drop is necessary for
treebanksearch
. That engine has a direct interface to
tgrep2
and will narrow the sentence display on the left
panel automatically.)
Platform | File | Install/Run | ||
MacOS X (PowerPC)
(10.3, 10.4) |
treebankviewer-powerpc.zip
(1.2MB)
Updated: 1/24/07
Note: Requires Aqua Tcl/Tk (10.4: already installed by default, 10.3: download from http://tcltkaqua.sourceforge.net/) |
(Unzip if necessary.)
Drag application to your Application folder. Double-click application. |
||
MacOS X (Intel)
(10.4) |
Application executable references /Library/Frameworks/Tk.framework/Versions/8.4/Tk
File should report: Tk: Mach-O universal binary with 2 architectures Tk (for architecture ppc): Mach-O dynamically linked shared library ppc Tk (for architecture i386): Mach-O dynamically linked shared library i386 |
(Unzip if necessary.)
Drag application to your Application folder. Double-click application. |
Platform | File | Install/Run |
Linux (Intel)
(Debian-based and RedHat-based) |
treebankviewer-linux.tar.gz
(1161KB)
Updated: 1/26/07
Note: The viewer was compiled on a Ubuntu 6.06
system. Major library dependencies: Tcl/Tk. Further information:
linux-gate.so.1 => (0xffffe000) libtk8.4.so.0 => /usr/lib/libtk8.4.so.0 (0xb7efb000) libtcl8.4.so.0 => /usr/lib/libtcl8.4.so.0 (0xb7e4d000) libSM.so.6 => /usr/lib/libSM.so.6 (0xb7e45000) libICE.so.6 => /usr/lib/libICE.so.6 (0xb7e2d000) libX11.so.6 => /usr/lib/libX11.so.6 (0xb7d47000) libdl.so.2 => /lib/tls/i686/cmov/libdl.so.2 (0xb7d44000) libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0xb7d22000) libpthread.so.0 => /lib/tls/i686/cmov/libpthread.so.0 (0xb7d0f000) libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7be0000) libXau.so.6 => /usr/lib/libXau.so.6 (0xb7bdd000) /lib/ld-linux.so.2 (0xb7fe1000)For those on a Redhat-based system, substitute the following executable: viewer.gz (1100KB, .gz file)
Compiled on Red Hat Enterprise Linux AS release 4 (Nahant Update
4), libtk8.4.so => /usr/lib/libtk8.4.so (0x001ec000) libtcl8.4.so => /usr/lib/libtcl8.4.so (0x00142000) libSM.so.6 => /usr/X11R6/lib/libSM.so.6 (0x00101000) libICE.so.6 => /usr/X11R6/lib/libICE.so.6 (0x0010c000) libX11.so.6 => /usr/X11R6/lib/libX11.so.6 (0x00ce7000) libdl.so.2 => /lib/libdl.so.2 (0x00ce1000) libm.so.6 => /lib/tls/libm.so.6 (0x00cbc000) libpthread.so.0 => /lib/tls/libpthread.so.0 (0x00dea000) libc.so.6 => /lib/tls/libc.so.6 (0x00b8f000) /lib/ld-linux.so.2 (0x00b71000) |
(Gunzip and untar.)
Go into extracted treebankviewer directory.
Run program using ./viewer
|
Platform | File | Install/Run |
Windows XP |
treebankviewer-winxp.zip
(968KB) Updated: 1/21/07
[Version compiled sans SP1 on Visual C++ 2005 Express Edition. SP1 breaks the code.] Release note: bug fix. Note: This relies on ActiveTCL and Microsoft Visual C++ DLLs. You need to install ActiveTCL for Windows XP. Download from http://downloads.activestate.com/ActiveTcl/Windows/.
|
(Unzip the treebankviewer folder.)
Place folder in C:\Program Files
To run, double-click the executable
[Do not double-click |
This example treebank is a free download.
Download | Sentence File | Prolog TreeBank |
lu.zip (.zip archive) | lu.lisp (POS-tagged sexps for Bikel-Collins) |
lu.pl (Generated by Bikel-Collins) |
Penn Treebank
This example treebank is a restricted download.
Download | Sentence File | Prolog TreeBank |
wsj.zip (14.5MB, .zip archive) | wsj.txt (WSJ sentences, not tagged) |
wsj.pl (WSJ PTB trees in Prolog format) |