A range of institutions have kindly agreed that excerpts from their text corpora may be distributed with TIGERSearch. The current version of TIGERSearch includes the following corpus samplers (in alphabetic order):
Chinese
Chinese Treebank sampler
105 corpus graphs, University of Pennsylvania,
distributed by
LDC
English
Penn Treebank: Brown Corpus and Switchboard Corpus samplers
200 sentences each, University of Pennsylvania,
distributed by
LDC
Penn-Helsinki Parsed Corpus of Middle English (PPCME2 Corpus) sampler
200 sentences, University of Pennsylvania /
PPCME2 Project
Susanne and Christine Corpus samplers
200 sentences each, Sussex University /
Susanne and Christine projects
VerbMobil Corpus sampler
250 sentences, see German VerbMobil sampler
German
DEREKO Corpus sampler
250 sentences, SfS, University of Tübingen and
IMS, University of Stuttgart /
DEREKO project
IMS chunking and parsing tools
The tools LoPar, TreeTagger, and YAC processed the same technical text (about 250 sentences).
IMS, University of Stuttgart
Negra Corpus sampler
250 sentences,
Department of Computational Linguistics, Universität des Saarlandes /
Negra project
TIGER Corpus sampler
200 sentences,
Institut für Germanistik, University of Potsdam /
Department of Computational Linguistics, Universität des Saarlandes /
IMS, University of Stuttgart /
TIGER project
VerbMobil Corpus sampler
250 sentences, SfS, University of Tübingen /
VerbMobil Project,
distributed by
IPSK, Ludwig-Maximilian-Universität München
Japanese
VerbMobil Corpus sampler
250 sentences, see German VerbMobil sampler
Korean
Korean Treebank sampler
125 corpus graphs, University of Pennsylvania,
distributed by
LDC