Excellent HLT

Turkle

Turkle is a Django web application that provides a clone of Amazon's Mechanical Turk service in your local environment, allowing you to collect local expert annotations with the same templates and data files you use for crowd annotation. Meanwhile, our pip-installable ProtoTurk server can be used to rapidly prototype new templates and data files.

Getting Started ProtoTurk

Patapsco

Patapsco is a scalable Python framework for reproducible cross-language information retrieval (CLIR) experiments.

Repository Colab Demo

Costello, Yang, Lawrie, & Mayfield. Patapsco: A Python Framework for Cross-Language Information Retrieval Experiments. In Proceedings of the 44th European Conference on Information Retrieval (ECIR), 2022.

Concrete

Concrete is a cross-platform data serialization format and communication protocol for language annotations. It replaces ad-hoc TSV, XML, JSON, and other formats for storing document- and sentence-level language annotations. We developed Concrete to record and share annotations on structured human language data, including both text and speech.

Getting Started Python JavaScript Java

Concretely Annotated Corpora

Under the heading Concretely Annotated, we processed a variety of standard corpora with multiple popular NLP tool-chains using the Concrete data schema.

Wikipedia English Gigaword The New York Times

Ferraro, Thomas, Gormley, Wolfe, Harman, & Van Durme. Concretely Annotated Corpora. In 4th Workshop on Automated Knowledge Base Construction (AKBC), 2014.