This page provides links to a few of the projects started at the HLTCOE. Additional resources can be found on the HLTCOE GitHub page.
Concrete is a data serialization format for NLP. It replaces ad-hoc XML, CSV, or programming language-specific serialization as a way of storing document- and sentence- level annotations.
Under the heading Concretely Annotated, we processed a variety of standard corpora with multiple popular NLP tool-chains, collected together under a single data schema we have created that we refer to as Concrete. We envision a multimodal workflow, where, e.g., knowledge can be extracted from both text and audio. We developed Concrete to record and share annotations on structured human language data — both text and speech.
For the full description, see Ferraro et al. (2014).