Thrift module: metadata

Module Services Data types Constants
metadata AnnotationMetadata
CommunicationMetadata
Digest
TheoryDependencies

Data structures

Struct: TheoryDependencies

Key Field Type Description Requiredness Default value
1 sectionTheoryList list< uuid.UUID > optional
2 sentenceTheoryList list< uuid.UUID > optional
3 tokenizationTheoryList list< uuid.UUID > optional
4 posTagTheoryList list< uuid.UUID > optional
5 nerTagTheoryList list< uuid.UUID > optional
6 lemmaTheoryList list< uuid.UUID > optional
7 langIdTheoryList list< uuid.UUID > optional
8 parseTheoryList list< uuid.UUID > optional
9 dependencyParseTheoryList list< uuid.UUID > optional
10 tokenAnnotationTheoryList list< uuid.UUID > optional
11 entityMentionSetTheoryList list< uuid.UUID > optional
12 entitySetTheoryList list< uuid.UUID > optional
13 situationMentionSetTheoryList list< uuid.UUID > optional
14 situationSetTheoryList list< uuid.UUID > optional
15 communicationsList list< uuid.UUID > optional

A struct that holds UUIDs for all theories that a particular
annotation was based upon (and presumably requires).

Producers of TheoryDependencies should list all stages that they
used in constructing their particular annotation. They do not,
however, need to explicitly label *each* stage; they can label
only the immediate stage before them.

Examples:

If you are producing a Tokenization, and only used the
SentenceSegmentation in order to produce that Tokenization, list
only the single SentenceSegmentation UUID in sentenceTheoryList.

In this example, even though the SentenceSegmentation will have
a dependency on some SectionSegmentation, it is not necessary
for the Tokenization to list the SectionSegmentation UUID as a
dependency.

If you are a producer of EntityMentions, and you use two
POSTokenTagging and one NERTokenTagging objects, add the UUIDs for
the POSTokenTagging objects to posTagTheoryList, and the UUID of
the NER TokenTagging to the nerTagTheoryList.

In this example, because multiple annotations influenced the
new annotation, they should all be listed as dependencies.

Struct: Digest

Key Field Type Description Requiredness Default value
1 bytesValue binary The following fields define various ways you can store the digest data (for convenience). If none of these meets your needs, then serialize the digest to a byte sequence and store it in bytesValue. optional
2 int64Value i64 optional
3 doubleValue double optional
4 stringValue string optional
5 int64List list< i64 > optional
6 doubleList list< double > optional
7 stringList list< string > optional

Analytic-specific information about an attribute or edge. Digests
are used to combine information from multiple sources to generate a
unified value. The digests generated by an analytic will only ever
be used by that same analytic, so analytics can feel free to encode
information in whatever way is convenient.

Struct: AnnotationMetadata

Key Field Type Description Requiredness Default value
1 tool string The name of the tool that generated this annotation. required
2 timestamp i64 The time at which this annotation was generated (in unix time UTC -- i.e., seconds since January 1, 1970). required
4 digest Digest A Digest, carrying over any information the annotation metadata wishes to carry over. optional
5 dependencies TheoryDependencies The theories that supported this annotation. An empty field indicates that the theory has no dependencies (e.g., an ingester). optional
6 kBest i32 An integer that represents a ranking for systems that output k-best lists. For systems that do not output k-best lists, the default value (1) should suffice. required 1

Metadata associated with an annotation or a set of annotations,
that identifies where those annotations came from.

Struct: CommunicationMetadata

Key Field Type Description Requiredness Default value
1 tweetInfo twitter.TweetInfo Extra information for communications where kind==TWEET: Information about this tweet that is provided by the Twitter API. For information about the Twitter API, see: https://dev.twitter.com/docs/platform-objects optional
2 emailInfo email.EmailCommunicationInfo Extra information for communications where kind==EMAIL optional
3 nitfInfo nitf.NITFInfo Extra information that may come from the NITF (News Industry Text Format) schema. See 'nitf.thrift'. optional

Metadata specific to a particular Communication object.
This might include corpus-specific metadata (from the Twitter API),
attributes associated with the Communication (the author),
or other information about the Communication.