Thrift module: metadata

Module	Services	Data types	Constants
metadata		AnnotationMetadata CommunicationMetadata Digest TheoryDependencies

Struct: TheoryDependencies

Key	Field	Type	Requiredness
1	sectionTheoryList	`list< uuid.UUID >`	optional
2	sentenceTheoryList	`list< uuid.UUID >`	optional
3	tokenizationTheoryList	`list< uuid.UUID >`	optional
4	posTagTheoryList	`list< uuid.UUID >`	optional
5	nerTagTheoryList	`list< uuid.UUID >`	optional
6	lemmaTheoryList	`list< uuid.UUID >`	optional
7	langIdTheoryList	`list< uuid.UUID >`	optional
8	parseTheoryList	`list< uuid.UUID >`	optional
9	dependencyParseTheoryList	`list< uuid.UUID >`	optional
10	tokenAnnotationTheoryList	`list< uuid.UUID >`	optional
11	entityMentionSetTheoryList	`list< uuid.UUID >`	optional
12	entitySetTheoryList	`list< uuid.UUID >`	optional
13	situationMentionSetTheoryList	`list< uuid.UUID >`	optional
14	situationSetTheoryList	`list< uuid.UUID >`	optional
15	communicationsList	`list< uuid.UUID >`	optional

A struct that holds UUIDs for all theories that a particular
annotation was based upon (and presumably requires).

Producers of TheoryDependencies should list all stages that they
used in constructing their particular annotation. They do not,
however, need to explicitly label *each* stage; they can label
only the immediate stage before them.

Examples:

If you are producing a Tokenization, and only used the
SentenceSegmentation in order to produce that Tokenization, list
only the single SentenceSegmentation UUID in sentenceTheoryList.

In this example, even though the SentenceSegmentation will have
a dependency on some SectionSegmentation, it is not necessary
for the Tokenization to list the SectionSegmentation UUID as a
dependency.

If you are a producer of EntityMentions, and you use two
POSTokenTagging and one NERTokenTagging objects, add the UUIDs for
the POSTokenTagging objects to posTagTheoryList, and the UUID of
the NER TokenTagging to the nerTagTheoryList.

In this example, because multiple annotations influenced the
new annotation, they should all be listed as dependencies.

Struct: Digest

Key	Field	Type	Description	Requiredness
1	bytesValue	`binary`	The following fields define various ways you can store the digest data (for convenience). If none of these meets your needs, then serialize the digest to a byte sequence and store it in bytesValue.	optional
2	int64Value	`i64`		optional
3	doubleValue	`double`		optional
4	stringValue	`string`		optional
5	int64List	`list< i64 >`		optional
6	doubleList	`list< double >`		optional
7	stringList	`list< string >`		optional

Analytic-specific information about an attribute or edge. Digests
are used to combine information from multiple sources to generate a
unified value. The digests generated by an analytic will only ever
be used by that same analytic, so analytics can feel free to encode
information in whatever way is convenient.

Struct: AnnotationMetadata

Key	Field	Type	Description	Requiredness	Default value
1	tool	`string`	The name of the tool that generated this annotation.	required
2	timestamp	`i64`	The time at which this annotation was generated (in unix time UTC -- i.e., seconds since January 1, 1970).	required
4	digest	`Digest`	A Digest, carrying over any information the annotation metadata wishes to carry over.	optional
5	dependencies	`TheoryDependencies`	The theories that supported this annotation. An empty field indicates that the theory has no dependencies (e.g., an ingester).	optional
6	kBest	`i32`	An integer that represents a ranking for systems that output k-best lists. For systems that do not output k-best lists, the default value (1) should suffice.	required	`1`

Metadata associated with an annotation or a set of annotations,
that identifies where those annotations came from.

Struct: CommunicationMetadata

Key	Field	Type	Description	Requiredness
1	tweetInfo	`twitter.TweetInfo`	Extra information for communications where kind==TWEET: Information about this tweet that is provided by the Twitter API. For information about the Twitter API, see: https://dev.twitter.com/docs/platform-objects	optional
2	emailInfo	`email.EmailCommunicationInfo`	Extra information for communications where kind==EMAIL	optional
3	nitfInfo	`nitf.NITFInfo`	Extra information that may come from the NITF (News Industry Text Format) schema. See 'nitf.thrift'.	optional

Metadata specific to a particular Communication object.
This might include corpus-specific metadata (from the Twitter API),
attributes associated with the Communication (the author),
or other information about the Communication.

Thrift module: metadata

Data structures

Struct: TheoryDependencies

Struct: Digest

Struct: AnnotationMetadata

Struct: CommunicationMetadata