Data structures
Struct: TheoryDependencies
Key
|
Field
|
Type
|
Description
|
Requiredness
|
Default value
|
1
|
sectionTheoryList
|
list<
uuid.UUID
>
|
|
optional
|
|
2
|
sentenceTheoryList
|
list<
uuid.UUID
>
|
|
optional
|
|
3
|
tokenizationTheoryList
|
list<
uuid.UUID
>
|
|
optional
|
|
4
|
posTagTheoryList
|
list<
uuid.UUID
>
|
|
optional
|
|
5
|
nerTagTheoryList
|
list<
uuid.UUID
>
|
|
optional
|
|
6
|
lemmaTheoryList
|
list<
uuid.UUID
>
|
|
optional
|
|
7
|
langIdTheoryList
|
list<
uuid.UUID
>
|
|
optional
|
|
8
|
parseTheoryList
|
list<
uuid.UUID
>
|
|
optional
|
|
9
|
dependencyParseTheoryList
|
list<
uuid.UUID
>
|
|
optional
|
|
10
|
tokenAnnotationTheoryList
|
list<
uuid.UUID
>
|
|
optional
|
|
11
|
entityMentionSetTheoryList
|
list<
uuid.UUID
>
|
|
optional
|
|
12
|
entitySetTheoryList
|
list<
uuid.UUID
>
|
|
optional
|
|
13
|
situationMentionSetTheoryList
|
list<
uuid.UUID
>
|
|
optional
|
|
14
|
situationSetTheoryList
|
list<
uuid.UUID
>
|
|
optional
|
|
15
|
communicationsList
|
list<
uuid.UUID
>
|
|
optional
|
|
A struct that holds UUIDs for all theories that a particular
annotation was based upon (and presumably requires).
Producers of TheoryDependencies should list all stages that they
used in constructing their particular annotation. They do not,
however, need to explicitly label *each* stage; they can label
only the immediate stage before them.
Examples:
If you are producing a Tokenization, and only used the
SentenceSegmentation in order to produce that Tokenization, list
only the single SentenceSegmentation UUID in sentenceTheoryList.
In this example, even though the SentenceSegmentation will have
a dependency on some SectionSegmentation, it is not necessary
for the Tokenization to list the SectionSegmentation UUID as a
dependency.
If you are a producer of EntityMentions, and you use two
POSTokenTagging and one NERTokenTagging objects, add the UUIDs for
the POSTokenTagging objects to posTagTheoryList, and the UUID of
the NER TokenTagging to the nerTagTheoryList.
In this example, because multiple annotations influenced the
new annotation, they should all be listed as dependencies.
Struct: Digest
Key
|
Field
|
Type
|
Description
|
Requiredness
|
Default value
|
1
|
bytesValue
|
binary
|
The following fields define various ways you can store the
digest data (for convenience). If none of these meets your
needs, then serialize the digest to a byte sequence and store it
in bytesValue.
|
optional
|
|
2
|
int64Value
|
i64
|
|
optional
|
|
3
|
doubleValue
|
double
|
|
optional
|
|
4
|
stringValue
|
string
|
|
optional
|
|
5
|
int64List
|
list<
i64
>
|
|
optional
|
|
6
|
doubleList
|
list<
double
>
|
|
optional
|
|
7
|
stringList
|
list<
string
>
|
|
optional
|
|
Analytic-specific information about an attribute or edge. Digests
are used to combine information from multiple sources to generate a
unified value. The digests generated by an analytic will only ever
be used by that same analytic, so analytics can feel free to encode
information in whatever way is convenient.
Key
|
Field
|
Type
|
Description
|
Requiredness
|
Default value
|
1
|
tool
|
string
|
The name of the tool that generated this annotation.
|
required
|
|
2
|
timestamp
|
i64
|
The time at which this annotation was generated (in unix time
UTC -- i.e., seconds since January 1, 1970).
|
required
|
|
4
|
digest
|
Digest
|
A Digest, carrying over any information the annotation metadata
wishes to carry over.
|
optional
|
|
5
|
dependencies
|
TheoryDependencies
|
The theories that supported this annotation.
An empty field indicates that the theory has no
dependencies (e.g., an ingester).
|
optional
|
|
6
|
kBest
|
i32
|
An integer that represents a ranking for systems
that output k-best lists.
For systems that do not output k-best lists,
the default value (1) should suffice.
|
required
|
1
|
Metadata associated with an annotation or a set of annotations,
that identifies where those annotations came from.
Key
|
Field
|
Type
|
Description
|
Requiredness
|
Default value
|
1
|
tweetInfo
|
twitter.TweetInfo
|
Extra information for communications where kind==TWEET:
Information about this tweet that is provided by the Twitter
API. For information about the Twitter API, see:
https://dev.twitter.com/docs/platform-objects
|
optional
|
|
2
|
emailInfo
|
email.EmailCommunicationInfo
|
Extra information for communications where kind==EMAIL
|
optional
|
|
3
|
nitfInfo
|
nitf.NITFInfo
|
Extra information that may come from the NITF
(News Industry Text Format) schema. See 'nitf.thrift'.
|
optional
|
|
Metadata specific to a particular Communication object.
This might include corpus-specific metadata (from the Twitter API),
attributes associated with the Communication (the author),
or other information about the Communication.