Welcome to CoreNLP XML Library’s documentation!¶
This library is designed to add a data model over Stanford CoreNLP’s basic XML output.
The Document class is designed to provide lazy-loaded access to information from syntax, coreference, and dependency parse structures within the XML.
What You Can Do With This Library¶
Some code examples:
from corenlp_xml import Document
doc = Document(xml_string)
# The first sentence
s1 = doc.sentences[0]
# Noun phrases for the first sentence
s1_nps = s1.phrase_strings("np")
# Text of semantic head of first sentence
s1_head = s1.semantic_head.text
# Find all representative coreferences matching noun phrases in sentence 1
s1_corefs = [coref for coref in doc.coreferences
if coref.representative and coref.sentence == s1]
Contents:
The Document Model¶
Sub-module for handling document-level stuff
- class corenlp_xml.document.Document(xml_string)[source]¶
This class abstracts a Stanford CoreNLP Document
- coreferences[source]¶
Returns a list of Coreference classes
Getter: Returns a list of coreferences Type: list of corenlp_xml.coreference.Coreference
- get_sentence_by_id(id)[source]¶
Gets sentence by ID
Parameters: id (int) – the ID of the sentence, as defined in the XML Returns: a sentence Return type: corenlp_xml.document.Sentence
- class corenlp_xml.document.Sentence(element)[source]¶
This abstracts a sentence
- basic_dependencies[source]¶
Accesses basic dependencies from the XML output
Getter: Returns the dependency graph for basic dependencies Type: corenlp_xml.dependencies.DependencyGraph
- collapsed_ccprocessed_dependencies[source]¶
Accesses collapsed, CC-processed dependencies
Getter: Returns the dependency graph for collapsed and cc processed dependencies Type: corenlp_xml.dependencies.DependencyGraph
- collapsed_dependencies[source]¶
Accessess collapsed dependencies for this sentence
Getter: Returns the dependency graph for collapsed dependencies Type: corenlp_xml.dependencies.DependencyGraph
- get_token_by_id(id)[source]¶
Accesses token by the XML ID
Parameters: id (int) – The XML ID of the token Returns: The token Return type: corenlp_xml.document.Token
- parse[source]¶
Accesses the parse tree based on the S-expression parse string in the XML
Getter: Returns the NLTK parse tree Type: nltk.Tree
- parse_string[source]¶
Accesses the S-Expression parse string stored on the XML document
Getter: Returns the parse string Type: str
- phrase_strings(phrase_type)[source]¶
Returns strings corresponding all phrases matching a given phrase type
Parameters: phrase_type (str) – POS such as “NP”, “VP”, “det”, etc. Returns: a list of strings representing those phrases
- semantic_head[source]¶
Returns the semantic head of the sentence – AKA the dependent of the root node of the dependency parse
Returns: the mention related to the semantic head Return type: corenlp_xml.coreference.Mention
- sentiment[source]¶
The sentiment of this sentence
Getter: Returns the sentiment value of this sentence Type: int
- class corenlp_xml.document.Token(element)[source]¶
Wraps the token XML element
- character_offset_begin[source]¶
Lazy-loads character offset begin node
Getter: Returns the integer value of the beginning offset Type: int
- character_offset_end[source]¶
Lazy-loads character offset end node
Getter: Returns the integer value of the ending offset Type: int
- lemma[source]¶
Lazy-loads the lemma for this word
Getter: Returns the plain string value of the word lemma Type: str
- ner[source]¶
Lazy-loads the NER for this word
Getter: Returns the plain string value of the NER tag for the word Type: str
- pos[source]¶
Lazy-loads the part of speech tag for this word
Getter: Returns the plain string value of the POS tag for the word Type: str
Interacting with Dependencies¶
This component is responsible for managing dependency parses
- class corenlp_xml.dependencies.DependencyGraph(element)[source]¶
Dependency graph, models a dependency parse
- get_node_by_idx(idx)[source]¶
Stores each distinct node in a dict
Parameters: idx (int) – the “idx” value of the node Returns: the node instance for that index Type: corenlp_xml.dependencies.DependencyNode
- class corenlp_xml.dependencies.DependencyLink(graph, element)[source]¶
Represents a relationship between two nodes in a dependency graph
- class corenlp_xml.dependencies.DependencyNode(graph, element)[source]¶
Represents a node in a dependency graph
- dependent(dep_type, node)[source]¶
Registers a node as dependent on this node
Parameters: - dep_type (str) – The dependency type
- node (corenlp_xml.dependencies.DependencyNode) – The node to be registered as a dependent
Returns: self, provides fluent interface
Return type: corenlp_xml.dependencies.DependencyNode
- dependents[source]¶
Gets dependent nodes
Getter: returns a flat list of all governing nodes Type: list of corenlp_xml.dependencies.DependencyNode
- dependents_by_type(dep_type)[source]¶
Gets the dependents of this node by a given dependency type
Parameters: dep_type (str) – The dependency type Returns: dependents matching the provided type
- governor(dep_type, node)[source]¶
Registers a node as governing this node
Parameters: - dep_type (str) – The dependency type
- node –
Returns: self, provides fluent interface
Return type: corenlp_xml.dependencies.DependencyNode
- governors[source]¶
Gets governing nodes
Getter: returns a flat list of all governing nodes Type: list of corenlp_xml.dependencies.DependencyNode
- governors_by_type(dep_type)[source]¶
Gets the governors of this node filtered by a dependency type
Parameters: dep_type (str) – The dependency type Returns: governors matching the provided type
- classmethod load(graph, element)[source]¶
Instantiates the node in the graph if it’s not already stored in the graph
Parameters: - graph (corenlp_xml.dependencies.DependencyGraph) – The dependency graph this node is a member of
- element (lxml.ElementBase) – The lxml element wrapping the node
- text = None¶
These properties are dicts of link type to node
Coreference Resolution¶
This library is responsible for handling coreference resolution parsing from the XML output
- class corenlp_xml.coreference.Coreference(document, element)[source]¶
Reflects a grouping of mentions
- class corenlp_xml.coreference.Mention(coref, element)[source]¶
Reflects a given mention
- head[source]¶
The token serving as the “head” of the mention
Getter: the token corresponding to the head Type: corenlp_xml.document.Token
- representative[source]¶
Interprets and normalizes the “representative” attribute”
Getter: determines whether the mention is representative Type: bool
- sentence[source]¶
The sentence related to this mention
Getter: returns the sentence this mention relates to Type: corenlp_xml.document.Sentence