Class Corpus
java.lang.Object
org.episteme.social.linguistics.loaders.tigerxml.Corpus
- All Implemented Interfaces:
Serializable
Represents a linguistic corpus in TIGER-XML format.
A corpus contains a sequence of annotated sentences, each with its own
syntactic tree structure (composed of terminals and non-terminals).
It also stores metadata and annotation specifications from the <head> section.
- Since:
- 1.0
- Author:
- Silvere Martin-Michiellot, Gemini AI (Google DeepMind)
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidaddAttribute(String name, String value) voidaddSentence(Sentence sentence) booleangetAllTs()getAttribute(String name) getGraphNode(String id) getGraphNodeBySpan(String span) Finds a node by its MMAX span string.getId()getSentence(int index) getSentence(String id) intgetTerminal(String id) getText()intinthashCode()static CorpusloadSerialized(String fileName) voidserializeToDisk(String fileName) voidsetAnnotationMetadata(AnnotationMetadata metadata) voidvoidsetVerbosity(int verbosity) toString()
-
Constructor Details
-
Corpus
public Corpus()Creates an empty Corpus. -
Corpus
Loads a corpus from a TIGER-XML file.- Parameters:
fileName- the XML file path.
-
Corpus
Loads a corpus from a TIGER-XML file with specified verbosity.- Parameters:
fileName- the XML file path.verbosity- verbosity level (0-5).
-
Corpus
Creates a corpus from a DOM root element.- Parameters:
root- the<corpus>element.
-
-
Method Details
-
getId
-
setId
-
getVerbosity
public int getVerbosity() -
setVerbosity
public void setVerbosity(int verbosity) -
getSentenceCount
public int getSentenceCount() -
getSentences
-
getSentence
-
getSentence
-
addSentence
-
getTerminal
-
getNT
-
getGraphNode
-
getGraphNodeBySpan
-
getAllNTs
-
getAllTs
-
getAllGraphNodes
-
addAttribute
-
getAttribute
-
getAnnotationMetadata
-
setAnnotationMetadata
-
getText
-
toString
-
hashCode
-
equals
-
serializeToDisk
- Throws:
IOException
-
loadSerialized
- Throws:
IOExceptionClassNotFoundException
-