kindred.Corpus¶

class kindred.Corpus(text=None, loadFromSimpleTag=False)[source]¶

Collection of text documents.

Variables:	documents – List of `kindred.Document` parsed – Boolean of whether it has been parsed yet. A `kindred.parser` can parse it.

Methods

__init__(text=None, loadFromSimpleTag=False)[source]¶

Create an empty corpus with no documents, or quickly load one with a single document using optional SimpleTag

Parameters:	text (String (with SimpleTag format XML)) – Optional SimpleTag text to initalize a single document loadFromSimpleTag (bool) – If text is provided, whether the text parameter is in the SimpleTag format and will extract entities and relations accordingly

addDocument(doc)[source]¶

Add a single document to the corpus

Parameters:	doc (kindred.Document) – Document to add

clone()[source]¶

Clone the corpus

Returns:	Clone of the corpus
Return type:	kindred.Corpus

getRelations()[source]¶

Get all relations in this corpus

Returns:	List of relations
Return type:	list

nfold_split(folds)[source]¶

Method for splitting up the corpus multiple times and is used for an n-fold cross validation approach (as a generator). Each iteration, the training and test set for that fold are provided.

Parameters:	folds (int) – Number of folds to create
Returns:	Tuple of training and test corpus (for iterations=folds)
Return type:	(kindred.Corpus,kindred.Corpus)

removeEntities()[source]¶: Remove all entities in this corpus

removeRelations()[source]¶: Remove all relations in this corpus

split(trainFraction)[source]¶

Randomly split the corpus into two corpus for use as a training and test set

Parameters:	trainFraction (float) – Fraction of documents to use in training set
Returns:	Tuple of training and test corpus
Return type:	(kindred.Corpus,kindred.Corpus)

splitIntoSentences()[source]¶

Create a new corpus with one document for each sentence in this corpus.

Returns:	Corpus with one document per sentence
Return type:	kindred.Corpus