kindred.Parser

class kindred.Parser(model='en_core_web_sm')[source]

Runs Spacy on corpus to get sentences and associated tokens

Variables:
  • model – Model for parsing (e.g. en/de/es/pt/fr/it/nl)
  • nlp – The underlying Spacy language model to use for parsing

Methods

__init__(model='en_core_web_sm')[source]

Create a Parser object that will use Spacy for parsing. It offers all the same languages that Spacy offers. Check out: https://spacy.io/usage/models. Note that the language model needs to be downloaded first (e.g. python -m spacy download en)

Parameters:model (str) – Name of an available Spacy language model for parsing (e.g. en/de/es/pt/fr/it/nl)
parse(corpus)[source]

Parse the corpus. Each document will be split into sentences which are then tokenized and parsed for their dependency graph. All parsed information is stored within the corpus object.

Parameters:corpus (kindred.Corpus) – Corpus to parse