Top-level Functions · spaCy API Documentation

spacy.load functionNeeds model

Load a model via its shortcut link, the name of an installed model package, a unicode path or a Path-like object. spaCy will try resolving the load argument in this order. If a model is loaded from a shortcut link or package name, spaCy will assume it’s a Python package and import it and call the model’s own load() method. If a model is loaded from a path, spaCy will assume it’s a data directory, read the language and pipeline settings off the meta.json and initialize the Language class. The data will be loaded in via Language.from_disk.

Example

nlp = spacy.load("en") # shortcut link
nlp = spacy.load("en_core_web_sm") # package
nlp = spacy.load("/path/to/en") # unicode path
nlp = spacy.load(Path("/path/to/en")) # pathlib Path

nlp = spacy.load("en_core_web_sm", disable=["parser", "tagger"])

Name	Type	Description
`name`	unicode / `Path`	Model to load, i.e. shortcut link, package name or path.
`disable`	list	Names of pipeline components to disable.
RETURNS	`Language`	A `Language` object with the loaded model.

Essentially, spacy.load() is a convenience wrapper that reads the language ID and pipeline components from a model’s meta.json, initializes the Language class, loads in the model data and returns it.

Abstract example
cls = util.get_lang_class(lang)         #  get language for ID, e.g. 'en'
nlp = cls()                             #  initialise the language
for name in pipeline: component = nlp.create_pipe(name)   #  create each pipeline component nlp.add_pipe(component)             #  add component to pipeline
nlp.from_disk(model_data_path)          #  load in model data

Changed in v2.0

As of spaCy 2.0, the path keyword argument is deprecated. spaCy will also raise an error if no model could be loaded and never just return an empty Language object. If you need a blank language, you can use the new function spacy.blank() or import the class explicitly, e.g. from spacy.lang.en import English.

- nlp = spacy.load("en", path="/model")
+ nlp = spacy.load("/model")

spacy.blank functionv2.0

Create a blank model of a given language class. This function is the twin of spacy.load().

Name	Type	Description
`name`	unicode	ISO code of the language class to load.
`disable`	list	Names of pipeline components to disable.
RETURNS	`Language`	An empty `Language` object of the appropriate subclass.

spacy.info function

The same as the info command. Pretty-print information about your installation, models and local setup from within spaCy. To get the model meta data as a dictionary instead, you can use the meta attribute on your nlp object with a loaded model, e.g. nlp.meta.

Name	Type	Description
`model`	unicode	A model, i.e. shortcut link, package name or path (optional).
`markdown`	bool	Print information as Markdown.

spacy.explain function

Get a description for a given POS tag, dependency label or entity type. For a list of available terms, see glossary.py.

Example

spacy.explain("NORP")
# Nationalities or religious or political groups

doc = nlp("Hello world")
for word in doc:
   print(word.text, word.tag_, spacy.explain(word.tag_))
# Hello UH interjection
# world NN noun, singular or mass

Name	Type	Description
`term`	unicode	Term to explain.
RETURNS	unicode	The explanation, or `None` if not found in the glossary.

spacy.prefer_gpu functionv2.0.14

Allocate data and perform operations on GPU, if available. If data has already been allocated on CPU, it will not be moved. Ideally, this function should be called right after importing spaCy and before loading any models.

Name	Type	Description
RETURNS	bool	Whether the GPU was activated.

spacy.require_gpu functionv2.0.14

Allocate data and perform operations on GPU. Will raise an error if no GPU is available. If data has already been allocated on CPU, it will not be moved. Ideally, this function should be called right after importing spaCy and before loading any models.

Name	Type	Description
RETURNS	bool	`True`

displaCy
Source

As of v2.0, spaCy comes with a built-in visualization suite. For more info and examples, see the usage guide on visualizing spaCy.

displacy.serve methodv2.0

Serve a dependency parse tree or named entity visualization to view it in your browser. Will run a simple web server.

Name	Type	Description	Default
`docs`	list, `Doc`, `Span`	Document(s) to visualize.
`style`	unicode	Visualization style, `'dep'` or `'ent'`.	`'dep'`
`page`	bool	Render markup as full HTML page.	`True`
`minify`	bool	Minify HTML markup.	`False`
`options`	dict	Visualizer-specific options, e.g. colors.	`{}`
`manual`	bool	Don’t parse `Doc` and instead, expect a dict or list of dicts. See here for formats and examples.	`False`
`port`	int	Port to serve visualization.	`5000`
`host`	unicode	Host to serve visualization.	`'0.0.0.0'`

displacy.render methodv2.0

Render a dependency parse tree or named entity visualization.

Name	Type	Description	Default
`docs`	list, `Doc`, `Span`	Document(s) to visualize.
`style`	unicode	Visualization style, `'dep'` or `'ent'`.	`'dep'`
`page`	bool	Render markup as full HTML page.	`False`
`minify`	bool	Minify HTML markup.	`False`
`jupyter`	bool	Explicitly enable or disable ”Jupyter mode” to return markup ready to be rendered in a notebook. Detected automatically if `None`.	`None`
`options`	dict	Visualizer-specific options, e.g. colors.	`{}`
`manual`	bool	Don’t parse `Doc` and instead, expect a dict or list of dicts. See here for formats and examples.	`False`
RETURNS	unicode	Rendered HTML markup.

Visualizer options

The options argument lets you specify additional settings for each visualizer. If a setting is not present in the options, the default value will be used.

Dependency Visualizer options

Name	Type	Description	Default
`fine_grained`	bool	Use fine-grained part-of-speech tags (`Token.tag_`) instead of coarse-grained tags (`Token.pos_`).	`False`
`add_lemma` v2.2.4	bool	Print the lemma’s in a separate row below the token texts.	`False`
`collapse_punct`	bool	Attach punctuation to tokens. Can make the parse more readable, as it prevents long arcs to attach punctuation.	`True`
`collapse_phrases`	bool	Merge noun phrases into one token.	`False`
`compact`	bool	“Compact mode” with square arrows that takes up less space.	`False`
`color`	unicode	Text color (HEX, RGB or color names).	`'#000000'`
`bg`	unicode	Background color (HEX, RGB or color names).	`'#ffffff'`
`font`	unicode	Font name or font family for all text.	`'Arial'`
`offset_x`	int	Spacing on left side of the SVG in px.	`50`
`arrow_stroke`	int	Width of arrow path in px.	`2`
`arrow_width`	int	Width of arrow head in px.	`10` / `8` (compact)
`arrow_spacing`	int	Spacing between arrows in px to avoid overlaps.	`20` / `12` (compact)
`word_spacing`	int	Vertical spacing between words and arcs in px.	`45`
`distance`	int	Distance between words in px.	`175` / `150` (compact)

Named Entity Visualizer options

Name	Type	Description	Default
`ents`	list	Entity types to highlight (`None` for all types).	`None`
`colors`	dict	Color overrides. Entity types in uppercase should be mapped to color names or values.	`{}`
`template` v2.2	unicode	Optional template to overwrite the HTML used to render entity spans. Should be a format string and can use `{bg}`, `{text}` and `{label}`.	see `templates.py`

By default, displaCy comes with colors for all entity types supported by spaCy. If you’re using custom entity types, you can use the colors setting to add your own colors for them. Your application or model package can also expose a spacy_displacy_colors entry point to add custom labels and their colors automatically.

Utility functions
Source

spaCy comes with a small collection of utility functions located in spacy/util.py. Because utility functions are mostly intended for internal use within spaCy, their behavior may change with future releases. The functions documented on this page should be safe to use and we’ll try to ensure backwards compatibility. However, we recommend having additional tests in place if your application depends on any of spaCy’s utilities.

util.get_data_path function

Get path to the data directory where spaCy looks for models. Defaults to spacy/data.

Name	Type	Description
`require_exists`	bool	Only return path if it exists, otherwise return `None`.
RETURNS	`Path` / `None`	Data path or `None`.

util.set_data_path function

Set custom path to the data directory where spaCy looks for models.

Name	Type	Description
`path`	unicode / `Path`	Path to new data directory.

util.get_lang_class function

Import and load a Language class. Allows lazy-loading language data and importing languages using the two-letter language code. To add a language code for a custom language class, you can use the set_lang_class helper.

Name	Type	Description
`lang`	unicode	Two-letter language code, e.g. `'en'`.
RETURNS	`Language`	Language class.

util.set_lang_class function

Set a custom Language class name that can be loaded via get_lang_class. If your model uses a custom language, this is required so that spaCy can load the correct class from the two-letter language code.

Name	Type	Description
`name`	unicode	Two-letter language code, e.g. `'en'`.
`cls`	`Language`	The language class, e.g. `English`.

util.lang_class_is_loaded functionv2.1

Check whether a Language class is already loaded. Language classes are loaded lazily, to avoid expensive setup code associated with the language data.

Name	Type	Description
`name`	unicode	Two-letter language code, e.g. `'en'`.
RETURNS	bool	Whether the class has been loaded.

util.load_model functionv2.0

Load a model from a shortcut link, package or data path. If called with a shortcut link or package name, spaCy will assume the model is a Python package and import and call its load() method. If called with a path, spaCy will assume it’s a data directory, read the language and pipeline settings from the meta.json and initialize a Language class. The model data will then be loaded in via Language.from_disk().

Name	Type	Description
`name`	unicode	Package name, shortcut link or model path.
`**overrides`	-	Specific overrides, like pipeline components to disable.
RETURNS	`Language`	`Language` class with the loaded model.

util.load_model_from_path functionv2.0

Load a model from a data directory path. Creates the Language class and pipeline based on the directory’s meta.json and then calls from_disk() with the path. This function also makes it easy to test a new model that you haven’t packaged yet.

Name	Type	Description
`model_path`	unicode	Path to model data directory.
`meta`	dict	Model meta data. If `False`, spaCy will try to load the meta from a meta.json in the same directory.
`**overrides`	-	Specific overrides, like pipeline components to disable.
RETURNS	`Language`	`Language` class with the loaded model.

util.load_model_from_init_py functionv2.0

A helper function to use in the load() method of a model package’s __init__.py.

Name	Type	Description
`init_file`	unicode	Path to model’s `__init__.py`, i.e. `__file__`.
`**overrides`	-	Specific overrides, like pipeline components to disable.
RETURNS	`Language`	`Language` class with the loaded model.

util.get_model_meta functionv2.0

Get a model’s meta.json from a directory path and validate its contents.

Name	Type	Description
`path`	unicode / `Path`	Path to model directory.
RETURNS	dict	The model’s meta data.

util.is_package function

Check if string maps to a package installed via pip. Mainly used to validate model packages.

Name	Type	Description
`name`	unicode	Name of package.
RETURNS	`bool`	`True` if installed package, `False` if not.

util.get_package_path functionv2.0

Get path to an installed package. Mainly used to resolve the location of model packages. Currently imports the package to find its path.

Name	Type	Description
`package_name`	unicode	Name of installed package.
RETURNS	`Path`	Path to model package directory.

util.is_in_jupyter functionv2.0

Check if user is running spaCy from a Jupyter notebook by detecting the IPython kernel. Mainly used for the displacy visualizer.

Name	Type	Description
RETURNS	bool	`True` if in Jupyter, `False` if not.

util.update_exc function

Update, validate and overwrite tokenizer exceptions. Used to combine global exceptions with custom, language-specific exceptions. Will raise an error if key doesn’t match ORTH values.

Example

BASE =  {"a.": [{ORTH: "a."}], ":)": [{ORTH: ":)"}]}
NEW = {"a.": [{ORTH: "a.", NORM: "all"}]}
exceptions = util.update_exc(BASE, NEW)
# {"a.": [{ORTH: "a.", NORM: "all"}], ":)": [{ORTH: ":)"}]}

Name	Type	Description
`base_exceptions`	dict	Base tokenizer exceptions.
`*addition_dicts`	dicts	Exception dictionaries to add to the base exceptions, in order.
RETURNS	dict	Combined tokenizer exceptions.

util.compile_prefix_regex function

Compile a sequence of prefix rules into a regex object.

Name	Type	Description
`entries`	tuple	The prefix rules, e.g. `lang.punctuation.TOKENIZER_PREFIXES`.
RETURNS	regex	The regex object. to be used for `Tokenizer.prefix_search`.

util.compile_suffix_regex function

Compile a sequence of suffix rules into a regex object.

Name	Type	Description
`entries`	tuple	The suffix rules, e.g. `lang.punctuation.TOKENIZER_SUFFIXES`.
RETURNS	regex	The regex object. to be used for `Tokenizer.suffix_search`.

util.compile_infix_regex function

Compile a sequence of infix rules into a regex object.

Name	Type	Description
`entries`	tuple	The infix rules, e.g. `lang.punctuation.TOKENIZER_INFIXES`.
RETURNS	regex	The regex object. to be used for `Tokenizer.infix_finditer`.

util.minibatch functionv2.0

Iterate over batches of items. size may be an iterator, so that batch-size can vary on each step.

Name	Type	Description
`items`	iterable	The items to batch up.
`size`	int / iterable	The batch size(s). Use `util.compounding` or `util.decaying` or for an infinite series of compounding or decaying values.
YIELDS	list	The batches.

util.compounding functionv2.0

Yield an infinite series of compounding values. Each time the generator is called, a value is produced by multiplying the previous value by the compound rate.

Name	Type	Description
`start`	int / float	The first value.
`stop`	int / float	The maximum value.
`compound`	int / float	The compounding factor.
YIELDS	int	Compounding values.

util.decaying functionv2.0

Yield an infinite series of linearly decaying values.

Name	Type	Description
`start`	int / float	The first value.
`end`	int / float	The maximum value.
`decay`	int / float	The decaying factor.
YIELDS	int	The decaying values.

util.itershuffle functionv2.0

Shuffle an iterator. This works by holding bufsize items back and yielding them sometime later. Obviously, this is not unbiased – but should be good enough for batching. Larger bufsize means less bias.

Name	Type	Description
`iterable`	iterable	Iterator to shuffle.
`bufsize`	int	Items to hold back (default: 1000).
YIELDS	iterable	The shuffled iterator.

util.filter_spans functionv2.1.4

Filter a sequence of Span objects and remove duplicates or overlaps. Useful for creating named entities (where one token can only be part of one entity) or when merging spans with Retokenizer.merge. When spans overlap, the (first) longest span is preferred over shorter spans.

Name	Type	Description
`spans`	iterable	The spans to filter.
RETURNS	list	The filtered spans.

Compatibility functions
Source

All Python code is written in an intersection of Python 2 and Python 3. This is easy in Cython, but somewhat ugly in Python. Logic that deals with Python or platform compatibility only lives in spacy.compat. To distinguish them from the builtin functions, replacement functions are suffixed with an underscore, e.g. unicode_.

Name	Python 2	Python 3
`compat.bytes_`	`str`	`bytes`
`compat.unicode_`	`unicode`	`str`
`compat.basestring_`	`basestring`	`str`
`compat.input_`	`raw_input`	`input`
`compat.path2str`	`str(path)` with `.decode('utf8')`	`str(path)`

compat.is_config function

Check if a specific configuration of Python version and operating system matches the user’s setup. Mostly used to display targeted error messages.

Name	Type	Description
`python2`	bool	spaCy is executed with Python 2.x.
`python3`	bool	spaCy is executed with Python 3.x.
`windows`	bool	spaCy is executed on Windows.
`linux`	bool	spaCy is executed on Linux.
`osx`	bool	spaCy is executed on OS X or macOS.
RETURNS	bool	Whether the specified configuration matches the user’s platform.

Sep	OCT	Nov
	20
2019	2020	2021

Overview

spacy.load functionNeeds model

Abstract example

spacy.blank functionv2.0

spacy.info function

spacy.explain function

spacy.prefer_gpu functionv2.0.14

spacy.require_gpu functionv2.0.14

displaCy Source

displacy.serve methodv2.0

displacy.render methodv2.0

Visualizer options

Dependency Visualizer options

Named Entity Visualizer options

Utility functions Source

util.get_data_path function

util.set_data_path function

util.get_lang_class function

util.set_lang_class function

util.lang_class_is_loaded functionv2.1

util.load_model functionv2.0

util.load_model_from_path functionv2.0

util.load_model_from_init_py functionv2.0

util.get_model_meta functionv2.0

util.is_package function

util.get_package_path functionv2.0

util.is_in_jupyter functionv2.0

util.update_exc function

util.compile_prefix_regex function

util.compile_suffix_regex function

util.compile_infix_regex function

util.minibatch functionv2.0

util.compounding functionv2.0

util.decaying functionv2.0

util.itershuffle functionv2.0

util.filter_spans functionv2.1.4

Compatibility functions Source

compat.is_config function

displaCy
Source

Utility functions
Source

Compatibility functions
Source