Top-level Functions
spacy.load functionNeeds model
Load a model via its shortcut link, the name of an
installed model package, a unicode path or
a Path-like object. spaCy will try resolving the load argument in this order.
If a model is loaded from a shortcut link or package name, spaCy will assume
it’s a Python package and import it and call the model’s own load() method. If
a model is loaded from a path, spaCy will assume it’s a data directory, read the
language and pipeline settings off the meta.json and initialize the Language
class. The data will be loaded in via
Language.from_disk.
| Name | Type | Description |
|---|---|---|
name | unicode / Path | Model to load, i.e. shortcut link, package name or path. |
disable | list | Names of pipeline components to disable. |
| RETURNS | Language | A Language object with the loaded model. |
Essentially, spacy.load() is a convenience wrapper that reads the language ID
and pipeline components from a model’s meta.json, initializes the Language
class, loads in the model data and returns it.
Abstract example
cls = util.get_lang_class(lang) # get language for ID, e.g. 'en' nlp = cls() # initialise the language for name in pipeline: component = nlp.create_pipe(name) # create each pipeline component nlp.add_pipe(component) # add component to pipeline nlp.from_disk(model_data_path) # load in model data
spacy.blank functionv2.0
Create a blank model of a given language class. This function is the twin of
spacy.load().
| Name | Type | Description |
|---|---|---|
name | unicode | ISO code of the language class to load. |
disable | list | Names of pipeline components to disable. |
| RETURNS | Language | An empty Language object of the appropriate subclass. |
spacy.info function
The same as the info command. Pretty-print information about
your installation, models and local setup from within spaCy. To get the model
meta data as a dictionary instead, you can use the meta attribute on your
nlp object with a loaded model, e.g. nlp.meta.
| Name | Type | Description |
|---|---|---|
model | unicode | A model, i.e. shortcut link, package name or path (optional). |
markdown | bool | Print information as Markdown. |
spacy.explain function
Get a description for a given POS tag, dependency label or entity type. For a
list of available terms, see
glossary.py.
| Name | Type | Description |
|---|---|---|
term | unicode | Term to explain. |
| RETURNS | unicode | The explanation, or None if not found in the glossary. |
spacy.prefer_gpu functionv2.0.14
Allocate data and perform operations on GPU, if available. If data has already been allocated on CPU, it will not be moved. Ideally, this function should be called right after importing spaCy and before loading any models.
| Name | Type | Description |
|---|---|---|
| RETURNS | bool | Whether the GPU was activated. |
spacy.require_gpu functionv2.0.14
Allocate data and perform operations on GPU. Will raise an error if no GPU is available. If data has already been allocated on CPU, it will not be moved. Ideally, this function should be called right after importing spaCy and before loading any models.
| Name | Type | Description |
|---|---|---|
| RETURNS | bool | True |
displaCy
As of v2.0, spaCy comes with a built-in visualization suite. For more info and examples, see the usage guide on visualizing spaCy.
displacy.serve methodv2.0
Serve a dependency parse tree or named entity visualization to view it in your browser. Will run a simple web server.
| Name | Type | Description | Default |
|---|---|---|---|
docs | list, Doc, Span | Document(s) to visualize. | |
style | unicode | Visualization style, 'dep' or 'ent'. | 'dep' |
page | bool | Render markup as full HTML page. | True |
minify | bool | Minify HTML markup. | False |
options | dict | Visualizer-specific options, e.g. colors. | {} |
manual | bool | Don’t parse Doc and instead, expect a dict or list of dicts. See here for formats and examples. | False |
port | int | Port to serve visualization. | 5000 |
host | unicode | Host to serve visualization. | '0.0.0.0' |
displacy.render methodv2.0
Render a dependency parse tree or named entity visualization.
| Name | Type | Description | Default |
|---|---|---|---|
docs | list, Doc, Span | Document(s) to visualize. | |
style | unicode | Visualization style, 'dep' or 'ent'. | 'dep' |
page | bool | Render markup as full HTML page. | False |
minify | bool | Minify HTML markup. | False |
jupyter | bool | Explicitly enable or disable ”Jupyter mode” to return markup ready to be rendered in a notebook. Detected automatically if None. | None |
options | dict | Visualizer-specific options, e.g. colors. | {} |
manual | bool | Don’t parse Doc and instead, expect a dict or list of dicts. See here for formats and examples. | False |
| RETURNS | unicode | Rendered HTML markup. |
Visualizer options
The options argument lets you specify additional settings for each visualizer.
If a setting is not present in the options, the default value will be used.
Dependency Visualizer options
| Name | Type | Description | Default |
|---|---|---|---|
fine_grained | bool | Use fine-grained part-of-speech tags (Token.tag_) instead of coarse-grained tags (Token.pos_). | False |
add_lemma v2.2.4 | bool | Print the lemma’s in a separate row below the token texts. | False |
collapse_punct | bool | Attach punctuation to tokens. Can make the parse more readable, as it prevents long arcs to attach punctuation. | True |
collapse_phrases | bool | Merge noun phrases into one token. | False |
compact | bool | “Compact mode” with square arrows that takes up less space. | False |
color | unicode | Text color (HEX, RGB or color names). | '#000000' |
bg | unicode | Background color (HEX, RGB or color names). | '#ffffff' |
font | unicode | Font name or font family for all text. | 'Arial' |
offset_x | int | Spacing on left side of the SVG in px. | 50 |
arrow_stroke | int | Width of arrow path in px. | 2 |
arrow_width | int | Width of arrow head in px. | 10 / 8 (compact) |
arrow_spacing | int | Spacing between arrows in px to avoid overlaps. | 20 / 12 (compact) |
word_spacing | int | Vertical spacing between words and arcs in px. | 45 |
distance | int | Distance between words in px. | 175 / 150 (compact) |
Named Entity Visualizer options
| Name | Type | Description | Default |
|---|---|---|---|
ents | list | Entity types to highlight (None for all types). | None |
colors | dict | Color overrides. Entity types in uppercase should be mapped to color names or values. | {} |
template v2.2 | unicode | Optional template to overwrite the HTML used to render entity spans. Should be a format string and can use {bg}, {text} and {label}. | see templates.py |
By default, displaCy comes with colors for all
entity types supported by spaCy. If you’re
using custom entity types, you can use the colors setting to add your own
colors for them. Your application or model package can also expose a
spacy_displacy_colors entry point
to add custom labels and their colors automatically.
Utility functions
spaCy comes with a small collection of utility functions located in
spacy/util.py.
Because utility functions are mostly intended for internal use within spaCy,
their behavior may change with future releases. The functions documented on this
page should be safe to use and we’ll try to ensure backwards compatibility.
However, we recommend having additional tests in place if your application
depends on any of spaCy’s utilities.
util.get_data_path function
Get path to the data directory where spaCy looks for models. Defaults to
spacy/data.
| Name | Type | Description |
|---|---|---|
require_exists | bool | Only return path if it exists, otherwise return None. |
| RETURNS | Path / None | Data path or None. |
util.set_data_path function
Set custom path to the data directory where spaCy looks for models.
| Name | Type | Description |
|---|---|---|
path | unicode / Path | Path to new data directory. |
util.get_lang_class function
Import and load a Language class. Allows lazy-loading
language data and importing languages using the
two-letter language code. To add a language code for a custom language class,
you can use the set_lang_class helper.
| Name | Type | Description |
|---|---|---|
lang | unicode | Two-letter language code, e.g. 'en'. |
| RETURNS | Language | Language class. |
util.set_lang_class function
Set a custom Language class name that can be loaded via
get_lang_class. If your model uses a
custom language, this is required so that spaCy can load the correct class from
the two-letter language code.
| Name | Type | Description |
|---|---|---|
name | unicode | Two-letter language code, e.g. 'en'. |
cls | Language | The language class, e.g. English. |
util.lang_class_is_loaded functionv2.1
Check whether a Language class is already loaded. Language classes are
loaded lazily, to avoid expensive setup code associated with the language data.
| Name | Type | Description |
|---|---|---|
name | unicode | Two-letter language code, e.g. 'en'. |
| RETURNS | bool | Whether the class has been loaded. |
util.load_model functionv2.0
Load a model from a shortcut link, package or data path. If called with a
shortcut link or package name, spaCy will assume the model is a Python package
and import and call its load() method. If called with a path, spaCy will
assume it’s a data directory, read the language and pipeline settings from the
meta.json and initialize a Language class. The model data will then be loaded
in via Language.from_disk().
| Name | Type | Description |
|---|---|---|
name | unicode | Package name, shortcut link or model path. |
**overrides | - | Specific overrides, like pipeline components to disable. |
| RETURNS | Language | Language class with the loaded model. |
util.load_model_from_path functionv2.0
Load a model from a data directory path. Creates the Language
class and pipeline based on the directory’s meta.json and then calls
from_disk() with the path. This function also makes
it easy to test a new model that you haven’t packaged yet.
| Name | Type | Description |
|---|---|---|
model_path | unicode | Path to model data directory. |
meta | dict | Model meta data. If False, spaCy will try to load the meta from a meta.json in the same directory. |
**overrides | - | Specific overrides, like pipeline components to disable. |
| RETURNS | Language | Language class with the loaded model. |
util.load_model_from_init_py functionv2.0
A helper function to use in the load() method of a model package’s
__init__.py.
| Name | Type | Description |
|---|---|---|
init_file | unicode | Path to model’s __init__.py, i.e. __file__. |
**overrides | - | Specific overrides, like pipeline components to disable. |
| RETURNS | Language | Language class with the loaded model. |
util.get_model_meta functionv2.0
Get a model’s meta.json from a directory path and validate its contents.
| Name | Type | Description |
|---|---|---|
path | unicode / Path | Path to model directory. |
| RETURNS | dict | The model’s meta data. |
util.is_package function
Check if string maps to a package installed via pip. Mainly used to validate model packages.
| Name | Type | Description |
|---|---|---|
name | unicode | Name of package. |
| RETURNS | bool | True if installed package, False if not. |
util.get_package_path functionv2.0
Get path to an installed package. Mainly used to resolve the location of model packages. Currently imports the package to find its path.
| Name | Type | Description |
|---|---|---|
package_name | unicode | Name of installed package. |
| RETURNS | Path | Path to model package directory. |
util.is_in_jupyter functionv2.0
Check if user is running spaCy from a Jupyter notebook by
detecting the IPython kernel. Mainly used for the
displacy visualizer.
| Name | Type | Description |
|---|---|---|
| RETURNS | bool | True if in Jupyter, False if not. |
util.update_exc function
Update, validate and overwrite
tokenizer exceptions. Used to
combine global exceptions with custom, language-specific exceptions. Will raise
an error if key doesn’t match ORTH values.
| Name | Type | Description |
|---|---|---|
base_exceptions | dict | Base tokenizer exceptions. |
*addition_dicts | dicts | Exception dictionaries to add to the base exceptions, in order. |
| RETURNS | dict | Combined tokenizer exceptions. |
util.compile_prefix_regex function
Compile a sequence of prefix rules into a regex object.
| Name | Type | Description |
|---|---|---|
entries | tuple | The prefix rules, e.g. lang.punctuation.TOKENIZER_PREFIXES. |
| RETURNS | regex | The regex object. to be used for Tokenizer.prefix_search. |
util.compile_suffix_regex function
Compile a sequence of suffix rules into a regex object.
| Name | Type | Description |
|---|---|---|
entries | tuple | The suffix rules, e.g. lang.punctuation.TOKENIZER_SUFFIXES. |
| RETURNS | regex | The regex object. to be used for Tokenizer.suffix_search. |
util.compile_infix_regex function
Compile a sequence of infix rules into a regex object.
| Name | Type | Description |
|---|---|---|
entries | tuple | The infix rules, e.g. lang.punctuation.TOKENIZER_INFIXES. |
| RETURNS | regex | The regex object. to be used for Tokenizer.infix_finditer. |
util.minibatch functionv2.0
Iterate over batches of items. size may be an iterator, so that batch-size can
vary on each step.
| Name | Type | Description |
|---|---|---|
items | iterable | The items to batch up. |
size | int / iterable | The batch size(s). Use util.compounding or util.decaying or for an infinite series of compounding or decaying values. |
| YIELDS | list | The batches. |
util.compounding functionv2.0
Yield an infinite series of compounding values. Each time the generator is called, a value is produced by multiplying the previous value by the compound rate.
| Name | Type | Description |
|---|---|---|
start | int / float | The first value. |
stop | int / float | The maximum value. |
compound | int / float | The compounding factor. |
| YIELDS | int | Compounding values. |
util.decaying functionv2.0
Yield an infinite series of linearly decaying values.
| Name | Type | Description |
|---|---|---|
start | int / float | The first value. |
end | int / float | The maximum value. |
decay | int / float | The decaying factor. |
| YIELDS | int | The decaying values. |
util.itershuffle functionv2.0
Shuffle an iterator. This works by holding bufsize items back and yielding
them sometime later. Obviously, this is not unbiased – but should be good enough
for batching. Larger bufsize means less bias.
| Name | Type | Description |
|---|---|---|
iterable | iterable | Iterator to shuffle. |
bufsize | int | Items to hold back (default: 1000). |
| YIELDS | iterable | The shuffled iterator. |
util.filter_spans functionv2.1.4
Filter a sequence of Span objects and remove duplicates or
overlaps. Useful for creating named entities (where one token can only be part
of one entity) or when merging spans with
Retokenizer.merge. When spans overlap, the
(first) longest span is preferred over shorter spans.
| Name | Type | Description |
|---|---|---|
spans | iterable | The spans to filter. |
| RETURNS | list | The filtered spans. |
Compatibility functions
All Python code is written in an intersection of Python 2 and Python 3. This
is easy in Cython, but somewhat ugly in Python. Logic that deals with Python or
platform compatibility only lives in spacy.compat. To distinguish them from
the builtin functions, replacement functions are suffixed with an underscore,
e.g. unicode_.
| Name | Python 2 | Python 3 |
|---|---|---|
compat.bytes_ | str | bytes |
compat.unicode_ | unicode | str |
compat.basestring_ | basestring | str |
compat.input_ | raw_input | input |
compat.path2str | str(path) with .decode('utf8') | str(path) |
compat.is_config function
Check if a specific configuration of Python version and operating system matches the user’s setup. Mostly used to display targeted error messages.
| Name | Type | Description |
|---|---|---|
python2 | bool | spaCy is executed with Python 2.x. |
python3 | bool | spaCy is executed with Python 3.x. |
windows | bool | spaCy is executed on Windows. |
linux | bool | spaCy is executed on Linux. |
osx | bool | spaCy is executed on OS X or macOS. |
| RETURNS | bool | Whether the specified configuration matches the user’s platform. |

