explosion/spaCy

User group for the spaCy Natural Language Processing tools

John Anderson

@sontek

I have a token that is a left bracket, that was parsed from the sentence: [carlota] Chicas, ponedla aquí.

(Pdb) pp token
[

If I check if it is a punctuation it says yes:

(Pdb) token.is_punct
True

But then I get the part of speech and it says PROPN not PUNCT:

(Pdb) token.pos_
'PROPN'

Sam Petulla

@spetulla_twitter

@alepiscopo Did the model finish? What is your machine setup?

Carsten Schnober

@carschno

is it possible to add vectors to an existing model?

I would like to use FastText vectors in nl_core_news_sm

so I can create a new model with python3 -m spacy init-model nl ..., but then I won't have the other pipeline components like sentencizer, NER etc. in that new model

jai priyadarshi

@jaipriyadarshicode

I re-trained my custom SpaCy model? Whats the method or how should I evaluate its accuracy?

Gustavo Gonçalves

@gsgoncalves

@alepiscopo When you finished building the KB you didn't get an "The nlp object should have a pretrained ner component." error from the linker training script? If not, what were your parameters to build the KB? Thanks!

Sam Horton

@SavePointSam

I'm in a position where my company is maintaining a fork of spaCy. I'm trying to determine how the build artifacts that are posted to PyPI are generated so that we can build them ourselves. The README explains how to do a local custom build. However, I am in need of posting to a private pip registry. The best I can determine is that it has something to do with the fabfile.py file and the builds are generated and posted through the buildkite service. Can someone help me?

Sam Horton

@SavePointSam

Upon closer look, it appears spaCy builds come from this project https://github.com/explosion/wheelwright

Alessandro Piscopo

@alepiscopo

Hi @spetulla_twitter the training never finished and always ends with an error. I'm using a 4 cores VM on GCP, with 256 GB. I get the error while loading the gold_entities.json file.

@gsgoncalves I never got the error you mentioned. I used the default parameters.

asif-khan17

@asif-khan17

Hi I am new to sapcy, I want to develop a model which gives me the text similarity based on the intent.For example "I like cats" and "I hate cats" should be very dissimilar but when I am using "similarity" it gives me very high similarity.

HendricButz

@HendricButz

Hi,
if i train a model with spacy's cli.train method, a bunch of models is created. Can anyone plz tell me, what the difference between best and final model is?
Couldn't find any documentation about it. ty

Matt Maybeno

@mmaybeno

looking to create a PR but it requires cupy, anyone have suggestions on ways to mock it?

ioli

@Bipinoli

How can I split a sentence based on conjunction like 'but' using Spacy?

Jack Park

@KnowledgeGarden

@Bipinoli I did not split on conjunctions inside spacy but did so in an iterator outside after creating a masterTokens list for each sentence. In my case, it was important to locate the predicate (single-predicate sentence) in order to spot triple structures around that predicate.

Sam Petulla

@spetulla_twitter

@alepiscopo I wasn't able to train, either. Has anyone been able to train with the linking script? Curious how much RAM is needed.

Jonathan Bastnagel

@inkadnb

Hmm, I can't seem to figure out how to deal with compound words that aren't in

the model. For example bucketlist vs bucket list.

In theory the similarly for these two should be basically identical.

Is this something the tokenizer should be handling?

Jonathan Bastnagel

@inkadnb

@asif-khan17 sentiment is what you're looking for not similarity

Haris Jabbar

@MaveriQ

i am trying to download/access the vocabulary used by BERT models in spacy. Just the list of 30k tokens. The 'Vocab.to_disk()' method just gives 1100 tokens. What am I doing wrong?

sim-kon

@sim-kon

Hey spacy enthusiasts, is there an OR-operator for the matcher (except the IN-operator)? Or in other words: How can I include two words in an IN-operator? Example: I want to match also "two rabbits" in pattern = ({'LEMMA': {'IN': ["dog", "cat", "rat"]}} without creating a second pattern. Thanks

agombert

@agombert

Hello everyone, I'm just looking for a way to custom the loss function in the text classification model: I'm doing BERT distillation, and would like to add the regression part in the loss function. Any idea what part I should rewrite or maybe use a custom component instead?

Zain Muhammad

@Zainpann_twitter

Is there anyone who is having a prebuilt model for entity linking, because I dont have enough processing resources to train el model from training file+wikiKB..if yes please share with me.

Alessandro Piscopo

@alepiscopo

@spetulla_twitter I've tried with 312GB, limiting the training set to 1.5M entities, but after 4 days training and not much progress I stopped that because it was costly

It would be good to have an estimate of the time (like time by number of items in the training set) required to train an entity linking. Anybody has got anything like that?

Shivankar

@Sh20092660_gitlab

Hello, How can we append our custom NER model into the standard NER Spacy Model? When I try to append it it actually gets overwritten.

Sam Petulla

@spetulla_twitter

@alepiscopo The issue with limiting the training set is.. there might be really obvious and important entities left out.

I may try it on a large cluster, soon, will let you know. But at least SOME details would be nice..

Shivankar

@Sh20092660_gitlab

Hello, has anyone tried before the method which I posted in the conversation above?

Zain Muhammad

@Zainpann_twitter

@Sh20092660_gitlab are you using prodigy for annotating, in order to get custom NER?

Attenton

@attenton

Hello, I submit an issue #4802 about customizing tokenizer with token_match, can anyone help me with this?

Zain Muhammad

@Zainpann_twitter

https://stackoverflow.com/questions/59316859/displaying-the-description-of-entity-from-kb-id-in-spacy-entity-linking

Zain Muhammad

@Zainpann_twitter

how to get the description from entity linking model? Is there any think to add in this line

ents = [(e.text, e.label_, e.kbid) for e in doc.ents]

Philip McGrath

@LiberalArtist

Can anyone explain why Spacy tags the first word in this sentence as 'NNP' (proper noun) and lemmatizes it as 'Time'? I expected 'NN' (common noun) and 'time'. Sentence:

'Time is therefore that mediating order, homogeneous both with the sensible whose very style very style of dispersion and distention it is, and with the intelligible for which it is the condition of intuition since it lends itself to that intelligible determination that we call "series."'

Romulo Curty Cerqueira

@curtyc

Hi Folks!
Somebody could tell me what is the best way to call spaCy from inside Java code?

jyek

@jyek

Hi, has anyone used Spacy to analyze financial documents or news?

Shivankar

@Sh20092660_gitlab

@Zainpann_twitter No, I'm not using prodigy for annotating

Elijah Rippeth

@erip

Hi all. Has anyone here had success adding language support with a morpohologically rich language? I have plenty of UD data, but there are so many language-specific POS tags:

$ cat *.conllu | grep -v "^#" | awk '{ print $5 }' | sort | uniq | wc -l
    3383

:point_up: I could throw all of these in TAG_MAP but there's information loss in this.

Elijah Rippeth

@erip

Some of the complication comes from the way tokenization is handled in the language. I'm dealing with Korean which is agglutinative; each new suffix compounds POS info, so the UD is rife with POS tags like "X+Y+Z". I don't know if this is better to capture in tokenization exceptions... but I'm at a loss. :smile:

Eli Selkin

@eselkin

Every time I build a container with spacy, no matter the version of python 3.7.3-3.8.0 it starts the build process of spaCy from scratch without using the wheels. Are the wheels not in the normal pypi index or something? I'm using the python-alpine docker base images.

Eli Selkin

@eselkin

Nevermind, turns out it's alpine, the wheels don't exist for alpine, but do for slim

Mike Beijen

@mikebeijen

Dear fellow spaCy users, I forked the spaCy repo in an effort to create an additional language. However, when I try to run one of the examples in the examples folder, I keep on getting the following error: ModuleNotFoundError: No module named 'spacy.pipeline.pipes'. Searching the web has not resulted in solving the error. Can anybody help me out? Thanks in advance!

Elijah Rippeth

@erip

@mikebeijen you need to build spacy with python setup.py build_ext --inplace

might be good to python setup.py clean first just in case.

Mike Beijen

@mikebeijen

Awesome, worked. Thx!

Dec	JAN	Feb
	07
2019	2020	2021

Where communities thrive

People

Repo info

Activity