The Wayback Machine - https://web.archive.org/web/20200107004023/https://gitter.im/explosion/spaCy

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 06 22:58
    JohnGiorgi commented #4876
  • Jan 06 22:58
    JohnGiorgi commented #4876
  • Jan 06 21:54
    svlandeg commented #4880
  • Jan 06 20:59
    CatarinaPC opened #4886
  • Jan 06 17:14
    JohnGiorgi commented #4876
  • Jan 06 17:08
    AlJohri edited #4885
  • Jan 06 17:07
    AlJohri opened #4885
  • Jan 06 17:01
    timforr commented #4874
  • Jan 06 16:58
    adrianeboyd commented #4874
  • Jan 06 16:10
    tamuhey commented #4878
  • Jan 06 16:01
    timforr commented #4874
  • Jan 06 15:48
    tamuhey commented #4878
  • Jan 06 15:48
    adrianeboyd labeled #4873
  • Jan 06 15:48
    adrianeboyd commented #4873
  • Jan 06 15:36
    no-response[bot] unlabeled #4880
  • Jan 06 15:36
    CatarinaPC closed #4880
  • Jan 06 15:36
    CatarinaPC commented #4880
  • Jan 06 15:00
    adrianeboyd commented #4878
  • Jan 06 14:52
    honnibal commented #4883
  • Jan 06 14:44
    adrianeboyd opened #4884
John Anderson
@sontek

I have a token that is a left bracket, that was parsed from the sentence: [carlota] Chicas, ponedla aquí.

(Pdb) pp token
[

If I check if it is a punctuation it says yes:

(Pdb) token.is_punct
True

But then I get the part of speech and it says PROPN not PUNCT:

(Pdb) token.pos_
'PROPN'
Sam Petulla
@spetulla_twitter
@alepiscopo Did the model finish? What is your machine setup?
Carsten Schnober
@carschno
is it possible to add vectors to an existing model?
I would like to use FastText vectors in nl_core_news_sm
so I can create a new model with python3 -m spacy init-model nl ..., but then I won't have the other pipeline components like sentencizer, NER etc. in that new model
jai priyadarshi
@jaipriyadarshicode
I re-trained my custom SpaCy model? Whats the method or how should I evaluate its accuracy?
Gustavo Gonçalves
@gsgoncalves
@alepiscopo When you finished building the KB you didn't get an "The nlp object should have a pretrained ner component." error from the linker training script? If not, what were your parameters to build the KB? Thanks!
Sam Horton
@SavePointSam
I'm in a position where my company is maintaining a fork of spaCy. I'm trying to determine how the build artifacts that are posted to PyPI are generated so that we can build them ourselves. The README explains how to do a local custom build. However, I am in need of posting to a private pip registry. The best I can determine is that it has something to do with the fabfile.py file and the builds are generated and posted through the buildkite service. Can someone help me?
Sam Horton
@SavePointSam
Upon closer look, it appears spaCy builds come from this project https://github.com/explosion/wheelwright
Alessandro Piscopo
@alepiscopo
Hi @spetulla_twitter the training never finished and always ends with an error. I'm using a 4 cores VM on GCP, with 256 GB. I get the error while loading the gold_entities.json file.
@gsgoncalves I never got the error you mentioned. I used the default parameters.
asif-khan17
@asif-khan17
Hi I am new to sapcy, I want to develop a model which gives me the text similarity based on the intent.For example "I like cats" and "I hate cats" should be very dissimilar but when I am using "similarity" it gives me very high similarity.
HendricButz
@HendricButz
Hi,
if i train a model with spacy's cli.train method, a bunch of models is created. Can anyone plz tell me, what the difference between best and final model is?
Couldn't find any documentation about it. ty
Matt Maybeno
@mmaybeno
looking to create a PR but it requires cupy, anyone have suggestions on ways to mock it?
ioli
@Bipinoli
How can I split a sentence based on conjunction like 'but' using Spacy?
Jack Park
@KnowledgeGarden
@Bipinoli I did not split on conjunctions inside spacy but did so in an iterator outside after creating a masterTokens list for each sentence. In my case, it was important to locate the predicate (single-predicate sentence) in order to spot triple structures around that predicate.
Sam Petulla
@spetulla_twitter
@alepiscopo I wasn't able to train, either. Has anyone been able to train with the linking script? Curious how much RAM is needed.
Jonathan Bastnagel
@inkadnb
Hmm, I can't seem to figure out how to deal with compound words that aren't in
the model. For example bucketlist vs bucket list.
In theory the similarly for these two should be basically identical.
Is this something the tokenizer should be handling?
Jonathan Bastnagel
@inkadnb
@asif-khan17 sentiment is what you're looking for not similarity
Haris Jabbar
@MaveriQ
i am trying to download/access the vocabulary used by BERT models in spacy. Just the list of 30k tokens. The 'Vocab.to_disk()' method just gives 1100 tokens. What am I doing wrong?
sim-kon
@sim-kon
Hey spacy enthusiasts, is there an OR-operator for the matcher (except the IN-operator)? Or in other words: How can I include two words in an IN-operator? Example: I want to match also "two rabbits" in pattern = ({'LEMMA': {'IN': ["dog", "cat", "rat"]}} without creating a second pattern. Thanks
agombert
@agombert
Hello everyone, I'm just looking for a way to custom the loss function in the text classification model: I'm doing BERT distillation, and would like to add the regression part in the loss function. Any idea what part I should rewrite or maybe use a custom component instead?
Zain Muhammad
@Zainpann_twitter
Is there anyone who is having a prebuilt model for entity linking, because I dont have enough processing resources to train el model from training file+wikiKB..if yes please share with me.
Alessandro Piscopo
@alepiscopo
@spetulla_twitter I've tried with 312GB, limiting the training set to 1.5M entities, but after 4 days training and not much progress I stopped that because it was costly
It would be good to have an estimate of the time (like time by number of items in the training set) required to train an entity linking. Anybody has got anything like that?
Shivankar
@Sh20092660_gitlab
Hello, How can we append our custom NER model into the standard NER Spacy Model? When I try to append it it actually gets overwritten.
Sam Petulla
@spetulla_twitter
@alepiscopo The issue with limiting the training set is.. there might be really obvious and important entities left out.
I may try it on a large cluster, soon, will let you know. But at least SOME details would be nice..
Shivankar
@Sh20092660_gitlab
Hello, has anyone tried before the method which I posted in the conversation above?
Zain Muhammad
@Zainpann_twitter
@Sh20092660_gitlab are you using prodigy for annotating, in order to get custom NER?
Attenton
@attenton
Hello, I submit an issue #4802 about customizing tokenizer with token_match, can anyone help me with this?
Zain Muhammad
@Zainpann_twitter
how to get the description from entity linking model? Is there any think to add in this line
ents = [(e.text, e.label_, e.kbid) for e in doc.ents]
Philip McGrath
@LiberalArtist
Can anyone explain why Spacy tags the first word in this sentence as 'NNP' (proper noun) and lemmatizes it as 'Time'? I expected 'NN' (common noun) and 'time'. Sentence: 'Time is therefore that mediating order, homogeneous both with the sensible whose very style very style of dispersion and distention it is, and with the intelligible for which it is the condition of intuition since it lends itself to that intelligible determination that we call "series."'
Romulo Curty Cerqueira
@curtyc
Hi Folks!
Somebody could tell me what is the best way to call spaCy from inside Java code?
jyek
@jyek
Hi, has anyone used Spacy to analyze financial documents or news?
Shivankar
@Sh20092660_gitlab
@Zainpann_twitter No, I'm not using prodigy for annotating
Elijah Rippeth
@erip
Hi all. Has anyone here had success adding language support with a morpohologically rich language? I have plenty of UD data, but there are so many language-specific POS tags:
$ cat *.conllu | grep -v "^#" | awk '{ print $5 }' | sort | uniq | wc -l
    3383
:point_up: I could throw all of these in TAG_MAP but there's information loss in this.
Elijah Rippeth
@erip
Some of the complication comes from the way tokenization is handled in the language. I'm dealing with Korean which is agglutinative; each new suffix compounds POS info, so the UD is rife with POS tags like "X+Y+Z". I don't know if this is better to capture in tokenization exceptions... but I'm at a loss. :smile:
Eli Selkin
@eselkin
Every time I build a container with spacy, no matter the version of python 3.7.3-3.8.0 it starts the build process of spaCy from scratch without using the wheels. Are the wheels not in the normal pypi index or something? I'm using the python-alpine docker base images.
Eli Selkin
@eselkin
Nevermind, turns out it's alpine, the wheels don't exist for alpine, but do for slim
Mike Beijen
@mikebeijen
Dear fellow spaCy users, I forked the spaCy repo in an effort to create an additional language. However, when I try to run one of the examples in the examples folder, I keep on getting the following error: ModuleNotFoundError: No module named 'spacy.pipeline.pipes'. Searching the web has not resulted in solving the error. Can anybody help me out? Thanks in advance!
Elijah Rippeth
@erip
@mikebeijen you need to build spacy with python setup.py build_ext --inplace
might be good to python setup.py clean first just in case.
Mike Beijen
@mikebeijen
Awesome, worked. Thx!