The Wayback Machine - https://web.archive.org/web/20201023185055/https://gitter.im/explosion/spaCy

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 18:18
    github-actions[bot] unlabeled #6283
  • 18:17
    inceatakan commented #6283
  • 16:59
    github-actions[bot] unlabeled #6290
  • 16:59
    atakanokan commented #6290
  • 15:09
    github-actions[bot] unlabeled #6289
  • 15:08
    abhishekrnath commented #6289
  • 14:16
    DuyguA synchronize #6268
  • 14:16
    DuyguA synchronize #6268
  • 14:12

    ines on develop

    Fix success message [ci skip] (compare)

  • 14:05
    randomgambit commented #6294
  • 13:59
    jamesdunham commented #6292
  • 13:54
    adrianeboyd commented #6294
  • 13:54
    randomgambit opened #6298
  • 13:45
    adrianeboyd commented #6292
  • 13:21
    GruffPrys commented #3056
  • 13:13
    randomgambit commented #6294
  • 13:12
    nopper closed #6296
  • 13:12
    nopper commented #6296
  • 13:02
    github-actions[bot] unlabeled #6294
  • 13:01
    randomgambit commented #6294
HumanG33k
@HumanG33k_gitlab
Hello i just start with spaCy for start i just want to do a simple classifier that take a word and says to me which custom class it s. (word can be class 1, class 2, (class 1 or class 2)/both (as you prefer)
there is a good tutorial for this kind of stuff ?
Maybe this will help.
There are more examples here
Rahul Gupta
@rahul1990gupta
IMO, Easiest way would be to train a sklearn classifier(logistic/svm/perceptron) on spacy word embeddings.
HumanG33k
@HumanG33k_gitlab
txt i will check that
Ario K
@itsario_twitter
hello, i'm wondering if there exist any pretrain library for spacy that would link the GPE entity and distinguish between countries, cities and states?
7 replies
Felipe Adachi
@felipeadachi_gitlab

Hi!
Has anyone tried using spacy w/ GPU on Google AI Platform Notebook?

GPU is not recognized for me, even though is the same code I used to use on Google Colab without any problems.

Natalia Pattarone
@npattarone

Hi! I have a question regarding using my own trained model (by using the sense2vec.train recipe) with prodigy for sense2vec (similarly to what is done here in the DEMO https://explosion.ai/demos/sense2vec). I've seen (and used) the example like the one below:

import spacy
from sense2vec import Sense2VecComponent

nlp = spacy.load('my_own_model')
s2v = Sense2VecComponent('/path/to/reddit_vectors-1.1.0')
nlp.add_pipe(s2v)

doc = nlp("A sentence about natural language processing.")
assert doc[3].text == 'natural language processing'
freq = doc[3]._.s2v_freq
vector = doc[3]._.s2v_vec
most_similar = doc[3]._.s2v_most_similar(3)

But I'm not quite sure how to use it by just inputting a single word (i.e. software) and use the doc = nlp("A sentence about natural language processing.") line (because I have no idea what to put there 😓), just want something similar to the standalone implementation of sense2vec but with my own model.

Any ideas?
Thanks a lot 😊

Jonathan Besomi
@jonathanbesomi_twitter

Dear all,

I'm trying to use the Lemmatizer from spaCy and I believe there is a mistake or something I'm missing.

Given this code:

import spacy
nlp = spacy.load("en_core_web_sm", disable=["parser", "ner"])
doc = nlp("My name is Adrian")
print(" ".join([word.lemma_ for word in doc]))

It returns "-PRON- name be adrian". Is that expected that "My" is repaced with "-PRON-"?

spaCy version is "2.3.0"

Thanks!

1 reply
martijnvanbeers
@martijnvanbeers
jonathanbesomi_twitter: yes, that is expected
Jonathan Besomi
@jonathanbesomi_twitter
That's great; thanks @martijnvanbeers !
Francesco Bartoli
@francbartoli
👋 just to say hello
SlimakSlimak
@SlimakSlimak

Hi, trying to use Japanese model from Spacy. This

import spacy
import ja_core_news_sm
nlp = spacy.load("ja_core_news_sm")

is giving me ModuleNotFoundError: No module named 'sudachidict' and OSError: symbolic link privilege not held

I reinstalled in administrator mode of cmd spacy and sudachipy==0.4.5 (as suggested in spacy docs), but didn't help. How can I use this Japanese model? Thanks

1 reply
eh = random ()
@vbppl_twitter
Hi everyone,
I was actually trying to extract sentences from a list of sentences which indicate issue.
Something of this sorts "iam experiencing problems with my phone" "I shattered my phone screen"
eh = random ()
@vbppl_twitter
I need high precision. Recall doesn't really matter.
Can someone please help
wobweger
@wobweger

is there a easy way to tokenize operators (=,<,>..) in spaCy
input looks like

set IF_FRONT 
    CSF=1
    Val = 1

usally with out space before or after,
also \n are part of input

harnit-bakshi
@harnit-bakshi
Hi can I check what would be the best to way to mock the output of spacy when doing unit tests? Need some guidance on that as I could not find much resources online
wobweger
@wobweger

is there a easy way to tokenize operators (=,<,>..) in spaCy
input looks like

set IF_FRONT 
    CSF=1
    Val = 1

usally with out space before or after,
also \n are part of input

solution

Prasad Varade
@prasad-varade
What should be acceptable loss while training custom NER ?
h4pZ
@h4pZ
Hi, I was wondering if there is a way to get the string corresponding to a hash value if the string itself is not stored in the StringStore?
铸剑非攻
@Hao-666
Hi, I am trying to use NLP techniques to extract the calculation logics/rules from the texts in specifications/regulations. For example, given the measurement rules descriptions, I can get the calculation logics to make it understandable to computers. Are there some good ways in SpaCy to achieve it? alt
Ario K
@itsario_twitter
is it possible to have spacy load a blank model in the spacy.load() function?
instead of using spacy.blank?
also is there a default model in spacy that i don't need to download?
Jakub Richtarik
@richtarik.jakub_gitlab
Hi, is there any way to iterate through noun_chunks in czech language? I know that czech language isn't supported in current version... so I tried to install spacy_udpipe, which supports udpipe models (also czech) with spacy's functionality, but this iteration isn't working. Is it because there is no lang/cs/syntax_iterators.py? Should I create one? Thanks for any advice.
Aoi
@aanifh
Hi, is it mandatory that we use the UD tagset when POS tagging with Spacy? Or can we use an altered or custom tagset?
martijnvanbeers
@martijnvanbeers
aanifh: you mean for a language model you want to become an official one? you can do whatever you like with models you develop yourself of course
Aoi
@aanifh
@martijnvanbeers yes in theory, the (unaltered) UD tagsets isn't suitable for the language I'm working on, too Eurocentric
martijnvanbeers
@martijnvanbeers
aanifh: I'm not part of the spacy team in any way, but my guess is that for official models you're going to need a really good reason why. it's easier if all the models behave the same as much as possible
I think your best bet is to open an issue asking, and explaining really well why UD isn't sufficient for your language, what things are missing, and how the tagset you propose instead is different from UD, and how much overlap there is
Sam Hoffman
@sam-hoffman
hello! Is there a language-agnostic way to do type checking for spacy models? Ideally, I'd like to be able to do something like call isinstance(nlp, spacy.lang). Unfortunately, when I do that now, I get an error like TypeError: isinstance() arg 2 must be a type or tuple of types
on the other hand, isinstance(mod, spacy.lang.en.English) returns True, but ideally I could do this test without reference to a specific language
gullibleretorted
@gullibleretorted

Hello together,

I have build a spacy pipeline for binary text classification. The pipeline works fine for models that are available through the spacy library. In order to compare my existing results to other models (https://github.com/google-research/bert#pre-trained-models) I used the convert_bert_original_tf_checkpoint_to_pytorch.py script (https://github.com/huggingface/transformers/blob/master/src/transformers/convert_bert_original_tf_checkpoint_to_pytorch.py) to convert existing checkpoints to PyTorch models. After that I wanted to "load those pyTorch models from a path" (https://github.com/explosion/spacy-transformers) to my pipeline.

I am able to successfully load those pyTorch models to my pipeline, but when I start the training with the same training data, I get the error message:
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1812 # remove once script supports set_grad_enabled
1813 _no_grad_embeddingrenorm(weight, input, max_norm, norm_type)
-> 1814 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1815
1816

IndexError: index out of range in self

I honestly do no understand this error and how I can solve it. After researching this problem, I tried to adjust my config file in different manners - without success. My only track at the moment is, if I reduce the my input size below 200 words, it is working fine. I would like to compare those models with the same inputa data (with the limitation of the 512 token by BERT) without truncation.

Does someone has an idea or a clue, how I could fix that problem? Any idea, would be highly appreciated!

Thanks a lot in advance!

vikasmech
@vikasmech
Hi all, 1) Do we know what is the number of a parameter used in NER 2.1.8 version? 2) what is the max number of document after which adding data in training wont add value to the model (NER)?
C Swart
@swartchris8
Screenshot 2020-10-15 at 14.09.01.png
Hello does spacy NER use word vectors? So the base spacy NER input is a concat of glove vectors and other hash embeddings? Img src: https://github.com/explosion/talks/blob/master/2017-11-02_Practical-and-Effective-Neural-NER.pdf
Jack Rory Staunton
@jack-rory-staunton
anyone using spacy 3? is there a way to use spacy-transformers with actual hugging face transformer models other than en_core_web_trf? if so, the documentation is not clear on how...
1 reply
AlexSchmidke
@AlexSchmidke
Hi, in a nutshell: How can we create the proper training format for our own corpus and additionally we want to add custom entities to train our own custom (german) ner model. We are trying to train a custom NER Model on our own Data. We found and ran the example Link. The training data seems to be in the following json format: Link. We have our own corpus with custom entities. We want to bring our corpus into the necessary format. The goldparser class is gone if we understood it correctly. Using spacy.training we found docs_to_json. It seems promising but we cant manage to add our custom entities to it. Anyone can help us out here? we are quit desperate :D
2 replies
Nicolas
@ngarneau
Hi there! I am currently using the spacy train command to train a custom NER model. For my use case, entity-based evaluation is not relevant, I'd prefer to do token-based evaluation. Is there an easy way to use a custom Scorer with the command line, or I need to write a script for it? Many thanks!
erduode
@erduode
Who has data sets for FA Cup Final 2012, Super Tuesday 2012, and the US Elections 2012?
gayetr
@gayetr
I am new to NLP and using spacy for the first time for languages other than English.Can someone please help me with some examples as how to go about building applications with supported languages and no unavailable models? I am trying to use the basic codes available in Hindi for tokenization and lemmatization? Any answer will be appreciated!
铸剑非攻
@Hao-666
Hi! May I know how to define the matcher pattern of "NNP/NN NNP/NN", which means it can be "NNP NNP", "NNP NN", "NN NNP", or "NN NN". Thank you!
vkorotchenko-homex
@vkorotchenko-homex
Hi there! Does anyone know if it is possible to generate all possible parse trees for a given sentence in Spacy?