explosion/spaCy

User group for the spaCy Natural Language Processing tools

HumanG33k

@HumanG33k_gitlab

Hello i just start with spaCy for start i just want to do a simple classifier that take a word and says to me which custom class it s. (word can be class 1, class 2, (class 1 or class 2)/both (as you prefer)

there is a good tutorial for this kind of stuff ?

Rahul Gupta

@rahul1990gupta

https://github.com/explosion/projects/tree/master/textcat-docs-issues

Maybe this will help.

There are more examples here

https://github.com/explosion/spaCy/tree/master/examples

Rahul Gupta

@rahul1990gupta

IMO, Easiest way would be to train a sklearn classifier(logistic/svm/perceptron) on spacy word embeddings.

HumanG33k

@HumanG33k_gitlab

txt i will check that

Ario K

@itsario_twitter

hello, i'm wondering if there exist any pretrain library for spacy that would link the GPE entity and distinguish between countries, cities and states?

7 replies

Felipe Adachi

@felipeadachi_gitlab

Hi!
Has anyone tried using spacy w/ GPU on Google AI Platform Notebook?

GPU is not recognized for me, even though is the same code I used to use on Google Colab without any problems.

Natalia Pattarone

@npattarone

Hi! I have a question regarding using my own trained model (by using the sense2vec.train recipe) with prodigy for sense2vec (similarly to what is done here in the DEMO https://explosion.ai/demos/sense2vec). I've seen (and used) the example like the one below:

import spacy
from sense2vec import Sense2VecComponent

nlp = spacy.load('my_own_model')
s2v = Sense2VecComponent('/path/to/reddit_vectors-1.1.0')
nlp.add_pipe(s2v)

doc = nlp("A sentence about natural language processing.")
assert doc[3].text == 'natural language processing'
freq = doc[3]._.s2v_freq
vector = doc[3]._.s2v_vec
most_similar = doc[3]._.s2v_most_similar(3)

But I'm not quite sure how to use it by just inputting a single word (i.e. software) and use the doc = nlp("A sentence about natural language processing.") line (because I have no idea what to put there 😓), just want something similar to the standalone implementation of sense2vec but with my own model.

Any ideas?
Thanks a lot 😊

Jonathan Besomi

@jonathanbesomi_twitter

Dear all,

I'm trying to use the Lemmatizer from spaCy and I believe there is a mistake or something I'm missing.

Given this code:

import spacy
nlp = spacy.load("en_core_web_sm", disable=["parser", "ner"])
doc = nlp("My name is Adrian")
print(" ".join([word.lemma_ for word in doc]))

It returns "-PRON- name be adrian". Is that expected that "My" is repaced with "-PRON-"?

spaCy version is "2.3.0"

Thanks!

1 reply

martijnvanbeers

@martijnvanbeers

jonathanbesomi_twitter: yes, that is expected

see https://spacy.io/api/annotation/

Jonathan Besomi

@jonathanbesomi_twitter

That's great; thanks @martijnvanbeers !

Francesco Bartoli

@francbartoli

👋 just to say hello

SlimakSlimak

@SlimakSlimak

Hi, trying to use Japanese model from Spacy. This

import spacy
import ja_core_news_sm
nlp = spacy.load("ja_core_news_sm")

is giving me ModuleNotFoundError: No module named 'sudachidict' and OSError: symbolic link privilege not held

I reinstalled in administrator mode of cmd spacy and sudachipy==0.4.5 (as suggested in spacy docs), but didn't help. How can I use this Japanese model? Thanks

1 reply

eh = random ()

@vbppl_twitter

Hi everyone,

I was actually trying to extract sentences from a list of sentences which indicate issue.

Something of this sorts "iam experiencing problems with my phone" "I shattered my phone screen"

eh = random ()

@vbppl_twitter

I need high precision. Recall doesn't really matter.

Can someone please help

wobweger

@wobweger

is there a easy way to tokenize operators (=,<,>..) in spaCy
input looks like

set IF_FRONT 
    CSF=1
    Val = 1

usally with out space before or after,
also \n are part of input

harnit-bakshi

@harnit-bakshi

Hi can I check what would be the best to way to mock the output of spacy when doing unit tests? Need some guidance on that as I could not find much resources online

wobweger

@wobweger

is there a easy way to tokenize operators (=,<,>..) in spaCy
input looks like
set IF_FRONT 
    CSF=1
    Val = 1
usally with out space before or after,
also \n are part of input

solution

Prasad Varade

@prasad-varade

What should be acceptable loss while training custom NER ?

h4pZ

@h4pZ

Hi, I was wondering if there is a way to get the string corresponding to a hash value if the string itself is not stored in the StringStore?

铸剑非攻

@Hao-666

Hi, I am trying to use NLP techniques to extract the calculation logics/rules from the texts in specifications/regulations. For example, given the measurement rules descriptions, I can get the calculation logics to make it understandable to computers. Are there some good ways in SpaCy to achieve it? alt

Ario K

@itsario_twitter

is it possible to have spacy load a blank model in the spacy.load() function?

instead of using spacy.blank?

also is there a default model in spacy that i don't need to download?

Jakub Richtarik

@richtarik.jakub_gitlab

Hi, is there any way to iterate through noun_chunks in czech language? I know that czech language isn't supported in current version... so I tried to install spacy_udpipe, which supports udpipe models (also czech) with spacy's functionality, but this iteration isn't working. Is it because there is no lang/cs/syntax_iterators.py? Should I create one? Thanks for any advice.

Aoi

@aanifh

Hi, is it mandatory that we use the UD tagset when POS tagging with Spacy? Or can we use an altered or custom tagset?

martijnvanbeers

@martijnvanbeers

aanifh: you mean for a language model you want to become an official one? you can do whatever you like with models you develop yourself of course

Aoi

@aanifh

@martijnvanbeers yes in theory, the (unaltered) UD tagsets isn't suitable for the language I'm working on, too Eurocentric

martijnvanbeers

@martijnvanbeers

aanifh: I'm not part of the spacy team in any way, but my guess is that for official models you're going to need a really good reason why. it's easier if all the models behave the same as much as possible

I think your best bet is to open an issue asking, and explaining really well why UD isn't sufficient for your language, what things are missing, and how the tagset you propose instead is different from UD, and how much overlap there is

Sam Hoffman

@sam-hoffman

hello! Is there a language-agnostic way to do type checking for spacy models? Ideally, I'd like to be able to do something like call isinstance(nlp, spacy.lang). Unfortunately, when I do that now, I get an error like TypeError: isinstance() arg 2 must be a type or tuple of types

on the other hand, isinstance(mod, spacy.lang.en.English) returns True, but ideally I could do this test without reference to a specific language

gullibleretorted

@gullibleretorted

Hello together,

I have build a spacy pipeline for binary text classification. The pipeline works fine for models that are available through the spacy library. In order to compare my existing results to other models (https://github.com/google-research/bert#pre-trained-models) I used the convert_bert_original_tf_checkpoint_to_pytorch.py script (https://github.com/huggingface/transformers/blob/master/src/transformers/convert_bert_original_tf_checkpoint_to_pytorch.py) to convert existing checkpoints to PyTorch models. After that I wanted to "load those pyTorch models from a path" (https://github.com/explosion/spacy-transformers) to my pipeline.

I am able to successfully load those pyTorch models to my pipeline, but when I start the training with the same training data, I get the error message:
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1812 # remove once script supports set_grad_enabled
1813 _no_grad_embeddingrenorm(weight, input, max_norm, norm_type)
-> 1814 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1815
1816

IndexError: index out of range in self

I honestly do no understand this error and how I can solve it. After researching this problem, I tried to adjust my config file in different manners - without success. My only track at the moment is, if I reduce the my input size below 200 words, it is working fine. I would like to compare those models with the same inputa data (with the limitation of the 512 token by BERT) without truncation.

Does someone has an idea or a clue, how I could fix that problem? Any idea, would be highly appreciated!

Thanks a lot in advance!

vikasmech

@vikasmech

Hi all, 1) Do we know what is the number of a parameter used in NER 2.1.8 version? 2) what is the max number of document after which adding data in training wont add value to the model (NER)?

C Swart

@swartchris8

Hello does spacy NER use word vectors? So the base spacy NER input is a concat of glove vectors and other hash embeddings? Img src: https://github.com/explosion/talks/blob/master/2017-11-02_Practical-and-Effective-Neural-NER.pdf

Jack Rory Staunton

@jack-rory-staunton

anyone using spacy 3? is there a way to use spacy-transformers with actual hugging face transformer models other than en_core_web_trf? if so, the documentation is not clear on how...

1 reply

AlexSchmidke

@AlexSchmidke

Hi, in a nutshell: How can we create the proper training format for our own corpus and additionally we want to add custom entities to train our own custom (german) ner model. We are trying to train a custom NER Model on our own Data. We found and ran the example Link. The training data seems to be in the following json format: Link. We have our own corpus with custom entities. We want to bring our corpus into the necessary format. The goldparser class is gone if we understood it correctly. Using spacy.training we found docs_to_json. It seems promising but we cant manage to add our custom entities to it. Anyone can help us out here? we are quit desperate :D

2 replies

Nicolas

@ngarneau

Hi there! I am currently using the spacy train command to train a custom NER model. For my use case, entity-based evaluation is not relevant, I'd prefer to do token-based evaluation. Is there an easy way to use a custom Scorer with the command line, or I need to write a script for it? Many thanks!

erduode

@erduode

Who has data sets for FA Cup Final 2012, Super Tuesday 2012, and the US Elections 2012?

gayetr

@gayetr

I am new to NLP and using spacy for the first time for languages other than English.Can someone please help me with some examples as how to go about building applications with supported languages and no unavailable models? I am trying to use the basic codes available in Hindi for tokenization and lemmatization? Any answer will be appreciated!

铸剑非攻

@Hao-666

Hi! May I know how to define the matcher pattern of "NNP/NN NNP/NN", which means it can be "NNP NNP", "NNP NN", "NN NNP", or "NN NN". Thank you!

vkorotchenko-homex

@vkorotchenko-homex

Hi there! Does anyone know if it is possible to generate all possible parse trees for a given sentence in Spacy?

Sep	OCT	Nov
	23
2019	2020	2021

Where communities thrive

People

Repo info

Activity