
ines on develop
Fix success message [ci skip] (compare)
Hi! I have a question regarding using my own trained model (by using the sense2vec.train recipe) with prodigy for sense2vec (similarly to what is done here in the DEMO https://explosion.ai/demos/sense2vec). I've seen (and used) the example like the one below:
import spacy
from sense2vec import Sense2VecComponent
nlp = spacy.load('my_own_model')
s2v = Sense2VecComponent('/path/to/reddit_vectors-1.1.0')
nlp.add_pipe(s2v)
doc = nlp("A sentence about natural language processing.")
assert doc[3].text == 'natural language processing'
freq = doc[3]._.s2v_freq
vector = doc[3]._.s2v_vec
most_similar = doc[3]._.s2v_most_similar(3)But I'm not quite sure how to use it by just inputting a single word (i.e. software) and use the doc = nlp("A sentence about natural language processing.") line (because I have no idea what to put there 😓), just want something similar to the standalone implementation of sense2vec but with my own model.
Any ideas?
Thanks a lot 😊
Dear all,
I'm trying to use the Lemmatizer from spaCy and I believe there is a mistake or something I'm missing.
Given this code:
import spacy
nlp = spacy.load("en_core_web_sm", disable=["parser", "ner"])
doc = nlp("My name is Adrian")
print(" ".join([word.lemma_ for word in doc]))It returns "-PRON- name be adrian". Is that expected that "My" is repaced with "-PRON-"?
spaCy version is "2.3.0"
Thanks!
Hi, trying to use Japanese model from Spacy. This
import spacy
import ja_core_news_sm
nlp = spacy.load("ja_core_news_sm")is giving me ModuleNotFoundError: No module named 'sudachidict' and OSError: symbolic link privilege not held
I reinstalled in administrator mode of cmd spacy and sudachipy==0.4.5 (as suggested in spacy docs), but didn't help. How can I use this Japanese model? Thanks
is there a easy way to tokenize operators (=,<,>..) in spaCy
input looks likeset IF_FRONT CSF=1 Val = 1usally with out space before or after,
also\nare part of input
isinstance(nlp, spacy.lang). Unfortunately, when I do that now, I get an error like TypeError: isinstance() arg 2 must be a type or tuple of types
isinstance(mod, spacy.lang.en.English) returns True, but ideally I could do this test without reference to a specific language
Hello together,
I have build a spacy pipeline for binary text classification. The pipeline works fine for models that are available through the spacy library. In order to compare my existing results to other models (https://github.com/google-research/bert#pre-trained-models) I used the convert_bert_original_tf_checkpoint_to_pytorch.py script (https://github.com/huggingface/transformers/blob/master/src/transformers/convert_bert_original_tf_checkpoint_to_pytorch.py) to convert existing checkpoints to PyTorch models. After that I wanted to "load those pyTorch models from a path" (https://github.com/explosion/spacy-transformers) to my pipeline.
I am able to successfully load those pyTorch models to my pipeline, but when I start the training with the same training data, I get the error message:
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1812 # remove once script supports set_grad_enabled
1813 _no_grad_embeddingrenorm(weight, input, max_norm, norm_type)
-> 1814 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1815
1816
IndexError: index out of range in self
I honestly do no understand this error and how I can solve it. After researching this problem, I tried to adjust my config file in different manners - without success. My only track at the moment is, if I reduce the my input size below 200 words, it is working fine. I would like to compare those models with the same inputa data (with the limitation of the 512 token by BERT) without truncation.
Does someone has an idea or a clue, how I could fix that problem? Any idea, would be highly appreciated!
Thanks a lot in advance!
spacy train command to train a custom NER model. For my use case, entity-based evaluation is not relevant, I'd prefer to do token-based evaluation. Is there an easy way to use a custom Scorer with the command line, or I need to write a script for it? Many thanks!