Skip to main content

All Questions

Tagged with
2 votes
0 answers
39 views

How to apply semantic tokenize on sentence in java by NLP?

Can an NLP model be used to tokenize a sentence based on its semantic meaning? For example, for the sentence: If the driver's age is more than 20, the tokens would be: Token1: if Token2: driver age ...
Mortova's user avatar
  • 88
0 votes
0 answers
26 views

Is there a method to load caseless models to Stanford's NLP sentiment analysis?

In the Stanford documentation, the authors mention using caseless models to process case-insensitive text. Namely the ability to load the GATE Twitter POS annotator. It is a POS annotator, but it ...
Abdul M. Diaz's user avatar
0 votes
1 answer
95 views

GloVe embedding for empty string

It looks like the embedding for the empty string in the glove.twitter.27B.200d.txt file that's part of this zip file: https://nlp.stanford.edu/data/glove.twitter.27B.zip is provided on line 38523, but ...
Michael Szczepaniak's user avatar
0 votes
1 answer
219 views

Stanford Stanza sometimes splits a sentence into two sentences

I am using stanza 1.6.1. I have been experimenting with Stanza's constituency parser. In certain cases it splits a sentence into 2 Sentence objects. For example, take this sentence : Pull up Field ...
zaki41's user avatar
  • 81
0 votes
1 answer
434 views

How to make stanza lemmatizer to return just the lemma instead of a dictionary?

I'm implementing stanza's lemmatizer because it works well with spanish texts but the lemmatizer retuns a whole dictionary with ID and other characteristics I don't care about for the time being. I ...
trashparticle's user avatar
0 votes
1 answer
326 views

Word2Vec - to be trained on train data or whole data

I wish to create a word2vec model and want to train it on my local data. so, the question is, should I train word2vec model on my whole data or should I split the data into train and test and then ...
Mayur Pol's user avatar
-2 votes
1 answer
107 views

Is there a method to extract quotes and their related speakers in the French language?

Is there a method to extract quote and their related speaker with the gestion of coreference? I want in output to get a dict with [{"speaker" : , "quotes": }] and if we don’t find ...
Olivier Ringold's user avatar
0 votes
1 answer
85 views

How to get Enhanced++ dependency labels with a java command line in the terminal?

I don't really know java, but I was just trying to use the documentation of the Stanford NLP parser to get the Enhanced++ dependency labels. This is the line I ran: java -cp "*" -Xmx2g edu....
Galit's user avatar
  • 67
1 vote
0 answers
344 views

Is there a way to load Word2Vec embeddings to ChromaDB?

I want to query for similar words using ChromaDB. For example, 'great' should return all the words that are similar to 'great', in most cases, it would be synonyms. For this, I would like to upload ...
smishra's user avatar
  • 3,428
0 votes
0 answers
257 views

How to speed up Stanza lemmatizer by excluding reduntant words

Given: I have a small sample document with limited number of words as follows: d =''' I go to school by the school bus everyday with all of my best friends. There are several students who also take ...
doplano's user avatar
  • 1,601
1 vote
0 answers
234 views

Stanford CoreNLP Help -- Cannot import edu.stanford.nlp.pipeline

I am trying to build an application in eclipse IDE for my resume (my first) and have ran into a problem in my main file, where I am trying to import edu.stanford.nlp.pipeline.*; and have been playing ...
Charper's user avatar
  • 11
0 votes
1 answer
441 views

Calculating similarity score in contexto.me clone

I am currently trying to clone the popular browser game contexto.me and I am having trouble with as to how to calculate the similarity score between two words (the target word and the user inputted ...
FarajSiddique's user avatar
-1 votes
1 answer
224 views

Best libraries to classify misclassified categories?

I have a datset of over 50k rows and around 40% of the categories are misclassified categories and I want to use natural language processing to re-classify them using variables that are mostly binary ...
wageeh's user avatar
  • 84
1 vote
1 answer
657 views

My gpt2 code generates a few correct words and then goes into a loop of generating the same sequence again and again

The following gpt2 code for sentence completion generates a few good sentences and then ends in a loop of repetitive sentences. from transformers import GPT2LMHeadModel, GPT2Tokenizer ...
steve landiss's user avatar
3 votes
1 answer
713 views

How can I find the cosine similarity between two song lyrics represented as strings?

My friends and I are doing an NLP project on song recommendation. Context: We originally planned on giving the model a recommended song playlist that has the most similar lyrics based on the random ...
yyy818's user avatar
  • 33

15 30 50 per page
1
2 3 4 5
95