Skip to main content

All Questions

Tagged with
0 votes
1 answer
220 views

Stanford Stanza sometimes splits a sentence into two sentences

I am using stanza 1.6.1. I have been experimenting with Stanza's constituency parser. In certain cases it splits a sentence into 2 Sentence objects. For example, take this sentence : Pull up Field ...
zaki41's user avatar
  • 81
0 votes
1 answer
435 views

How to make stanza lemmatizer to return just the lemma instead of a dictionary?

I'm implementing stanza's lemmatizer because it works well with spanish texts but the lemmatizer retuns a whole dictionary with ID and other characteristics I don't care about for the time being. I ...
trashparticle's user avatar
0 votes
0 answers
257 views

How to speed up Stanza lemmatizer by excluding reduntant words

Given: I have a small sample document with limited number of words as follows: d =''' I go to school by the school bus everyday with all of my best friends. There are several students who also take ...
doplano's user avatar
  • 1,601
0 votes
1 answer
441 views

Calculating similarity score in contexto.me clone

I am currently trying to clone the popular browser game contexto.me and I am having trouble with as to how to calculate the similarity score between two words (the target word and the user inputted ...
FarajSiddique's user avatar
-1 votes
1 answer
226 views

Best libraries to classify misclassified categories?

I have a datset of over 50k rows and around 40% of the categories are misclassified categories and I want to use natural language processing to re-classify them using variables that are mostly binary ...
wageeh's user avatar
  • 84
0 votes
2 answers
266 views

What is Stanford CoreNLP's recipe for tokenization?

Whether you're using Stanza or Corenlp (now deprecated) python wrappers, or the original Java implementation, the tokenization rules that StanfordCoreNLP follows is super hard for me to figure out ...
lrthistlethwaite's user avatar
1 vote
1 answer
355 views

How to get original token position in string from Stanza constituency parse tree?

I am using Stanza to extract noun phrases from texts. I am using this code to extract the NPs and store them according to their depth. nlp = stanza.Pipeline('en', tokenize_pretokenized=True) ...
kachap's user avatar
  • 11
1 vote
1 answer
167 views

NLP task of arranging words in the correct order?

Is there any state-of-the-art deep learning model that can acomplish the task of arranging a bunch of words in the correct order? For example, Input: boy that killed have must they Expected output: ...
LoUso DeBasura's user avatar
0 votes
1 answer
256 views

Stanford's Stanza NLP: find all words ids for a given span

I am using a Stanza pipeline that extracts both words and named entities. The sentence.entities gives me a list of recognized named entities with their start and end characters. Here is an example: { ...
Robert Alexander's user avatar
2 votes
0 answers
225 views

Stanford Stanza NLP to networkx: superimpose NER entities onto graph of words

Here is a sample program which will take a text (example is in italian but Stanza supports many languages) and builds and displays a graph of the words (only certain Parts of Speech) and their ...
Robert Alexander's user avatar
1 vote
1 answer
493 views

Obtaining data from both token and word objects in a Stanza Document / Sentence

I am using a Stanford STANZA pipeline on some (italian) text. Problem I'm grappling with is that I need data from BOTH the Token and Word objects. While I'm able to access one or the other separately ...
Robert Alexander's user avatar
1 vote
1 answer
72 views

Retain original document element index of argument passed through sklearn's CountVectorizer() in order to access corresponding part of speech tag

I have a data frame with sentences and the respective part of speech tag for each word (Below is an extract of the data I'm working with (data taken from SNLI corpus). For each sentence in my ...
OLGJ's user avatar
  • 452
1 vote
0 answers
1k views

Break Complex/Compound Sentences into Simple Sentences using NLP

I want to break sentences i.e complex/compound sentences, the sentences that are larger in size and consist of more than two sentences. for eg: I like to eat apples but I hate apple juice. Here the ...
DevPy's user avatar
  • 497
1 vote
0 answers
90 views

NLP / ML Python: variation of topic modeling + summarization? Can someone point me in the right direction?

New to NLP and Machine learning. Wondering if someone can point me in the right direction: I'm looking to create a function that takes 2 inputs. -an array of strings (english sentences of varying ...
dv151's user avatar
  • 115
1 vote
1 answer
2k views

Extracting country name from an address

I've a large dataset with an address column. I would like to extract the countries from the address. In many cases, the address column contains states, cities, and zip code, but the country names. ...
kaloon's user avatar
  • 177

15 30 50 per page
1
2 3 4 5
17