All Questions
255 questions
0
votes
1
answer
220
views
Stanford Stanza sometimes splits a sentence into two sentences
I am using stanza 1.6.1. I have been experimenting with Stanza's constituency parser.
In certain cases it splits a sentence into 2 Sentence objects. For example, take this sentence : Pull up Field ...
0
votes
1
answer
435
views
How to make stanza lemmatizer to return just the lemma instead of a dictionary?
I'm implementing stanza's lemmatizer because it works well with spanish texts but the lemmatizer retuns a whole dictionary with ID and other characteristics I don't care about for the time being. I ...
0
votes
0
answers
257
views
How to speed up Stanza lemmatizer by excluding reduntant words
Given:
I have a small sample document with limited number of words as follows:
d ='''
I go to school by the school bus everyday with all of my best friends.
There are several students who also take ...
0
votes
1
answer
441
views
Calculating similarity score in contexto.me clone
I am currently trying to clone the popular browser game contexto.me and I am having trouble with as to how to calculate the similarity score between two words (the target word and the user inputted ...
-1
votes
1
answer
226
views
Best libraries to classify misclassified categories?
I have a datset of over 50k rows and around 40% of the categories are misclassified categories and I want to use natural language processing to re-classify them using variables that are mostly binary ...
0
votes
2
answers
266
views
What is Stanford CoreNLP's recipe for tokenization?
Whether you're using Stanza or Corenlp (now deprecated) python wrappers, or the original Java implementation, the tokenization rules that StanfordCoreNLP follows is super hard for me to figure out ...
1
vote
1
answer
355
views
How to get original token position in string from Stanza constituency parse tree?
I am using Stanza to extract noun phrases from texts. I am using this code to extract the NPs and store them according to their depth.
nlp = stanza.Pipeline('en', tokenize_pretokenized=True)
...
1
vote
1
answer
167
views
NLP task of arranging words in the correct order?
Is there any state-of-the-art deep learning model that can acomplish the task of arranging a bunch of words in the correct order?
For example,
Input: boy that killed have must they
Expected output: ...
0
votes
1
answer
256
views
Stanford's Stanza NLP: find all words ids for a given span
I am using a Stanza pipeline that extracts both words and named entities.
The sentence.entities gives me a list of recognized named entities with their start and end characters. Here is an example:
{
...
2
votes
0
answers
225
views
Stanford Stanza NLP to networkx: superimpose NER entities onto graph of words
Here is a sample program which will take a text (example is in italian but Stanza supports many languages) and builds and displays a graph of the words (only certain Parts of Speech) and their ...
1
vote
1
answer
493
views
Obtaining data from both token and word objects in a Stanza Document / Sentence
I am using a Stanford STANZA pipeline on some (italian) text.
Problem I'm grappling with is that I need data from BOTH the Token and Word objects.
While I'm able to access one or the other separately ...
1
vote
1
answer
72
views
Retain original document element index of argument passed through sklearn's CountVectorizer() in order to access corresponding part of speech tag
I have a data frame with sentences and the respective part of speech tag for each word (Below is an extract of the data I'm working with (data taken from SNLI corpus). For each sentence in my ...
1
vote
0
answers
1k
views
Break Complex/Compound Sentences into Simple Sentences using NLP
I want to break sentences i.e complex/compound sentences, the sentences that are larger in size and consist of more than two sentences.
for eg: I like to eat apples but I hate apple juice.
Here the ...
1
vote
0
answers
90
views
NLP / ML Python: variation of topic modeling + summarization? Can someone point me in the right direction?
New to NLP and Machine learning. Wondering if someone can point me in the right direction:
I'm looking to create a function that takes 2 inputs.
-an array of strings (english sentences of varying ...
1
vote
1
answer
2k
views
Extracting country name from an address
I've a large dataset with an address column. I would like to extract the countries from the address. In many cases, the address column contains states, cities, and zip code, but the country names. ...