0

I am using stanza 1.6.1. I have been experimenting with Stanza's constituency parser.

In certain cases it splits a sentence into 2 Sentence objects. For example, take this sentence : Pull up Field with low precision.

It splits it into 2 sentences internally (Pull up and Field with low precision) and so the constituency parser output comes out as 2 trees (one for each sentence).

Changing "Field" to lowercase in above sentence makes Stanza treat it as one sentence and I get one tree representation (as expected) as constituency output.

Is there some way to make Stanza consider this as one sentence apart from string manipulation techniques like converting to lowercase? Or is there a case insensitive model that I could use?

1 Answer 1

1

The issue seems to be particularly related to the older versions including 1.6.1 as reported by other users [1], [2]. I can reproduce your issue with:

doc = nlp("Pull up Field with low precision") 
for i, sentence in enumerate(doc.sentences):
    print(f'====== Sentence {i+1} tokens =======')
    print(*[f'id: {token.id}\ttext: {token.text}' for token in sentence.tokens], sep='\n')

which prints:

====== Sentence 1 tokens =======
id: (1,)    text: Pull
id: (2,)    text: up
====== Sentence 2 tokens =======
id: (1,)    text: Field
id: (2,)    text: with
id: (3,)    text: low
id: (4,)    text: precision

Solution: However, the new release of the library 1.7.0 does not seem to have this problem. Just install with:

pip install stanza # should install v1.7.0

and then test it via:

import stanza
stanza.download('en') # to download the default English language package
nlp = stanza.Pipeline('en') # to initialize the pipeline

doc = nlp("Pull up Field with low precision.")
for i, sentence in enumerate(doc.sentences):
    print(f'====== Sentence {i+1} tokens =======')
    print(*[f'id: {token.id}\ttext: {token.text}' for token in sentence.tokens], sep='\n')

which prints out:

====== Sentence 1 tokens =======
id: (1,)    text: Pull
id: (2,)    text: up
id: (3,)    text: Field
id: (4,)    text: with
id: (5,)    text: low
id: (6,)    text: precision
id: (7,)    text: .

You can also visualize the Constituency Parse at stanza.run enter image description here

1
  • I tried Stanza 1.7.0. The above sentence works. But some sentences are still getting broken down. For example: 1. Get tables with rating > 3 in snowflake 2. datasets categorized under oracle Resource type that have been created in the last two months
    – zaki41
    Commented Jan 19, 2024 at 16:16

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.