0

I have a word/text file containing,

1. 10 Liter sample of an ideal gas is expanded reversibly and isothermally at 300k from initial pressure of 10atm to final pressure of 1atm. The heat absorbed by gas during the process is approximately.
(A)15kJ
(B)23kJ
(C)32kJ
(D)50kJ

[Answer]:(B)

[QuestionType]:single_correct

2. Which of the following statement is correct

(A)Li is hander than the other alkali metals.
(B)In solvay process NH3 is recovered when the solution containing NH4Cl is treated with H2O.
(C)Na2CO3 is pearl ash.
(D)Berylium and Aluminium ions do not have strong tendency to form complexes like 

[Answer]:(C)

[QuestionType]:single_correct

I need to get each question in a separate list starting from question number to [QuestionType].

( 1. to [QuestionType])

Output :

[[1. 10 Liter sample of an ideal gas is expanded reversibly and isothermally at 300k from initial pressure of 10atm to final pressure of 1atm. The heat absorbed by gas during the process is approximately.,(A)15kJ,(B)23kJ,(C)32kJ,(D)50kJ,[Answer]:(B),[QuestionType]:single_correct],
[2. Which of the following statement is correct,(A)Li is hander than the other alkali metals.,(B)In solvay process NH3 is recovered when the solution containing NH4Cl is treated with H2O.,(C)Na2CO3 is pearl ash.,(D)Berylium and Aluminium ions do not have strong tendency to form complexes like ,[Answer]:(C),[QuestionType]:single_correct]]

I tried in for loop but cant able to get contents in between

import docx
import re
doc = docx.Document("QnA.docx")
for i in doc.paragraphs:
    if re.match(r"^[0-9]+[.]+",i.text):
        print(i.text) # matched number condition
    if re.match(r"(^\[QuestionType\])",i.text):
        print(i.text) # matched QuestionType condition

2 Answers 2

3

You might use a single pattern, starting the match with 1 or more digits and a dot.

Then continue matching all the lines that do not start with [QuestionType] and finally match that line.

^\d+\..*(?:\r?\n(?!\[QuestionType]).*)*\r?\n\[QuestionType]:.*

See a regex demo and a Python demo

For example

import re

regex = r"^\d+\..*(?:\r?\n(?!\[QuestionType]).*)*\r?\n\[QuestionType]:.*"

s = ("1. 10 Liter sample of an ideal gas is expanded reversibly and isothermally at 300k from initial pressure of 10atm to final pressure of 1atm. The heat absorbed by gas during the process is approximately.\n"
    "(A)15kJ\n"
    "(B)23kJ\n"
    "(C)32kJ\n"
    "(D)50kJ\n\n"
    "[Answer]:(B)\n\n"
    "[QuestionType]:single_correct\n\n"
    "2. Which of the following statement is correct\n\n"
    "(A)Li is hander than the other alkali metals.\n"
    "(B)In solvay process NH3 is recovered when the solution containing NH4Cl is treated with H2O.\n"
    "(C)Na2CO3 is pearl ash.\n"
    "(D)Berylium and Aluminium ions do not have strong tendency to form complexes like \n\n"
    "[Answer]:(C)\n\n"
    "[QuestionType]:single_correct")
    
print(re.findall(regex, s, re.M))

Output

['1. 10 Liter sample of an ideal gas is expanded reversibly and isothermally at 300k from initial pressure of 10atm to final pressure of 1atm. The heat absorbed by gas during the process is approximately.\n(A)15kJ\n(B)23kJ\n(C)32kJ\n(D)50kJ\n\n[Answer]:(B)\n\n[QuestionType]:single_correct', '2. Which of the following statement is correct\n\n(A)Li is hander than the other alkali metals.\n(B)In solvay process NH3 is recovered when the solution containing NH4Cl is treated with H2O.\n(C)Na2CO3 is pearl ash.\n(D)Berylium and Aluminium ions do not have strong tendency to form complexes like \n\n[Answer]:(C)\n\n[QuestionType]:single_correct']
0

First, you get content of each question using regex. After, you split \n for content of each question.

You could try following regex.

\d+\.[\s\S]+?QuestionType.*

I also try to test on python.

import re
content = '''1. 10 Liter sample of an ideal gas is expanded reversibly and isothermally at 300k from initial pressure of 10atm to final pressure of 1atm. The heat absorbed by gas during the process is approximately.
(A)15kJ
(B)23kJ
(C)32kJ
(D)50kJ

[Answer]:(B)

[QuestionType]:single_correct

2. Which of the following statement is correct

(A)Li is hander than the other alkali metals.
(B)In solvay process NH3 is recovered when the solution containing NH4Cl is treated with H2O.
(C)Na2CO3 is pearl ash.
(D)Berylium and Aluminium ions do not have strong tendency to form complexes like 

[Answer]:(C)

[QuestionType]:single_correct
'''

splitQuestion = re.findall(r"\d+\.[\s\S]+?QuestionType.*", content)

result = [];
for eachQuestion in splitQuestion:
    result.append(eachQuestion.split("\n"))

print(result)

Result.

[['1. 10 Liter sample of an ideal gas is expanded reversibly and isothermally at 300k from initial pressure of 10atm to final pressure of 1atm. The heat absorbed by gas during the process is approximately.', '(A)15kJ', '(B)23kJ', '(C)32kJ', '(D)50kJ', '', '[Answer]:(B)', '', '[QuestionType]:single_correct'], ['2. Which of the following statement is correct', '', '(A)Li is hander than the other alkali metals.', '(B)In solvay process NH3 is recovered when the solution containing NH4Cl is treated with H2O.', '(C)Na2CO3 is pearl ash.', '(D)Berylium and Aluminium ions do not have strong tendency to form complexes like ', '', '[Answer]:(C)', '', '[QuestionType]:single_correct']]

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.