Scripting for text processing: Delete a set of lines only if entire pattern matches

Question

I want to delete a set of lines (globally) only if the entire pattern matches.

Pattern Description:

Line1:^[#]+ .*

Line2:^[[:space:]]*$

Line3:^-[[:space:]]*$

Line4:^[[:space:]]*$

Line5:^[#]+ .*$|^[-]+[[:space:]]*$

Note:

Line3 can have space(s) after -
Line2 and Line4 may have a space character or should be blank
Line5, either matches ^[#]+ .*$ or ^[-]+[[:space:]]*$
I don't want to delete the last line of the pattern i.e. Line5 in the pattern description.

Example:

# Body

- Inside Body

# Summary

-

# Bibliography

- Read this book

Expected output:

# Body

- Inside Body

# Bibliography

- Read this book

Note: The provided solution works, is it possible to write it more clearly as follows:

e = '(^|\n)[#]+ .*\
    \n[\t ]*\
    \n-[\t ]*\
    \n[\t ]*\
    \n([#]+ .*|[-]+[\t ]*)\n'

Also, how can we do the provided solution for multiple occurrences of the multiline pattern?

Do you know the line terminator that will be present? Also would an answer using awk (or any other test processing tool) be acceptable? — goodguy5, Commented Dec 18, 2018 at 13:31
I would be happy if its portable to both Windows and uni, if not possible Unix would be preferable. Other scripting languages are also good like awk, python, javascript — Porcupine, Commented Dec 18, 2018 at 13:33
@Kusalananda Yes I use a custom format to take notes in markdown files (data files). I have created a script file to remove unnecessary elements of the format (that have not been used) in a temporary file (copy of data files), and then render it with Pandoc. — Porcupine, Commented Dec 19, 2018 at 14:23

icarus · Accepted Answer · 2018-12-18 14:20:46Z

2

A python solution, should work for python2 or 3. reads from stdin, outputs to stdout. About the only thing I did was change the expression for [[:space:]] to [\t ].

#!/usr/bin/python3

import sys
import re
e='(^|\n)[#]+ .*\n[\t ]*\n-[\t ]*\n[\t ]*\n([#]+ .*|[-]+[\t ]*)\n'
print(re.sub(e, '\\1\\2\n', sys.stdin.read()))

answered Dec 18, 2018 at 14:20

icarus

19k1 gold badge41 silver badges57 bronze badges

Clarification: Can we do inplace substitution when, read from a file?
– Porcupine
Commented Dec 18, 2018 at 14:34
linux doesn't have any primitives to do this, as the file changes length. So you need to write to a temporary file and rename. See stackoverflow.com/questions/42429320/… for examples of modules to do it.
– icarus
Commented Dec 18, 2018 at 14:44
Thanks, I did this and it works: python Code.py < ./Input.md > ./Output.md
– Porcupine
Commented Dec 18, 2018 at 15:04
Additionally, please see the Note at the end of the question.
– Porcupine
Commented Dec 19, 2018 at 13:05
I tried to do global replacement using this print(re.sub(e, '\\1\\2\n', sys.stdin.read(), flags=re.MULTILINE)) but it does only once. Could you please have a look? I am using Python3.6
– Porcupine
Commented Dec 19, 2018 at 13:29

| Show 2 more comments

Stack Exchange Network

Scripting for text processing: Delete a set of lines only if entire pattern matches

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

Scripting for text processing: Delete a set of lines only if entire pattern matches

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions