1

I want to delete a set of lines (globally) only if the entire pattern matches.

Pattern Description:

Line1:^[#]+ .*

Line2:^[[:space:]]*$

Line3:^-[[:space:]]*$

Line4:^[[:space:]]*$

Line5:^[#]+ .*$|^[-]+[[:space:]]*$

Note:

  1. Line3 can have space(s) after -
  2. Line2 and Line4 may have a space character or should be blank
  3. Line5, either matches ^[#]+ .*$ or ^[-]+[[:space:]]*$
  4. I don't want to delete the last line of the pattern i.e. Line5 in the pattern description.

Example:

# Body

- Inside Body

# Summary

-

# Bibliography

- Read this book

Expected output:

# Body

- Inside Body

# Bibliography

- Read this book

Note: The provided solution works, is it possible to write it more clearly as follows:

e = '(^|\n)[#]+ .*\
    \n[\t ]*\
    \n-[\t ]*\
    \n[\t ]*\
    \n([#]+ .*|[-]+[\t ]*)\n'

Also, how can we do the provided solution for multiple occurrences of the multiline pattern?

5
  • Do you know the line terminator that will be present? Also would an answer using awk (or any other test processing tool) be acceptable?
    – goodguy5
    Commented Dec 18, 2018 at 13:31
  • I would be happy if its portable to both Windows and uni, if not possible Unix would be preferable. Other scripting languages are also good like awk, python, javascript
    – Porcupine
    Commented Dec 18, 2018 at 13:33
  • Related: sed multiple lines
    – goodguy5
    Commented Dec 18, 2018 at 13:42
  • Is this document in a known format? Does it have a parser?
    – Kusalananda
    Commented Dec 19, 2018 at 13:37
  • @Kusalananda Yes I use a custom format to take notes in markdown files (data files). I have created a script file to remove unnecessary elements of the format (that have not been used) in a temporary file (copy of data files), and then render it with Pandoc.
    – Porcupine
    Commented Dec 19, 2018 at 14:23

1 Answer 1

2

A python solution, should work for python2 or 3. reads from stdin, outputs to stdout. About the only thing I did was change the expression for [[:space:]] to [\t ].

#!/usr/bin/python3

import sys
import re
e='(^|\n)[#]+ .*\n[\t ]*\n-[\t ]*\n[\t ]*\n([#]+ .*|[-]+[\t ]*)\n'
print(re.sub(e, '\\1\\2\n', sys.stdin.read()))
7
  • Clarification: Can we do inplace substitution when, read from a file?
    – Porcupine
    Commented Dec 18, 2018 at 14:34
  • linux doesn't have any primitives to do this, as the file changes length. So you need to write to a temporary file and rename. See stackoverflow.com/questions/42429320/… for examples of modules to do it.
    – icarus
    Commented Dec 18, 2018 at 14:44
  • Thanks, I did this and it works: python Code.py < ./Input.md > ./Output.md
    – Porcupine
    Commented Dec 18, 2018 at 15:04
  • Additionally, please see the Note at the end of the question.
    – Porcupine
    Commented Dec 19, 2018 at 13:05
  • I tried to do global replacement using this print(re.sub(e, '\\1\\2\n', sys.stdin.read(), flags=re.MULTILINE)) but it does only once. Could you please have a look? I am using Python3.6
    – Porcupine
    Commented Dec 19, 2018 at 13:29

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.