2

I have this file below test.dat

        <category>Games</category>
</game>

        <category>Applications</category>
</game>

        <category>Demos</category>
</game>

        <category>Games</category>
        <description>MLB 2002 (USA)</description>
</game>

        <category>Bonus Discs</category>
</game>

        <category>Multimedia</category>
</game>

        <category>Add-Ons</category>
</game>

        <category>Educational</category>
</game>

        <category>Coverdiscs</category>
</game>

        <category>Video</category>
</game>

        <category>Audio</category>
</game>

        <category>Games</category>
</game>

How do I use Get-Content and Select-String to output the following to terminal from the input of the file above. Using the above input I need to receive this output.

            <category>Games</category>
    </game>
            <category>Games</category>
    </game>

This is the command I'm currently using but it isn't working. Get-Content '.\test.dat' | Select-String -pattern '(^\s+<category>Games<\/category>\n^\s+<\/game>$)'

4
  • It looks like xml so why not process as xml? What are you going to filter out with notmatch? What are you actually trying to achieve? Please provide expected result. Commented Apr 18, 2021 at 16:45
  • It is xml but this is the only way I'm familiar with. Basically I just need to strip out <category>Games</category> </game>
    – volitank
    Commented Apr 18, 2021 at 16:48
  • Strip it out and leave the rest of the file? Strip it out and store it exactly like you show there? What exactly do you want to end up with? Please edit your post to provide those important details. Commented Apr 18, 2021 at 17:44
  • I've simplified it for you.
    – volitank
    Commented Apr 18, 2021 at 17:58

1 Answer 1

2

First thing is you need to read it all in as one string to match across lines.

Get-Content '.\test.dat' -Raw

Since it seems you want to exclude the entry with you can use this pattern that grabs only those that don't have white space after and before

'(?s)\s+<category>Games\S+\r?\n</game>'

Select string returns a matchinfo object and you need to extract the Value property of the Matches property. You can do that a few different ways.

Get-Content '.\test.dat' -Raw |
    Select-String '(?s)\s+<category>Games\S+\r?\n</game>' -AllMatches |
        ForEach-Object Matches | ForEach-Object Value

or

$output = Get-Content '.\test.dat' -Raw |
    Select-String '(?s)\s+<category>Games\S+\r?\n</game>' -AllMatches

$output.Matches.Value

or

(Get-Content '.\test.dat' -Raw |
    Select-String '(?s)\s+<category>Games\S+\r?\n</game>' -AllMatches).Matches.Value

Output

        <category>Games</category>
</game>


        <category>Games</category>
</game>

You could also use [regex] type accelerator.

$str = Get-Content '.\test.dat' -Raw

[regex]::Matches($str,'(?s)\s+<category>Games\S+\r?\n</game>').value

EDIT

Based on your additional info, the way I understand it is you want to remove any game categories that are empty. We can simplify this greatly by using a here string.

$pattern = @'
        <category>Games</category>
    </game>

'@

The additional blank line is intentional to capture the final newline character. You could also write it like this

$pattern = @'
        <category>Games</category>
    </game>\r?\n
'@

Now if we do a replace on the pattern, you'll see what I believe is what you expect for your final result.

(Get-Content $inputfile -Raw) -replace $pattern

And to finish it off you can just put the above command inside a Set-Content command. Since the Get-Content command is enclosed in parenthesis, it is completely read into memory before the file is written to.

Set-Content -Path $inputfile -Value ((Get-Content $inputfile -Raw) -replace $pattern)

EDIT 2

Well it seems to work in ISE but not in powershell console. In case you encounter the same thing, try this.

$pattern = '(?s)\s+<category>Games</category>\r?\n\s+</game>'

Set-Content -Path $inputfile -Value ((Get-Content $inputfile -Raw) -replace $pattern)
6
  • So this is ending up to be way more complicated than I thought it should be. This works on the test file fine but the real file I want it on isn't. I've created a pastebin since the file is very large. but this is the data I"m working with pastebin.com/q33wFdAX. What is the best way to automate just removing those dual lines from the file?
    – volitank
    Commented Apr 18, 2021 at 18:58
  • Basically I wanted to use -notmatch to get everything but <category>Games</category></game> so I could just write a newfile without those lines. This is real easy in notepad++ but I was looking for a way to automate it.
    – volitank
    Commented Apr 18, 2021 at 19:18
  • But you do want to leave the game section with MLB? Commented Apr 18, 2021 at 19:22
  • Yes the game sections with data I would want to keep. If it's empty then no
    – volitank
    Commented Apr 18, 2021 at 19:28
  • 1
    Just add an extra \r?\n at the end Commented Apr 18, 2021 at 22:07

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.