I'm looking for a way to strip all comments from a file. There are various ways to do comments, but I'm only interested in the simple # form comments. Reason is that I only use <# #> for in-function .SYNOPSIS which is functional code as opposed to just a comment, so I want to keep those.
EDIT: I have updated this question using the helpful answers below.
So there are only a couple of scenarios that I need:
a) whole line comments with # at start of line (or possibly with white-space before. i.e. regex of ^\s*# seems to work.
b) with some code at the start of a line, then a comment at the end of the line.
I want to avoid stripping lines that have e.g. Write-Host "#####" but I think this is covered in the code that I have.
I was able to remove end-of-line comments with a split as I couldn't work out how to do it with regex, does anyone know a way to achieve that with regex?
The split was not ideal as a <# on a line would be removed by the -split but I've fixed that by splitting on " #". This is not perfect but might be good enough - maybe a more reliable way with regex might exist?
When I do the below against my 7,000 line long script, it works(!) and strips a huge amount of comments, BUT, the output file is almost doubled in size(!?) from 400kb to about 700kb. Does anyone understand why that happens and how to prevent that (is it something to do with BOM's or Unicode or things like that? Out-File seems to really balloon the file-size!)
$x = Get-Content ".\myscript.ps1" # $x is an array, not a string
$out = ".\myscript.ps1"
$x = $x -split "[\r\n]+" # Remove all consecutive line-breaks, in any format '-split "\r?\n|\r"' would just do line by line
$x = $x | ? { $_ -notmatch "^\s*$" } # Remove empty lines
$x = $x | ? { $_ -notmatch "^\s*#" } # Remove all lines starting with # including with whitespace before
$x = $x | % { ($_ -split " #")[0] } # Remove end of line comments
$x = ($x -replace $regex).Trim() # Remove whitespace only at start and end of line
$x | Out-File $out
# $x | more
out-filedefaults to encoding UTF16-LE, probably with BOM. You could trySet-Contentinstead, which defaults to ANSI encoding. For either command, you can use-Encodingparameter. The UTF8 encoding will have BOM with those commands.Set-Contentthe file size went down to 220k instead of 700k. I've never gotten my head around the use of these various encodings and why some are so bloated ... Thanks.$regexdefined?