Given a unknown string with an unknown size, e.g. a ScriptBlock expression or something like:
$Text = @'
LOREM IPSUM
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.
It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
'@
I would like to summarize the string to a single line (replace all the consecutive white spaces to a single white space) and truncate it to a specific $Length:
$Length = 32
$Text = $Text -Replace '\s+', ' '
if ($Text.Length -gt $Length) { $Text = $Text.SubString(0, $Length) }
$Text
LOREM IPSUM Lorem Ipsum is simpl
The issue is that if it concerns a large string, it isn't very effective towards replacing the white spaces: it replaces all white spaces in the whole $Text string where only need to replace the first few white spaces till I have a string of the required size ($Length = 32).
Swapping the -replace and SubString operations isn't desired as well as that would return a lesser length than required or even a single space for any $Text string that starts with something like 32 white spaces.
Question:
How can I effectively merge the two (-replace and SubString) operations so that I am not replacing more white spaces than necessarily and get a string of the required length (in case the $Text string is larger than the required length)?
Update
I think I am close by using a MatchEvaluator Delegate:
$Length = 8
$TotalSpaces = 0
$Delegate = {
if ($Args[0].Index - $TotalSpaces -gt $Length) {
'{break}'
([Ref]$TotalSpaces).Value = [int]::MaxValue
}
else { ([Ref]$TotalSpaces).Value += $Args[0].Value.Length }
}
[regex]::Replace('test 0 1 2 3 4 5 6 7 8 9', '\s+', $Delegate)
test01234{break}56789
Now the question is how can I break the regex processing at the {break}?
Note that for performance reasons I really want to break out and not substitute the <regular-expression> with the found match (which makes it look like it stopped).
Match(String, Int32)overload of theRegexclass - see learn.microsoft.com/en-us/dotnet/api/… - and just pull everything up to the first whitespace match off the front of the string, then call it again starting from after the current match until you've got the desired length of output...Match(String, Int32, Int32)overload- learn.microsoft.com/en-us/dotnet/api/… - which does the same thing but stops after a given number of input characters have been read, so you don't process the entire string if there's no more whitespace...ForEach-Object,if ($imDoneReplacingCondition) { return }Match(String, Int32, Int32)and the performance is strangely bad if the input string is long - it's like it's doing something to scan the entire string regardless of how short the search window is. Try$regex = [regex] "\s+"; $text = "LOREM IPSUM" * 2000; $regex.Match($text, 0, 100)for example and then change* 2000to* 20000000. @SantiagoSquarzon's manual parsing performs much better on very long strings...