2

Using powershell, but open to other potential solutions....

I have a long string. I need to replace several sequences of characters by position in that string with a mask character (period or space). I don't know what those characters are going to be, but I know they need to be something else. I have written code using mid and iterating through the string using mid and position numbers, but that is a bit cumbersome and wondering if there is a faster/more elegant method.

Example: Given the 2 strings:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
12345678901234567890123456

I want to replace characters 2-4, 8-9, 16-22, & 23 with ., yielding:

A...EFGH..KLMNOP.....VWX.Z
1...5678..123456.....234.6

I can do that with a series of MID's, but I was just wanting to know if there were some sort of faster masking function to make this happen. I have to do this through millions of rows and second count.

3 Answers 3

3

Try this:

$regex = [regex]'(.).{3}(.{4}).{2}(.{6}).{5}(.{3}).(.+)'
$replace = '$1...$2..$3.....$4.$5'

('ABCDEFGHIJKLMNOPQRSTUVWXYZ',
 '12345678901234567890123456') -Replace $regex,$replace

A...EFGH..KLMNOP.....VWX.Z
1...5678..123456.....234.6

The -replace operator is slower than string.replace() for a single operation, but has the advantage of being able to operate on an array of strings, which is faster than the string method plus a foreach loop.

Here's a sample implementation (requires V4):

$regex =  [regex]'(.).{3}(.{4}).{2}(.{6}).{5}(.{3}).(.+)'
$replace = '$1...$2..$3.....$4.$5'

filter fix-file {
 $_ -replace $regex,$replace | 
 add-content "c:\mynewfiles\$($file.name)"
}

get-childitem c:\myfiles\*.txt -PipelineVariable file |
 get-content -ReadCount 1000 | fix-file 

If you want to use the mask method, you can generate $regex and $replace from that:

$mask  = '-...----..------.....---.-'

 $regex = [regex]($mask -replace '(-+)','($1)').replace('-','.')

 $replace = 
 ([char[]]($mask -replace '-+','-') |
  foreach {$i=1}{if ($_ -eq '.'){$_} else {'$'+$i++}} {}) -join ''

$regex.ToString()
$replace

(.)...(....)..(......).....(...).(.)
$1...$2..$3.....$4.$5
Sign up to request clarification or add additional context in comments.

Comments

2

Here another approach:

C:\PS> $mask ="-...----..------.....---.-"
C:\PS> ([char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ' | % {$i=0}{if ($mask[$i++] -eq '-') {$_} else {'.'}}) -join ''

A...EFGH..KLMNOP.....VWX.Z

And if we are going to take advantage of V4 features :-), try this:

C:\PS> $i=0;([char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ').Foreach({if ($mask[$i++] -eq '-') {$_} else {'.'}}) -join ''

3 Comments

Good lord that seems overly complicated. By the way, you can also create a char array containing A to Z with the following: [char[]]([char]'A'..[char]'Z')
Really liked the explicit mask. Made a variation on this using string formatting.
Yeah I didn't think it was that complicated. A simple loop over each character and test for mask or not seems more like CS 101. :-)
2

Here yet another approach:

C:\PS> $mask = "{0}...{4}{5}{6}{7}..{10}{11}{12}{13}{14}{15}.....{21}{22}{23}.{25}"
C:\PS> $singlecharstrings = [string[]][char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
C:\PS> $mask -f $singlecharstrings

A...EFGH..KLMNOP.....VWX.Z

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.