Revisions to Extract variables from text file into array with Bash, Perl and Regex

Provided a breakdown of the code.

Source Link

edited Aug 26, 2017 at 1:45

5.3k
1
21
35

My Perl knowledge is thin but since no one else has provided a Perl answer I'll give it a go.

Pass your data in as file and it will print tab-separated lines with three values per line:

perl -e 'while (<>) { $s .= $_; } chomp $s; @arr = split(/\n{2,}/, $s); foreach my $a(@arr) { $a =~ s/Filename: ([^\n]*)\nType: ([^\n]*)\nSize: ([^\n]*)\n.*/$1\t$2\t$3\n/ || next; print "$a"; } ' infile

Result:

XXXXX   XXX     XXXX
YYYYY   YYY     YYYY

It's a bit brute-force beast that splitsbut works by splitting the input up into "paragraphs" or blocksparagraphs/blocks and then then applies your multi-line regex to each. (Each block must be separated by an empty line.) If that regex doesn't match then the block is skipped (i

Details.e. if a value is missing after Filename, Size, or Type).

while (<>) { $s .= $_; } - Slurp the input into a single string.

chomp $s - Remove trailing newline from the string.

@arr = split(/\n{2,}/, $s) - Split string on consecutive newlines. This breaks it up into paragraphs/blocks. Store the blocks in an array.

foreach my $a(@arr) - Loop over each array element (block). The next two lines of code are applied to each block.

$a =~ s/Filename: ([^\n]*)\nType: ([^\n]*)\nSize: ([^\n]*)\n.*/$1\t$2\t$3\n/ || next - Extract values from the three fields of interest. If no substitution occurs (meaning the regex doesn't match because, for example, a value is missing) then skip this block and move to the next one.

print "$a" - Print the result of the substitution: the three values separated by tabs.

Let me re-emphasize thatAgain, I don't use much Perl so there probably are surely more elegant solutions than this.

My Perl knowledge is thin but since no one else has provided a Perl answer I'll give it a go.

Pass your data in as file and it will print tab-separated lines with three values per line:

perl -e 'while (<>) { $s .= $_; } chomp $s; @arr = split(/\n{2,}/, $s); foreach my $a(@arr) { $a =~ s/Filename: ([^\n]*)\nType: ([^\n]*)\nSize: ([^\n]*)\n.*/$1\t$2\t$3\n/ || next; print "$a"; } ' infile

Result:

XXXXX   XXX     XXXX
YYYYY   YYY     YYYY

It's a brute-force beast that splits the input up into "paragraphs" or blocks and then applies your multi-line regex to each. (Each block must be separated by an empty line.) If that regex doesn't match then the block is skipped (i.e. if a value is missing after Filename, Size, or Type).

Let me re-emphasize that I don't use much Perl so there are surely more elegant solutions than this.

My Perl knowledge is thin but since no one else has provided a Perl answer I'll give it a go.

Pass your data in as file and it will print tab-separated lines with three values per line:

perl -e 'while (<>) { $s .= $_; } chomp $s; @arr = split(/\n{2,}/, $s); foreach my $a(@arr) { $a =~ s/Filename: ([^\n]*)\nType: ([^\n]*)\nSize: ([^\n]*)\n.*/$1\t$2\t$3\n/ || next; print "$a"; } ' infile

Result:

XXXXX   XXX     XXXX
YYYYY   YYY     YYYY

It's a bit brute-force but works by splitting the input up into paragraphs/blocks and then applies your multi-line regex to each.

Details...

while (<>) { $s .= $_; } - Slurp the input into a single string.

chomp $s - Remove trailing newline from the string.

@arr = split(/\n{2,}/, $s) - Split string on consecutive newlines. This breaks it up into paragraphs/blocks. Store the blocks in an array.

foreach my $a(@arr) - Loop over each array element (block). The next two lines of code are applied to each block.

$a =~ s/Filename: ([^\n]*)\nType: ([^\n]*)\nSize: ([^\n]*)\n.*/$1\t$2\t$3\n/ || next - Extract values from the three fields of interest. If no substitution occurs (meaning the regex doesn't match because, for example, a value is missing) then skip this block and move to the next one.

print "$a" - Print the result of the substitution: the three values separated by tabs.

Again, I don't use much Perl so there probably are more elegant solutions than this.

Trimmed off single trailing newline.

Source Link

edited Aug 25, 2017 at 11:39

B Layer

5.3k
1
21
35

My Perl knowledge is thin but since no one else has provided a Perl answer I'll give it a go.

Pass your data in as file and it will print tab-separated lines with three values per line:

perl -e 'while (<>) { $s .= $_; } chomp $s; @arr = split(/\n{2,}/, $s); foreach my $a(@arr) { $a =~ s/Filename: ([^\n]*)\nType: ([^\n]*)\nSize: ([^\n]*)\n.*/$1\t$2\t$3\n/ || next; print "$a"; } ' infile

Result:

XXXXX   XXX     XXXX
YYYYY   YYY     YYYY

It's a brute-force beast that splits the input up into "paragraphs" or blocks and then applies your multi-line regex to each. (Each block must be separated by an empty line.) If that regex doesn't match then the block is skipped (i.e. if a value is missing after Filename, Size, or Type).

Let me re-emphasize that I don't use much Perl so there are surely more elegant solutions than this.

My Perl knowledge is thin but since no one else has provided a Perl answer I'll give it a go.

Pass your data in as file and it will print tab-separated lines with three values per line:

perl -e 'while (<>) { $s .= $_; } @arr = split(/\n{2,}/, $s); foreach my $a(@arr) { $a =~ s/Filename: ([^\n]*)\nType: ([^\n]*)\nSize: ([^\n]*)\n.*/$1\t$2\t$3\n/ || next; print "$a"; } ' infile

Result:

XXXXX   XXX     XXXX
YYYYY   YYY     YYYY

It's a brute-force beast that splits the input up into "paragraphs" or blocks and then applies your multi-line regex to each. (Each block must be separated by an empty line.) If that regex doesn't match then the block is skipped (i.e. if a value is missing after Filename, Size, or Type).

Let me re-emphasize that I don't use much Perl so there are surely more elegant solutions than this.

My Perl knowledge is thin but since no one else has provided a Perl answer I'll give it a go.

Pass your data in as file and it will print tab-separated lines with three values per line:

perl -e 'while (<>) { $s .= $_; } chomp $s; @arr = split(/\n{2,}/, $s); foreach my $a(@arr) { $a =~ s/Filename: ([^\n]*)\nType: ([^\n]*)\nSize: ([^\n]*)\n.*/$1\t$2\t$3\n/ || next; print "$a"; } ' infile

Result:

XXXXX   XXX     XXXX
YYYYY   YYY     YYYY

It's a brute-force beast that splits the input up into "paragraphs" or blocks and then applies your multi-line regex to each. (Each block must be separated by an empty line.) If that regex doesn't match then the block is skipped (i.e. if a value is missing after Filename, Size, or Type).

Let me re-emphasize that I don't use much Perl so there are surely more elegant solutions than this.

Post Undeleted by B Layer

occurred Aug 25, 2017 at 8:43

deleted 121 characters in body

Source Link

edited Aug 25, 2017 at 8:43

B Layer

5.3k
1
21
35

My Perl knowledge is thin but since no one else has provided a Perl answer I'll give it a go.

Pass your data in as file and it will print tab-separated lines with three values per line:

perl  -nee 'if'while (/^(Type|Size|Filename):/<>) { chomp;$s s/.*:= *$_; } @arr = split(.*)/$1\t\n{2,}/, $s); print;foreach }my elsif$a(@arr) { $a =~ s/Filename: ([^\n]*)\nType: ([^\n]*)\nSize: ([^\n]*)\n.*/^$$1\t$2\t$3\n/) {|| next; print "\n";"$a"; } ' infile

Result:

XXXXX   XXX     XXXX
YYYYY   YYY     YYYY

Some notesIt's a brute-force beast that splits the input up into "paragraphs" or blocks and then applies your multi-line regex to each. (Each block must be separated by an empty line.) If that regex doesn't match then the block is skipped (i.e. if a value is missing after Filename, Size, or Type).

Rather than using a multi-line regex I'm going with single line processing. Much easier to work out, IMO.

Use -n rather than -p. These are similar in that a per-line loop is put around the input but the latter prints every line while the former only prints what you tell it to.

Each output line will have a trailing tab. It shouldn't be too hard to fix this...I'll leave it to you.

This depends on an empty line separating each block.

Let me re-emphasize that I don't use much Perl so there may very well be more elegant solutions than this.

Let me re-emphasize that I don't use much Perl so there are surely more elegant solutions than this.

My Perl knowledge is thin but since no one else has provided a Perl answer I'll give it a go.

Pass your data in as file and it will print tab-separated lines with three values per line:

perl  -ne 'if (/^(Type|Size|Filename):/) { chomp; s/.*: *(.*)/$1\t/; print; } elsif (/^$/) { print "\n"; }' infile

Result:

XXXXX   XXX     XXXX
YYYYY   YYY     YYYY

Some notes...

Rather than using a multi-line regex I'm going with single line processing. Much easier to work out, IMO.

Use -n rather than -p. These are similar in that a per-line loop is put around the input but the latter prints every line while the former only prints what you tell it to.

Each output line will have a trailing tab. It shouldn't be too hard to fix this...I'll leave it to you.

This depends on an empty line separating each block.

Let me re-emphasize that I don't use much Perl so there may very well be more elegant solutions than this.

My Perl knowledge is thin but since no one else has provided a Perl answer I'll give it a go.

Pass your data in as file and it will print tab-separated lines with three values per line:

perl -e 'while (<>) { $s .= $_; } @arr = split(/\n{2,}/, $s); foreach my $a(@arr) { $a =~ s/Filename: ([^\n]*)\nType: ([^\n]*)\nSize: ([^\n]*)\n.*/$1\t$2\t$3\n/ || next; print "$a"; } ' infile

Result:

XXXXX   XXX     XXXX
YYYYY   YYY     YYYY

It's a brute-force beast that splits the input up into "paragraphs" or blocks and then applies your multi-line regex to each. (Each block must be separated by an empty line.) If that regex doesn't match then the block is skipped (i.e. if a value is missing after Filename, Size, or Type).

Let me re-emphasize that I don't use much Perl so there are surely more elegant solutions than this.

Post Deleted by B Layer

occurred Aug 25, 2017 at 8:05

Added explanatory notes.

Source Link

edited Aug 25, 2017 at 7:41

B Layer

5.3k
1
21
35

Loading

Source Link

answered Aug 25, 2017 at 7:22

B Layer

5.3k
1
21
35

Loading

Stack Exchange Network

Return to Answer