1

I have a pipe delimted file

d1000|1000
d1001|100
d1002|10
d1003|1
d1004|
d1005|

I want to modify $2 if length is less than 4 digits, and also keep empty spaces as is

so trying to do it via awk script

BEGIN { FS="|"; OFS="\t" }

{
n=1100
{ if (length($2)!=4 && length($2)>0) {$2=++n}};

print $1, $2
}

but it's printing same number over & over

d1000   1000
d1001   1101
d1002   1101
d1003   1101
d1004
d1005

whereas desired output

d1000   1000
d1001   1101
d1002   1102
d1003   1103
d1004
d1005

EDIT: here is the above code formatted legibly by gawk -o-:

BEGIN {
        FS = "|"
        OFS = "\t"
}

{
        n = 1100
        if (length($2) != 4 && length($2) > 0) {
                $2 = ++n
        }
        print $1, $2
}
1
  • 1
    It's important to format your code properly in any programming language for your own sake and for the sake of anyone else trying to understand it. I edited your question to show one way you could format awk code legibly.
    – Ed Morton
    Commented May 18, 2022 at 13:12

2 Answers 2

4

The error may be obvious with more consistent indentation:

BEGIN { FS="|"; OFS="\t" }
{
  n=1100
  {
    if (length($2)!=4 && length($2)>0) {
      $2=++n
    }
  };
  print $1, $2
}

Everything inside the outer braces gets executed, unconditionally, for each record - so the value of n is reset every line.

You should move the initialization of n to the BEGIN block:

BEGIN { FS="|"; OFS="\t"; n=1100 }
{
  {
    if (length($2)!=4 && length($2)>0) {
      $2=++n
    }
  };
  print $1, $2
}

or (more idiomatically)

BEGIN { FS="|"; OFS="\t"; n=1100 }
(length($2)!=4 && length($2)>0) {
  $2=++n
}
{
  print $1, $2
}
1

I propose this solution:

$ awk -F'|' -v OFS='\t' '$2 ~ /^[0-9]{1,3}$/ { $2 = 1100 +(++c) } { print $1,$2 }' file 
d1000   1000
d1001   1101
d1002   1102
d1003   1103
d1004
d1005
0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.