I have a fastq file. I will explain what it is. It is something like this
@SRR1024120.7 DBRHHJN1:259:D0PM7ACXX:1:1101:1386:1189 length=100
GATACAGGATGCCTGGGTCTAGGCTGTGTGACCTTGGGCCAGTTCCTCTC
+SRR1024120.7 DBRHHJN1:259:D0PM7ACXX:1:1101:1386:1189 length=100
DDDFFDDBGFEHEHGIGC9F>HG9EH8?DF4?:DF<?3:D?DHIGGDDFH
@SRR1024120.25 DBRHHJN1:259:D0PM7ACXX:1:1101:1752:1149 length=100
CTGCTGCTCATGCTCAT
+SRR1024120.25 DBRHHJN1:259:D0PM7ACXX:1:1101:1752:1149 length=100
BDDDDD<<CC:C+AFFE
@SRR1024120.42 DBRHHJN1:259:D0PM7ACXX:1:1101:2482:1096 length=100
AGCGTGTGCCACCCTACGCCGGC
+SRR1024120.42 DBRHHJN1:259:D0PM7ACXX:1:1101:2482:1096 length=100
DD>DAA@AA@@?2C8AB)?@:DD
@SRR1024120.1 DBRHHJN1:259:D0PM7ACXX:1:1101:1200:1120 length=100
AGACAGAAGGGGAGTACAGCTCTCTGGAACATGAGAGTGCAAGGGGTTGAGTGTTT
+SRR1024120.1 DBRHHJN1:259:D0PM7ACXX:1:1101:1200:1120 length=100
DDDFFFCFGEHI@CGFADFGCCFFGHFGCFFFHGGDGHIFHDFGGI<BF=DHIHHH
Now 4 lines correspond to 1 read so
@SRR1024120.7 DBRHHJN1:259:D0PM7ACXX:1:1101:1386:1189 length=100
GATACAGGATGCCTGGGTCTAGGCTGTGTGACCTTGGGCCAGTTCCTCTC
+SRR1024120.7 DBRHHJN1:259:D0PM7ACXX:1:1101:1386:1189 length=100
DDDFFDDBGFEHEHGIGC9F>HG9EH8?DF4?:DF<?3:D?DHIGGDDFH
corresponds to 1 read which is GATACAGGATGCCTGGGTCTAGGCTGTGTGACCTTGGGCCAGTTCCTCTC
I showed you the fastq file above. What i want to do is I want to extract only those reads in which length of the read seq is <= 25, So my output should be
@SRR1024120.25 DBRHHJN1:259:D0PM7ACXX:1:1101:1752:1149 length=100
CTGCTGCTCATGCTCAT
+SRR1024120.25 DBRHHJN1:259:D0PM7ACXX:1:1101:1752:1149 length=100
BDDDDD<<CC:C+AFFE
@SRR1024120.42 DBRHHJN1:259:D0PM7ACXX:1:1101:2482:1096 length=100
AGCGTGTGCCACCCTACGCCGGC
+SRR1024120.42 DBRHHJN1:259:D0PM7ACXX:1:1101:2482:1096 length=100
DD>DAA@AA@@?2C8AB)?@:DD
I want to use awk for this purpose.
I tried something like this
awk 'NR % 2 == 0 {if(length($1) <= 25) print $0}; NR % 2 == 1' test.fastq
BUT this prints something like this
@SRR1024120.7 DBRHHJN1:259:D0PM7ACXX:1:1101:1386:1189 length=100
+SRR1024120.7 DBRHHJN1:259:D0PM7ACXX:1:1101:1386:1189 length=100
@SRR1024120.25 DBRHHJN1:259:D0PM7ACXX:1:1101:1752:1149 length=100
CTGCTGCTCATGCTCAT
+SRR1024120.25 DBRHHJN1:259:D0PM7ACXX:1:1101:1752:1149 length=100
BDDDDD<<CC:C+AFFE
@SRR1024120.42 DBRHHJN1:259:D0PM7ACXX:1:1101:2482:1096 length=100
AGCGTGTGCCACCCTACGCCGGC
+SRR1024120.42 DBRHHJN1:259:D0PM7ACXX:1:1101:2482:1096 length=100
DD>DAA@AA@@?2C8AB)?@:DD
@SRR1024120.1 DBRHHJN1:259:D0PM7ACXX:1:1101:1200:1120 length=100
+SRR1024120.1 DBRHHJN1:259:D0PM7ACXX:1:1101:1200:1120 length=100
Clearly I don't want
@SRR1024120.7 DBRHHJN1:259:D0PM7ACXX:1:1101:1386:1189 length=100
+SRR1024120.7 DBRHHJN1:259:D0PM7ACXX:1:1101:1386:1189 length=100
@SRR1024120.1 DBRHHJN1:259:D0PM7ACXX:1:1101:1200:1120 length=100
+SRR1024120.1 DBRHHJN1:259:D0PM7ACXX:1:1101:1200:1120 length=100
in my output.
Any help would be appreciated
Thanks