I need to use grep
and awk
in order to match two types of patterns but I cannot figure out the syntax.
My file has values such as:
sample1,gicode1,123,4541,221,3661,Sodalis sp.1
sample2,gicode1,123,0322,12,112342,Sodalis sp.2
sample3,gicode1,112,4541,00,2342,Candidatus sp.
sample4,gicode1,2341,4541,00,9606,Homo sapiens
I need to grab the count of lines that have Sodalis
. This can be in the name (so 7th column) or based on taxid since sometimes the naming that comes up is not accurate. The ID is the 6th column.
My issue is that sometimes the IDs in the 6th column can match to values in other columsn which are not ids. If I want the Sodalis
species with the ID 2342
, it shows up properly in sample 3, but it is also the scoring value in sample 4 (3rd column).
I can grab the ID in the proper column using awk -F, '$6==2342'
or simply the name using grep 'Sodalis'
but I am having an issue combining both as in the following:
cat myfile.txt | grep "Sodalis" OR awk -F, '$6==2342' | wc -l
The return should be 3, but I get either 2 (for grep
) or just 1 (for awk
). I've tried many variations of this with ||
or &
even:
cat myfile.txt | grep "Sodalis" || cat myfile.txt | awk -F, '$6==2342'
But it gives the answer 1.
I know with grep I can also use grep -E 'Sodalis|2342'
but this unfortunately returns 4 because the second pattern is matching to sample 4 where it the scoring value happens to be 2342
. Is there a way to grep
a value based on a certain column? I need the full line to also appear because I want to save those results as a separate file called Sodalis.txt
.
I need to use grep and awk
- no, you never need grep when you're using awk.