0

I want to check if each .bam file is accompanied with a .bai file. So if clean_xyz_1.sorted.bam is present clean_xyx_1.sorted.bam.bai should also be present. Each file has a variable string in the middle (xyz). I want to check multiple folders to make sure both files are present. If both files are not present, I want to run a command. However, I have not been able to check for both files in multiple directories. Here is what I have tried:

dirs=(*/)
clean="clean_"
sorted="_1.sorted.bam"

for i in "$dirs"/"$clean"*"$sorted"*; do
  if [[ ! -e "$i".bai ]]; then
  samtools index "$i"
  fi
done

The command works fine and creates a '.bai' file. However, it only opens the first directory. Is there a way to expand all directories?

1
  • "if 1.bam is present 1.bam.bai should also be present" - your example code appears only to consider .bam files matching the pattern */clean_*_1.bam but the description only refers to *.bam files. Which is correct? Commented Jul 6, 2023 at 13:27

3 Answers 3

1
dir=(*/)

Creates an array of the directory in the current working directory, to iterate an array you should use ${dir[@]} instead of $dir which will only print the first element.

"$dirs"/"$clean"*"$sorted"*

Would also match .bai files. This is probably unwanted behaviour. Hence I suggest using */"$clean"*"$sorted" as the glob of the for loop.


Thus I propose this change

shopt -s nullglob
clean="clean_"        
sorted="_1.sorted.bam"
                                          
for i in */"$clean"*"$sorted"; do 
  if [[ ! -e "$i".bai ]]; then
    samtools index "$i"
  fi
done
1

With zsh, you'd do:

dirs=( *(N/) )
prefix=clean_
suffix=_1.sorted.bam

for file ( $^dirs/$prefix*$suffix(N) )
  [[ -e $file.bai ]] || samtools index $file
0

Search through dir1, dir2, ... for all .bam files, printing the names of .bai files that should exist but are missing:

find dir1 dir2 ... -type f -name '*.bam' -print | \
while read name ; do
  bai=${name%.bam}.bai
  [ -f "$bai" ] || printf "missing %s\n" "$bai"
done

This assumes you don't have paths with newlines in them, so find outputs one complete .bam pathname per line.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.