compare multiple files in bash script

Question

bash and shell programming is new for me.I have a few files with the extension .v.gz, in the bash command I am performing some operations, and ill store the result in the same filename with .txt extension.

For the example .txt file data as shown below, I am considering 4 files with different file names, and the same extension (Maybe the file will be 30+ also)

file_one.txt

statement_modeule_name_1 
statement_modeule_name_2
statement_modeule_name_3
statement_modeule_name_4
statement_modeule_name_5

Fetch_Data.txt

statement_modeule_name_6
statement_modeule_name_7
statement_modeule_name_2
statement_modeule_name_8
statement_modeule_name_9

onefile.txt

statement_modeule_name_10
statement_modeule_name_11
statement_modeule_name_6
statement_modeule_name_4
statement_modeule_name_14

Data_New.txt

statement_modeule_name_15
statement_modeule_name_16
statement_modeule_name_11
statement_modeule_name_5
statement_modeule_name_17

The output of the code expected in the command prompt

file_one and Fetch_Data   statement_modeule_name_2

Fetch_Data and one_file   statement_modeule_name_6

file_one and Fetch_Data   statement_modeule_name_4

file_one and Fetch_Data and file4    statement_modeule_name_5

Fetch_Data and Data_new   statement_modeule_name_11

The code that I am doing is

for file in *.v.gz;
do
  zgrep -A1 "^module" "$file" | sed -n -e 's/^\(module \)*\(.*(.*)\).*$/\2/p' | cut -f1 -d"(" > $(basename "$file" .v.gz).txt
done     #the result what I get here I mentioned in the question .txt files with data (example)

can anyone help me to complete this, I am okay with python or bash script (for bash need to remove the python extension )

Now I am generating multiple output files with .txt format in the 1st phase
now I want to compare multiple .txt files line by line and return if the same lines are present in the files with the filename, Shown in the output expected

aviro · Accepted Answer · 2022-11-17 11:46:00Z

1

$ FILES=( $(find -maxdepth 1 -type f -printf "%P\n") )

$ cat ${FILES[@]} | 
sort |
uniq -d |
xargs -r -d '\n' -I{} bash -c '
  echo $(sed "s/ / and /g" <<<$(grep -xl "{}" '"${FILES[*]}"')), {}'

The result:

file3.txt and file4.txt, modeule_name_11
file1.txt and file2.txt, modeule_name_2
file1.txt and file3.txt, modeule_name_4
file1.txt and file3.txt and file4.txt, modeule_name_5
file2.txt and file3.txt, modeule_name_6

Explenation:

FILES=( $(find -maxdepth 1 -type f -printf "%P\n") ) - $FILES would be an array holding the list of files.
cat ${FILES[@]} - print the content of the files.
sort | uniq -d - only show repeated lines (ie, lines that appear in more than one file) since there's no point to check lines that we know don't appear in other files.
xargs -r -d '\n' -I{} bash -c ' - for each line perform the following script. The separator is a new line, so it could support special characters. {} would be replaced with the line we're looking in the
grep -xl "{}" '"${FILES[*]}"' - for each line print the files (-l) that match the entire line (-x).
sed "s/ / and /g" <<<$(grep ... )) - replace the spaces between the matched files with " and ".
echo $(...), {} - print the list of matching followed by the matching line ({}).

answered Nov 17, 2022 at 11:46

aviro

6,77216 silver badges35 bronze badges

In the same folder not only .txt files other extension files are also present. how can I mention the specific file types? it's comparing all the files present in the folder
– Santhosh Nayak D.
Commented Nov 17, 2022 at 12:40
after the above-mentioned code, I pasted your code. is it correct? ......=> for file in .v.gz; do zgrep -A1 "^module" "$file" | sed -n -e 's/^(module )*(.*(.)).*$/\2/p' | cut -f1 -d"(" > $(basename "$file" .v.gz).txt done FILES=( $(find -maxdepth 1 -type f -printf "%P\n") ) cat ${FILES[@]} | sort | uniq -d | xargs -r -d '\n' -I{} bash -c ' echo $(sed "s/ / and /g" <<<$(grep -xl "{}" '"${FILES[*]}"')), {}'
– Santhosh Nayak D.
Commented Nov 17, 2022 at 12:42
@SanthoshNayakD., if you want to only check .txt files, add -name "*.txt" after the -type f in the find command. Regarding your second comment, you tell me if it works. You didn't change anything in my command, I don't know if it worked for you. By the way, next time you reply to a comment, you need to mention the person you're replying to, for instance: @aviro. That way the person you're replying to will get a notification.
– aviro
Commented Nov 17, 2022 at 12:57

Add a comment |

Kusalananda · Accepted Answer · 2022-11-17 15:53:30Z

Concatenate all data into one stream, but prefix each line by the filename. Assuming you have no tab characters in the data, we may use a tab character as the delimiter between the filename and the original data. Then group the data by the second tab-delimited field and collapse the filenames into a comma-delimited list for each group.

awk -v OFS='\t' '{ print FILENAME, $0 }' *.txt |
datamash --sort groupby 2 collapse 1

Output given the data in the question (the order of the fields may be reversed by passing it through e.g. datamash cut 2,1):

statement_modeule_name_1        file_one.txt
statement_modeule_name_10       onefile.txt
statement_modeule_name_11       Data_New.txt,onefile.txt
statement_modeule_name_14       onefile.txt
statement_modeule_name_15       Data_New.txt
statement_modeule_name_16       Data_New.txt
statement_modeule_name_17       Data_New.txt
statement_modeule_name_2        Fetch_Data.txt,file_one.txt
statement_modeule_name_3        file_one.txt
statement_modeule_name_4        file_one.txt,onefile.txt
statement_modeule_name_5        Data_New.txt,file_one.txt
statement_modeule_name_6        Fetch_Data.txt,onefile.txt
statement_modeule_name_7        Fetch_Data.txt
statement_modeule_name_8        Fetch_Data.txt
statement_modeule_name_9        Fetch_Data.txt

Alternatively, use Miller (mlr) in place of GNU datamash:

awk -v OFS='\t' '{ print FILENAME, $0 }' *.txt | 
mlr --tsv -N nest --ivar , -f 1

Output given the data in the question:

Data_New.txt    statement_modeule_name_15
Data_New.txt    statement_modeule_name_16
Data_New.txt,onefile.txt        statement_modeule_name_11
Data_New.txt,file_one.txt       statement_modeule_name_5
Data_New.txt    statement_modeule_name_17
Fetch_Data.txt,onefile.txt      statement_modeule_name_6
Fetch_Data.txt  statement_modeule_name_7
Fetch_Data.txt,file_one.txt     statement_modeule_name_2
Fetch_Data.txt  statement_modeule_name_8
Fetch_Data.txt  statement_modeule_name_9
file_one.txt    statement_modeule_name_1
file_one.txt    statement_modeule_name_3
file_one.txt,onefile.txt        statement_modeule_name_4
onefile.txt     statement_modeule_name_10
onefile.txt     statement_modeule_name_14

redseven · Accepted Answer · 2022-11-17 13:40:39Z

comm is your friend. For two files:

$ comm -12 <(sort file_one.txt) <(sort Fetch_Data.txt)
statement_modeule_name_2

For all the txt files in the current directory:

for FILE1 in *.txt; do
  for FILE2 in *.txt; do
    [ "$FILE1" == "$FILE2" ] && continue
    echo "$FILE1  $FILE2  $(comm -12 <(sort $FILE1) <(sort $FILE2))"
  done
done

ps: the solution is a bit redundant, because it will compare file1 and file2 and later file2 and file1 too...

output with your data:

Data_New.txt  Fetch_Data.txt  
Data_New.txt  file_one.txt  statement_modeule_name_5
Data_New.txt  onefile.txt  statement_modeule_name_11
Fetch_Data.txt  Data_New.txt  
Fetch_Data.txt  file_one.txt  statement_modeule_name_2
Fetch_Data.txt  onefile.txt  statement_modeule_name_6
file_one.txt  Data_New.txt  statement_modeule_name_5
file_one.txt  Fetch_Data.txt  statement_modeule_name_2
file_one.txt  onefile.txt  statement_modeule_name_4
onefile.txt  Data_New.txt  statement_modeule_name_11
onefile.txt  Fetch_Data.txt  statement_modeule_name_6
onefile.txt  file_one.txt  statement_modeule_name_4

More about comm in another topic here: Common lines between two files

set -- *.txt; for file1; do shift; for file2; do ...dostuff...; done; done (to get rid of duplicated comparisons). But note that there's nothing stopping a statement_modeule_name thing to be in more than two files) — Kusalananda, Commented Nov 17, 2022 at 13:46

Stack Exchange Network

compare multiple files in bash script

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

compare multiple files in bash script

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions