0

bash and shell programming is new for me.I have a few files with the extension .v.gz, in the bash command I am performing some operations, and ill store the result in the same filename with .txt extension.

For the example .txt file data as shown below, I am considering 4 files with different file names, and the same extension (Maybe the file will be 30+ also)

file_one.txt

statement_modeule_name_1 
statement_modeule_name_2
statement_modeule_name_3
statement_modeule_name_4
statement_modeule_name_5

Fetch_Data.txt

statement_modeule_name_6
statement_modeule_name_7
statement_modeule_name_2
statement_modeule_name_8
statement_modeule_name_9

onefile.txt

statement_modeule_name_10
statement_modeule_name_11
statement_modeule_name_6
statement_modeule_name_4
statement_modeule_name_14

Data_New.txt

statement_modeule_name_15
statement_modeule_name_16
statement_modeule_name_11
statement_modeule_name_5
statement_modeule_name_17

The output of the code expected in the command prompt

file_one and Fetch_Data   statement_modeule_name_2

Fetch_Data and one_file   statement_modeule_name_6

file_one and Fetch_Data   statement_modeule_name_4

file_one and Fetch_Data and file4    statement_modeule_name_5

Fetch_Data and Data_new   statement_modeule_name_11

The code that I am doing is

for file in *.v.gz;
do
  zgrep -A1 "^module" "$file" | sed -n -e 's/^\(module \)*\(.*(.*)\).*$/\2/p' | cut -f1 -d"(" > $(basename "$file" .v.gz).txt
done     #the result what I get here I mentioned in the question .txt files with data (example)

can anyone help me to complete this, I am okay with python or bash script (for bash need to remove the python extension )

  • Now I am generating multiple output files with .txt format in the 1st phase
  • now I want to compare multiple .txt files line by line and return if the same lines are present in the files with the filename, Shown in the output expected

3 Answers 3

1
$ FILES=( $(find -maxdepth 1 -type f -printf "%P\n") )

$ cat ${FILES[@]} | 
sort |
uniq -d |
xargs -r -d '\n' -I{} bash -c '
  echo $(sed "s/ / and /g" <<<$(grep -xl "{}" '"${FILES[*]}"')), {}'

The result:

file3.txt and file4.txt, modeule_name_11
file1.txt and file2.txt, modeule_name_2
file1.txt and file3.txt, modeule_name_4
file1.txt and file3.txt and file4.txt, modeule_name_5
file2.txt and file3.txt, modeule_name_6

Explenation:

  • FILES=( $(find -maxdepth 1 -type f -printf "%P\n") ) - $FILES would be an array holding the list of files.
  • cat ${FILES[@]} - print the content of the files.
  • sort | uniq -d - only show repeated lines (ie, lines that appear in more than one file) since there's no point to check lines that we know don't appear in other files.
  • xargs -r -d '\n' -I{} bash -c ' - for each line perform the following script. The separator is a new line, so it could support special characters. {} would be replaced with the line we're looking in the
  • grep -xl "{}" '"${FILES[*]}"' - for each line print the files (-l) that match the entire line (-x).
  • sed "s/ / and /g" <<<$(grep ... )) - replace the spaces between the matched files with " and ".
  • echo $(...), {} - print the list of matching followed by the matching line ({}).
3
  • In the same folder not only .txt files other extension files are also present. how can I mention the specific file types? it's comparing all the files present in the folder Commented Nov 17, 2022 at 12:40
  • after the above-mentioned code, I pasted your code. is it correct? ......=> for file in .v.gz; do zgrep -A1 "^module" "$file" | sed -n -e 's/^(module )*(.*(.)).*$/\2/p' | cut -f1 -d"(" > $(basename "$file" .v.gz).txt done FILES=( $(find -maxdepth 1 -type f -printf "%P\n") ) cat ${FILES[@]} | sort | uniq -d | xargs -r -d '\n' -I{} bash -c ' echo $(sed "s/ / and /g" <<<$(grep -xl "{}" '"${FILES[*]}"')), {}' Commented Nov 17, 2022 at 12:42
  • @SanthoshNayakD., if you want to only check .txt files, add -name "*.txt" after the -type f in the find command. Regarding your second comment, you tell me if it works. You didn't change anything in my command, I don't know if it worked for you. By the way, next time you reply to a comment, you need to mention the person you're replying to, for instance: @aviro. That way the person you're replying to will get a notification.
    – aviro
    Commented Nov 17, 2022 at 12:57
1

Concatenate all data into one stream, but prefix each line by the filename. Assuming you have no tab characters in the data, we may use a tab character as the delimiter between the filename and the original data. Then group the data by the second tab-delimited field and collapse the filenames into a comma-delimited list for each group.

awk -v OFS='\t' '{ print FILENAME, $0 }' *.txt |
datamash --sort groupby 2 collapse 1

Output given the data in the question (the order of the fields may be reversed by passing it through e.g. datamash cut 2,1):

statement_modeule_name_1        file_one.txt
statement_modeule_name_10       onefile.txt
statement_modeule_name_11       Data_New.txt,onefile.txt
statement_modeule_name_14       onefile.txt
statement_modeule_name_15       Data_New.txt
statement_modeule_name_16       Data_New.txt
statement_modeule_name_17       Data_New.txt
statement_modeule_name_2        Fetch_Data.txt,file_one.txt
statement_modeule_name_3        file_one.txt
statement_modeule_name_4        file_one.txt,onefile.txt
statement_modeule_name_5        Data_New.txt,file_one.txt
statement_modeule_name_6        Fetch_Data.txt,onefile.txt
statement_modeule_name_7        Fetch_Data.txt
statement_modeule_name_8        Fetch_Data.txt
statement_modeule_name_9        Fetch_Data.txt

Alternatively, use Miller (mlr) in place of GNU datamash:

awk -v OFS='\t' '{ print FILENAME, $0 }' *.txt | 
mlr --tsv -N nest --ivar , -f 1

Output given the data in the question:

Data_New.txt    statement_modeule_name_15
Data_New.txt    statement_modeule_name_16
Data_New.txt,onefile.txt        statement_modeule_name_11
Data_New.txt,file_one.txt       statement_modeule_name_5
Data_New.txt    statement_modeule_name_17
Fetch_Data.txt,onefile.txt      statement_modeule_name_6
Fetch_Data.txt  statement_modeule_name_7
Fetch_Data.txt,file_one.txt     statement_modeule_name_2
Fetch_Data.txt  statement_modeule_name_8
Fetch_Data.txt  statement_modeule_name_9
file_one.txt    statement_modeule_name_1
file_one.txt    statement_modeule_name_3
file_one.txt,onefile.txt        statement_modeule_name_4
onefile.txt     statement_modeule_name_10
onefile.txt     statement_modeule_name_14
0
0

comm is your friend. For two files:

$ comm -12 <(sort file_one.txt) <(sort Fetch_Data.txt)
statement_modeule_name_2

For all the txt files in the current directory:

for FILE1 in *.txt; do
  for FILE2 in *.txt; do
    [ "$FILE1" == "$FILE2" ] && continue
    echo "$FILE1  $FILE2  $(comm -12 <(sort $FILE1) <(sort $FILE2))"
  done
done

ps: the solution is a bit redundant, because it will compare file1 and file2 and later file2 and file1 too...

output with your data:

Data_New.txt  Fetch_Data.txt  
Data_New.txt  file_one.txt  statement_modeule_name_5
Data_New.txt  onefile.txt  statement_modeule_name_11
Fetch_Data.txt  Data_New.txt  
Fetch_Data.txt  file_one.txt  statement_modeule_name_2
Fetch_Data.txt  onefile.txt  statement_modeule_name_6
file_one.txt  Data_New.txt  statement_modeule_name_5
file_one.txt  Fetch_Data.txt  statement_modeule_name_2
file_one.txt  onefile.txt  statement_modeule_name_4
onefile.txt  Data_New.txt  statement_modeule_name_11
onefile.txt  Fetch_Data.txt  statement_modeule_name_6
onefile.txt  file_one.txt  statement_modeule_name_4

More about comm in another topic here: Common lines between two files

1
  • set -- *.txt; for file1; do shift; for file2; do ...dostuff...; done; done (to get rid of duplicated comparisons). But note that there's nothing stopping a statement_modeule_name thing to be in more than two files)
    – Kusalananda
    Commented Nov 17, 2022 at 13:46

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.