0

I'm creating a code for the automatic extraction of bib records from scientific papers.

In an old version of the script i gave in input the name of the folder where all the pdfs were stored, now I want to give a regex. E.g. before:

./AutoBib.sh Papers/

Now:

./Autobib.sh Papers/*.pdf

In the folder there are, for example 3 pdf files: Shrek.pdf, Fiona.pdf, Donkey.pdf, using my script I should be able to retrieve the doi from all files creating a file where all doi are listed but executing my script it returns the doi of the first file and nothing more.

Here there is my code:

for i in $1; do
    doi $i
done

doi is a function that extract the doi from a pdf and puts it in a txt file. When i run the script it returns me only the doi of the first file.

How can I feed a regex in my script and being able to iterate though all files that matches that regex?

3 Answers 3

3

It's important to understand that Papers/*.pdf is not a regular expression, it's a wildcard pattern that causes bash to perform filename expansion, or globbing.

$1 represents the first argument to your script, so your for loop is only ever iterating over that one argument.

Use $@ to represent all arguments:

for i in "$@"; do
    doi "$i"
done
Sign up to request clarification or add additional context in comments.

Comments

1

If you want to filter files within directory by pattern, you can pass this pattern as second script parameter and search for matching files using find.

Here is the code. It's additionally resistant to filenames containing spaces:

find "$1" -maxdepth 1 -name "$2" -exec doi {} \;

Usage example: ./Autobib.sh Papers/ *.pdf

4 Comments

This has multiple problems. You don't want to use ls in scripts and *.pdf is not a valid regex. Even if it were, you'd need to quote it in order to prevent the shell from expanding it into a list of arguments.
Thanks. I edited answer to use find instead of ls and to make description more relevant.
read is still not robust, you want something like while IFS="" read -r filename but a much better solution is find ... -exec doi {} j;
Yeah, looks great to me! Thanks!
0

You can just run the ls command in loop and it will solve your problem.

for x in $(ls $@/*.pdf)
do
echo $x  ## if you want only file name you can change this line to echo `basename $x`
done

I have created the same scenario as you mentioned above, refer the snapshot.

enter image description here

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.