folks-
I'm a bit stumped, on this one. I'm trying to write a bash script that will use csplit to take multiple input files and split them according to the same pattern. (For context: I have multiple TeX files with questions in them, separated by the \question command. I want to extract each question into their own file.)
The code I have so far:
#!/bin/bash
# This script uses csplit to run through an input TeX file (or list of TeX files) to separate out all the questions into their own files.
# This line is for the user to input the name of the file they need questions split from.
read -ep "Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. " files
read -ep "Type the directory where you would like to save the split files: " save
read -ep "What unit do these questions belong to?" unit
# This is a check for the user to confirm the file list, and proceed if true:
echo "The file(s) being split is/are $files. Please confirm that you wish to split this file, or cancel."
select ynf in "Yes" "No"; do
case $ynf in
No ) exit;;
Yes ) echo "The split files will be saved to $save. Please confirm that you wish to save the files here."
select ynd in "Yes" "No"; do
case $ynd in
Yes )
# This line will create a loop to conduct the script over all the files in the list.
for i in ${files[@]}
do
# Mass re-naming is formatted to give "guestion###.tex' to enable processing a large number of questions quickly.
# csplit is the utility used here; run "man csplit" to learn more of its functionality.
# the structure is "csplit [name of file] [output options] [search filter] [separator(s)].
# this script calls csplit, will accept the name of the file in the argument, searches the files for calls of "question", splits the file everywhere it finds a line with "question", and renames it according to the scheme [prefix]#[suffix] (the %03d in the suffix-format is what increments the numbering automatically).
# the '\\question' allows searching for \question, which eliminates the split for \end{questions}; eliminating the \begin{questions} split has not yet been understood.
csplit $i --prefix=$save'/'$unit'q' --suffix-format='%03d.tex' /'\\question'/ '{*}'
done; exit;;
No ) exit;;
esac
done
esac
done
return
I can confirm it does do the loop as I intended for the input files I have. However, the behavior I'm noticing is that it'll split the first file into "q1.tex q2.tex q3.tex" as expected, and when it moves on to the next file in the list, it'll split the questions and overwrite the old files, and the third file it will overwrite the second file's splits, etc. What I would like to happen is that, say, if File1 has 3 questions, it will output:
q1.tex
q2.tex
q3.tex
And then if File2 has 4 questions, it will then continue incrementing to:
q4.tex
q5.tex
q6.tex
q7.tex
Is there a way for csplit to detect the numbering that has already been done in this loop, and increment appropriately?
Thanks for any help you folks can offer!