2

I have a file that includes a relative path in angle brackets, such as the following (example.txt):

Some content containing <../another.txt> file

Then in the parent directory, the file another.txt:

another

What linux command line can I use to generate example_processed.txt that replaces <path> tokens with the contents of the file at the specified path? E.g., I want a command that ingests example.txt and generates example_processed.txt with the following contents:

Some content containing another file

Note that I don't care if there are extraneous newlines in the generated file, so the following output would also be acceptable (this is just an example, any extraneous whitespace is acceptable):

Some content containing
another
file

I have a bash loop that enables reading in the contents of the file into variables, but again, don't know if this helps me to perform the substitution:

cp example.txt example_processed.txt
grep -oP '<\K.*(?=>)' example.txt | while read -r REPL_PATH ; do
    local CONTENTS=$(<"$REPL_PATH")
    # TODO: How do I use this?  The following is what I want to work:
    # sed "s/<$REPL_PATH>/$CONTENTS/g"
    echo "$REPL_PATH: $CONTENTS"
done

This is what has produced the closest result, but requires another.txt to be in the same directory:

sed -e '/<\(.*\)>/{' -e 's/<.*>//' -e 'r another.txt' -e '}' -i example.txt

The above outputs:

Some content containing file
another

Questions:

  • How can I specify the replacement path as ../another.txt?
  • How can I replace the literal another.txt in the above command with the result of capture group #1? E.g., sed -e '/<\(.*\)>/{' -e 's/<.*>//' -e 'r \1' -e '}' -i example.txt
  • How do I move the replacement string between the words "containing" and "file", rather than after the word "file"?
2
  • I updated with a sed command that gets close. I didn't include it originally because I don't even know that that is the correct tool, and I didn't want to elicit answers that make assumptions about the tooling. If someone can give me a hint at the correct tool(s) to use, I can investigate myself. Its hard to form a question when I know nothing about where the answer will take me. I also have a grep/loop combination that reads the contents of the file into an environment variable, but then I don't know how to use the variables to perform the substitution. Again, not sure if that helps. Commented Jan 27, 2024 at 2:48
  • What you're trying to do requires some kind of "processor" software (e.g. LaTex) to handle the file(s), or a self-written script in something more capable than pure "bash". Tip forward, pinpointing a detail in (La)TeX -> tex.stackexchange.com/questions/246/… Commented Jan 27, 2024 at 8:50

2 Answers 2

3

My idea is to convert the input file to a shell script in a form:

cat <<EOF$
…
EOF$

where is the original content of the input file, but with <pathname> replaced by $(cat pathname), so when the script gets interpreted by a shell, it's a command substitution that will be replaced by the output of cat pathname.

This is the command:

<example.txt sed '
   s/[$\\`]/\\&/g
   s/<\([^<>]*\)>/$(\n$$\1\n)/g
   1 i cat <<EOF$
   $ a EOF$
' | sed '
   /^\$\$/ {
      s/[^$\\`]/\\&/g
      s/^\$\$/cat -- /
      }
' | sh >example_processed.txt

Step by step:

  • <example.txt sedsed reads example.txt and does the following:

    • s/[$\\`]/\&/g – escapes every $, \ and `, otherwise they would be special in our here document;
    • s/<\([^<>]*\)>/$(\n$$\1\n)/g – converts every string between and including < and > (where the inside does not contain < or >, so non-greedily) to
      $(
      $$string
      )
      
    • 1 i cat <<EOF$ – inserts cat <<EOF$ before the first line;
    • $ a EOF$ – appends EOF$ after the last line.
  • | sed – the second sed reads from the first and

    • /^\$\$/ – identifies lines starting with $$ (note they must come from the first sed because every $ from the original file is now preceded by a backslash) and there:
      • s/[^$\\`]/\&/g – every character but $, \ or ` gets escaped by a backslash (the excluded characters are already escaped where appropriate)
      • s/^\$\$/cat -- / – and the leading $$ gets replaced by cat -- .
  • | sh >example_processed.txt – POSIX shell interprets the resulting script and writes to example_processed.txt

Your example file will get to sh as the following script:

cat <<EOF$
Some content containing $(
cat -- \.\.\/\a\n\o\t\h\e\r\.\t\x\t
) file
EOF$

Notes:

  • EOF$ is used instead of traditional EOF, so nothing in your original file can interfere. Even if there was EOF$ in the original file, in the script it would be EOF\$.
  • Newlines in pathnames are not supported, < and the corresponding > must be in the same line of input for our code to work.
  • Other characters are supported. The pathname (../another.txt in the example) in the script is fully escaped (character by character), so even if you used a pathname with spaces, asterisks or whatever, it would be safe.
  • $(…) strips trailing newline characters, this is usually fine.
  • -- is explained here: What does -- (double-dash) mean?
  • Relative paths in <…> will be resolved with respect to the working directory of sh, not to the directory containing the input file. In our example it's the same directory, but in general the directories may differ. If you want to resolve relative paths with respect to the directory of the input file then you must run sh in this exact directory, just like we did.
  • The output goes to example_processed.txt which is deliberately a different name than example.txt. Do not redirect output to the file you're reading.

The final result in example_processed.txt:

Some content containing another file
1
  • 1
    This answer is incredibly clever, but also obfuscates the intent. I opted to just read the file line-by-line, replacing the token(s) with the appropriate file's contents. Ended up being much more readable/maintainable. Commented Jan 29, 2024 at 23:58
1

The following is what I ended up coding in a Bash script, simply because it is easier to understand and maintain:

#!/bin/bash

# Note that the following assumes the script is running in the
# same directory as the input file, so it can handle relative paths

local TEMPLATE="example.txt"
local GENERATED="%{TEMPLATE%.txt}_processed.txt"
rm -f "$GENERATED"

# Read the template file line-by-line
while IFS='' read -r LINE; do
    # Determine whether a line includes a link to another file
    if [[ $LINE =~ ^(.*)\<(.+)\>(.*)$ ]]; then
        # If the other file doesn't exist, error out
        if [ ! -f "${BASH_REMATCH[2]}" ]; then
            echo "Unable to include '${BASH_REMATCH[2]}` in '$TEMPLATE'" >&2
            exit 1
        fi

        # Replace the file path with the contents of the file
        echo -n "${BASH_REMATCH[1]}" >> "$GENERATED"
        cat     "${BASH_REMATCH[2]}" >> "$GENERATED"
        echo    "${BASH_REMATCH[3]}" >> "$GENERATED"
    else
        # Copy the line as-is
        echo "$LINE" >> "$GENERATED"
    fi
done < "$TEMPLATE"
1
  • I should also mention that my template file is known not to include any '<' or '>' characters, which is why those delimiters were selected. If your content may include those characters, then the regular expression may need to be more complex. Commented Jan 31, 2024 at 12:54

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.