I write lots of small scripts to manipulate files on a Bash-based server. I would like to have a mechanism by which to log which commands created which files in a given directory. However, I don't just want to capture every input command, all the time.
Approach 1: a wrapper script that uses a Bash builtin (a la history or fc -ln -1) to grab the last command and write it to a log file. I have not been able to figure out any way to do this, as the shell builtin commands do not appear to be recognized outside of the interactive shell.
Approach 2: a wrapper script that pulls from ~/.bash_history to get the last command. This, however, requires setting up the Bash shell to flush every command to history immediately (as per this comment) and seems also to require that the history be allowed to grow inexorably. If this is the only way, so be it, but it would be great to avoid having to edit the ~/.bashrc file on every system where this might be implemented.
Approach 3: use script. My problem with this is that it requires multiple commands to start and stop the logging, and because it launches its own shell it is not callable from within another script (or at least, doing so complicates things significantly).
I am trying to figure out an implementation that's of the form log_this.script other_script other_arg1 other_arg2 > file, where everything after the first argument is logged. The emphasis here is on efficiency and minimizing syntax overhead.
EDIT: iLoveTux and I both came up with similar solutions. For those interested, my own implementation follows. It is somewhat more constrained in its functionality than the accepted answer, but it also auto-updates any existing logfile entries with changes (though not deletions).
Sample usage:
$ cmdlog.py "python3 test_script.py > test_file.txt"
creates a log file in the parent directory of the output file with the following:
2015-10-12@10:47:09 test_file.txt "python3 test_script.py > test_file.txt"
Additional file changes are added to the log;
$ cmdlog.py "python3 test_script.py > test_file_2.txt"
the log now contains
2015-10-12@10:47:09 test_file.txt "python3 test_script.py > test_file.txt"
2015-10-12@10:47:44 test_file_2.txt "python3 test_script.py > test_file_2.txt"
Running on the original file name again changes the file order in the log, based on modification time of the files:
$ cmdlog.py "python3 test_script.py > test_file.txt"
produces
2015-10-12@10:47:44 test_file_2.txt "python3 test_script.py > test_file_2.txt"
2015-10-12@10:48:01 test_file.txt "python3 test_script.py > test_file.txt"
Full script:
#!/usr/bin/env python3
'''
A wrapper script that will write the command-line
args associated with any files generated to a log
file in the directory where the files were made.
'''
import sys
import os
from os import listdir
from os.path import isfile, join
import subprocess
import time
from datetime import datetime
def listFiles(mypath):
"""
Return relative paths of all files in mypath
"""
return [join(mypath, f) for f in listdir(mypath) if
isfile(join(mypath, f))]
def read_log(log_file):
"""
Reads a file history log and returns a dictionary
of {filename: command} entries.
Expects tab-separated lines of [time, filename, command]
"""
entries = {}
with open(log_file) as log:
for l in log:
l = l.strip()
mod, name, cmd = l.split("\t")
# cmd = cmd.lstrip("\"").rstrip("\"")
entries[name] = [cmd, mod]
return entries
def time_sort(t, fmt):
"""
Turn a strftime-formatted string into a tuple
of time info
"""
parsed = datetime.strptime(t, fmt)
return parsed
ARGS = sys.argv[1]
ARG_LIST = ARGS.split()
# Guess where logfile should be put
if (">" or ">>") in ARG_LIST:
# Get position after redirect in arg list
redirect_index = max(ARG_LIST.index(e) for e in ARG_LIST if e in ">>")
output = ARG_LIST[redirect_index + 1]
output = os.path.abspath(output)
out_dir = os.path.dirname(output)
elif ("cp" or "mv") in ARG_LIST:
output = ARG_LIST[-1]
out_dir = os.path.dirname(output)
else:
out_dir = os.getcwd()
# Set logfile location within the inferred output directory
LOGFILE = out_dir + "/cmdlog_history.log"
# Get file list state prior to running
all_files = listFiles(out_dir)
pre_stats = [os.path.getmtime(f) for f in all_files]
# Run the desired external commands
subprocess.call(ARGS, shell=True)
# Get done time of external commands
TIME_FMT = "%Y-%m-%d@%H:%M:%S"
log_time = time.strftime(TIME_FMT)
# Get existing entries from logfile, if present
if LOGFILE in all_files:
logged = read_log(LOGFILE)
else:
logged = {}
# Get file list state after run is complete
post_stats = [os.path.getmtime(f) for f in all_files]
post_files = listFiles(out_dir)
# Find files whose states have changed since the external command
changed = [e[0] for e in zip(all_files, pre_stats, post_stats) if e[1] != e[2]]
new = [e for e in post_files if e not in all_files]
all_modded = list(set(changed + new))
if not all_modded: # exit early, no need to log
sys.exit(0)
# Replace files that have changed, add those that are new
for f in all_modded:
name = os.path.basename(f)
logged[name] = [ARGS, log_time]
# Write changed files to logfile
with open(LOGFILE, 'w') as log:
for name, info in sorted(logged.items(), key=lambda x: time_sort(x[1][1], TIME_FMT)):
cmd, mod_time = info
if not cmd.startswith("\""):
cmd = "\"{}\"".format(cmd)
log.write("\t".join([mod_time, name, cmd]) + "\n")
sys.exit(0)
nohup?nohupwill log stdout and stderr to a file, but I want to capture the entirety of the command line input, including redirects, pipes, etc. Basically, I need exactly what the Bash builtinfc -ln -1produces, but automated.