I am trying to see how I can speed up the below script that reports disk usage.
The timed find command towards the end is the problematic line that I am trying to speed up. This script is run on directories that have over 6-7TB of data, and it takes 16-18hrs. However, I want it to run in under 8hrs. Can someone please suggest alternate ways to modify this script?
# -disk_check.csh takes dir name as a mandatory argument and an options <num> or -verbose as a second argument.
# Ex1: disk_check <dir_name> - Reports out the disk usage per user and the total disk consumption
# Ex2: disk_check <dir_name> -verbose -Along with the above, it also lists all files by size in the given directory
# Ex3: disk_check <dir_name> -<num> -Similar to Ex2, But here it reports out the top <num> files by size in the given directory
if ($#argv == 0) then
echo " Error : Dir path missing"
echo " Syntax : disk_check <dir-name> <verbose>"
echo " verbose gives a list of all files per individual sorted by size"
exit 0
endif
set cwd = $argv[1]
if ($cwd =~ "-help") then
echo " Error : Dir path missing"
echo " Syntax : disk_check <dir-name> <-verbose>"
echo " -verbose gives a list of all files per individual sorted by size"
exit 0
endif
if ($#argv > 1) then
set opt = $argv[2]
#echo "opt : $opt"
endif
if ( -d $cwd ) then
set ava = `df -h $cwd | tail -1 | awk '{print $1'}`
set tot = `df -h $cwd | tail -1 | awk '{print $2'}`
set ad = `df -h $cwd | tail -1 | awk '{print $3'}`
set pcu = `df -h $cwd | tail -1 | awk '{print $4'}`
echo ""
echo "Summary for dir ${cwd}: $tot Used (${pcu})"
echo "-----------------------------------------------------------------------------"
echo " Total Volume $ava"
echo " Available on disk $ad "
echo " Percentage used $pcu"
echo ""
echo "Summary by User:"
printf "%sUser%15sSize%10sCount\n" ""
echo "---------------------------------------------"
# This is the command that takes a long time:
time find $cwd -type f -printf "%u %s\n" | awk '{user[$1]+=$2;count[$1]++}; END{ for( i in user) printf "%s%-13s%5s%-0.2f%s%5s%7s\n","", i, "", user[i]/1024**3,"GB", "", count[i]}'| sort -nk2 -r
if ($#argv > 1) then
if ($opt =~ "-verbose") then
echo "\nDetail, Sorted by size"
printf " User%15sFile%15sSize\n" ""
echo "---------------------------------------------------"
find $cwd -type f -not -path '*/\.*' -printf "%-13u | %-50p | %-10s \n" | sort -nk5 -r
endif
quotato get a report. \$\endgroup\$findcommand has already been identified as a performance problem. \$\endgroup\$find,sortandawkcall. Given the amount of data it's used on, I'm not surefindis the only problem here. \$\endgroup\$