I have two files containing the similar pattern:
cmd1 [cmd2 {xx/xx[7] x/x[0] ...}] cmd3 [cmd4 {xx/x[12] ...}]
cmd5 [cmd6 {x/x[1] xx ...}]
I don't need to consider all the cmds in two files. Only need to sort the string lists' order in braces. Then I sort the two files separately and use comm
to output the similarities and differences. I use the same flow to sort the string lists in two files. The flow is below:
matchedBraces=$(grep -o '\{[^}]*\}' $fileA) #grep all the braces and strings in them
while read perMatch
do
sort_now "$perMatch" $fileA
done <<< "$matchedBraces"
function sort_now {
beforeSort=$(echo "$1" | sed 's?\[?\\[?g' | sed 's?\]?\\]?g') #in case string has square brackets in it, change [...] to \[...\] for later use
afterSort=$(echo "$1" | grep -o '[^{} ]*' | sort | tr '\n' ' ') #get all the individual strings in brace, sort them, put them as a string with a trailing white space
afterSort={$(echo $afterSort)} #delete the trailing white space, add the brace back to the string list
afterSort=$(echo "$afterSort" | sed 's?\[?\\[?g' | sed 's?\]?\\]?g') #in case string has square brackets in it, change [...] to \[...\] for later use
sed -i -f - $2 << eof
s?${beforeSort}?${afterSort}?g
eof #the variables may be very large, so I have to use sed this way
sort $2 -o $2
}
I changed the code's order for clarity.
It works flawlessly but if a file only contains the pattern cmd1 [cmd2 {xx/xx}]
only one string in brace, but there's 50k similar lines, then it's very time-consuming. Even if I put it running with 8cpus 200G mem, it keeps running after hours. Since I know in tcl, to append something to a string, the command append
is much more efficient than set
. I'm wondering if bash has similar commands or features. Or can someone optimise my code to save time?
perl -pe 's/(?<=\{).*?(?=\})/join " ", sort split " ", $&/ge' file
afterSort=$(echo "$1" ...
beafterSort=$(echo "$beforeSort" ...
?grep/sed/sort
results for both files (as opposed to making us reverse engineer the code to figure out the desired sorted output); update the question with the final resultssed
calls are actually required it's probably possible to consolidate many together; then again, eliminating the row-by-row processing of abash
loop and replacing with a more appropriate tool (eg,awk
,perl
,pyton
, etc) is going to be another big time saver; I'm guessing most (if not all) of this code could be replaced with a singleawk/perl/python
script ... but we'll need a robust set of sample data and better description of your sorting algorithm