1

I've got an array that contains duplicate items, e.g.

THE_LIST=(
"'item1' 'data1 data2'"
"'item1' 'data2 data3'"
"'item2' 'data4'"
)

Based on the above, I want to create an associative array that would assign itemN as key and dataN as value.

My code iterates over the list, and assigns key => value like this (the additional function is shortened, as it performs some additional jobs on the list):

function get_items(){
    KEY=$1
    VALUES=()
    shift $2
    for VALUE in "$@"; do
        VALUES[${#VALUES[@]}]="$VALUE"
    done
}

declare -A THE_LIST
for ((LISTID=0; LISTID<${#THE_LIST[@]}; LISTID++)); do
    eval "LISTED_ITEM=(${THE_LIST[$LISTID]})"
    get_items "${LISTED_ITEM[@]}"
    THE_LIST=([$KEY]="${VALUES[@]}")
done

when I print the array, I'm getting something like:

item1: data1 data2
item1: data2 data3
item2: data4

but instead, I want to get:

item1: data1 data2 data3
item2: data4

Cannot find a way of merging the duplicate keys as well as removing duplicate values for the key.

What would be the approach here?

UPDATE

The actual code is:

THE_LIST=(
"'item1' 'data1 data2'"
"'item1' 'data2 data3'"
"'item2' 'data4'"
)

function get_backup_locations () {
  B_HOST="$2"
  B_DIRS=()
  B_DIR=()
  shift 2

  for B_ITEM in "$@"; do
    case "$B_ITEM" in
      -*) B_FLAGS[${#B_FLAGS[@]}]="$B_ITEM" ;;
      *) B_DIRS[${#B_DIRS[@]}]="$B_ITEM" ;;
    esac
  done

  for ((B_IDX=0; B_IDX<${#B_DIRS[@]}; B_IDX++)); do
    B_DIR=${B_DIRS[$B_IDX]}

    ...do stuff here...

  done
}

function get_items () {
  for ((LOCIDY=0; LOCIDY<${#LOCATIONS[@]}; LOCIDY++)); do
    eval "LOCATION=(${LOCATIONS[$LOCIDY]})"
    get_backup_locations "${LOCATION[@]}"
    THE_LIST=([$B_HOST]="${B_DIR[@]}")
  done | sort | uniq
}

when printing the array with:

for i in "${!THE_LIST[@]}"; do
    echo "$i : ${THE_LIST[$i]}"
done

I get

item1: data1 data2
item1: data2 data3
item2: data4
10
  • 1
    Your code as given won't work at all - THE_LIST is already a normal array, so you can't redeclare it as an associative array, and even if you could, you're overwriting it each time in the loop with THE_LIST=([$KEY]="${VALUES[@]}").
    – muru
    Commented Jun 13, 2019 at 7:34
  • @muru, so, by what you're saying, I cannot convert an array into associative array, or just not this way?
    – Bart
    Commented Jun 13, 2019 at 7:41
  • I'm saying that the code has no relation to the output that you say you're getting.
    – muru
    Commented Jun 13, 2019 at 7:42
  • 2
    This is not helping your question, but have you taken a look at python? Complex stuff like this is often easy as hell in python.
    – Panki
    Commented Jun 13, 2019 at 7:52
  • 1
    @Panki, yes, Python or perl might be better approach here, however, I'm adding additional feature to an existing bash script, thus the pain.. it's simply too large to rewrite the whole thing in time. if I don't find a way, I may just as well use another language for the task.
    – Bart
    Commented Jun 13, 2019 at 7:54

2 Answers 2

1

If the keys and values are guaranteed to be purely alphanumerical, something like this might work:

declare -A output

make_list() {
  local IFS=" "
  declare -A keys                  # variables declared in a function are local by default
  for i in "${THE_LIST[@]}"
  do 
    i=${i//\'/}                    # since everything is alphanumeric, the quotes are useless
    declare -a keyvals=($i)        # split the entry, filename expansion isn't a problem
    key="${keyvals[0]}"            # get the first value as the key
    keys["$key"]=1                 # and save it in keys
    for val in "${keyvals[@]:1}"
    do                             # for each value
      declare -A "$key[$val]=1"    # use it as the index to an array. 
    done                           # Duplicates just get reset.
  done

  for key in "${!keys[@]}"
  do                               # for each key
    declare -n arr="$key"          # get the corresponding array
    output["$key"]="${!arr[*]}"    # and the keys from that array, deduplicated
  done
}

make_list
declare -p output  # print the output to check

With the example input, I get this output:

declare -A output=([item1]="data3 data2 data1" [item2]="data4" )

The data items are out of order, but deduplicated.


Might be best to use Python with the csv module instead.

6
  • that does the job after some adjustments as bash version on target machine doesn't support namerefs declaration. thanks for pointers!
    – Bart
    Commented Jun 13, 2019 at 9:36
  • @Bart I'm curious: how did you fix that?
    – muru
    Commented Jun 13, 2019 at 10:28
  • I was too fast being cheerful. this indeed works on a newer bash, but what I thought was a workaround, didn't work out well in the end. I'll most probably rewrite the script, worst case, put an RFC to update bash on a server ;)
    – Bart
    Commented Jun 13, 2019 at 10:46
  • @Bart are you on Bash 4.2, or something older?
    – muru
    Commented Jun 13, 2019 at 10:54
  • that's bash 4.2.46
    – Bart
    Commented Jun 13, 2019 at 11:17
1

If there is no whitespace in any of the values, this solution might work. Use awk associative arrays to build up declare -A commands.

#!/bin/bash

THE_LIST=(
"'item1' 'data1 data2'"
"'item1' 'data2 data3'"
"'item2' 'data4'"
)

eval "$(\
  for i in "${THE_LIST[@]}"; do
    row=($(eval echo $i))
    echo "${row[@]}"
  done | awk '{ for (i=2; i<=NF; i++) if (seen[$1] !~ $i) { seen[$1]=seen[$1]$i" " } }
    END { for (s in seen) print "declare -A new_list["s"]=\""seen[s] }' | sed 's/[[:space:]]*$/"/'
)"

for i in "${!new_list[@]}"; do
  echo "$i: ${new_list[$i]}"
done

This prints:

item2: data4
item1: data1 data2 data3

The order of the values is preserved, but the keys are reordered. I couldn't figure out how to trim the trailing whitespace of an array entry in awk so I just used sed to replace it with a quote, but it's already a total hack to begin with.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.