-1

I have a comma-separated csv file with 50 lines. One column is for state names and the other column is for capitals (of the states). How do you make a loop where it counts the number of tokens (2, 3, 4) from those two columns together and groups the result into an array? Is it possible to keep track of how many such states there are while doing this?

Kusalananda
  • 320,670
  • 36
  • 633
  • 936
  • 1
    An example of the input and the expected output would be good to see. What is a token? Are you just wanting to count words per line, then `awk '{ print NF }'` would output the number of whitespace-delimited words on each line of input. What type of array are you needing? An array in some shell script language, or a list in XML or JSON? – Kusalananda Jul 26 '22 at 10:47

2 Answers2

1

This solution uses awk instead. I understood from the question that the output should contain only the name of the states. The previous answer provided an output that was more useful and OP accepted that answer, so this script follows the same format with the same dataset.

{
    x = $0
    gsub(/,/, " ", $0)
    a[x]=NF

}

END {
    for (key in a) {
        counter[a[key]] += 1
    }

    for (c in counter) {
        print counter[c] " values with " c " tokens:"
        for (key in a) {
            if (c == a[key]) {
                print "\t"key
            }
        }
    }
}

32 values with 2 tokens:
        Oregon,Salem
        Virginia,Richmond
        Montana,Helena
        Florida,Tallahassee
        Ohio,Columbus
        Delaware,Dover
        Nebraska,Lincoln
        California,Sacramento
        Wisconsin,Madison
        Alaska,Juneau
        Texas,Austin
        Tennessee,Nashville
        Hawaii,Honolulu
        Maryland,Annapolis
        Idaho,Boise
        Illinois,Springfield
        Wyoming,Cheyenne
        Georgia,Atlanta
        Connecticut,Hartford
        Arizona,Phoenix
        Indiana,Indianapolis
        Colorado,Denver
        Mississippi,Jackson
        Washington,Olympia
        Kentucky,Frankfort
        Vermont,Montpelier
        Maine,Augusta
        Michigan,Lansing
        Kansas,Topeka
        Alabama,Montgomery
        Massachusetts,Boston
        Pennsylvania,Harrisburg
16 values with 3 tokens:
        South Dakota,Pierre
        New Hampshire,Concord
        Arkansas,Little Rock
        North Carolina,Raleigh
        North Dakota,Bismarck
        Louisiana,Baton Rouge
        Oklahoma,Oklahoma City
        New York,Albany
        Nevada,Carson City
        Iowa,Des Moines
        South Carolina,Columbia
        Rhode Island,Providence
        New Jersey,Trenton
        Minnesota,St. Paul
        Missouri,Jefferson City
        West Virginia,Charleston
2 values with 4 tokens:
        Utah,Salt Lake City
        New Mexico,Santa Fe
r_31415
  • 496
  • 1
  • 4
  • 7
0

With State Capitals.csv along the lines of:

Alabama,Montgomery
Alaska,Juneau
Arizona,Phoenix
...
West Virginia,Charleston
Wisconsin,Madison
Wyoming,Cheyenne

The following Bash script (version 4+) does what you're asking (assuming I understand what you're asking):

#!/bin/bash -e

export PATH=/bin:/sbin:/usr/bin:/usr/sbin

declare -A a
declare -i i j
while IFS=, read state capital; do
    i=$(( $( echo "$state $capital" | tr -cd ' ' | wc -c ) + 1 ))
    if [[ -z ${a[$i]} ]]; then
        declare -a b=()
    else
        eval "${a[$i]}"
    fi
    b+=("$state|$capital")
    a[$i]=$( declare -p b )
done <<< $( sort 'State Capitals.csv' )

for i in $( IFS=$'\n'; echo "${!a[*]}" | sort -n ); do
    echo "The following \"state capital\" strings have $i tokens:"
    eval "${a[$i]}"
    for (( j = 0; j < ${#b[@]}; ++j )); do
        echo "${b[$j]}"
    done \
        | column -ts '|' \
        | sed -re 's/^/  /'
done

The first loop populates an associative array (a), its indices being the number of words in "State Capital", and its values being a string representation of arrays containing "State|Capital" entries (stringified using declare -p).

The second loop iterates through the sorted keys of a, uses eval to load a's values (stringified with declare -p) into array b, and iterating through b.

ahi324
  • 157
  • 5
  • If it's appropriate to ask here, why was this question and this answer down-voted? Hoping to become a helpful contributor here. – ahi324 Jul 26 '22 at 23:29
  • I didn't down vote your answer, but I'm pretty sure the reason was due to the use of shell loops, which some people frown upon reflexively. In fact, I just wrote an answer to provide a more nuanced view of this issue here: https://unix.stackexchange.com/a/711524/29793. In my opinion, your script is a bit complex for my taste, but gives a very nice output and solves all the problems requested by OP, so I upvoted this answer anyway. – r_31415 Jul 27 '22 at 20:13
  • 1
    @r_31415 Thank you for the feedback, and I very much like your criteria for shell loops. – ahi324 Jul 27 '22 at 20:24
  • 1
    @ahi324 I tried to upvote & then I chose your answer as the best one despite not understanding your code since it was too advanced for me but it met all the checkmarks like r_31415 said – usuallystuck Jul 30 '22 at 11:41