I have a tab-delimited file where one of the columns is in the format "LastName, FirstName". What I want to do is split that record out into two separate columns, last, and first, use cut or some other verb(s) on that, and output the result to JSON.
I should add that I'm not married to JSON, and I know how to use other tools like jq, but it would be nice to get it in that format in one step.
The syntax for the nest verb looks like it requires memorizing a lot of frankly non-memorable options, so I figured that there would be a simple DSL operation to do this job. Maybe that's not the case?
Here's what I've tried. (Let's just forget about the extra space that's attached to Firstname right now, OK? I would use strip or ssub or something to get rid of that later.)
echo -e "last_first\nLastName, Firstname" \
| mlr --t2j put '$o=splitnv($last_first,",")'
# result:
# { "last_first": "LastName, Firstname", "o": "(error)" }
# expected something like:
# { "last_first": "LastName, Firstname", "o": { 1: "LastName", 2: "Firstname" } }
#
# or:
# { "last_first": "LastName, Firstname", "o": [ "LastName", "Firstname" ] }
Why (error)? Is it not reasonable that assigning to $o as above would assign a new column o to the result of splitnv?
Here's something else I tried that didn't work like I would've expected either:
echo -e "last_first\nLastName, Firstname" \
| mlr -T nest --explode --values --across-fields --nested-fs , -f last_first
# result (no delimiter here, just one field, confirmed w/ 'cat -A')
# last_first
# LastName, Firstname
# expected:
# last_first_1<tab>last_first_2
# LastName,<tab> Firstname
Edit: The problem with the command above is I should've used --tsv, not -T, which is a synonym for --nidx --fs tab (numerically-indexed columns). Problem is, Miller doesn't produce an error message when it's obviously wrong to ask for named columns in that case, which might be a mis-feature; see issue #233.
Any insight would be appreciated.