Tab-delimited values to YAML conversion

Question

I have a file with tab-delimited values in this format:

your-email  your-order-id   PayPal-transaction-id   your-first-name your-second-name
[email protected]   12345   54321   sooky   spooky
[email protected]   23456   23456   kiki    dee
[email protected] 34567   76543   cheeky  chappy

and I'd like to use awk to convert this to YAML:

---

your-email: [email protected]
your-order-id: 12345
PayPal-transaction-id: 54321
your-first-name: sooky
your-second-name: spooky

your-email: [email protected]
your-order-id: 23456
PayPal-transaction-id: 23456
your-first-name: kiki
your-second-name: dee

your-email: [email protected]
your-order-id: 34567
PayPal-transaction-id: 76543
your-first-name: cheeky
your-second-name: chappy

So far, my awk script looks like this:

#!/usr/bin/awk
FS=="\t"
BEGIN {print "---"} 
NR==1 {for (i=1;i<=NF;i++) print $i ": "}

But I can't figure out how to get each field from line 1 onwards to print after its header and recreate the YAML key values from the first line of the input file. In the real file, there are 38 fields and 34 records (so not huge).

Note that the YAML document you depict is probably not what you actually want: you repeatedly overwrite the value of the 5 keys, so you would get only the info for the last order when you loaded the document. You probably want either a series of subdocuments -- in which case you should change the blank lines to `---` -- or you want a list of dictionaries (that would be my preferred choice.), in which case you should prefix the `your-email` lines with `- ` and indent the other non-blank ones two spaces. See the [YAML reference-card](http://www.yaml.org/refcard.html). — kampu, May 28 '13 at 07:35

score 3 · Accepted Answer · answered May 27 '13 at 23:19

Here's one way:

$ cat inf
your-email  your-order-id   PayPal-transaction-id   your-first-name your-second-name
[email protected]   12345   54321   sooky   spooky
[email protected]   23456   23456   kiki    dee
[email protected] 34567   76543   cheeky  chappy
$ cat mkf.sh
awk '
BEGIN {
  print "---\n"
}
NR == 1 {
  nc = NF
  for(c = 1; c <= NF; c++) {
    h[c] = $c
  }
}
NR > 1 {
  for(c = 1; c <= nc; c++) {
    printf h[c] ": " $c "\n"
  }
  print ""
}' inf
$ ./mkf.sh inf
---

your-email: [email protected]
your-order-id: 12345
PayPal-transaction-id: 54321
your-first-name: sooky
your-second-name: spooky

your-email: [email protected]
your-order-id: 23456
PayPal-transaction-id: 23456
your-first-name: kiki
your-second-name: dee

your-email: [email protected]
your-order-id: 34567
PayPal-transaction-id: 76543
your-first-name: cheeky
your-second-name: chappy

That's great. Thanks. I would upvote but my poor reputation precludes that :-( — duff, May 28 '13 at 08:47
@duff Good! If this solves your issue, please consider [accepting the answer](https://unix.stackexchange.com/help/someone-answers). Accepting an answer marks the issue as resolved. — Kusalananda, Aug 28 '22 at 06:59

Kusalananda · Answer 2 · 2022-08-28T06:51:20.343

csvjson -t file | yq -y .

Assuming the fields of the original data are delimited by tabs, this uses csvjson (from the csvkit toolkit) to convert the data to JSON format. The yq parser (from https://kislyuk.github.io/yq/) is then used to transcode the JSON into YAML.

Given the data in the question, the final output will be the YAML document

- your-email: [email protected]
  your-order-id: 12345
  PayPal-transaction-id: 54321
  your-first-name: sooky
  your-second-name: spooky
- your-email: [email protected]
  your-order-id: 23456
  PayPal-transaction-id: 23456
  your-first-name: kiki
  your-second-name: dee
- your-email: [email protected]
  your-order-id: 34567
  PayPal-transaction-id: 76543
  your-first-name: cheeky
  your-second-name: chappy

I'm noting that the expected output in the question makes little sense as it's a single section with multiple duplicated keys (a key's value is overwritten by a later instance of that same key). I've therefore chosen to ignore that in favour of a document without duplicated keys (the above document contains a list of three objects).

In place of csvjson -j file you may instead use

mlr --itsv --ojson --jlistwrap cat file

... which uses Miller (mlr) from https://miller.readthedocs.io/en/latest/ to convert the tab-delimited input into JSON.

In place of yq -y . you may use

yj -jy

... which uses yj from https://github.com/sclevine/yj to translate JSON to YAML.

Any combination of the four tools mentioned for TSV-->JSON and JSON-->YAML transcoding will give you the same (or equivalent) result in the end.

score 0 · Answer 3 · answered May 27 '13 at 23:17

0

Have you tried to define an iterable integer variable set to zero in begin and run an if/else statement that if "iter==0" saves the field names to elements in an array then autoincrements the integer or else it does the record print you've written (except printing out the fields by using your i iterable? (more information on awk arrays).

I haven't tested this code at all (and I suck something awful at awk in general), but it should serve as a concrete illustration of the general programming/scripting concept:

#!/usr/bin/awk
FS=="\t"
BEGIN {
   print "---"
   iter=0
} 
NR==1 
{

   if (iter == 0)
      for (i=1;i<=NF;i++) 
         newArr[i]=$i
      iter++
   else
      for (i=1;i<=NF;i++) 
         print newArr[i] ": " $i

}

answered May 27 '13 at 23:17

Bratchley

16,684
13
64
103

I get a syntax error when trying to run this:`gawk: ./a.awk:15: else` the arrow (`^`) indicates the `e` of else. could you explain the `NR==1`? Won't that make your code execute _only_ on the first line? – terdon May 27 '13 at 23:46
Yeah, like I said, it was just an illustration, not final code. I looks like @icyrock.com has essentially re-created the same basic script except using `NR` instead of an if/else statement. I would try to use their code. – Bratchley May 27 '13 at 23:50
OK, fair enough, I thought it was some dark trickery I was not aware of :). Sorry, next time I'll read the text of your answer as well as the code. – terdon May 27 '13 at 23:51
I didn't mean to catch the `NR`, that's from your code that you provided, I just worked off that as a template. Near as I can tell from the other person's code, it matches line numbers so I'm assuming `NR` is "number of record" and the statement before the curly brace is a conditional statement. – Bratchley May 27 '13 at 23:52

score 0 · Answer 4 · edited Mar 18 '16 at 22:04

I am sure this can be done in awk but if a Perl answer is acceptable, this should do what you need:

#!/usr/bin/env perl
print "---\n";
while (<>) {
    chomp;
    ## This splits the line at one or more whitespace characters
    ## into the array @fields.
    @fields=split(/\t+/);
    ## Get the column names if this is the 1st line
    if ($.==1){@cols=@fields}
    ## Print the data if it is not the first line
    else {
      print "\n";
      for ($i=0;$i<=$#fields;$i++){
        print "$cols[$i] : $fields[$i]\n";
      }
    }
}

For example:

$./foo.pl input_text.txt
---

your-email: [email protected]
your-order-id: 12345
PayPal-transaction-id: 54321
your-first-name: sooky
your-second-name: spooky

your-email: [email protected]
your-order-id: 23456
PayPal-transaction-id: 23456
your-first-name: kiki
your-second-name: dee

your-email: [email protected]
your-order-id: 34567
PayPal-transaction-id: 76543
your-first-name: cheeky
your-second-name: chappy

This can be condensed into a one-liner using Perl's -a option which splits each line into the array @F:

echo "---";perl  -aF"\t" -ne 'if ($.==1){@c=@F; chomp($c[$#c]);}else {
 print "\n";for ($i=0;$i<=$#F;$i++){print "$c[$i]: $F[$i]\n";}}' input_text.txt

Tab-delimited values to YAML conversion

4 Answers4