5

Given file.csv:

a,b,c
1,2,3

How can mlr be made to output:

a,b,c
1,2,c

Using the label name of $c without knowing in advance that $c contains the letter "c"?


Note: correct answer must use mlr only.

agc
  • 7,045
  • 3
  • 23
  • 53
  • do you want the last header column to be the last column value or all other records? – RomanPerekhrest Mar 14 '18 at 15:35
  • @RomanPerekhrest, In this simplest possible case either way would be fine. – agc Mar 14 '18 at 15:37
  • 1
    I hadn't come across miller - `mlr` - before - looks great, will have to spend some time playing with it and reading the docs. Thanks for the link. – cas Mar 15 '18 at 06:14
  • 1
    Questions: 1. Is this supposed to always refer to field 3 or the last field? 2. If there is more than 1 line of data following the headers, should all lines of the data be changed, or only the first line? – Mr. Lance E Sloan Jul 01 '20 at 22:45
  • @LS, Sensible corner-case questions, but the disappointing answer is *"doesn't matter (to me)"* -- I asked the Q. because extracting **a** field name from a header, (not any particular field), seemed like a simple thing, (and should be given the scope of `mlr`), but I couldn't find a simple method. The `a,b,c\n1,2,3\n` was my attempt to give a minimal data instance. – agc Jul 03 '20 at 00:34
  • If you don’t care about how it handles the general case, then ``printf '%s\n' a,b,c 1,2,c`` will do what you want. – G-Man Says 'Reinstate Monica' Jul 08 '20 at 05:13
  • @G-ManSays'ReinstateMonica', This Q. ***is*** about the general case. – agc Jul 09 '20 at 18:07
  • Well, that would be the standard presumption.   My point is that, even when presented with specific clarifying questions, you have refused to identify the general case. – G-Man Says 'Reinstate Monica' Jul 09 '20 at 20:06

3 Answers3

5

Edited answer

Hi, you could use this script

mlr --csv put 'if (NR == 1) {
counter=1;
  for (key in $*) {
    if (counter == 3) {
    $[key]=key;
    }
    counter += 1;
  }
}' input.csv

And as output you will have:

a,b,c
1,2,c

NR == 1 to have the first row, and counter == 3 to get the third field.

aborruso
  • 2,618
  • 10
  • 26
  • Gives an error if the last line of the file ends with a line break, which is not an uncommon thing. (`Header/data length mismatch (3 != 1) at file "input.csv" line 3.`) I think this is a miller feature, though, not a problem of this script. – Mr. Lance E Sloan Jun 30 '20 at 15:42
  • 1
    You say "`NR == 1` to have the heading row", but that's not actually what it does. When `NR` equals `1`, that's the first row of data, which is the line *after* the heading row. – Mr. Lance E Sloan Jun 30 '20 at 23:12
  • @LS I have no error if the last line of the file ends with a line break. I'm using mlr 5.7 – aborruso Jul 01 '20 at 05:57
2

Simply with awk:

awk 'BEGIN{ FS=OFS="," }{ (NR == 1)? c=$NF : $NF=c }1' file.csv

Sample output:

a,b,c
1,2,c
RomanPerekhrest
  • 29,703
  • 3
  • 43
  • 67
  • 1
    Thanks & sorry -- I'm at fault for not specifying that a `mlr`-only answer is required. (I'm trying to learn `mlr`, but it can be puzzling.) *Q.* revised to reflect language limitation. +1 because it works tho'... – agc Mar 14 '18 at 15:47
  • The awk script doesn't work without `1` at the end, but why? – Mr. Lance E Sloan Jun 30 '20 at 16:02
  • I see why `1` is needed. It's a pattern that's always true. A pattern without an action prints the current line. Clever, but not intuitive. It'd be better to explicitly print, i.e., `awk 'BEGIN{ FS=OFS="," }{ (NR == 1)? c=$NF : $NF=c; print }' file.csv`. – Mr. Lance E Sloan Jun 30 '20 at 22:43
  • 1
    @LS The `1` trick is a common idiom in AWK; I agree it’s not intuitive, and you can avoid it in your own programs, but you’d end up having to learn it quickly if you read a lot of other people’s AWK programs ;-). – Stephen Kitt Jul 01 '20 at 07:20
2

miller v5.6.0 allows the use of $[[fieldno]] to refer to the value of the name of field number "fldno", so in your case field 3's name is $[[3]].

    mlr --csv put '$c = $[[3]]' file.csv
pjfarley3
  • 21
  • 1
  • That sounds ideal, but on my system with `mlr` *v5.3.0-1*, the above code fails with two errors: `mlr DSL: syntax error at "["` and `mlr put: syntax error on DSL parse of '$3 = $[[3]]'`. – agc Oct 05 '19 at 03:19
  • 1
    Apologies for that. I only just started using miller and am using the latest stable level of miller from Homebrew,version 5.6.0, where that syntax is accepted and supported. May I suggest you upgrade? Homebrew makes that so very easy to do. – pjfarley3 Oct 06 '19 at 05:29
  • Homebrew may be easy, but SFAIK it's an *OSX* installer util rather than a *Linux* one. I'm running *Ubuntu 18.04*, so the easiest available upgrade is `mlr` *v5.4*, (from [pkgs.org](http://archive.ubuntu.com/ubuntu/pool/universe/m/miller/miller_5.4.0-1_amd64.deb)), but *v5.4* can't returns the same error. Using `alien` on [miller-5.6.0-alt1_1.x86_64.rpm from pkgs.org](http://archive.ubuntu.com/ubuntu/pool/universe/m/miller/miller_5.4.0-1_amd64.deb) does successfully install however. – agc Oct 07 '19 at 13:29
  • 1
    ...but *v5.6* outputs: `a,b,c,3` (linefeed) `1,2,3,c`, which is incorrect. – agc Oct 07 '19 at 13:31
  • There's a typo in the solution. At least for Miller 5.7.0. The `$3` should be `$c`, then it produces the correct answer. – Mr. Lance E Sloan Jun 30 '20 at 22:46
  • 1
    @LS the original answer didn’t work in Miller 5.6 either; but it’s not clear from the question that the answer is allowed to use the third column’s label... – Stephen Kitt Jul 01 '20 at 07:22
  • @StephenKitt, you're right. I glossed over that part of the question. I have an update. I also wonder whether the change is to always apply to the *_third_* field or the *_last_* field, and whether it's to apply to the *_first_* line of data or *_all_* lines. – Mr. Lance E Sloan Jul 01 '20 at 14:51