0

My /etc/passwd has a list of users in a format that looks like this:

username:password:uid:gid:firstname.lastname, somenumber:/...

Goal : I want to see only the first names and than sort them having the most common name appear first, 2nd most common appear 2nd etc....

I saw some solutions as to how to do the 2nd part, although they are relevant to working with a text file and not to reading from a map.

In regards to the first part, I really don't know how to approach this. I know that there are some solutions but don't really know how to do them.

asaf92
  • 292
  • 3
  • 14

2 Answers2

6

One way to do it:

cut -d: -f5 /etc/passwd | \
    sed 's/\..*//' | \
    sort -i | \
    uniq -ci | \
    sort -rn
Satō Katsura
  • 13,138
  • 2
  • 31
  • 48
  • Great answer, but I think he'll be in need of using `uniq` without `-i`, since there should be difference between X and x in name, we only need --ignore-case option for sort as you've used. In addition, using the sed command you've added in your answer, seems irrelevant, if there is any reason, please explain. –  Aug 09 '16 at 08:07
  • @FarazX Re: `-i`: `John.Doe` should be the same as `john.doe`. Re: `sed`: from the OP: _I want to see only the first names_. – Satō Katsura Aug 09 '16 at 08:13
  • Oh you're right, sorry I didn't notice. So voila! Thanks for your explanation, and your great way of using cut ;) –  Aug 09 '16 at 08:14
  • **cut** + **sed** is too much `sed '/\n/{P;d};s/:/\n/4;s/\./\n/;D'` or `sed 's/[^.]*:\(\w\+\).*/\1/'` – Costas Aug 09 '16 at 08:23
  • @Costas Too much compared to what? For me, total time spent thinking about getting the 5th field portably with `sed` >> the time gained by not using `cut`. BTW, your second recipe assumes GNU `sed` (`\w`). – Satō Katsura Aug 09 '16 at 08:31
  • @SatoKatsura The above is example. If you'd like you can do the same as in your script `sed 's/[^.]*://;s/\..*//'`. But my 1st example a little bit quicker. AND if you don't like `\w` you free to use `[:alnum:]` – Costas Aug 09 '16 at 08:38
  • @Costas `sed 's/[^.]*://;s/\..*//'` misses any names without dot. The point of using `cut` is precisely to avoid going into this kind of details, you know. – Satō Katsura Aug 09 '16 at 08:44
  • @SatoKatsura If you insist `s/\([^:]*:\)\{4\}//;s/[:.].*//` In any way if you involve *sed* you can easily avoid *cut* – Costas Aug 09 '16 at 09:46
  • Can u explain briefly how this command works? (specifically the `sed` and `cut` part) – asaf92 Aug 09 '16 at 13:29
  • And btw, in my system I don't have access to passwd. I have to type `ypcat passwd` to read it. – asaf92 Aug 09 '16 at 13:31
  • @PanthersFan92 `cut` extracts the 5th field, `sed` kills the `.lastname, somenumber` part out of it. You can, of course, do it like this: `ypcat passwd | cut -d: -f5 | ...`. – Satō Katsura Aug 09 '16 at 13:48
2

Using awk and sorting to have the most common name first:

awk -F: '{sub(/[.].*/, "", $5); a[$5]++} END{for (n in a)print a[n],n}' /etc/passwd | sort -nr

For a case-insensitive version:

awk -F: '{sub(/[. ,].*/, "", $5); a[tolower($5)]++} END{for (n in a)print a[n],n}' /etc/passwd | sort -nr

For those who prefer their commands spread over multiple lines:

awk -F: '
  {
    sub(/[.].*/, "", $5)
    a[$5]++
  }

  END{
    for (n in a)
      print a[n],n
  }
  ' /etc/passwd | sort -nr

How it works

  • -F:

    This makes : the field separator.

  • sub(/[.].*/, "", $5)

    This removes everything after the first period from field 5.

  • a[$5]++

    The count for the number of times this name has appeared is stored in associative array a. This increments the counter. For the case-insensitive version, this is replaced with a[tolower($5)]++.

  • END{for (n in a)print a[n],n}

    This prints the count and name for all the results that we have in array a.

  • sort -nr

    This sorts the output numerically in descending order.

John1024
  • 73,527
  • 11
  • 167
  • 163