14

I was trying to figure out a solution for this question. I wanted to use awk for the solution.

My input file is something like below.

-bash-3.2$ cat file
ramesh
ramesh_venkat
ramesh3_venkat3
ramesh4
ramesh5
venkat
venkat3
venkat4

I used awk command to extract the second values after _ as below.

awk -F "_" '{print $2}' file

However, though the above command prints the correct values I am getting blank lines in my output. I have 2 questions.

Question 1

How can I remove the blank lines in output so that I get only venkat and venkat3 in the output?

If I use printf instead of print in my awk, I get venkatvenkat3 as output which is not I wanted to achieve. I want the output like,

venkat
venkat3

Question 2

Using those values as an associative array or something, how can I find if the values actually occur in $1 column?

I wanted to achieve something like,

awk -F "_" '$2==1{print $1}' file

EDIT

I did not notice the awk solution of Stephane. Is it doing the same thing that I had mentioned?

Ramesh
  • 38,687
  • 43
  • 140
  • 215
  • 1
    Stephane's `awk` is not doing the same thing. Your approach assumes that a word can only be contained in another if it is separated by `_`. While that is true for the OP's example, all of the posted answers also deal with cases like `doglion` and not only `dog_lion`. – terdon May 07 '14 at 16:52
  • For non-awk, see: [How to remove blank lines from a file in shell?](http://unix.stackexchange.com/q/101440/21471) – kenorb May 05 '15 at 16:05

4 Answers4

12

Another approach:

Question 1

awk -F_ '$2{print $2}' file

This will only print if $2 is defined. It is a shorter way of writing:

awk -F_ '{if($2){print $2}}' file

Question 2

Don't have anything to add that has not already been addressed.

terdon
  • 234,489
  • 66
  • 447
  • 667
10

Question1

$ awk -F _ 'NF > 1 {print $2}' file
venkat
venkat3

Question2

$ awk -F _ '
    NR == FNR {a[$1];next}
    ($2 in a) {print $2}
' file file
venkat
venkat3
cuonglm
  • 150,973
  • 38
  • 327
  • 406
8

for Question 1, you could use the --only-delimited (-s) option of cut

cut -s -f2 -d'_' file
venkat
venkat3
iruvar
  • 16,515
  • 8
  • 49
  • 81
6

question 1

awk -F "_" '/_/ {print $2}' file

question 2

awk -F "_" '{values[$1]=1;}; END {for (val in values) print val;}' file
Hauke Laging
  • 88,146
  • 18
  • 125
  • 174
  • For question2, I intend to get only `venkat` and `venkat3` as output as they are present in `$1`. However, I get all the `$1` values as per your command. – Ramesh May 07 '14 at 15:37
  • @Ramesh: As your describtion, I think you want to get `$2` of entry that have `$2` occurs in 1st column. Is this right? – cuonglm May 07 '14 at 15:43
  • @Gnouc, yes you are right. – Ramesh May 07 '14 at 15:44