0

how to capture string from csv line that comes after specific word

for example , this is the csv line that we want to cut the strings that comes after /data/

status=true /data/sdb/hadoop/hdfs/log,/data/sdc/hadoop/hdfs/log,/data/sdd/hadoop/hdfs/log,/data/sde/hadoop/hdfs/log,/data/sdf/hadoop/hdfs/log

example of expected resuls

sdb
sdc
sdd
sde
sdf
Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
yael
  • 12,598
  • 51
  • 169
  • 303

6 Answers6

4

Use grep:

with PCRE:

grep -Po '/data/\K[^/]*'

if that is not available:

grep -o '/data/[^/]*' | cut -d'/' -f3
pLumo
  • 22,231
  • 2
  • 41
  • 66
1

@pLumo absolutely has the right answer. If, for whatever reason, you wanted to use awk and bash's builtin parameter expansion, all the while being slightly convoluted...

LINE_COUNTER=0
while read line; do
    COUNT_SEP="${line//[^,]}"
    for col in $(seq 2 $((${#COUNT_SEP}+1))); do
        LINE_COUNTER=$(($LINE_COUNTER+1))
        COLUMN=$(echo "${line}" | awk -v variable="${col}" -F, '{ print $variable }')
        if [ $LINE_COUNTER -eq 1 ]
        then
            echo "${COLUMN}" > /tmp/splitCSV
        else
            echo "${COLUMN}" >> /tmp/splitCSV
        fi
    done
    while read splitCol; do
        echo "${splitCol}" | awk -F'/data/' '{ print $2 }' | awk -F'/' '{ print $1 }'
    done < /tmp/splitCSV
done < test.csv
Jake Ireland
  • 195
  • 8
  • 1
    You should never do that. See [why-is-using-a-shell-loop-to-process-text-considered-bad-practice](https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice) for some of the reasons. – Ed Morton Mar 03 '20 at 15:59
  • 1
    Thanks! I didn't know that was best practice. Very interesting. – Jake Ireland Mar 03 '20 at 18:51
  • 1
    Yeah, the guys who invented shell to manipulate files and processes also invented tools like awk for shell to call to manipulate text. So, horses for courses... never write a shell loop just to manipulate text and you can't go wrong. – Ed Morton Mar 04 '20 at 14:41
1

Just to add an option, having in mind that there's only one pattern that match three characters between slashes, with sed and grep:

grep -o "/.../"  foo | sed 's;/;;g' file

Output:

sdb
sdc
sdd
sde
sdf
schrodingerscatcuriosity
  • 12,087
  • 3
  • 29
  • 57
1

For Above input below command will work

perl -pne "s/,/\n/g"  filename|awk -F '/data/' '{gsub("/.*","",$2);print $2}'

output

sdb
sdc
sdd
sde
sdf
Praveen Kumar BS
  • 5,139
  • 2
  • 9
  • 14
1

This works for me with awk

awk -F'/' '{for(i=1;i<=NF;i++) if($i=="data") print $(i+1)}' <file>

1: -F defines field separator as /

2: loop on every field on each line

3: if field equals "data" print next field

Clement
  • 47
  • 4
1

We can choose from the following :

awk -F/ '
     BEGIN { OFS = RS }
     {
       N = split($0, a, /\//)
       $0 = "" 
        for ( i=j=1; i<N; i++ ) 
            if ( a[i] == "data" ) 
                 $(j++) = a[++i]
      }N>1' file.csv


perl -F/ -lane '
   shift(@F) eq q(data) and print(shift(@F)) 
      while(@F && m{/data/});
' file.csv


perl -lne 'print for m{/data/([^/,]+)}g' file.csv


sed -re '
    /\n/{P;D;}
    s:/data/([^/,]+):\n\1\n:
   D
' file.csv
Rakesh Sharma
  • 836
  • 1
  • 5
  • 3