6

I have a data file, the content is as follows:

department: customer service  section: A
department: marketing         section: A
department: finance           section: A

When I read each line, I would extract the department name using cut command. Unfortunately, the program will automatically trim all redundant space and thus I cut the department name incorrectly.

cat dept.dat | while read line
do
    echo $line
    echo $line | cut -c 12-29
done

e.g. the original line is:

department: marketing         section: A

While the program treats this line as:

department: marketing section: A

How can I read the line without trimming all the redundant space?

slm
  • 363,520
  • 117
  • 767
  • 871
Newbiee
  • 85
  • 1
  • 4
  • Becuase I need to implement some more business logic, it is required to read the file line by line – Newbiee Aug 18 '13 at 01:55
  • See [this wiki](http://mywiki.wooledge.org/BashPitfalls). It has all the common bash pitfalls that you should avoid. The one you mentioned is just a minor example of a very large set of problems you may encounter while scripting in bash. – Bichoy Aug 18 '13 at 03:04

2 Answers2

11

You are losing the spaces when you expand $line. Put double quotes around your variable expansion and you'll preserve the spaces:

$ cat dept.dat | while read line
> do
>     echo "$line"
>     echo "$line" | cut -c 12-29"
> done
department: customer service  section: A
 customer service 
department: marketing         section: A
 marketing        
department: finance           section: A
 finance          
camh
  • 38,261
  • 8
  • 74
  • 62
  • 6
    +1. I'd also recommend setting `IFS` to the null string, so that leading whitespace is not discarded, and specifying the `-r` flag to `read`, so that backslashes are not processed/discarded. That is: instead of `while read line`, I'd recommend writing `while IFS= read -r line`. It's more verbose, but it guarantees that *no* mangling will happen. (Unless the line contains null bytes . . .) – ruakh Aug 18 '13 at 07:27
1

regex matching

You can also achieve something similar using sed:

$ cat dept.dat | while read line
do
  echo "$line"
  echo "$line" | sed -e 's/.*: \(.*\)  .*/\1/'
done

You could also use awk:

$ cat dept.dat | while read line
do
  echo "$line"
  echo "$line" | awk '{sub(/.*nt: /,""); sub(/  .*/,""); print }'
done

You could also use grep:

$ cat dept.dat | while read line
do
  echo "$line"
  echo "$line" | grep -oP '(?<=: ).*(?=  )'
done

NOTE: The grep solution assumes a recent version of grep that includes PCRE (Perl regular expression support).

The main difference in these solutions vs. cut are that they match based on patterns where the cut solution assumes a more rigid structure (specific character positions) to the input data.

static positional matching

An alternative to using cut is to use awk's substr function:

$ cat dept.dat | while read line
do
  echo "$line"
  echo "$line" | awk '{print substr($0,13,16)}'
done
slm
  • 363,520
  • 117
  • 767
  • 871