0

I used the following syntax to read in a file name and modify its extensions to generate the name of a matched data file. As an example, I have ABC_1.fastq.Blockxy, XYZ_1.fastq.Block34 and I want to generate ABC_2.fastq.Block12, XYZ_2.fastq.Block34 as my new file name:

for infile in *_1.fastq.Block*
do
    #base=$(basename ${infile} _1.fastq.**)
    IFS='_'
    read -ra ADDR <<< $infile
    base=${ADDR[0]}
    IFS='.'
    read -ra ADDR <<< ${ADDR[1]}
    second_file="${base}_2.fastq.${ADDR[2]}"
    echo $second_file
done

When executed, this script prints e.g.

ABC_2 fastq Block12
XYZ_2 fastq Block34

i.e. spaces between fastq and Block12. Why am I getting spaces rather than a period between these three strings when I concatenate? I thought that using braces for variable names should have eliminated this problem.

Max
  • 101
  • 1
  • 1
    Wouldn't `second_file="${infile/_1/_2}"` be a lot easier? – frabjous Aug 31 '22 at 22:46
  • Yes, that works in this case. However, for more complicated examples I do want to be able to use pieces of the split string to create new strings. I still don't understand why I'm getting spaces and no periods between the string elements with second_file. – Max Aug 31 '22 at 22:51
  • 4
    I take it it's because IFS is still set to ".", which is making `echo` treat the variable as multiple arguments rather than one. You can `unset IFS` prior to the echo or put in quotes, `echo "$second_file"`. – frabjous Aug 31 '22 at 23:02

1 Answers1

1

In

echo $second_file

Since you forgot to quote the parameter expansion, it's subject to split+glob, so with IFS=., ABC_2.fastq.Block12 is first split into, ABC_2, fastq and Block12 and each word subject to globbing, with no effect here since none of the words contain glob operators.

So 3 arguments are passed to echo which it prints space separated.

To print the contents of a variable followed by a newline character, you need:

printf '%s\n' "$var"

For more details, see:


Now, a few more comments on your code:

  • since bash doesn't have the equivalent of zsh's (N) glob qualifier or ksh93's ~(N) glob operator, before using a glob in a for loop (at least), you need to set the nullglob option:

    shopt -s nullglob
    for infile in *_1.fastq.Block*; do...
    

    If you don't and there's no matching file, you'll loop over a literal *_1.fastq.Block*

  • You can set IFS for read only with: IFS=_ read -ra ADDR <<< "$infile" (see also the quotes around $infile which are needed in older versions of bash). That way, $IFS is only changed while read runs¹ and it's restored to its previous value after read returns.

  • IFS=. read -ra <<< "$var" is a poor method for splitting. First is only works for single line $vars which is not necessarily the case of file names, and also it's quite inefficient. That involves either storing the contents of $var into a tempfile or feed it via a pipe depending on the version of bash and/or the size of $var and then reading it one byte at a time until a newline is found.

    Here, you could use the split+glob operator instead:

    IFS=:; set -o noglob
    addr=( $infile )
    

    (or addr=( $infile'' ) to not ignore a trailing :.)

    Or switch to better shells with proper splitting operators.

    Another approach here would be to do:

    regex='^(.*)_1\.fastq\.(Block.*)$'
    if [[ $infile =~ $regex ]]; then
      outfile=${BASH_REMATCH[1]}_2.fastq.${BASH_REMATCH[2]}
      ...
    

    With the caveat that regex matching only works with valid text, which again is not a guarantee for file names.

    Here, you could also use standard sh parameter expansion operators:

    new_file=${infile%_1.fastq.Block*}_2.fastq.Block${infile##*_1.fastq.Block}
    

    Or the ksh93-style:

    new_file=${infile/_1.fastq.Block/_2.fastq.Block}
    

    (note the variations in behaviour among all those approaches if _1.fastq.Block occurs more than once in the file name).


¹ Though beware that if a trap is handled whilst read is running, the code in that trap will have the modified $IFS

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501