1

I know this question has been asked and answered before, I have tried the code but I do not get a correct output.

I have 2 folders: vanila1 and vanila2, each has 400 files with similar names

ls vanila1
MB.2613.007_0021.ED4_KS1A29-7_338_all
MB.2613.007_0022.ED9_SD2A27-1_180_all
MB.2613.007_14.ED14_IA2A35-2_310_all

ls vanila2
MB.2613.007_0021.ED4_KS1A29-7_338_all
MB.2613.007_0022.ED9_SD2A27-1_180_all
MB.2613.007_14.ED14_IA2A35-2_310_all

I want to combine files with identical names and I am using this:

ls vanila1 | while read FILE; do
  cat vanila1/"$FILE" vanila2/"$FILE" >> all_combined/"$FILE"
done

I do not get a correct output, the number of lines in combined file is more that the sum of file1 and file 2. Am I doing something wrong?

peterh
  • 9,488
  • 16
  • 59
  • 88
Anna1364
  • 1,006
  • 1
  • 17
  • 33
  • @Theophrastus, yes. seems your possible clue did the job for me, thanks for that. – Anna1364 Feb 14 '18 at 19:32
  • It is generally a [really bad idea](http://mywiki.wooledge.org/ParsingLs") to parse the output of `ls`. You should probably look into either using `find` or simple shell globbing to get your list of files to process. Extensive further reading on the subject can be found [here](https://unix.stackexchange.com/questions/128985/why-not-parse-ls). – DopeGhoti Feb 14 '18 at 19:46
  • I do prefer [ParsingLs](http://mywiki.wooledge.org/ParsingLs) in BashGuide. – ilkkachu Feb 14 '18 at 20:41
  • You might observe that I do in fact link to that exact article. – DopeGhoti Feb 14 '18 at 21:01
  • I got the same result with either `>` or `>>`. The only difference between the two is that `>` overwrites the contents of the file if it already exists. – Nasir Riley Feb 15 '18 at 03:16

2 Answers2

1

I have a hunch that you may have run your loop more than once, and since you use the >> redirection operator, which appends data, your result files grows every time.

Instead (and here I'm avoiding using ls too, see the discussion in "Why *not* parse `ls`?" for reasons):

for name in vanila1/*; do
    base_name=${name##*/}

    if [ -f "vanila2/$base_name" ]; then
        cat "$name" "vanila2/$base_name" >"all_combined/$base_name"
    else
        printf 'No file in vanila2 corresponds to "%s"\n' "$name" >&2
    fi
done

The variable substitution ${name##*/} transforms a pathname like vanila1/MB.2613.007_0021.ED4_KS1A29-7_338_all into just MB.2613.007_0021.ED4_KS1A29-7_338_all, i.e. it removes all things before the /, including the slash (this is the filename component of the pathname, or "the basename"). This may be replaced by $(basename "$name").

If there is a file in vanila2 corresponding to the name picked up from vanila1, the two are concatenated and put into the all_combined directory. If not, there is a diagnostic message about this fact.

By using > rather than >>, any existing file in all_combined with the same name will be replaced rather than appended to.


If you have other files or directories in vanila1, then you may want to modify the pattern vanila1/* in the loop to something that matches only the files that you are interested in, for example vanila1/*_all or similar.

Kusalananda
  • 320,670
  • 36
  • 633
  • 936
  • thanks so much for the code and very helpful explanations on it. There is one little question in your code that I could not understand as I am learning programming: if [ -f "vanila2/$base_name" ], what does this part do? I mean -f? – Anna1364 Feb 15 '18 at 20:15
  • @Anna1364 The test `[ -f "filename" ]` will be true if there exists a file whose name is `filename`. In the code I using a similar test to check whether `vanila2/$base_name` corresponds to an existing file. The `-f` test chocks specifically for an existing _regular file_, while other tests like `-d` checks whether the given name is that of an existing _directory_. See `man test`. – Kusalananda Feb 15 '18 at 20:55
-1

So you have files with identical names in two directories, and where both files are present you with to concatenate them?

for file in dir1/*; do
   otherfile="$(basename "$file")"
   if [[ -r dir2/"${otherfile}" ]]; then
       cat "$file" dir2/"$otherfile" >> combined/"$otherfile"
   fi
done
DopeGhoti
  • 73,792
  • 8
  • 97
  • 133
  • 1
    Your answer doesn't address the questioner's main point, which was to figure out why his/her result is seemingly longer than the sum of its parts. – user1404316 Feb 14 '18 at 19:53