2

My data inside a folder data looks like data1.txt, data2.txt, … data120.txt.

Inside each .txt file I have four columns (1000 data lines in each column) example:

data1.txt

1 2 3 4 
4 0 1 3 
3 1 1 2 
2 2 2 1 
........

data2.txt

0 1 3 4 
4 2 1 3 
3 1 3 2 
2 3 2 1 
........

data3.txt

1 0 3 4 
4 0 0 3 
0 1 1 2 
2 0 2 1 
........

data120.txt

1 2 3 1 
4 1 1 3 
3 1 1 1 
2 1 2 1 
........

I want to get the average .txt which looks like below but divided by 4 because I used four data samples in this example.

1+0+1+1  \  2+1+0+2  \  3+3+3+3  \  4+4+4+1  \

4+4+4+4  \  0+2+0+1  \  1+1+0+1  \  3+3+3+3  \

3+3+0+3  \  1+1+1+1  \  1+3+1+1  \  2+2+2+1  \

2+2+2+2  \  2+3+0+1  \  2+2+2+2  \  1+1+1+1  \

I show my data this way just to make it clear - )

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
saya
  • 21
  • 4
  • 2
    Formatted the text a little bit. Modify if it is wrong. (Assumed there was no blank lines between data rows in the file). Also; the output is a bit vague. Is it the way you have written it? Or do you want the average in addition, or only the average, or ? I Assume you want `0.75 1.25 3 3.25` in row 1 etc. Is this correct? – ibuprofen May 29 '21 at 00:15
  • 2
    Not sure I get what you want here. You want to sum file1:`Col1.Row1`+file2:`Col1.Row1` ... +file120`Col1.Row1` ? 120 values. And file1:`Col2.Row1`+file2:`Col2.Row1` ... +file120`Col2.Row1` ... Then file1:`Col1.Row2`+file2:`Col1.Row2` ... +file120`Col1.Row2` ? – ibuprofen May 29 '21 at 00:34
  • @ibuprofen thank you so much for your prompt response - and yes I want exactly how you understood - like 0.75 1.25, 3, 3.25 in row 1 and so forth.. Somehow I find difficulty in presenting the data in a clear way - Appreciate it. – saya May 29 '21 at 04:26
  • If any of the answers here does what you want then see https://unix.stackexchange.com/help/someone-answers for what to do next. – Ed Morton Jun 01 '21 at 13:04

3 Answers3

3
$ paste data*.txt |
    awk -v numOutFlds=4 '{
        numFiles = NF / numOutFlds
        for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
            sum = 0
            for (fileNr=1; fileNr<=numFiles; fileNr++) {
                inFldNr = outFldNr + ((fileNr - 1) * numOutFlds)
                sum += $inFldNr
            }
            printf "%g%s", sum/numFiles, (outFldNr<numOutFlds ? OFS : ORS)
        }
    }' |
    column -t
0.75  1.25  3     3.25
4     0.75  0.75  3
2.25  1     1.5   1.75
2     1.5   2     1
Ed Morton
  • 28,789
  • 5
  • 20
  • 47
2

You can make do with the RPN desk calculator dc

paste ./*.txt |
dc -e "2k #2 decimal digits output
[q]sq
[0dddsasbscsd]si # initialize the four registers
[la+sa lb+sb lc+sc ld+sd z3<u]su # update the four registers
[ld4/n9an lc4/n9an
 lb4/n9an la4/n10an lix]sp # print results
[?z0=q lux lpx z0=?]s?
lix l?x
"
.75 1.25    3.00    3.25
4.00    .75 .75 3.00
2.25    1.00    1.50    1.75
2.00    1.50    2.00    1.00

Perl can be used as follows:

paste *.txt |
perl -lane '
  my @avgs;
  while (@F >= 4) {
    my @tmp = splice(@F, 0, 4);
    $avgs[$#tmp] += pop(@tmp) while @tmp;
  }
  print join "\t", map { sprintf "%.2f", $_/4.0 } @avgs;
' - 

We can use the GNU sed in cooperation with bc to get the output.

n='(\S+)'
paste ./*.txt |
sed -Ee "
  s/\s+/ /g;s/^ | \$//g
  s/\$/ /;s/ /\n/4;ta
  :a
    s/^$n $n $n $n\n$n $n $n $n (.*)/printf '%d %d %d %d\n%s' \$((\1 + \5)) \$((\2 + \6)) \$((\3 + \7)) \$((\4 + \8)) '\9'/e
    s/\n/&/
  ta
  s/.*/printf '%d\/4\n' &|bc -l|paste -s/e
  s/(\...)\S*/\1/g
  s/(^|\t)\./\10./g
" -
guest_7
  • 5,698
  • 1
  • 6
  • 13
1

Assuming I've interpreted it correctly, you could also do something like:

awk (POSIX):

awk -v n_col=4 '
NF != n_col { next }
FILENAME != file {
    file = FILENAME
    k = 0
}
{
    for (i = 1; i <= n_col; ++i)
        A[k++] += $i
}
END {
    n_files = ARGC - 1
    for (i = 0; i < k; ) {
        printf "%2.3f%s", A[i] / n_files,
            ++i % n_col == 0 ? "\n" : " "
    }
}
' data*.txt

perl:

I am sure this can be done better, but a stab at it.

./script.pl <COLUMNS> data*.txt
#!/usr/bin/env perl

use strict;
use warnings;

my @data;
my $cols = $ARGV[0];
my $ac = $#ARGV;
shift;
for (@ARGV) {
    my $k = 0;
    open my $fh, '<', $_
        or die "Cannot open '$_' - $!";
    local $/;
    my $fdata = <$fh>;
    close $fh;
    for ($fdata) {
        $data[$k++] += $_ for split;
    }
}

my $i = 0;
for (@data) {
    printf "%.3f%s", $_ / $ac, ++$i % $cols ? "\t" : "\n";
}


bash: (Slow)

As you tagged the question with bash as well I add a, for fun, sample of the same. Rather slow compared to perl, awk, ...

Note that while it is doable, it is not the best tool for the job.

Uses bashism in for of mapfile, read -a etc.

./script <COLUMNS> data*.txt

#!/bin/bash

declare -i res=1000
declare -i dec=$(( ${#res} - 1 ))
declare -i n_files
declare -i n_columns
declare -a A

process()
{
    local m a
    mapfile -t m< "$1"
    read -ra a<<< "${m[@]}"
    for (( i = 0; i < ${#a[*]}; ++i )); do
        (( A[i] += a[i] ))
    done
    (( ++n_files ))
}

n_columns=$1
shift
for f in "$@"; do
    process "$f"
done

for (( i = 0; i < ${#A[@]}; ++i )); do
    (( (i + 1) % n_columns == 0 )) && sep=$'\n' || sep=' '
    printf "%3.${dec}f%s" "$(( res * A[i] / n_files ))e-$dec" "$sep"
done

alternative method for printing "float":

    (( d = A[i] * res / n_files ))
    printf "%3d.%0${dec}d%s" "$(( d / res ))" "$(( d % res ))" "$sep"

Next sed ... nah, believe I only link this: Addition with 'sed' ;)

ibuprofen
  • 2,781
  • 1
  • 14
  • 33