How to get the average of multiple data in a row from multiple folders

Question

My data inside a folder data looks like data1.txt, data2.txt, … data120.txt.

Inside each .txt file I have four columns (1000 data lines in each column) example:

data1.txt

data2.txt

data3.txt

data120.txt

I want to get the average .txt which looks like below but divided by 4 because I used four data samples in this example.

1+0+1+1  \  2+1+0+2  \  3+3+3+3  \  4+4+4+1  \

4+4+4+4  \  0+2+0+1  \  1+1+0+1  \  3+3+3+3  \

3+3+0+3  \  1+1+1+1  \  1+3+1+1  \  2+2+2+1  \

2+2+2+2  \  2+3+0+1  \  2+2+2+2  \  1+1+1+1  \

I show my data this way just to make it clear - )

Formatted the text a little bit. Modify if it is wrong. (Assumed there was no blank lines between data rows in the file). Also; the output is a bit vague. Is it the way you have written it? Or do you want the average in addition, or only the average, or ? I Assume you want `0.75 1.25 3 3.25` in row 1 etc. Is this correct? — ibuprofen, May 29 '21 at 00:15
Not sure I get what you want here. You want to sum file1:`Col1.Row1`+file2:`Col1.Row1` ... +file120`Col1.Row1` ? 120 values. And file1:`Col2.Row1`+file2:`Col2.Row1` ... +file120`Col2.Row1` ... Then file1:`Col1.Row2`+file2:`Col1.Row2` ... +file120`Col1.Row2` ? — ibuprofen, May 29 '21 at 00:34
@ibuprofen thank you so much for your prompt response - and yes I want exactly how you understood - like 0.75 1.25, 3, 3.25 in row 1 and so forth.. Somehow I find difficulty in presenting the data in a clear way - Appreciate it. — saya, May 29 '21 at 04:26
If any of the answers here does what you want then see https://unix.stackexchange.com/help/someone-answers for what to do next. — Ed Morton, Jun 01 '21 at 13:04

Ed Morton · Answer 1 · 2021-05-29T12:19:34.200

$ paste data*.txt |
    awk -v numOutFlds=4 '{
        numFiles = NF / numOutFlds
        for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
            sum = 0
            for (fileNr=1; fileNr<=numFiles; fileNr++) {
                inFldNr = outFldNr + ((fileNr - 1) * numOutFlds)
                sum += $inFldNr
            }
            printf "%g%s", sum/numFiles, (outFldNr<numOutFlds ? OFS : ORS)
        }
    }' |
    column -t
0.75  1.25  3     3.25
4     0.75  0.75  3
2.25  1     1.5   1.75
2     1.5   2     1

guest_7 · Answer 2 · 2021-05-30T23:27:31.763

You can make do with the RPN desk calculator dc

paste ./*.txt |
dc -e "2k #2 decimal digits output
[q]sq
[0dddsasbscsd]si # initialize the four registers
[la+sa lb+sb lc+sc ld+sd z3<u]su # update the four registers
[ld4/n9an lc4/n9an
 lb4/n9an la4/n10an lix]sp # print results
[?z0=q lux lpx z0=?]s?
lix l?x
"
.75 1.25    3.00    3.25
4.00    .75 .75 3.00
2.25    1.00    1.50    1.75
2.00    1.50    2.00    1.00

Perl can be used as follows:

paste *.txt |
perl -lane '
  my @avgs;
  while (@F >= 4) {
    my @tmp = splice(@F, 0, 4);
    $avgs[$#tmp] += pop(@tmp) while @tmp;
  }
  print join "\t", map { sprintf "%.2f", $_/4.0 } @avgs;
' -

We can use the GNU sed in cooperation with bc to get the output.

n='(\S+)'
paste ./*.txt |
sed -Ee "
  s/\s+/ /g;s/^ | \$//g
  s/\$/ /;s/ /\n/4;ta
  :a
    s/^$n $n $n $n\n$n $n $n $n (.*)/printf '%d %d %d %d\n%s' \$((\1 + \5)) \$((\2 + \6)) \$((\3 + \7)) \$((\4 + \8)) '\9'/e
    s/\n/&/
  ta
  s/.*/printf '%d\/4\n' &|bc -l|paste -s/e
  s/(\...)\S*/\1/g
  s/(^|\t)\./\10./g
" -

ibuprofen · Answer 3 · 2021-05-30T16:40:35.487

Assuming I've interpreted it correctly, you could also do something like:

awk (POSIX):

awk -v n_col=4 '
NF != n_col { next }
FILENAME != file {
    file = FILENAME
    k = 0
}
{
    for (i = 1; i <= n_col; ++i)
        A[k++] += $i
}
END {
    n_files = ARGC - 1
    for (i = 0; i < k; ) {
        printf "%2.3f%s", A[i] / n_files,
            ++i % n_col == 0 ? "\n" : " "
    }
}
' data*.txt

perl:

I am sure this can be done better, but a stab at it.

./script.pl <COLUMNS> data*.txt

#!/usr/bin/env perl

use strict;
use warnings;

my @data;
my $cols = $ARGV[0];
my $ac = $#ARGV;
shift;
for (@ARGV) {
    my $k = 0;
    open my $fh, '<', $_
        or die "Cannot open '$_' - $!";
    local $/;
    my $fdata = <$fh>;
    close $fh;
    for ($fdata) {
        $data[$k++] += $_ for split;
    }
}

my $i = 0;
for (@data) {
    printf "%.3f%s", $_ / $ac, ++$i % $cols ? "\t" : "\n";
}

bash: (Slow)

As you tagged the question with bash as well I add a, for fun, sample of the same. Rather slow compared to perl, awk, ...

Note that while it is doable, it is not the best tool for the job.

Uses bashism in for of mapfile, read -a etc.

./script <COLUMNS> data*.txt

#!/bin/bash

declare -i res=1000
declare -i dec=$(( ${#res} - 1 ))
declare -i n_files
declare -i n_columns
declare -a A

process()
{
    local m a
    mapfile -t m< "$1"
    read -ra a<<< "${m[@]}"
    for (( i = 0; i < ${#a[*]}; ++i )); do
        (( A[i] += a[i] ))
    done
    (( ++n_files ))
}

n_columns=$1
shift
for f in "$@"; do
    process "$f"
done

for (( i = 0; i < ${#A[@]}; ++i )); do
    (( (i + 1) % n_columns == 0 )) && sep=$'\n' || sep=' '
    printf "%3.${dec}f%s" "$(( res * A[i] / n_files ))e-$dec" "$sep"
done

alternative method for printing "float":

    (( d = A[i] * res / n_files ))
    printf "%3d.%0${dec}d%s" "$(( d / res ))" "$(( d % res ))" "$sep"

Next sed ... nah, believe I only link this: Addition with 'sed' ;)

Really appreciate @ibuprofen and Ed Morton!! Thank you so much!! — saya, May 29 '21 at 19:00
@saya: NB! I somehow introduced a bug in the `awk` code while posting. Added the `NR < col_len` after copying over the code. Should obviously be `NF` not `NR`. Sorry for that. — ibuprofen, May 30 '21 at 10:46

How to get the average of multiple data in a row from multiple folders

data1.txt

data2.txt

data3.txt

data120.txt

3 Answers3

Linked