4

I have some tables (table.txt) as follow:

YEAR MONTH DAY RES
1971 1     1   1345
1971 1     2   1265
1971 1     3   1167

The length of each time series goes from 1.1.1971 until 31.12.2099. Unfortunately, some time series are missing leap years and their values (e.g. year 1972 is a leap year so the month of February should have 29 days, but my time series just have 28 days in February 1972). For examples in my current tables the end of the February month in 1972 is presented as follow:

YEAR MONTH DAY RES
1972 2     27  100
1972 2     28  101
1972 3     1   102

This is wrong, cause it´s not accounting any leap year. Instead of that I would like to include in my time series each missing days (obviously the 29th of February) of every leap years in my time series, by extrapolating the value with the previous and next day, as follow:

YEAR MONTH DAY RES
1972 2     27  100
1972 2     28  101
1972 2     29  101.5
1972 3     1   102

Is there a way to do that using shell/bash?

steve
  • 548
  • 1
  • 5
  • 17
  • 1
    This sounds like the kind of thing that might be better done in a real language, e.g. Perl or Python. Both of those have date-handling modules, and they're much nicer for dealing with complex logic in. – Tom Hunt Oct 20 '15 at 16:28
  • @ Stéphane Chazelas - No in fact i am more searching to have a dot as a separator (i update my question, SORRY!) – steve Oct 20 '15 at 16:51
  • @steve if you add a space between the `@` and the username, it doesn't work. You need `@user`, not `@ user`. – terdon Oct 21 '15 at 12:47

3 Answers3

4

Maybe something like:

awk '
  function isleap(y) {
    return y % 4 == 0 && (y % 100 != 0 || y % 400 == 0)
  }
  $2 == 3 && $3 == 1 && isleap($1) && last_day != 29 {
    print $1, 2, 29, (last_data + $4) / 2
  }
  {print; last_day = $3; last_data = $4}' file
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
2

I was just thinking about this and, because of the way leap years alternate every other even numbered year, the following is true:

([13579][26]|[02468][048]) == leap year

Basically, leap years occur on years 2 and 6 for odd-numbered decades, but on years 4 and 8 for even-numbered decades, and on the turn of every other decade.

And so you can do:

sed -e'  /[02468] * 2 * 28 /!b'\
    -e'h;/[13579][26] * 2 / G' \
    -e'  /[02468][048] * 2 /G' \
    -e'  /\n/s/ 28 / 29 /2'    \
    -eP\;D <in >out

...which would find, double, then modify all of the Feb 28 lines in input only for leap years regardless of the start point for any alternation loop.


This was my first instinct:

sed -e'/\([02648] * 2 * 2\)8 /!b' \
    -e:n -e'n;//!bn' -e'p;s//\19 /' <in

...which was just a slight adaptation to my answer to your other question, but which will only work for each series in which the first even year encountered is not a leap year because it works by alternation.

I tested both of the seds against my test file from your other question. The infile already had leap years, of course, and the code I used to generate it is in the answer there as well, but both worked for a series beginning with 1970, though the first wouldn't break anyway:


1970  2   27  58
1970  2   28  59
1970  3   1   60
1972  2   27  58
1972  2   28  59
1972  2   29  59
1972  2   29  60
1972  3   1   61
1974  2   27  58
1974  2   28  59
1974  3   1   60
1976  2   27  58
1976  2   28  59
1976  2   29  59
1976  2   29  60
1976  3   1   61
1978  2   27  58
1978  2   28  59
1978  3   1   60
1980  2   27  58
1980  2   28  59
1980  2   29  59
1980  2   29  60
1980  3   1   61
mikeserv
  • 57,448
  • 9
  • 113
  • 229
1

Perl solution:

#!/usr/bin/perl
use warnings;
use strict;

use Time::Piece;

print scalar <>; # Skip the header.

while (<>) {
    my ($year, $month, $day, $res) = split;
    my $t = 'Time::Piece'->strptime("$year $month $day", '%Y %m %d');
    if ($t->is_leap_year && 2 == $month && 28 == $day) {
        print;
        $_ = <>;
        my ($year2, $month2, $day2, $res2) = split;
        die "Expected March the 1st: $_"
            unless $year == $year2 && 3 == $month2 && 1 == $day2;
        print join("\t", $year, 2, 29, ($res + $res2) / 2), "\n";
    }
    print;
}

Save as fix_feb29.pl. Then run

for file in *.txt ; do
    fix_feb29.pl -i~ "$file"
done
choroba
  • 45,735
  • 7
  • 84
  • 110