I am trying to design a Perl/... approach which converts my timestamp format (ddMMyyyy-HHmm+0300) into the timestamp/time/... format (yyyy-MM-dd'T'HH:mm:00) used by WEKA data analysis system.
I am initially making the WEKA data file from paste command and the removal of the first column with AWK.
There should not be any limitations which would make the problem harder than it is actually, but possibly the quotes in the first variable.
I think the approach (3) can be most feasible i.e. use directly POSIX::strftime function (Deathgrip)
- Hard problem in Section 1
- Easier approach without quotes in the data in Section 2
POSIX::strftimeapproach and similar thread Perl strptime format differs from strftime
Example of the input
23072017-2200+0300
Expected output
2017-07-23'T'22:00:00
Full example of CSV line without quotes but with underscores so can be harder
Ni, Aika, Aika_l, Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l
"Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
"Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
Expected output
Ni, Aika, Aika_l, Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l
"Masi", 2010-07-23'T'22:00:00, 2010-07-24'T'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
"Masi", 2010-07-23'T'22:00:00, 2010-07-24'T'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
1. Attempt script which you can call by script.pl filename
I think the use of parser Text::CSV is too complicated because my data set is simpler than the use case.
So I think a simple regex approach can be possible
#!/usr/bin/env perl
# https://stackoverflow.com/a/33995620/54964
## Data prepared like this for the script
# paste -d" " log.csv data.csv | awk '{$1=""; print $0}' > weka.data.csv
# cp $HOME/Data/weka.data.csv $HOME/Workspace/
#
# Maybe, this all could be integrated into Perl script
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new( { binary => 1, eol => "\n" } );
while ( my $row = $csv->getline( \*ARGV ) ) {
s/\n/ /g for @$row;
$csv->print( \*STDOUT, $row );
# TODO regex
#convert ddMMyyyy-HHmm+0300 to yyyy-MM-dd'T'HH:mm:00
}
2. Perl Regex approach
Pseudocode where the approach cannot work because there are no variable replacements like carrying dd to the result
# TODO s/ddMMyyyy-HHmm+0300/$3-$2-$1'T'$4:$5:00/;
perl -pe s/([0-3][0-9])(([0-1][0-9]))(20[0-9]{2})([0-2][0-9])([0-5][0-9])+0300/$3-$2-$1'T'$4:$5:00/;
where
ddby([0-3][0-9])/$3- similarly for
MMby([0-1][0-9])/$2 yyyysimilarly like(20[0-9]{2})/$1-literallyHH24H time by([0-5][0-9])/$4mmby([0-5][0-9])) /$5+0300/ remove simply
It would be great to have the regex in some more readable format.
Testing Sundeep's proposal in comment
Code
#!/bin/bash
# https://stackoverflow.com/a/33995620/54964
s='"Masi", 23072010-2200+0300, 24072010-0600+0300 70, 7h40'
echo "$s" | perl -pe 's/\b(\d\d)(\d\d)(\d{4})-(\d\d)(\d\d)\+\d{4}\b/$3-$2-$1\x27T<200c><200b>\x27$4:$5:00/g' y $csv = Text::CSV->new( { binary => 1, eol => "\n" } );
Output is as expected for one line
"Masi", 2010-07-23'T'22:00:00, 2010-07-24'T'06:00:00, 70, 7h40
Applying on the complete line by just replacing the variable s content, output as expected
"Masi", 2010-07-23'T'22:00:00, 2010-07-24'T'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
TODO complete approach with multiline approach with capability to skip the header
Testing Deathgrip's motivated proposal
Code
#!/usr/bin/env perl
# https://stackoverflow.com/a/33995620/54964
use strict;
use warnings;
# https://stackoverflow.com/a/20007784/54964
# http://perldoc.perl.org/POSIX.html
use Time::Piece;
use POSIX;
# TODO breaks because of false brackets
#my $input = '"Masi", 2010-07-23'T<200c><200b>'22:00:00, 2010-07-24'T<200c><200b>'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010'
my $str = '23072017-2200+0300';
my $f = '%d%m%dY-%H%M+0300';
#my $t = POSIX::strftime($str, $f); # fails!
my $t = strftime($str, $f); # fails!
print "$t\n";
Output
Usage: POSIX::strftime(fmt, sec, min, hour, mday, mon, year, wday = -1, yday = -1, isdst = -1) at prepare.data3.pl line 22.
OS: Debian 9