How to convert this timestamp format to another format in Perl?

Question

I am trying to design a Perl/... approach which converts my timestamp format (ddMMyyyy-HHmm+0300) into the timestamp/time/... format (yyyy-MM-dd'T'HH:mm:00) used by WEKA data analysis system. I am initially making the WEKA data file from paste command and the removal of the first column with AWK. There should not be any limitations which would make the problem harder than it is actually, but possibly the quotes in the first variable. I think the approach (3) can be most feasible i.e. use directly POSIX::strftime function (Deathgrip)

Hard problem in Section 1
Easier approach without quotes in the data in Section 2
POSIX::strftime approach and similar thread Perl strptime format differs from strftime

Example of the input

23072017-2200+0300

Expected output
```
2017-07-23'T'22:00:00
```

Full example of CSV line without quotes but with underscores so can be harder

 Ni, Aika, Aika_l, Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l
 "Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
 "Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010

Expected output

 Ni, Aika, Aika_l, Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l
 "Masi", 2010-07-23'T‌'22:00:00, 2010-07-24'T‌'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
 "Masi", 2010-07-23'T‌'22:00:00, 2010-07-24'T‌'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010

1. Attempt script which you can call by `script.pl filename`

I think the use of parser Text::CSV is too complicated because my data set is simpler than the use case. So I think a simple regex approach can be possible

#!/usr/bin/env perl
# https://stackoverflow.com/a/33995620/54964

## Data prepared like this for the script
# paste -d" " log.csv data.csv | awk '{$1=""; print $0}' > weka.data.csv
# cp $HOME/Data/weka.data.csv $HOME/Workspace/
#
# Maybe, this all could be integrated into Perl script

use strict;
use warnings;

use Text::CSV;

my $csv = Text::CSV->new( { binary => 1, eol => "\n" } );

while ( my $row = $csv->getline( \*ARGV ) ) {
    s/\n/ /g for @$row;
    $csv->print( \*STDOUT, $row );

    # TODO regex
    #convert ddMMyyyy-HHmm+0300 to yyyy-MM-dd'T'HH:mm:00    
}

2. Perl Regex approach

Pseudocode where the approach cannot work because there are no variable replacements like carrying dd to the result

# TODO s/ddMMyyyy-HHmm+0300/$3-$2-$1'T'$4:$5:00/;
perl -pe s/([0-3][0-9])(([0-1][0-9]))(20[0-9]{2})([0-2][0-9])([0-5][0-9])+0300/$3-$2-$1'T'$4:$5:00/;

where

dd by ([0-3][0-9]) / $3
similarly for MM by ([0-1][0-9]) / $2
yyyy similarly like (20[0-9]{2}) / $1
- literally
HH 24H time by ([0-5][0-9]) / $4
mm by ([0-5][0-9])) / $5
+0300 / remove simply

It would be great to have the regex in some more readable format.

Testing Sundeep's proposal in comment

Code

#!/bin/bash
# https://stackoverflow.com/a/33995620/54964

s='"Masi", 23072010-2200+0300, 24072010-0600+0300 70, 7h40'

echo "$s" | perl -pe 's/\b(\d\d)(\d\d)(\d{4})-(\d\d)(\d\d)\+\d{4}\b/$3-$2-$1\x27T<200c><200b>\x27$4:$5:00/g' y $csv = Text::CSV->new( { binary => 1, eol => "\n" } );

Output is as expected for one line

"Masi", 2010-07-23'T‌'22:00:00, 2010-07-24'T‌'06:00:00, 70, 7h40

Applying on the complete line by just replacing the variable s content, output as expected

"Masi", 2010-07-23'T‌'22:00:00, 2010-07-24'T‌'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010

TODO complete approach with multiline approach with capability to skip the header

Testing Deathgrip's motivated proposal

Code

#!/usr/bin/env perl
# https://stackoverflow.com/a/33995620/54964

use strict;
use warnings;
# https://stackoverflow.com/a/20007784/54964
# http://perldoc.perl.org/POSIX.html
use Time::Piece;
use POSIX;

# TODO breaks because of false brackets
#my $input = '"Masi", 2010-07-23'T<200c><200b>'22:00:00, 2010-07-24'T<200c><200b>'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010'

my $str = '23072017-2200+0300';
my $f = '%d%m%dY-%H%M+0300';
#my $t = POSIX::strftime($str, $f); # fails!
my $t = strftime($str, $f); # fails!

print "$t\n";

Output

Usage: POSIX::strftime(fmt, sec, min, hour, mday, mon, year, wday = -1, yday = -1, isdst = -1) at prepare.data3.pl line 22.

OS: Debian 9

assuming sample string, `s='"Masi", 23072010-2200+0300, 24072010-0600+0300 70, 7h40'` ... is this what you are looking for? `echo "$s" | perl -pe 's/\b(\d\d)(\d\d)(\d{4})-(\d\d)(\d\d)\+\d{4}\b/$3-$2-$1\x27T\x27$4:$5:00/g'` ... suggest to show 2-3 lines each from log.csv data.csv and post complete expected output for that... — Sundeep, Jul 24 '17 at 15:57
Have you considered using `Time::Local` (http://perldoc.perl.org/Time/Local.html) and the `strftime` function in the `POSIX` module (http://perldoc.perl.org/POSIX.html) to do the conversion? — Deathgrip, Jul 24 '17 at 16:06
@LéoLéopoldHertz준영 I'd rather prefer to understand complete question and answer it... the one in comment is only date fmt conversion... hence why I suggest you to add few lines from both input files and add expected output for that — Sundeep, Jul 24 '17 at 16:44

score 2 · Accepted Answer · answered Jul 25 '17 at 15:25

$ perl -pe 's/\b(\d\d)(\d\d)(\d{4})-(\d\d)(\d\d)\+\d{4}\b/$3-$2-$1\x27T\x27$4:$5:00/g' ip.csv
 Ni, Aika, Aika_l Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l
 "Masi", 2010-07-23'T'22:00:00, 2010-07-24'T'06:00:00 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
 "Masi", 2010-07-23'T'22:00:00, 2010-07-24'T'06:00:00 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010

\b is word boundary
(\d\d) captures two consecutive digits, (\d{4}) captures four of them and so on
\x27 is for single quotes. If there can be unrelated digits after this, perhaps better to use octal representation \047
as the search and replacement is only for specific ddMMyyyy-HHmm+0300 format, it won't affect the header. Still if needed, just add if $.>1 after the substitute command

Probably the paste+awk commands used to create the input can be incorporated easily to this command, but would need that info added to question

I corrected the typo `Aika_l Un` to `Aika_l, Un`. I hope it does not affect your proposal. What do you think? Maybe, one more field in your approach. — Léo Léopold Hertz 준영, Jul 26 '17 at 06:42
nope it won't affect... just another data which won't match the regex — Sundeep, Jul 26 '17 at 06:50
Actually, not because the change is only in the header so your code works still, although I fixed the typo in the header. — Léo Léopold Hertz 준영, Aug 09 '17 at 14:47

score 1 · Answer 2 · edited Aug 09 '17 at 16:02

Here is what I would have done:

#!/usr/bin/env perl
# https://stackoverflow.com/a/33995620/54964

use strict;
use warnings;
# https://stackoverflow.com/a/20007784/54964
# http://perldoc.perl.org/POSIX.html
use POSIX qw(strftime);
use DateTime;
use DateTime::Format::Strptime qw(strptime);

my $str = '23072017-2200+0300';
my $dtime = strptime( '%d%m%Y-%H%M%z', $str );
my $f = '%Y-%m-%d\'T\'%H:%M:%S';
my $t = strftime( $f, 0, $dtime->minute, $dtime->hour, $dtime->day, $dtime->month-1, $dtime->year-1900, -1, -1, $dtime->time_zone );

print "$t\n";

Output as expected on the time field

2017-07-23'T'22:00:00

How to convert this timestamp format to another format in Perl?

1. Attempt script which you can call by script.pl filename

2. Perl Regex approach

Testing Sundeep's proposal in comment

Testing Deathgrip's motivated proposal

2 Answers2

1. Attempt script which you can call by `script.pl filename`