How to convert a .txt subtitle file to .srt format?

Question

I have a subtitle file, it looks like this:

00:00:44:" Myślę, więc jestem".|Kartezjusz, 1596-1650
00:01:01:Trzynaste Pietro
00:01:06:Podobno niewiedza uszczęśliwia.
00:01:10:Po raz pierwszy w życiu|zgadzam się z tym.
00:01:13:Wolałbym...
00:01:15:nigdy nie odkryć|tej straszliwej prawdy.
00:01:19:Teraz już wiem...

I'm not sure what format this is, but I wanted to convert the subtitles to .srt. Unfortunately gnome-subtitles and subtitleeditor can't recognize this kind of format.

gnome-subtitles says:

Unable to detect the subtitle format. Please check that the file type is supported.

subtitleeditor says:

Please check that the file contains subtitles in a supported format.

file output:

UTF-8 Unicode text

Is there a way to convert this file to .srt format?

you can find the srt format here http://en.wikipedia.org/wiki/SubRip, it should be obvious how to convert — Thorsten Staerk, Jan 26 '14 at 18:17

terdon · Accepted Answer · 2014-01-26T20:41:58.033

This is very similar to @goldilock's approach but, IMO, simpler and can deal with empty lines in the file and replaces | with a line break :

#!/usr/bin/env perl
my ($time, $text, $next_time, $next_text);
my ($c,$i)=0;
while (<>) {
    ## skip bad lines
    next unless /^\s*([:\d]+)\s*:(.+)/;
    ## If this is the first line. I could have used $. but this is
    ## safer in case the file contains an empty line at the beginning.
    if ($c == 0) {
      $time=$1; 
      $text=$2;
      $c++;
    }
    else {
      ## This is the counter for the subtitle index
      $i++;
      ## Save the current values
      $next_time=$1; 
      $next_text=$2;     
      ## I am assuming that the | should be interpreted
      ## as a newline, remove this if I'm wrong.
      $text=~s/\|/\n/g;     
      ## Print the previous subttitle
      print "$i\n$time,100 --> $next_time,000\n$text\n\n";        
      ## Save the current one for the next line
      $time=$next_time; $text=$next_text;
    }
}     
## Print the last subtitle. It will be dislayed for a minute
## 'cause I'm lazy.
$i++;
$time=~/(\d+:)(\d+)(:\d+)/;
my $newtime=$1 . (sprintf "%02d", $2+1) . $3;
print "$i\n$time,100 --> $newtime,000\n$text\n\n";

Save the script as a file and make it executable, then run:

./script.pl subfile > good_subs.srt

The output I get on your sample was:

1
00:00:44,100 --> 00:01:01,000
" Myślę, więc jestem".
Kartezjusz, 1596-1650

2
00:01:01,100 --> 00:01:06,000
Trzynaste Pietro

3
00:01:06,100 --> 00:01:10,000
Podobno niewiedza uszczęśliwia.

4
00:01:10,100 --> 00:01:13,000
Po raz pierwszy w życiu
zgadzam się z tym.

5
00:01:13,100 --> 00:01:15,000
Wolałbym...

6
00:01:15,100 --> 00:01:19,000
nigdy nie odkryć
tej straszliwej prawdy.

7
00:01:19,100 --> 00:02:19,000
Teraz już wiem...

The last subtitle in your output ends 0.1 seconds before it starts! — goldilocks, Jan 26 '14 at 20:03
This works pretty well. I just need to customize some entries because they're displayed for a little bit too long. Maybe there's a way to put that in the script, let's say 5-8secs max. If you want to experiment more with the subtitles, I uploaded it to pasebin : http://pastebin.com/vZP419eG — Mikhail Morfikov, Jan 26 '14 at 20:08
@goldilocks damn, sorry, forgot `use Time::Machine` :). Thanks, fixed. — terdon, Jan 26 '14 at 20:36
@MikhailMorfikov it's possible but increases the complexity because that means that we need to manipulate times, so that `1:59 + 20 = 2:19`. This means either complex code or using external modules and seemed beyond the scope of the question. — terdon, Jan 26 '14 at 20:38
+1 Nice job. For the time you could use an algorithm based on the length of the text string, say 1/2 second per character but not exceeding the start of the next title. — goldilocks, Jan 27 '14 at 09:55

goldilocks · Answer 2 · 2014-01-26T20:57:52.027

What Thorsten meant is something like this:

#!/usr/bin/perl
use strict;
use warnings FATAL => qw(all);

my $END = '!!ZZ_END';
my $LastTitleDuration = 5;

my $count = 1;
my $line = <STDIN>;
chomp $line;
my $next = <STDIN>;
while ($line) {
    $next = lastSubtitle($line) if !$next;
    last if !$next;
    chomp $next;
    if (!($next =~ m/^\d\d:\d\d:\d\d:.+/)) { 
        print STDERR 'Skipping bad data at line '.($count+1).":\n$line\n";
        $next = <STDIN>;
        next;
    }
    printf STDOUT
        "%d\r\n%s,100 --> %s,000\r\n%s\r\n\r\n",
        $count++,
        substr($line, 0, 8),
        substr($next, 0, 8),
        substr($line, 9)
    ;
} continue {
    $line = $next;
    $next = <STDIN>;
}

sub lastSubtitle {
    my $line = shift;
    $line =~ /^(\d\d:\d\d:)(\d\d):(.+)/;
    return 0 if $3 eq $END;
    return sprintf("$1%2d:$END", $2 + $LastTitleDuration);
}

When I feed your sample data into this, I get:

1
00:00:44,100 --> 00:01:01,000
" Myślę, więc jestem".|Kartezjusz, 1596-1650

2
00:01:01,100 --> 00:01:06,000
Trzynaste Pietro

3
00:01:06,100 --> 00:01:10,000
Podobno niewiedza uszczęśliwia.

4
00:01:10,100 --> 00:01:13,000
Po raz pierwszy w życiu|zgadzam się z tym.

5
00:01:13,100 --> 00:01:15,000
Wolałbym...

6
00:01:15,100 --> 00:01:19,000
nigdy nie odkryć|tej straszliwej prawdy.

7
00:01:19,100 --> 00:01:24,000
Teraz już wiem...

Couple of points:

The subtitles actually start 1/10th second late so they do not overlap, and because I was too lazy to add in some math involving the second timestamp. They then stay on until 1/10th second before the next title.
The last title stays up for $LastTitleDuration (5 seconds).
I used CRLF line endings as per the SupRip wikipedia article although that may not be necessary.
It presumes the first line of input is not malformed. Beyond that, they are checked, and errors are reported to stdout, so:
```
readAlongToSRT.pl < readAlong.txt > whatever.srt
```
Should create the file but still print errors to the screen.
Processing will stop at a blank line.
See terdon's comment below re: the possible significance of | in the subtitle content. You may want to insert $line =~ s/|/\r\n/g; before the printf STDOUT line.

This took me 20 minutes and the only test data I had was those 7 lines, so don't count on it being perfect. If there are ever line breaks in the subtitles, that will cause a problem. I presumed there aren't; if that is the case I suggest you remove them from the input first rather than trying to deal with them here.

Damn, beat me to it and using the same approach! Nice one, +1. I _think_ that the `|` in the original format should be changed to `\n` but that's just a guess. — terdon, Jan 26 '14 at 19:08

How to convert a .txt subtitle file to .srt format?

2 Answers2