11

I have a 250 MB text file, all in one line.

In this file I want to replace a characters with b characters:

sed -e "s/a/b/g" < one-line-250-mb.txt

It fails with:

sed: couldn't re-allocate memory

It seems to me that this kind of task could be performed inline without allocating much memory.
Is there a better tool for the job, or a better way to use sed?


GNU sed version 4.2.1
Ubuntu 12.04.2 LTS
1 GB RAM

Nicolas Raoul
  • 7,945
  • 14
  • 43
  • 55
  • possible duplicate of [Out of memory while using sed with multiline expressions on giant file](http://unix.stackexchange.com/questions/63354/out-of-memory-while-using-sed-with-multiline-expressions-on-giant-file) – Ruban Savvy Dec 19 '13 at 03:36
  • 4
    That question is about a very complex multiline expression. My question is about the most basic expression you could imagine. – Nicolas Raoul Dec 19 '13 at 03:41
  • @RubanSavvy plus, neither of the answers on the other Q take into account the long line and in fact, both would probably have the same issue. – terdon Dec 19 '13 at 03:44
  • Can you include your sed version in this Q and also your hardware info (RAM specifically) and distro version? – slm Dec 19 '13 at 12:25
  • A partial `ltrace` would be interesting. – U. Windl Feb 03 '22 at 23:21

3 Answers3

11

Yes, use tr instead:

tr 'a' 'b' < file.txt > output.txt

sed deals in lines so a huge line will cause it problems. I expect it is declaring a variable internally to hold the line and your input exceeds the maximum size allocated to that variable.

tr on the other hand deals with characters and should be able to handle arbitrarily long lines correctly.

terdon
  • 234,489
  • 66
  • 447
  • 667
  • Curiously I just created a 250MB file filled w/ "abcabc..." and was able to do `sed -e "s/a/z/g" b.txt > c.txt` without any issues. Using sed (GNU sed) 4.2.2. – slm Dec 19 '13 at 04:24
  • @slm same here on a 496M file and same `sed` version, guess it depends on implementation or hardware. – terdon Dec 19 '13 at 11:12
  • Yeah if I had to gander a guess we're dealing with an older version of `sed`. – slm Dec 19 '13 at 12:24
8

Historical versions of sed and awk had memory problems, these have mostly been fixed in more recent versions, but one of the classic occurrences of this problem hit Larry Wall pretty hard. his answer was to write a new programming language - with no memory limits other than hardware. He called it perl. your specific problem can be solved more simply, but the general rule of thumb I use is when sed won't use perl.

Edit: by request an example:

perl -pe "s/a/b/g" < one-line-250-mb.txt

or for less memory usage:

perl -e 'BEGIN{$/=\32768}' -pe "s/a/b/g" < one-line-250-mb.txt
hildred
  • 5,759
  • 3
  • 30
  • 43
  • 1
    This whole paragraph boils down to "Perl.". Some details would be nice, or at least an example or something – Michael Mrozek Dec 19 '13 at 23:11
  • @MichaelMrozek I realize that hat collection does tend lead to roboediting, but I figured with your reputation you would pay a little closer attention. Specifically in that the specific problem had already been solved, in a very narrow way, that would not help the majority of people searching, so I added an answer for the general case. the expanded answer I provided would have helped Nicolas Raoul If there hadn't already been a workable solution, but I doubt It would help very many others, whereas my original answer would help everyone who reached the limits of sed. If you disagree I'll delete – hildred Dec 19 '13 at 23:50
  • @hildred I don't think it's too much to ask that you could assume good faith of the moderators when they are making valid comments on your answer, without resorting immediately to accusations of ulterior motives (hats, really?!). – Chris Down Dec 20 '13 at 03:13
  • @ChrisDown On the contrary -- I'm in it entirely for the hats. Also this was flagged as not an answer by multiple people, but that's a distant second priority to the hats – Michael Mrozek Dec 20 '13 at 03:16
  • 1
    The second one with the memory limitation did the trick (for my 2.5GB 1-line file): thanks! Bit disappointed by `sed`, though. :\ – Tomislav Nakic-Alfirevic Jul 10 '19 at 14:14
  • @hildred Where can I learn more about the `perl` command that uses less memory? The number 32,768- is that bytes? Is it specifying how much memory is being allotted to `perl`? – Harold Fischer Mar 23 '20 at 23:57
  • @HaroldFischer, I think that was in the man page. What it does is a fixed size block read, so that instead of loading all 250mb into ram then doing the substitution it does multiple 32k reads with a substitution after each. the main drawback of this approach is that matches across blocks don't happen, although not a problem for single character matches. – hildred Mar 24 '20 at 03:03
1

It is not a "proper way", but in some scenarios we can split the file, replace and join again. Example:

split -b 50M -d big_file big_file_part
sed -i 's/a/b/g' big_file_part*
cat big_file_part* >file

I successfully made replace in ~100 GB file.

But we need extra space on disk (to make a file copy).