4

So as to vizualize, and hopefully edit, a very large text file, more than 10GBytes (a backup dump of whole database), I have tried using vim... it didn't behave very well.

I have also tried cat to at least visualize, and cat didn't not behave properly neither.

Are there tools for huge file editing ? Something that would view/edit by a limited group of lines without trying to load the rest, unless told to load a new group (really limited) of lines, and maybe the possibility to jump certain places thanks to a search utility.

Stephane Rolland
  • 4,147
  • 6
  • 37
  • 49
  • 2
    The problem is, if you edit, and that edit changes the length of the file, it needs to be rewritten entirely. And that's never a graceful thing to do with such huge files. – frostschutz Mar 24 '13 at 01:44
  • Also, the best tool to edit a database is the database itself. – frostschutz Mar 24 '13 at 01:46
  • 1
    It doesn't help for the editing part, but to visualise `less` might be an option. Also see http://unix.stackexchange.com/a/66298/12779 – Marco Mar 24 '13 at 01:54
  • @Marco, `less` is a terrible idea, it will load the whole file into memory to be able to move around. – vonbrand Mar 24 '13 at 02:04
  • 2
    @vonbrand No. Less opens the file instantly and without loading it into RAM. – Marco Mar 24 '13 at 02:26
  • @Marco I have just tried less, it's perfect for vizualization, instantly ! – Stephane Rolland Mar 24 '13 at 02:30
  • Related: [How can I edit a large file in place?](http://unix.stackexchange.com/q/1279/12779) – Marco Mar 24 '13 at 03:46
  • You might be able to do the first part (up to the editing) more quickly with `hadoop` by dividing the work up, but you might have to roll your own software. – Emre Mar 19 '14 at 20:45

2 Answers2

4

It doesn't help for the editing part, but to visualise less might be an option. The advantage is that less can read large files quickly because it does not require the file to fit into the RAM. This makes it a much better choice than vim, for instance.

Marco
  • 33,188
  • 10
  • 112
  • 146
3

Tools like sed(1) were designed for this kind of task. If you need more control over the operations done line by line, perhaps something like Perl or Python is a better match to the job.

In any case, rummaging inside a 10GiB file will take a long time. Isn't it easier just to slurp it into the database manager, and massage it there?

vonbrand
  • 18,156
  • 2
  • 37
  • 59
  • well I admit, editing seems really a bad idea regarding to all comments. However I do need to visualize. I like your idea of Python, if no ready-made utility does the trick. – Stephane Rolland Mar 24 '13 at 02:02
  • I'm looking at sed syntax, but in my howto, they don't mention your sed(1). What do you mean by (1) ? My command line doesn't accept it. – Stephane Rolland Mar 24 '13 at 02:09
  • @StephaneRolland, Unix notation, it is `sed` in secion 1 (user commands) of the manuals. – vonbrand Mar 24 '13 at 02:10
  • Mmh... my first tryings right now with `sed` have not been conclusive: I'm gonna go python. I'll feel much more at ease. – Stephane Rolland Mar 24 '13 at 02:13
  • Well my first try with python has not been convincing too :-) with readlines() it made my system on the brink of collapse :-) – Stephane Rolland Mar 24 '13 at 02:28
  • 2
    @StephaneRolland, the whole point of sed, perl or python is to be able to work with a few lines at a time, _not_ the complete file. You'd need some 32GiB RAM just to load that... – vonbrand Mar 24 '13 at 02:30