Questions tagged [large-files]

89 questions
217
votes
10 answers

How to remove duplicate lines inside a text file?

A huge (up to 2 GiB) text file of mine contains about 100 exact duplicates of every line in it (useless in my case, as the file is a CSV-like data table). What I need is to remove all the repetitions while (preferably, but this can be sacrificed for…
Ivan
  • 17,368
  • 35
  • 93
  • 118
189
votes
8 answers

cat line X to line Y on a huge file

Say I have a huge text file (>2GB) and I just want to cat the lines X to Y (e.g. 57890000 to 57890010). From what I understand I can do this by piping head into tail or viceversa, i.e. head -A /path/to/file | tail -B or alternatively tail -C…
Amelio Vazquez-Reina
  • 40,169
  • 77
  • 197
  • 294
133
votes
14 answers

Replace string in a huge (70GB), one line, text file

I have a huge (70GB), one line, text file and I want to replace a string (token) in it. I want to replace the token , with another dummy token (glove issue). I tried sed: sed 's///g' < corpus.txt > corpus.txt.new but the output…
Christos Baziotis
  • 1,457
  • 3
  • 13
  • 10
64
votes
11 answers

Is there a way to modify a file in-place?

I have a fairly large file (35Gb), and I would like to filter this file in situ (i.e. I don't have enough disk space for another file), specifically I want to grep and ignore some patterns — is there a way to do this without using another…
Nim
  • 953
  • 1
  • 8
  • 12
49
votes
4 answers

Diffing two big text files

I have two big files (6GB each). They are unsorted, with linefeeds (\n) as separators. How can I diff them? It should take under 24h.
Jonas Lejon
  • 679
  • 1
  • 6
  • 10
36
votes
3 answers

Replace text quickly in very large file

I have 25GB text file that needs a string replaced on only a few lines. I can use sed successfully but it takes a really long time to run. sed -i 's|old text|new text|g' gigantic_file.sql Is there a quicker way to do this?
eisaacson
  • 461
  • 1
  • 4
  • 3
33
votes
3 answers

Transferring large (8 GB) files over ssh

I tried it with SCP, but it says "Negative file size". >scp matlab.iso xxx@xxx:/matlab.iso matlab.iso: Negative file size Also tried using SFTP, worked fine until 2 GB of the file had transferred, then stopped: sftp> put matlab.iso Uploading…
eimrek
  • 543
  • 1
  • 5
  • 9
21
votes
2 answers

largefile feature at creating file-system

Is useful to use -T largefile flag at creating a file-system for a partition with big files like video, and audio in flac format? I tested the same partition with that flag and without it, and using tune2fs -l [partition], I checked in "Filesystem…
Marc
  • 1,651
  • 3
  • 14
  • 16
18
votes
2 answers

Why are these files in an ext4 volume fragmented?

I have a 900GB ext4 partition on a (magnetic) hard drive that has no defects and no bad sectors. The partition is completely empty except for an empty lost+found directory. The partition was formatted using default parameters except that I set the…
EmmaV
  • 3,985
  • 4
  • 30
  • 61
18
votes
5 answers

How can I edit a large file in place?

I have a few files sized > 1 GB each. I need to remove last few bytes from the files. How can I do it? I prefer to edit file in place to save disk space. I am on HP-UX.
Hemant
  • 6,834
  • 5
  • 38
  • 42
17
votes
1 answer

Number of files per directory

I have a directory with about 100000 small files (each file is from 1-3 lines, each file is a text file). In size the directory isn't very big (< 2GB). This data lives in a professionally administered NFS server. The server runs Linux. I think the…
carlosdc
  • 275
  • 1
  • 2
  • 8
13
votes
5 answers

How to find duplicate lines in many large files?

I have ~30k files. Each file contains ~100k lines. A line contains no spaces. The lines within an individual file are sorted and duplicate free. My goal: I want to find all all duplicate lines across two or more files and also the names of the files…
11
votes
3 answers

Basic sed command on large one-line file: couldn't re-allocate memory

I have a 250 MB text file, all in one line. In this file I want to replace a characters with b characters: sed -e "s/a/b/g" < one-line-250-mb.txt It fails with: sed: couldn't re-allocate memory It seems to me that this kind of task could be…
Nicolas Raoul
  • 7,945
  • 14
  • 43
  • 55
10
votes
3 answers

Emacs: Open a buffer with all lines between lines X to Y from a huge file

In the same spirit as this other question: cat line X to line Y on a huge file: Is there a way to open from within Emacs (and show on a buffer) a given set of lines (e.g. all lines between line X and Y) from a huge text file? E.g. Open and show in…
Amelio Vazquez-Reina
  • 40,169
  • 77
  • 197
  • 294
9
votes
1 answer

Is there bdiff (1) in Linux?

There is bdiff(1) command in Solaris, which allow you to diff(1) files with size bigger than your RAM size (documentation). Is there something like that in Linux? I tried googling but I don't find which package has bdiff in Ubuntu.
AntonioK
  • 1,151
  • 2
  • 15
  • 28
1
2 3 4 5 6