Questions tagged [large-files]
89 questions
217
votes
10 answers
How to remove duplicate lines inside a text file?
A huge (up to 2 GiB) text file of mine contains about 100 exact duplicates of every line in it (useless in my case, as the file is a CSV-like data table).
What I need is to remove all the repetitions while (preferably, but this can be sacrificed for…
Ivan
- 17,368
- 35
- 93
- 118
189
votes
8 answers
cat line X to line Y on a huge file
Say I have a huge text file (>2GB) and I just want to cat the lines X to Y (e.g. 57890000 to 57890010).
From what I understand I can do this by piping head into tail or viceversa, i.e.
head -A /path/to/file | tail -B
or alternatively
tail -C…
Amelio Vazquez-Reina
- 40,169
- 77
- 197
- 294
133
votes
14 answers
Replace string in a huge (70GB), one line, text file
I have a huge (70GB), one line, text file and I want to replace a string (token) in it.
I want to replace the token , with another dummy token (glove issue).
I tried sed:
sed 's///g' < corpus.txt > corpus.txt.new
but the output…
Christos Baziotis
- 1,457
- 3
- 13
- 10
64
votes
11 answers
Is there a way to modify a file in-place?
I have a fairly large file (35Gb), and I would like to filter this file in situ (i.e. I don't have enough disk space for another file), specifically I want to grep and ignore some patterns — is there a way to do this without using another…
Nim
- 953
- 1
- 8
- 12
49
votes
4 answers
Diffing two big text files
I have two big files (6GB each). They are unsorted, with linefeeds (\n) as separators. How can I diff them? It should take under 24h.
Jonas Lejon
- 679
- 1
- 6
- 10
36
votes
3 answers
Replace text quickly in very large file
I have 25GB text file that needs a string replaced on only a few lines. I can use sed successfully but it takes a really long time to run.
sed -i 's|old text|new text|g' gigantic_file.sql
Is there a quicker way to do this?
eisaacson
- 461
- 1
- 4
- 3
33
votes
3 answers
Transferring large (8 GB) files over ssh
I tried it with SCP, but it says "Negative file size".
>scp matlab.iso xxx@xxx:/matlab.iso
matlab.iso: Negative file size
Also tried using SFTP, worked fine until 2 GB of the file had transferred, then stopped:
sftp> put matlab.iso
Uploading…
eimrek
- 543
- 1
- 5
- 9
21
votes
2 answers
largefile feature at creating file-system
Is useful to use -T largefile flag at creating a file-system for a partition with big files like video, and audio in flac format?
I tested the same partition with that flag and without it, and using tune2fs -l [partition], I checked in "Filesystem…
Marc
- 1,651
- 3
- 14
- 16
18
votes
2 answers
Why are these files in an ext4 volume fragmented?
I have a 900GB ext4 partition on a (magnetic) hard drive that has no defects and no bad sectors. The partition is completely empty except for an empty lost+found directory. The partition was formatted using default parameters except that I set the…
EmmaV
- 3,985
- 4
- 30
- 61
18
votes
5 answers
How can I edit a large file in place?
I have a few files sized > 1 GB each. I need to remove last few bytes from the files. How can I do it? I prefer to edit file in place to save disk space.
I am on HP-UX.
Hemant
- 6,834
- 5
- 38
- 42
17
votes
1 answer
Number of files per directory
I have a directory with about 100000 small files (each file is from 1-3 lines, each file is a text file). In size the directory isn't very big (< 2GB). This data lives in a professionally administered NFS server. The server runs Linux. I think the…
carlosdc
- 275
- 1
- 2
- 8
13
votes
5 answers
How to find duplicate lines in many large files?
I have ~30k files. Each file contains ~100k lines. A line contains no spaces. The lines within an individual file are sorted and duplicate free.
My goal: I want to find all all duplicate lines across two or more files and also the names of the files…
Lars Schneider
- 242
- 1
- 2
- 9
11
votes
3 answers
Basic sed command on large one-line file: couldn't re-allocate memory
I have a 250 MB text file, all in one line.
In this file I want to replace a characters with b characters:
sed -e "s/a/b/g" < one-line-250-mb.txt
It fails with:
sed: couldn't re-allocate memory
It seems to me that this kind of task could be…
Nicolas Raoul
- 7,945
- 14
- 43
- 55
10
votes
3 answers
Emacs: Open a buffer with all lines between lines X to Y from a huge file
In the same spirit as this other question: cat line X to line Y on a huge file:
Is there a way to open from within Emacs (and
show on a buffer) a given set of lines (e.g. all lines between line X and Y) from a huge text file?
E.g. Open and show in…
Amelio Vazquez-Reina
- 40,169
- 77
- 197
- 294
9
votes
1 answer
Is there bdiff (1) in Linux?
There is bdiff(1) command in Solaris, which allow you to diff(1) files with size bigger than your RAM size (documentation).
Is there something like that in Linux? I tried googling but I don't find which package has bdiff in Ubuntu.
AntonioK
- 1,151
- 2
- 15
- 28