Skip to main content

All Questions

0 votes
4 answers
275 views

Remove subdomains or existing domains

I have a list of domains, a sample is: account.google.com drive.google.com google.com bgoogle.com yahoo.co.uk stats.wikipedia.org media.wikipedia.org files.media.wikipedia.org bible.com I would like ...
ellat's user avatar
  • 137
0 votes
2 answers
98 views

Can some command or script subtract lines in one file from another faster than grep?

I have a shell script that runs regularly and the following part of it causes a slowdown. grep -v -f RemoveTheseGoodIPs.txt FromTheseShadyIPs.txt > RemainingBadIPs.txt It works. It just takes 156 ...
Darius Dauer's user avatar
1 vote
0 answers
141 views

host hangs after cat /dev/null > bigfile.log

I found a big log file(2.7 TB) on my disk, so I decided to empty it with the following command: cat /dev/null > bigfile.log After I executed this cmd, I lost my ssh connection. When I logged in ...
Cans Fong's user avatar
10 votes
4 answers
2k views

Why is tail file | tr (pipeline) faster than sed or perl with many lines?

I have a file with about one million lines, like this: "ID" "1" "2" "00000687" 0 1 "00000421" 1 0 "00000421" 1 0 "00000421" 1 0 with the last line repeated more than one million times. Taking ...
Francesco's user avatar
  • 856
24 votes
5 answers
6k views

What is the quickest way of replacing 0 by 1 and vice-versa in a stream?

Given a string composed of 0s and 1s, my goal is to replace 0 by 1 and vice-versa. Example: Input 111111100000000000000 Intended output 000000011111111111111 I tried, unsuccessfully, the following ...
Paulo Tomé's user avatar
  • 3,792
1 vote
0 answers
388 views

Large file processing performance

I did a simple performance test of csv processing a while ago, and want to share results with community, maybe you can point out, what test could be more precise and fair. Firstly i took out 42 MB ...
nonForgivingJesus's user avatar
2 votes
2 answers
2k views

How can I make my sed script run faster?

I got this script my a related question of mine - How do I insert the filename and header to the beginning of a csv find . -name '*.csv' -printf "%f\n" | sed 's/.csv$//' | xargs -I{} sed -i '1s/^/...
scientific_explorer's user avatar
1 vote
1 answer
332 views

Improve performance when using "system" call (shell escape) processing large files in awk

I have an awk script that processes very large files that look something like this: K1353 SF3987.7PD833391.4 KARE K1353 SF3987.2KD832231.4 MEAKE K1332 IF4987.7RP832231.2 LEAOS K1329 SF2787.7KD362619....
Rendalf's user avatar
  • 39
3 votes
1 answer
2k views

filtering a large file with a large filter

I want to extract all lines of $file1 that start with a string stored in $file2. $file1 is 4 GB large with about 20 million lines, $file2 has 2 million lines, is about 140 MB large and contains two ...
katosh's user avatar
  • 354
7 votes
1 answer
2k views

efficiently grep an interval of a sorted file

My file has millions of lines, resides in memory /dev/shm/tmp.file, is accessed by multiple threads, looks like this 831092,25a1bd66f2eec71aa2f0a8bb3d,/path/to/a/file 4324,8d83c29e4d8c71bd66f1bd66fs,/...
katosh's user avatar
  • 354
3 votes
2 answers
688 views

A more efficient way to process a large amount of files (300k+) in order to collect results?

I have a file named fields.txt and containing L=300k+ lines which looks like: field1 field2 field3 field1 field2 field3 field1 field2 field3 ... field1 field2 field3 In the same folder, I have N ...
woland's user avatar
  • 55
1 vote
1 answer
283 views

How to print only 1 filename together with the matching pattern?

I want to print the filename/s together with the matching pattern but only once even if the pattern match has multiple occurrence in the file. E.g. I have a list of patterns; list_of_patterns.txt and ...
WashichawbachaW's user avatar
13 votes
5 answers
34k views

How to find duplicate lines in many large files?

I have ~30k files. Each file contains ~100k lines. A line contains no spaces. The lines within an individual file are sorted and duplicate free. My goal: I want to find all all duplicate lines across ...
Lars Schneider's user avatar
1 vote
2 answers
5k views

fast ways of removing beginning lines from large text file

I have a big text file (>500GB), all the ways I can find (sed/tail and others) all require write the 500GB content to disk. Is there anyway to quickly remove the first a few lines in place without ...
1a1a11a's user avatar
  • 121
6 votes
1 answer
10k views

How well does grep/sed/awk perform on very large files? [closed]

I was wondering if grep, sed, and awk were viable tools for finding data in very large files. Lets say I have a 1TB file. If i wanted to process the text in that file, what would the time frame look ...
Luke Pafford's user avatar

15 30 50 per page