All Questions
Tagged with performance text-processing
24 questions
0
votes
4
answers
275
views
Remove subdomains or existing domains
I have a list of domains, a sample is:
account.google.com
drive.google.com
google.com
bgoogle.com
yahoo.co.uk
stats.wikipedia.org
media.wikipedia.org
files.media.wikipedia.org
bible.com
I would like ...
0
votes
2
answers
98
views
Can some command or script subtract lines in one file from another faster than grep?
I have a shell script that runs regularly and the following part of it causes a slowdown.
grep -v -f RemoveTheseGoodIPs.txt FromTheseShadyIPs.txt > RemainingBadIPs.txt
It works. It just takes 156 ...
1
vote
0
answers
141
views
host hangs after cat /dev/null > bigfile.log
I found a big log file(2.7 TB) on my disk, so I decided to empty it with the following command:
cat /dev/null > bigfile.log
After I executed this cmd, I lost my ssh connection. When I logged in ...
10
votes
4
answers
2k
views
Why is tail file | tr (pipeline) faster than sed or perl with many lines?
I have a file with about one million lines, like this:
"ID" "1" "2"
"00000687" 0 1
"00000421" 1 0
"00000421" 1 0
"00000421" 1 0
with the last line repeated more than one million times. Taking ...
24
votes
5
answers
6k
views
What is the quickest way of replacing 0 by 1 and vice-versa in a stream?
Given a string composed of 0s and 1s, my goal is to replace 0 by 1 and vice-versa. Example:
Input
111111100000000000000
Intended output
000000011111111111111
I tried, unsuccessfully, the following ...
1
vote
0
answers
388
views
Large file processing performance
I did a simple performance test of csv processing a while ago, and want to share results with community, maybe you can point out, what test could be more precise and fair.
Firstly i took out 42 MB ...
2
votes
2
answers
2k
views
How can I make my sed script run faster?
I got this script my a related question of mine - How do I insert the filename and header to the beginning of a csv
find . -name '*.csv' -printf "%f\n" |
sed 's/.csv$//' |
xargs -I{} sed -i '1s/^/...
1
vote
1
answer
332
views
Improve performance when using "system" call (shell escape) processing large files in awk
I have an awk script that processes very large files that look something like this:
K1353 SF3987.7PD833391.4 KARE
K1353 SF3987.2KD832231.4 MEAKE
K1332 IF4987.7RP832231.2 LEAOS
K1329 SF2787.7KD362619....
3
votes
1
answer
2k
views
filtering a large file with a large filter
I want to extract all lines of $file1 that start with a string stored in $file2.
$file1 is 4 GB large with about 20 million lines, $file2 has 2 million lines, is about 140 MB large and contains two ...
7
votes
1
answer
2k
views
efficiently grep an interval of a sorted file
My file has millions of lines, resides in memory /dev/shm/tmp.file, is accessed by multiple threads, looks like this
831092,25a1bd66f2eec71aa2f0a8bb3d,/path/to/a/file
4324,8d83c29e4d8c71bd66f1bd66fs,/...
3
votes
2
answers
688
views
A more efficient way to process a large amount of files (300k+) in order to collect results?
I have a file named fields.txt and containing L=300k+ lines which looks like:
field1 field2 field3
field1 field2 field3
field1 field2 field3
...
field1 field2 field3
In the same folder, I have N ...
1
vote
1
answer
283
views
How to print only 1 filename together with the matching pattern?
I want to print the filename/s together with the matching pattern but only once even if the pattern match has multiple occurrence in the file.
E.g. I have a list of patterns; list_of_patterns.txt and ...
13
votes
5
answers
34k
views
How to find duplicate lines in many large files?
I have ~30k files. Each file contains ~100k lines. A line contains no spaces. The lines within an individual file are sorted and duplicate free.
My goal: I want to find all all duplicate lines across ...
1
vote
2
answers
5k
views
fast ways of removing beginning lines from large text file
I have a big text file (>500GB), all the ways I can find (sed/tail and others) all require write the 500GB content to disk. Is there anyway to quickly remove the first a few lines in place without ...
6
votes
1
answer
10k
views
How well does grep/sed/awk perform on very large files? [closed]
I was wondering if grep, sed, and awk were viable tools for finding data in very large files.
Lets say I have a 1TB file. If i wanted to process the text in that file, what would the time frame look ...