Newest 'performance+text-processing' Questions

0 votes

4 answers

275 views

Remove subdomains or existing domains

I have a list of domains, a sample is: account.google.com drive.google.com google.com bgoogle.com yahoo.co.uk stats.wikipedia.org media.wikipedia.org files.media.wikipedia.org bible.com I would like ...

ellat

137

asked Apr 10, 2024 at 16:39

0 votes

2 answers

98 views

Can some command or script subtract lines in one file from another faster than grep?

I have a shell script that runs regularly and the following part of it causes a slowdown. grep -v -f RemoveTheseGoodIPs.txt FromTheseShadyIPs.txt > RemainingBadIPs.txt It works. It just takes 156 ...

Darius Dauer

45

asked Jul 29, 2023 at 17:25

1 vote

0 answers

141 views

host hangs after cat /dev/null > bigfile.log

I found a big log file(2.7 TB) on my disk, so I decided to empty it with the following command: cat /dev/null > bigfile.log After I executed this cmd, I lost my ssh connection. When I logged in ...

Cans Fong

11

asked Jan 18, 2023 at 3:32

10 votes

4 answers

2k views

Why is tail file | tr (pipeline) faster than sed or perl with many lines?

I have a file with about one million lines, like this: "ID" "1" "2" "00000687" 0 1 "00000421" 1 0 "00000421" 1 0 "00000421" 1 0 with the last line repeated more than one million times. Taking ...

Francesco

856

asked May 21, 2020 at 13:38

24 votes

5 answers

6k views

What is the quickest way of replacing 0 by 1 and vice-versa in a stream?

Given a string composed of 0s and 1s, my goal is to replace 0 by 1 and vice-versa. Example: Input 111111100000000000000 Intended output 000000011111111111111 I tried, unsuccessfully, the following ...

Paulo Tomé

3,792

asked Jan 16, 2020 at 16:03

1 vote

0 answers

388 views

Large file processing performance

I did a simple performance test of csv processing a while ago, and want to share results with community, maybe you can point out, what test could be more precise and fair. Firstly i took out 42 MB ...

nonForgivingJesus

111

asked Aug 6, 2019 at 18:54

2 votes

2 answers

2k views

How can I make my sed script run faster?

I got this script my a related question of mine - How do I insert the filename and header to the beginning of a csv find . -name '*.csv' -printf "%f\n" | sed 's/.csv$//' | xargs -I{} sed -i '1s/^/...

scientific_explorer

137

asked Aug 2, 2019 at 20:37

1 vote

1 answer

332 views

Improve performance when using "system" call (shell escape) processing large files in awk

I have an awk script that processes very large files that look something like this: K1353 SF3987.7PD833391.4 KARE K1353 SF3987.2KD832231.4 MEAKE K1332 IF4987.7RP832231.2 LEAOS K1329 SF2787.7KD362619....

Rendalf

39

asked May 10, 2019 at 13:38

3 votes

1 answer

2k views

filtering a large file with a large filter

I want to extract all lines of $file1 that start with a string stored in $file2. $file1 is 4 GB large with about 20 million lines, $file2 has 2 million lines, is about 140 MB large and contains two ...

katosh

354

asked Feb 14, 2019 at 18:19

7 votes

1 answer

2k views

efficiently grep an interval of a sorted file

My file has millions of lines, resides in memory /dev/shm/tmp.file, is accessed by multiple threads, looks like this 831092,25a1bd66f2eec71aa2f0a8bb3d,/path/to/a/file 4324,8d83c29e4d8c71bd66f1bd66fs,/...

katosh

354

asked Feb 7, 2019 at 16:24

3 votes

2 answers

688 views

A more efficient way to process a large amount of files (300k+) in order to collect results?

I have a file named fields.txt and containing L=300k+ lines which looks like: field1 field2 field3 field1 field2 field3 field1 field2 field3 ... field1 field2 field3 In the same folder, I have N ...

woland

55

asked Apr 19, 2018 at 9:07

1 vote

1 answer

283 views

How to print only 1 filename together with the matching pattern?

I want to print the filename/s together with the matching pattern but only once even if the pattern match has multiple occurrence in the file. E.g. I have a list of patterns; list_of_patterns.txt and ...

WashichawbachaW

377

asked Feb 14, 2018 at 6:28

13 votes

5 answers

34k views

How to find duplicate lines in many large files?

I have ~30k files. Each file contains ~100k lines. A line contains no spaces. The lines within an individual file are sorted and duplicate free. My goal: I want to find all all duplicate lines across ...

Lars Schneider

242

asked Feb 11, 2018 at 23:02

1 vote

2 answers

5k views

fast ways of removing beginning lines from large text file

I have a big text file (>500GB), all the ways I can find (sed/tail and others) all require write the 500GB content to disk. Is there anyway to quickly remove the first a few lines in place without ...

1a1a11a

121

asked Feb 16, 2017 at 22:55

6 votes

1 answer

10k views

How well does grep/sed/awk perform on very large files? [closed]

I was wondering if grep, sed, and awk were viable tools for finding data in very large files. Lets say I have a 1TB file. If i wanted to process the text in that file, what would the time frame look ...

Luke Pafford

323

asked Sep 22, 2016 at 6:39

Stack Exchange Network

All Questions

Remove subdomains or existing domains

Can some command or script subtract lines in one file from another faster than grep?

host hangs after cat /dev/null > bigfile.log

Why is tail file | tr (pipeline) faster than sed or perl with many lines?

What is the quickest way of replacing 0 by 1 and vice-versa in a stream?

Large file processing performance

How can I make my sed script run faster?

Improve performance when using "system" call (shell escape) processing large files in awk

filtering a large file with a large filter

efficiently grep an interval of a sorted file

A more efficient way to process a large amount of files (300k+) in order to collect results?

How to print only 1 filename together with the matching pattern?

How to find duplicate lines in many large files?

fast ways of removing beginning lines from large text file

How well does grep/sed/awk perform on very large files? [closed]

Hot Network Questions

All Questions

Related Tags