edited tags

Link

edited Apr 15, 2022 at 4:33

user232326

Adding last updates. Thanks for help and show how to findout Who killed my sort.

Source Link

edited Apr 12, 2022 at 0:44

wviana

213
1
3
9

Update3

So I'm getting so deep in the rabbit hole, totally lost focus of my objective. I started search for faster sorting, maybe write some C or Rust, but realized that already have the data I came for processed. So I'm here to show dmesg output and one final tip about the python script. The tip is: may be better to just count using dict or Counter, than sort its output using gnu sort tool. Probably sort sorts faster than python sorted buitin function.

About dmesg, it was pretty simple to find out of memory, just did a sudo dmesg | less press G to go all way down, than ? to search back, than searched for Out string. Found two of them, one for my python script and another to my sort, the one that started this question. Here is those outputs:

[1306799.058724] Out of memory: Killed process 1611241 (sort) total-vm:1131024kB, anon-rss:1049016kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:2120kB oom_score_adj:0
[1306799.126218] oom_reaper: reaped process 1611241 (sort), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[1365682.908896] Out of memory: Killed process 1611945 (python3) total-vm:1965788kB, anon-rss:1859264kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:3748kB oom_score_adj:0
[1365683.113366] oom_reaper: reaped process 1611945 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

That's it, thank you so much for helping so far, hope it help others too.

Update3

So I'm getting so deep in the rabbit hole, totally lost focus of my objective. I started search for faster sorting, maybe write some C or Rust, but realized that already have the data I came for processed. So I'm here to show dmesg output and one final tip about the python script. The tip is: may be better to just count using dict or Counter, than sort its output using gnu sort tool. Probably sort sorts faster than python sorted buitin function.

About dmesg, it was pretty simple to find out of memory, just did a sudo dmesg | less press G to go all way down, than ? to search back, than searched for Out string. Found two of them, one for my python script and another to my sort, the one that started this question. Here is those outputs:

[1306799.058724] Out of memory: Killed process 1611241 (sort) total-vm:1131024kB, anon-rss:1049016kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:2120kB oom_score_adj:0
[1306799.126218] oom_reaper: reaped process 1611241 (sort), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[1365682.908896] Out of memory: Killed process 1611945 (python3) total-vm:1965788kB, anon-rss:1859264kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:3748kB oom_score_adj:0
[1365683.113366] oom_reaper: reaped process 1611945 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

That's it, thank you so much for helping so far, hope it help others too.

Just a note in first update code and improved title

Source Link

edited Apr 11, 2022 at 22:43

wviana

213
1
3
9

Who killed my sort? ( howor How to efficient count distinct values from a csv column )

#!/usr/bin/env python3
import sys

conter = dict()

# Create a key for each distinct line and increment according it shows up. 
for l in sys.stdin:
    conter[l] = conter.setdefault(l, 0) + 1 # After Update2 note: don't do this, do just `couter[l] = conter.get(l, 0) + 1`

# Print entries sorting by tuple second item ( value ), in reverse order
for e in sorted(conter.items(), key=lambda i: i[1], reverse=True):
    k, v = e
    print(f'{v}\t{k}')

Who killed my sort? ( how to efficient count distinct values from a csv column )

#!/usr/bin/env python3
import sys

conter = dict()

# Create a key for each distinct line and increment according it shows up. 
for l in sys.stdin:
    conter[l] = conter.setdefault(l, 0) + 1

# Print entries sorting by tuple second item ( value ), in reverse order
for e in sorted(conter.items(), key=lambda i: i[1], reverse=True):
    k, v = e
    print(f'{v}\t{k}')

Who killed my sort? or How to efficient count distinct values from a csv column

#!/usr/bin/env python3
import sys

conter = dict()

# Create a key for each distinct line and increment according it shows up. 
for l in sys.stdin:
    conter[l] = conter.setdefault(l, 0) + 1 # After Update2 note: don't do this, do just `couter[l] = conter.get(l, 0) + 1`

# Print entries sorting by tuple second item ( value ), in reverse order
for e in sorted(conter.items(), key=lambda i: i[1], reverse=True):
    k, v = e
    print(f'{v}\t{k}')

Place third updated of this jurney and add more descriptive exaplanation of what this Q/A is about.

Source Link

edited Apr 11, 2022 at 22:33

wviana

213
1
3
9

Loading

typo corrected

Source Link

edit approved Apr 10, 2022 at 21:46

Alter Lagos

103
3

Loading

Separated paragraph that was joined with last quote

Source Link

edited Apr 10, 2022 at 21:16

wviana

213
1
3
9

Loading

Jorney updated afeter reading answers.

Source Link

edited Apr 10, 2022 at 20:41

wviana

213
1
3
9

Loading

Became Hot Network Question

occurred Apr 10, 2022 at 12:19

Just trying to make one line to fit in codeblock columns

Source Link

edited Apr 10, 2022 at 4:04

wviana

213
1
3
9

Loading

Add left zero for perfect progress bar alignement

Source Link

edited Apr 10, 2022 at 3:58

wviana

213
1
3
9

Loading

Source Link

asked Apr 10, 2022 at 3:53

wviana

213
1
3
9

Loading

Stack Exchange Network

Return to Question

Update3

Update3

Who killed my sort? ( howor How to efficient count distinct values from a csv column )

Who killed my sort? ( how to efficient count distinct values from a csv column )

Who killed my sort? or How to efficient count distinct values from a csv column