Skip to main content
edited tags
Link
user232326
user232326
Adding last updates. Thanks for help and show how to findout Who killed my sort.
Source Link
wviana
  • 213
  • 1
  • 3
  • 9

Update3

So I'm getting so deep in the rabbit hole, totally lost focus of my objective. I started search for faster sorting, maybe write some C or Rust, but realized that already have the data I came for processed. So I'm here to show dmesg output and one final tip about the python script. The tip is: may be better to just count using dict or Counter, than sort its output using gnu sort tool. Probably sort sorts faster than python sorted buitin function.

About dmesg, it was pretty simple to find out of memory, just did a sudo dmesg | less press G to go all way down, than ? to search back, than searched for Out string. Found two of them, one for my python script and another to my sort, the one that started this question. Here is those outputs:

[1306799.058724] Out of memory: Killed process 1611241 (sort) total-vm:1131024kB, anon-rss:1049016kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:2120kB oom_score_adj:0
[1306799.126218] oom_reaper: reaped process 1611241 (sort), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[1365682.908896] Out of memory: Killed process 1611945 (python3) total-vm:1965788kB, anon-rss:1859264kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:3748kB oom_score_adj:0
[1365683.113366] oom_reaper: reaped process 1611945 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

That's it, thank you so much for helping so far, hope it help others too.

Update3

So I'm getting so deep in the rabbit hole, totally lost focus of my objective. I started search for faster sorting, maybe write some C or Rust, but realized that already have the data I came for processed. So I'm here to show dmesg output and one final tip about the python script. The tip is: may be better to just count using dict or Counter, than sort its output using gnu sort tool. Probably sort sorts faster than python sorted buitin function.

About dmesg, it was pretty simple to find out of memory, just did a sudo dmesg | less press G to go all way down, than ? to search back, than searched for Out string. Found two of them, one for my python script and another to my sort, the one that started this question. Here is those outputs:

[1306799.058724] Out of memory: Killed process 1611241 (sort) total-vm:1131024kB, anon-rss:1049016kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:2120kB oom_score_adj:0
[1306799.126218] oom_reaper: reaped process 1611241 (sort), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[1365682.908896] Out of memory: Killed process 1611945 (python3) total-vm:1965788kB, anon-rss:1859264kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:3748kB oom_score_adj:0
[1365683.113366] oom_reaper: reaped process 1611945 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

That's it, thank you so much for helping so far, hope it help others too.

Just a note in first update code and improved title
Source Link
wviana
  • 213
  • 1
  • 3
  • 9

Who killed my sort? ( howor How to efficient count distinct values from a csv column )

#!/usr/bin/env python3
import sys

conter = dict()

# Create a key for each distinct line and increment according it shows up. 
for l in sys.stdin:
    conter[l] = conter.setdefault(l, 0) + 1 # After Update2 note: don't do this, do just `couter[l] = conter.get(l, 0) + 1`

# Print entries sorting by tuple second item ( value ), in reverse order
for e in sorted(conter.items(), key=lambda i: i[1], reverse=True):
    k, v = e
    print(f'{v}\t{k}')

Who killed my sort? ( how to efficient count distinct values from a csv column )

#!/usr/bin/env python3
import sys

conter = dict()

# Create a key for each distinct line and increment according it shows up. 
for l in sys.stdin:
    conter[l] = conter.setdefault(l, 0) + 1

# Print entries sorting by tuple second item ( value ), in reverse order
for e in sorted(conter.items(), key=lambda i: i[1], reverse=True):
    k, v = e
    print(f'{v}\t{k}')

Who killed my sort? or How to efficient count distinct values from a csv column

#!/usr/bin/env python3
import sys

conter = dict()

# Create a key for each distinct line and increment according it shows up. 
for l in sys.stdin:
    conter[l] = conter.setdefault(l, 0) + 1 # After Update2 note: don't do this, do just `couter[l] = conter.get(l, 0) + 1`

# Print entries sorting by tuple second item ( value ), in reverse order
for e in sorted(conter.items(), key=lambda i: i[1], reverse=True):
    k, v = e
    print(f'{v}\t{k}')
Place third updated of this jurney and add more descriptive exaplanation of what this Q/A is about.
Source Link
wviana
  • 213
  • 1
  • 3
  • 9
Loading
Separated paragraph that was joined with last quote
Source Link
wviana
  • 213
  • 1
  • 3
  • 9
Loading
Jorney updated afeter reading answers.
Source Link
wviana
  • 213
  • 1
  • 3
  • 9
Loading
Became Hot Network Question
Just trying to make one line to fit in codeblock columns
Source Link
wviana
  • 213
  • 1
  • 3
  • 9
Loading
Add left zero for perfect progress bar alignement
Source Link
wviana
  • 213
  • 1
  • 3
  • 9
Loading
Source Link
wviana
  • 213
  • 1
  • 3
  • 9
Loading