119
votes
Accepted
How can lossless compression ever exist?
There can be no algorithm that losslessly compresses all inputs. But there are many algorithms that losslessly compress many inputs. And it turns out that most of the strings we like to operate on ...
59
votes
Accepted
What is the most efficient way to store a numeric range?
Just count the number of possible ranges. There are 256 ranges with lower bound 0 (0-0, 0-1, ... 0-254, 0-255), 255 ranges with lower bound 1, ... and finally 1 range with lower bound 255 (255-255). ...
35
votes
How can lossless compression ever exist?
Compression works by making more likely patterns shorter at the cost of less likely patterns
Suppose we have a data format which is a string of 4 possible values, we could represent it with 2 bits ...
19
votes
How can lossless compression ever exist?
I recommend learning about Shannon entropy, but the simple answer is: lossless compression can't compress all inputs, but it can compress some inputs, and in practice that is enough.
To demonstrate I'...
17
votes
What is the most efficient way to store a numeric range?
For such small number of bits, it is infeasible to save many bits as Glorfindel has pointed out.
However, if the domain you are using has a few more bits, you can achieve significant savings for the ...
15
votes
Accepted
compression algorithm for non-repeating integers
You have to consider that compression - by which I assume you mean lossless compression - equates to the removal of redundant information. If you write 12,12,12,12,12 there's redundance and you can ...
15
votes
Accepted
Is it possible to store N bits of unique combinations, in N-1 bits? If not; why does MD5 get reprimanded for collissions?
Of course, the pigeonhole principle states that colisions are inevitable for hashing algorithms.
The point of hashing algorithms is not to prevent colisions. But to make intentional collisions ...
12
votes
How can lossless compression ever exist?
For example, if you had a file with 16 bit audio, you can obviously represent it with 16 bit per sample. However, a simple lossless compressor will have an algorithm that predicts the next sample, and ...
11
votes
How to review sql changes more effectively?
You may not like this, but:
This is not a problem to be solved easily by additional technologies or tools.
An SQL which contains "long nested queries, sometimes calling other procedures" and cannot ...
9
votes
Load and process (compressed) data from filesystem in the blink of an eye
profiling measurements
It sounds like you do not yet know where the CPU cycles go,
or, more importantly, the sources of I/O + network delay.
I didn't see any numeric figures,
let alone 98th percentile ...
8
votes
What is the most efficient way to store a numeric range?
This kind of problem is the subject of Claude Shannon’s seminal paper, A Mathematical Theory of Communication, which introduced the word “bit” and more or less invented data compression.
The general ...
8
votes
Accepted
How is it possible to GZIP a stream before the entire contents are known?
Your understanding concerns static compression, where the whole dataset is available when you begin compression. In a very simplified way, yes it replaces common sequences with smaller tokens to ...
7
votes
Accepted
How can I reduce the amount of storage needed for a gravitational n-body simulation?
Float Warning
Be aware of floating point precision. The mantessa is only x bits long, which means that for values close to zero they can express very small distances, but at galactic distances the ...
7
votes
How can lossless compression ever exist?
If our data was sequences of random values, and longer sequences were less likely than shorter ones, then lossless compression would on-average be worse than useless.
But our data is far from random ...
7
votes
How does data store compression speed up data warehouses?
Therefore, data compression must significantly slow down a database.
Which concludes a masterstroke in logical fallacy!
One of the essential reasons why compression can make things faster, is that ...
6
votes
Accepted
some misunderstanding in concept of Huffman algorithm
What is difference between Average length of codes and Average length
of codewords
By definition, an alphabet is a set of things {A1, . . . , AN} that we might wish to encode.
A code C is a mapping ...
6
votes
Accepted
How to deal with large data in Websocket message?
I'll propose an answer since it's been nearly a year.
As @Walfrat has pointed out:
Note that if you're echanging too much data, it might also be a design
problem that would lead to poor ...
6
votes
Is it possible to store N bits of unique combinations, in N-1 bits? If not; why does MD5 get reprimanded for collissions?
A hash function maps a larger (potentially infinite) input space into a smaller (usually finite) output space.
As a result, every hash function must have collisions. This is a consequence of the ...
4
votes
compression algorithm for non-repeating integers
Edit:
According to you, you are actually doing this:
I have 1024 datasets of size 512 where where each value is a distinct integer between 1 and N (or 0 and N-1), i.e., a permutation of a set of N ...
4
votes
A collision-free hash-like function for use in hash tables and other data structures?
You write "Idea #1: use the positional number of a string in the table as its unique ID." but this is not how sequences work. If you want to avoid contention around the sequences, you use sequence ...
4
votes
Is it possible to store N bits of unique combinations, in N-1 bits? If not; why does MD5 get reprimanded for collissions?
Cryptographic hashing will for example produce a 256 bit hashcode. If the hashed data is more than 256 bits, then collisions are unavoidable. Now try to find a collision, and you run into a problem. ...
4
votes
Short and compact barcode
Since the size and information density seems to be your issue, why not switch from a traditional 1D bar code to a 2D data matrix ?
It may allow you to encode the data in less space, and to read it ...
4
votes
How can lossless compression ever exist?
Other answers are right: not all data can be compressed. I'm not aiming to replace the accepted answer, but to provide some other descriptive text that might also be useful.
I'd like to expand on the ...
4
votes
Load and process (compressed) data from filesystem in the blink of an eye
I assume you are a junior programmer because there are a lot of red flags in your question that show you don't know what you are doing. You need to profile first.
Especially you need to find out
how ...
3
votes
Compressing EBCDIC file vs UTF8
First thoughts
EBCDIC is a format with an 8 bit encoding. The basic EBCDIC character set has plenty of unused space, so not all the potential of the 8 bits is used. But it has codepages for handling ...
3
votes
How does conditional compilation impact product quality, security and code complexity?
Generally speaking, the use of conditional compilation increases complexity which makes it harder to achieve security and other desirable qualities. A part of this complexity is essential if you want ...
3
votes
How to review sql changes more effectively?
SQL queries are complex and brittle code that I experienced myself to be quite difficult to review. We do have a good portion of our codebase written as SQL templates (for good reasons I won't ...
3
votes
Accepted
Are Flate compression in PDF and Deflate different algorithms?
If you just start the .net DeflateStream at index 2 in the PDF FlateDecode stream it will read perfectly! (Writing Flate streams is harder, you need to compute the Adler32 checksum for Adobe reader ...
3
votes
Accepted
How to remove unused code from a jar file?
This really is an engineering problem, a common way this is handles is keeping the libraries and your own application separate, and giving out updates to your program as partial updates, allowing ...
3
votes
Load and process (compressed) data from filesystem in the blink of an eye
First:
Set goals
Profile
Identify bottlenecks
Decide what to do
Is it possible to alleviate the memory bus from being the bottleneck by using (lightly) compressed data on reads?
If by "...
Only top scored, non community-wiki answers of a minimum length are eligible
Related Tags
compression × 68algorithms × 19
performance × 4
data × 4
storage × 4
java × 3
c# × 3
database × 3
strings × 3
hashing × 3
huffman-encoding × 3
c++ × 2
javascript × 2
php × 2
data-structures × 2
sql × 2
optimization × 2
memory × 2
math × 2
efficiency × 2
random × 2
coding × 2
binary × 2
files × 2
arithmetic × 2