Skip to content

Conversation

@vyavdoshenko
Copy link

Fixes a buffer overflow bug in HUF_WriteCTableWksp structure that has existed since 2016. The huffWeight array was incorrectly sized at 255 elements, but the code attempts to access index 255, causing an out-of-bounds write.

Bug Details:
Location: lib/compress/huf_compress.c:245

Current (buggy) code:

typedef struct {
    HUF_CompressWeightsWksp wksp;
    BYTE bitsToWeight[HUF_TABLELOG_MAX + 1];
    BYTE huffWeight[HUF_SYMBOLVALUE_MAX];      // 255 elements (indices 0-254)
} HUF_WriteCTableWksp;

Problem: At line 285, the code writes:
wksp->huffWeight[maxSymbolValue] = 0;

When maxSymbolValue = 255 (valid for 256 unique symbols), this writes to huffWeight[255], which is out of bounds for a 255-element array.

Impact:
Current Limitation:
The bug is currently masked by a runtime check at line 282:
if (maxSymbolValue > (256-128)) return ERROR(GENERIC);

This artificially limits the raw write path to 128 symbols, preventing the overflow but also preventing valid use of 256 symbols when FSE compression is not effective.

Real-World Impact:
This bug affects applications that:

  • Use data with all 256 possible byte values
  • Have uniform or near-uniform symbol distributions (where FSE compression fails)

Root Cause:
Introduced in commit 38b75dde (July 2016) during code simplification:
Intent: Remove special-case RLE handling for all-1 Huffman distributions
Side effect: Array size accidentally reduced from [HUF_SYMBOLVALUE_MAX + 1] (256) to [HUF_SYMBOLVALUE_MAX] (255)
Protection: Runtime check added simultaneously to prevent overflow, but it limits functionality

Fix:
Structure Fix:

typedef struct {
    HUF_CompressWeightsWksp wksp;
    BYTE bitsToWeight[HUF_TABLELOG_MAX + 1];
    BYTE huffWeight[HUF_SYMBOLVALUE_MAX + 1];  // 256 elements (indices 0-255)
} HUF_WriteCTableWksp;

Runtime Check Removal:
With the structure corrected, the artificial 128-symbol limit can be safely removed:
// REMOVED: if (maxSymbolValue > (256-128)) return ERROR(GENERIC);

Testing:
Added regression test tests/huf_max_symbol_test.c that:

  • Creates a histogram with 256 symbols (all with equal frequency)
  • Forces raw write path (equal frequency makes FSE compression ineffective)
  • Calls HUF_writeCTable_wksp with maxSymbolValue=255

Test Results:
Before fix:

$ ./huf_max_symbol_test
FAIL
Exit: 1

Returns ERROR(GENERIC) due to runtime check blocking 256 symbols. After fix:

$ ./huf_max_symbol_test
OK
Exit: 0

Successfully handles all 256 symbols.

Consistency:
Note that HUF_readCTable (line 294) already uses the correct size:
BYTE huffWeight[HUF_SYMBOLVALUE_MAX + 1]; // Already correct!
This fix brings HUF_WriteCTableWksp in line with the existing correct implementation.

Backward Compatibility

  • Binary compatible: Structure size increases by 1 byte (stack-allocated, no ABI impact)
  • API compatible: No function signature changes
  • Behavior: Existing code continues to work; new capability for 256-symbol edge cases

Checklist:

  • Bug fix (non-breaking change fixing an issue)
  • Regression test added
  • Integrated into Makefile, CMake, and Meson build systems
  • No external API changes
@meta-cla
Copy link

meta-cla bot commented Nov 22, 2025

Hi @vyavdoshenko!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@meta-cla meta-cla bot added the CLA Signed label Nov 22, 2025
@meta-cla
Copy link

meta-cla bot commented Nov 22, 2025

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@vyavdoshenko
Copy link
Author

Invalid assumption

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

1 participant