Make reference tests simpler to use by RunDevelopment · Pull Request #2802 · image-rs/image

RunDevelopment · 2026-02-27T17:08:49Z

Fixes #2796

I changed the reference image test runner to a bless-based workflow. I also got rid of CRC and the custom test harness.

Changes:

Removed CRC from reference images, because it makes blessing more complex.
All test images in tests/images are now decoded and compared to their references in tests/references.
1. A test image failing to decode or not having a reference is now an error.
2. Animated test images also compare all their frames to reference.
3. Users can use the BLESS=1 env var to automatically create and update references.
Added missing reference images.

This directly led me to discovering 2 issues:

TIFF is unable to decode hpredict_cmyk.tiff (temporarily lives in tests/bad/tiff/TODO hpredict_cmyk.tiff). Other programs can open this.
WebP animation frames are decoded incorrectly when transparency is involved. See tests\reference\webp\extended_images\anim.webp.04.png. This is an animation of a ball bouncing around, but we decode each frame with all previous frames stacked below. My guess is that some buffer isn't cleared when it should be, and we just draw opaque pixels on top of the previous frame.

fintelia · 2026-02-27T18:39:19Z

Could we break this up into smaller pieces?

There are parts that I like, but tons of different things are changing at once. For instance, this completely rewrites the reference images test and regresses the output to just be:

     Running `/home/runner/work/image/image/target/debug/deps/reference_images-522906eb2bfa8780`

running 2 tests
test bad_images ... ok
test check_references ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 10.53s

RunDevelopment · 2026-02-27T23:32:21Z

Could we break this up into smaller pieces?

The only parts to break off that I can think of are the bad images and non-image files. Did you mean that?

this completely rewrites the reference images test

Well, yes. The reference image test works very differently now, and the previous implementation didn't have much that was reusable (because it was one huge function).

regresses the output to just be

Please explain how this is a regression. While true that all images passing will be a single line, that's hardly a problem, no? What's most important is that errors are explained well to enable the developer to fix them. If any subset of images fail the test, a simple overview will be generated. For example, say the reference for a.gif is different and the reference for b.qoi is missing. Then the output will be:

     Running tests\reference_images.rs (target\debug\deps\reference_images-f2796d1ae4fc4b02.exe)

running 1 test

thread 'check_references' (12332) panicked at tests\reference_images.rs:140:9:
Errors in references:
❌ images\gif\a.gif
     Reference dimension mismatch: image is (256, 256), reference is (12, 12) 
❌ images\qoi\b.qoi
     Missing reference image
stack backtrace:
   0: std::panicking::panic_handler
             at /rustc/254b59607d4417e9dffbc307138ae5c86280fe4c/library\std\src\panicking.rs:689
   1: core::panicking::panic_fmt
             at /rustc/254b59607d4417e9dffbc307138ae5c86280fe4c/library\core\src\panicking.rs:80
   2: reference_images::check_references
             at .\tests\reference_images.rs:140
   3: reference_images::check_references::closure$0
             at .\tests\reference_images.rs:46
   4: core::ops::function::FnOnce::call_once<reference_images::check_references::closure_env$0,tuple$<> >
             at C:\Users\micha\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\ops\function.rs:250
   5: core::ops::function::FnOnce::call_once
             at /rustc/254b59607d4417e9dffbc307138ae5c86280fe4c/library\core\src\ops\function.rs:250
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
test check_references ... FAILED

failures:
    check_references

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 1 filtered out; finished in 9.82s

197g · 2026-02-27T23:46:11Z

Each image having its own test failure status was the main motivation for changing it from the version-0.25 variant in the first place. It's valuable to know which test is failing if any is failing, makes it easy to identify any particular feature they might have especially for regression tests and suites that are constructed to cover them.

RunDevelopment · 2026-02-28T00:17:37Z

It's valuable to know which test is failing if any is failing, [...]

Don't you get that same information in the output that is generated by this PR? I don't understand what advantage making each file one test has.

fintelia · 2026-02-28T01:31:37Z

If the test passes then you don't get any output detailed output. That includes if the test passes because your image was skipped or because you placed it in the wrong directory and it didn't get detected in the first place.

And there's also the aspect that you're reverting a feature that we just added a few weeks ago.

RunDevelopment · 2026-02-28T09:39:34Z

Sure. Then let's go back to libtest-mimic.

Could you please also answer how you want the PR to be split up?
Edit: I made 2 PR that split off some parts of this. This PR is blocked until they are merged.

RunDevelopment · 2026-02-28T13:11:54Z

Regarding QOI: The BE bug that causes CI to fail has been fixed around a year ago, but they haven't made a new release in over 3 years. So that's fun. Suggestions on how to deal with this? Removing the one (and only) QOI test image seems like the easiest solution.

RunDevelopment · 2026-02-28T23:35:50Z

CI finally passes 🎉 I fixed it by ignoring QOI specifically on BE arches in reference tests. Hacky, but it works.

With that, this PR is technically ready to merge (after review of course), but I'd like to talk about 2 topics before:

tests/output - do we still need it? Right now, reference tests will still write every decoded image to tests/output. This was useful for when you had to manually copy files around, but now you can just bless and references will be updated automatically.
Unused references. I haven't implemented that yet with libtest-mimic. I'd like to do it in a follow-up PR to not block this one.

mstoeckl · 2026-03-01T00:26:09Z

I think including the hash of the uncompressed data in the filename is useful at minimum when testing PNG and TIFF (float) formats, because they are the reference formats for integer and float pixel types. It ensures that decoder bugs (like incorrectly byte-swapping 16 bit data on big endian systems) that affect both the image and its reference will be caught. I can't say if it's the best way to do it, but it works.

(Arguably reference images aren't needed at all to detect issues, and the hash suffices; but they do help with debugging. Edit: I should also note the current hash only applies to the raw decoded bytes and can miss dimension or format interpretation issues.)

Looking over the test images, I see:

In general the reference folder has gotten larger. Part of this is driven by the large (1MB) png/iptc.png, which did not previously have a reference. Some of the new reference images (like tests/reference/webp/extended_images/advertises_rgba_but_frames_are_rgb.webp.05.png) could can be compressed a bit tighter (using e.g. oxipng).
Now the tests produce two files like output/exr/cropping - uncropped original.exr.tiff, which is 1920x1920 pixels and uses 60MB of disk space.

RunDevelopment · 2026-03-01T01:01:17Z

It ensures that decoder bugs (like incorrectly byte-swapping 16 bit data on big endian systems) that affect both the image and its reference will be caught

IMO, it's fine for the reference test to assume that reference images are decoded correctly.

Plus, say there was a byte-swapping (or similar) bug in PNG/TIFF. For that bug to not be caught by the reference test, it would also need to be present in all other image formats that use PNG/TIFF for reference images.

In general the reference folder has gotten larger. Part of this is driven by the large (1MB) png/iptc.png, which did not previously have a reference. Some of the new reference images (like tests/reference/webp/extended_images/advertises_rgba_but_frames_are_rgb.webp.05.png) could can be compressed a bit tighter (using e.g. oxipng).

Frankly, png/iptc.png seems like a mistake to start with. Idk why this was added.

As for better compression: I'm open to it if it's a simple change. But I'm not going to integrate an external command line tool. The whole point of this PR is that it should be easy to add test images and update references.

Now the tests produce two files like output/exr/cropping - uncropped original.exr.tiff, which is 1920x1920 pixels and uses 60MB of disk space.

This will change when #2785 enables compression for TIFF by default. Case in point, I actually copied the compression settings from #2785 to generate the TIFF reference images for this PR. See that reference/exr/cropping - uncropped original.exr.tiff is 100KB and not 60 MB.

fintelia · 2026-03-01T01:11:59Z

We shouldn't be generating the reference images using the same decoders we're testing. They should be produced using some other independent decoder, and then compressed via oxipng or similar to avoid taking up more space than necessary in the repository.

The only exceptions are JPEG and AVIF(?) where the bit-exact output isn't fully defined. In those cases we should still compare against an independent reference implementation, but the checked in contents should be produced by our decoders (after running them through oxipng)

RunDevelopment added 3 commits February 27, 2026 17:48

Make reference tests simpler to use

a4bf4c3

Fix partial feature set

798b90f

Fix compiler error

ccb031e

This was referenced Feb 28, 2026

Move non-image test assets into own folder #2805

Merged

Move bad image files into own folder #2806

Merged

RunDevelopment added 4 commits February 28, 2026 11:22

Move bad images test out

34c5946

Back to libtest-mimic

96da6db

Simplified test

84862a6

Add simple diffing for mismatching references

154cf0a

RunDevelopment added 4 commits February 28, 2026 17:52

Remove mismatch diff. It's not worth the complexity

589086c

Exclude QOI on BE arch from reference test

11dc328

Better test names

5bbcab8

Merge branch 'main' into new-ref-tests

70783c8

RunDevelopment force-pushed the new-ref-tests branch from e3d17e5 to 70783c8 Compare February 28, 2026 23:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make reference tests simpler to use#2802

Make reference tests simpler to use#2802
RunDevelopment wants to merge 11 commits intoimage-rs:mainfrom
RunDevelopment:new-ref-tests

RunDevelopment commented Feb 27, 2026 •

edited

Loading

fintelia commented Feb 27, 2026

RunDevelopment commented Feb 27, 2026

197g commented Feb 27, 2026

RunDevelopment commented Feb 28, 2026

fintelia commented Feb 28, 2026

RunDevelopment commented Feb 28, 2026 •

edited

Loading

RunDevelopment commented Feb 28, 2026

RunDevelopment commented Feb 28, 2026

mstoeckl commented Mar 1, 2026 •

edited

Loading

RunDevelopment commented Mar 1, 2026

fintelia commented Mar 1, 2026

Labels

4 participants

Conversation

RunDevelopment commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

fintelia commented Feb 27, 2026

RunDevelopment commented Feb 27, 2026

197g commented Feb 27, 2026

RunDevelopment commented Feb 28, 2026

fintelia commented Feb 28, 2026

RunDevelopment commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

RunDevelopment commented Feb 28, 2026

RunDevelopment commented Feb 28, 2026

mstoeckl commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

RunDevelopment commented Mar 1, 2026

fintelia commented Mar 1, 2026

Labels

4 participants

RunDevelopment commented Feb 27, 2026 •

edited

Loading

RunDevelopment commented Feb 28, 2026 •

edited

Loading

mstoeckl commented Mar 1, 2026 •

edited

Loading