Skip to content

Fix for non-ascii characters in text annotations of open-ephys files#1827

Open
MarinManuel wants to merge 6 commits intoNeuralEnsemble:masterfrom
MarinManuel:master
Open

Fix for non-ascii characters in text annotations of open-ephys files#1827
MarinManuel wants to merge 6 commits intoNeuralEnsemble:masterfrom
MarinManuel:master

Conversation

@MarinManuel
Copy link
Copy Markdown

Open Ephys (binary, not sure about other format) files can contain text information that contains non-ascii characters. When trying to open such a file, the code fails with error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 38: ordinal not in range(128)

This fix detects the UnicodeDecodeError and falls back to decoding byte by byte use utf8 encoding.

As far as I can tell, this change does not affect any of the existing tests. Let me know if we want to add a test with a file with non-ascii characters.

@alejoe91
Copy link
Copy Markdown
Contributor

Thanks @MarinManuel !

Would it make sense to always decode to be sure? (also to avoid the try-except?)

What characters were causing issues?

@MarinManuel
Copy link
Copy Markdown
Author

My initial thought was also to always decode with utf8, but I did not want to risk breaking anything, so I choose the safer approach. I also assume the .astype("U") is faster than looping, but I don't know if that matters really.

I had some text strings with the characters µ and Δ that were creating issues. I initially raised an issue with Open ephys (see thread on discord if you are interested) before I pinpointed the issue here.

@alejoe91 alejoe91 added this to the 0.14.5 milestone Mar 25, 2026
@zm711
Copy link
Copy Markdown
Contributor

zm711 commented Mar 31, 2026

I also prefer to avoid a bunch of try-except blocks if possible. I'm not sure how just always decoding would break but to be fair I don't do much with unicode vs utf8 stuff.

@MarinManuel
Copy link
Copy Markdown
Author

That's fair, but unfortunately, removing the try blocks causes issues because in some instances the array contains a bunch of numbers which should not be decode()ed.

As an alternative to the try blocks, I propose using conditional tests then to only use decode() if the array contains strings, otherwise use astype("U") as before

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants