Context
Writing a Power Query function to read binary data from a binary stream where some fields are preceded by their length as a 32-bit Little Endian Integer
.
A systematic error occurs when using the BinaryFormat.*
functions having a length
or count
parameter, and feeding them this parameter "as a binary format of the length that precedes the binary data" (as opposed to "as a number").
Details
This error does NOT occur if the length endianness in the data is Big Endian
.
Worth to Notice, Big Endian
being the default, when working on Little Endian
data, one HAVE TO use the BinaryFormat.ByteOrder
function to modify the byte order.
This results in convoluted workarounds in order to first read the length, and then feed it to the field reader function, "as a number".
Reference: https://learn.microsoft.com/en-us/powerquery-m/binaryformat-byteorder
For much better documentation: https://powerquery.how/binaryformat-byteorder/
All other functions mentioned here are also documented on these 2 sites, see their left menu.
Test case
Sample standalone Power Query M query to document the expected and observed behavior:
let
// sample data bits
BigEndian3i32 = #binary({0,0,0,3}),
LittleEndian3i32 = #binary({3,0,0,0}),
BinaryText3Chars = Text.ToBinary("abc"),
// TEST DATA: a text field preceded by its length as a 32-bit integer
BEbinaryData = Binary.Combine({BigEndian3i32 , BinaryText3Chars}),
LEbinaryData = Binary.Combine({LittleEndian3i32 , BinaryText3Chars}),
// Readers for integer:
// default and modified using the 2 options for BinaryFormat.ByteOrder
IntReader = BinaryFormat.UnsignedInteger32, // NO ByteOrder modification
IntReaderBE = BinaryFormat.ByteOrder(
BinaryFormat.UnsignedInteger32, ByteOrder.BigEndian),
IntReaderLE = BinaryFormat.ByteOrder(
BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
// Text field readers, using the 3 cases for reading the length
TextReader = BinaryFormat.Text(IntReader),
TextReaderBE = BinaryFormat.Text(IntReaderBE),
TextReaderLE = BinaryFormat.Text(IntReaderLE),
// display function for binary data
Binary_HexPrint = (binary as binary) as text => let
length = Binary.Length(binary),
asText = Binary.ToText(binary, BinaryEncoding.Hex),
splitter= Splitter.SplitTextByLengths(List.Repeat({2}, length)),
pretty = Text.Combine(splitter(asText), " ")
in pretty,
// 5 first fields: check upstream data
// 3 last fields : apply the Text readers on the test data
result = [
BE Sample = Binary_HexPrint(BEbinaryData), // ok, no problem upstream
LE Sample = Binary_HexPrint(LEbinaryData), // ok, no problem upstream
Int Value = IntReader(BEbinaryData), // ok, no problem upstream
Int Value BE = IntReaderBE(BEbinaryData), // ok, no problem upstream
Int Value LE = IntReaderLE(LEbinaryData), // ok, no problem upstream
Text Value = TextReader(BEbinaryData), // ok, EXPECTED BEHAVIOR
Text Value BE = TextReaderBE(BEbinaryData), // ERROR
Text Value LE = TextReaderLE(LEbinaryData) // ERROR
]
in
result
Test result
Screen capture of the result of this test query, with the error message:
In English, the error message says "The value of the specified Binary Format can't be used to read a length".
- When the
BinaryFormat.UnsignedInteger32
function is used directly, the expected behavior is obtained: the field value "abc" is read correctly. - But if modified using
BinaryFormat.ByteOrder
(whateverByteOrder
used), it doesn't work anymore.
Expected behavior
I would have expected that this function modification would have been transparent since, modified or not, the length value read is the same (an unsigned 32-bit Integer, as can be checked using Value.Type
).
But it seems that BinaryFormat.ByteOrder
is not "registered" as a function that "can be used to read a length".
Workaround
// Define short hands replacing functions having a 'length' or 'count' parameter
// but not working with BinaryFormat.ByteOrder
BinaryFormat_Text = (lengthReader as function) as function => BinaryFormat.Choice(
lengthReader,
(length) => BinaryFormat.Text(length)
),
BinaryFormat_Binary = (lengthReader as function) as function => BinaryFormat.Choice(
lengthReader,
(length) => BinaryFormat.Binary(length)
),
BinaryFormat_List =
(binaryFormat as function, lengthReader as function) as function => BinaryFormat.Choice(
lengthReader,
(count) => BinaryFormat.List(binaryFormat, count)
),
// NOT working with BinaryFormat.ByteOrder
// myFieldReader = BinaryFormat.Text(IntReaderLE)
// Shorthand use, WORKING with BinaryFormat.ByteOrder
myFieldReader = BinaryFormat_Text(IntReaderLE)
Of course, when working with Big Endian
length, one should completely avoid the problem by NOT using BinaryFormat.ByteOrder
at all!
Questions
Am I just misunderstanding (and misusing) how to use these functions, or does it look like a bug/limitation in the M Power Query language?
Would you recommend better or simpler workarounds?