0

Context

Writing a Power Query function to read binary data from a binary stream where some fields are preceded by their length as a 32-bit Little Endian Integer.

A systematic error occurs when using the BinaryFormat.* functions having a length or count parameter, and feeding them this parameter "as a binary format of the length that precedes the binary data" (as opposed to "as a number").

Details

This error does NOT occur if the length endianness in the data is Big Endian.
Worth to Notice, Big Endian being the default, when working on Little Endian data, one HAVE TO use the BinaryFormat.ByteOrder function to modify the byte order.

This results in convoluted workarounds in order to first read the length, and then feed it to the field reader function, "as a number".

Reference: https://learn.microsoft.com/en-us/powerquery-m/binaryformat-byteorder
For much better documentation: https://powerquery.how/binaryformat-byteorder/
All other functions mentioned here are also documented on these 2 sites, see their left menu.

Test case

Sample standalone Power Query M query to document the expected and observed behavior:

let
    // sample data bits
    BigEndian3i32    = #binary({0,0,0,3}),
    LittleEndian3i32 = #binary({3,0,0,0}),
    BinaryText3Chars = Text.ToBinary("abc"),

    // TEST DATA: a text field preceded by its length as a 32-bit integer
    BEbinaryData = Binary.Combine({BigEndian3i32    , BinaryText3Chars}),
    LEbinaryData = Binary.Combine({LittleEndian3i32 , BinaryText3Chars}),

    // Readers for integer: 
    // default and modified using the 2 options for BinaryFormat.ByteOrder
    IntReader    = BinaryFormat.UnsignedInteger32,  // NO ByteOrder modification
    IntReaderBE  = BinaryFormat.ByteOrder(
        BinaryFormat.UnsignedInteger32, ByteOrder.BigEndian),
    IntReaderLE  = BinaryFormat.ByteOrder(
        BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),

    // Text field readers, using the 3 cases for reading the length
    TextReader    = BinaryFormat.Text(IntReader),
    TextReaderBE  = BinaryFormat.Text(IntReaderBE),
    TextReaderLE  = BinaryFormat.Text(IntReaderLE),

    // display function for binary data
    Binary_HexPrint = (binary as binary) as text => let
        length  = Binary.Length(binary),
        asText  = Binary.ToText(binary, BinaryEncoding.Hex),
        splitter= Splitter.SplitTextByLengths(List.Repeat({2}, length)),
        pretty  = Text.Combine(splitter(asText), " ")
    in pretty,

    // 5 first fields: check upstream data
    // 3 last fields : apply the Text readers on the test data
    result = [
        BE Sample     = Binary_HexPrint(BEbinaryData), // ok, no problem upstream
        LE Sample     = Binary_HexPrint(LEbinaryData), // ok, no problem upstream
        Int Value     = IntReader(BEbinaryData),       // ok, no problem upstream
        Int Value BE  = IntReaderBE(BEbinaryData),     // ok, no problem upstream
        Int Value LE  = IntReaderLE(LEbinaryData),     // ok, no problem upstream
        Text Value    = TextReader(BEbinaryData),      // ok, EXPECTED BEHAVIOR
        Text Value BE = TextReaderBE(BEbinaryData),    // ERROR
        Text Value LE = TextReaderLE(LEbinaryData)     // ERROR
    ]
in
    result

Test result

Screen capture of the result of this test query, with the error message: screen capture: result of the test query along with the error message

In English, the error message says "The value of the specified Binary Format can't be used to read a length".

  • When the BinaryFormat.UnsignedInteger32 function is used directly, the expected behavior is obtained: the field value "abc" is read correctly.
  • But if modified using BinaryFormat.ByteOrder(whatever ByteOrder used), it doesn't work anymore.

Expected behavior

I would have expected that this function modification would have been transparent since, modified or not, the length value read is the same (an unsigned 32-bit Integer, as can be checked using Value.Type).
But it seems that BinaryFormat.ByteOrder is not "registered" as a function that "can be used to read a length".

Workaround

    // Define short hands replacing functions having a 'length' or 'count' parameter 
    // but not working with BinaryFormat.ByteOrder
    BinaryFormat_Text   = (lengthReader as function) as function => BinaryFormat.Choice( 
            lengthReader,
            (length) => BinaryFormat.Text(length)
    ),
    BinaryFormat_Binary = (lengthReader as function) as function => BinaryFormat.Choice( 
            lengthReader,
            (length) => BinaryFormat.Binary(length)
    ),
    BinaryFormat_List   = 
        (binaryFormat as function, lengthReader as function) as function => BinaryFormat.Choice( 
            lengthReader,
            (count) => BinaryFormat.List(binaryFormat, count)
    ),

    // NOT working with BinaryFormat.ByteOrder
    // myFieldReader = BinaryFormat.Text(IntReaderLE)
    // Shorthand use, WORKING with BinaryFormat.ByteOrder
    myFieldReader = BinaryFormat_Text(IntReaderLE)

Of course, when working with Big Endian length, one should completely avoid the problem by NOT using BinaryFormat.ByteOrder at all!

Questions

Am I just misunderstanding (and misusing) how to use these functions, or does it look like a bug/limitation in the M Power Query language?
Would you recommend better or simpler workarounds?

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.