Re: Parsing binary data from an RS232 connection

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Parsing binary data from an RS232 connection
From: sur-behoffski <sur_behoffski@...>
Date: Sun, 8 Dec 2019 04:06:09 +1030

G'day,

Apologies in advance for such a long message, but the protocol format,
as described, is buggy, and could cause heartache.  These protocols are
quite tricky to get right; I have had great fortune to peek over the
shoulders of more than one experienced practitioner, as a part of
taking over incomplete, demonstration, or under-performing code, and
working to bring the code up to commercial standards.

[The under-performing code was because the stack pointer was misaligned
for a Z80 system (0xffff), so every push and pop operation took over
twice as long as documented in the manual!]

-- sur-behoffski (Brenton Hoff)
programmer, Grouse Software



On 2019-12-07 08:39, Russell Haley wrote:

I am writing a little Lua program to parse binary data received via serial
communications. I'm wondering if someone on the mailing list has a novel
approach to parsing the data? The communications protocol is something like
this:

<HEADER> - 4 bytes
<MSG_TYPE> - 1 byte
<SEQUENCE> - 2 bytes
<PAYLOAD_LENGTH> - 2 bytes
<PAYLOAD> - X bytes (variable based on message type)
<CHECKSUM> - 4 bytes

>
> [...]


You are making contradictory assumptions here:  Is the channel noisy --
there is any chance of message corruption, perhaps by:

    - A single-bit error in one byte;
    - A burst error, smearing a series of bits, possibly crossing byte
      boundaries;
    - Any chance of dropping a character (are you polling the UART?  If so,
      can you guarantee that the polling task, perhaps competing with other
      tasks for CPU/scheduler priority, will inspect the UART sufficiently
      often that the UART's hardware queue will never overflow?);
    - Maybe you've got the UART working with a DMA controller -- this is
      more common, and mostly gets rid of the task scheduling hazards, plus
      CPU overhead nuisance of polling -- but exactly how do you set up the
      DMA controller to issue an interrupt at the right time for the client
      receiver/queue/stream/whatever?  Do you have to have a "header-DMA"
      transfer, followed by a "packet body plus checksum" transfer?
    - On the flip side of dropping a character, is there any chance that
      some noise or glitch may be seen as a start-of-character sequence,
      leading to an extra character being inserted during an idle time?

As you can see, things can get fairly hairy fairly quickly.  However, your
packet format has a contradiction in it:

    1. Either the channel is completely correct at every single hardware
       and software operation level (unlikely, but...), in which case, why
       bother with a checksum?   OR

    2. The channel is noisy/fallible, so the variable-length specifier can
       be POISONED by noise, and yet it is trusted/worshipped by the protocol
       code as if it's perfect!  What if a frame came in which was intended
       to have a <PAYLOAD_LENGTH> of "06", but noise corrupted it to "32"?
       The receiver would need to have some way of detecting this
       corruption... for a two-way protocol, one end might hang, waiting for
       the end of the frame, and the other end might hang, waiting for an
       ACK.

-------

One way of possibly patching this is by:

    1. Adding a checksum after the end of the fixed-length header -- including
       the specification of the variable-length body that is to follow; and
    2. Demand that the start-of-packet <HEADER> *never* appears anywhere else
       in the frame -- nowhere in the variable-length data, nowhere in the
       fixed-length header, except as the first four bytes.
    3. Adding limits to frame sizes, timers to limit frame transmission
       period, and a mechanism for a receiver to respond with a *sequenced*
       NAK.  Without an increasing sequence number on messages from both ends,
       frames and/or ACK/NAK messages may end up getting duplicated, leading
       confusion that could result in a frame being dropped or maybe duplicated.

--

Okay, some more on techniques at the bit level (bit stuffing), and at the
byte level (byte stuffing) for ensuring that some "sentinel" header
sequence never appears in an arbitrary data stream, except as a by-product
of corruption, e.g. noise.  (If corruption does occur, the protocol depends on
the strength of the checksum; for large frames, 32-bit CRCs are favoured;
the CCITT CRC32 has been the traditional favourite, but in the last 15 years or
so, the Castagnoli CRC32 has been gaining traction as a worthy successor).

--

I've written a number of low-level microcontroller networking systems, in the
late 1980s/early 1990s, before TCP/IP became widely established.  These
networks were based on SDLC/HDLC:

        https://en.m.wikipedia.org/wiki/High-Level_Data_Link_Control

Usually, the UART would be programmed for NRZI (non-return-to-zero-inverted)
bit-blitting protocol.  The key thing to notice in the article is in the
header sequence:

      Protocol Structure -> Frame Format -> Flag

The technique of *bit stuffing* is used to ensure that the flag does not appear
in the data, so, when a flag arrives at end-of-frame, the receiver knows that
the checksum is the previous 16 bits.

The flag is 0x7e: A zero bit, six 1 bits, and a final zero bit.  Bit stuffing,
at the transmitter end, watches the in-frame data as it passes, and, whenever
it sees five consecutive 1 bits, unconditionally inserts a zero bit, before
resuming the frame data.  Likewise, the receiver watches the data stream, and if
it sees five 1-bits followed by a zero bit, it discards the zero bit.

--

If you are working purely in software, then byte-stuffing can be used instead of
bit stuffing, for the variable-length data in the frame.  This is done by:

    1. Designating one value as the Flag: 0x7e;
    2. Designating another byte as an Escape, typically: 0x7d; and
    3. Using a simple, reversible sequence to do the stuffing:
        3a. If a byte is not a Flag or an Escape, send it as-is;
        3b. For either of a Flag byte or an Escape byte:
            - Send the Escape byte; and
            - Send the original byte, XORed with 0x20.

So, a Flag (0x7e) in the data gets sent as "0x7d 0x5e"; and
an  Escape (0x7d) in the data gets sent as "0x7d 0x5d".

-----------

Okay,

Prev by Date: Re: Parsing binary data from an RS232 connection
Next by Date: Create Lua desktop application with weblike technologies
Previous by thread: Re: Parsing binary data from an RS232 connection
Next by thread: Lua 5.3.5 with shadowed variable warnings
Index(es):
- Date
- Thread