[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Parsing binary data from an RS232 connection
- From: sur-behoffski <sur_behoffski@...>
- Date: Sun, 8 Dec 2019 04:06:09 +1030
G'day,
Apologies in advance for such a long message, but the protocol format,
as described, is buggy, and could cause heartache. These protocols are
quite tricky to get right; I have had great fortune to peek over the
shoulders of more than one experienced practitioner, as a part of
taking over incomplete, demonstration, or under-performing code, and
working to bring the code up to commercial standards.
[The under-performing code was because the stack pointer was misaligned
for a Z80 system (0xffff), so every push and pop operation took over
twice as long as documented in the manual!]
-- sur-behoffski (Brenton Hoff)
programmer, Grouse Software
On 2019-12-07 08:39, Russell Haley wrote:
I am writing a little Lua program to parse binary data received via serial
communications. I'm wondering if someone on the mailing list has a novel
approach to parsing the data? The communications protocol is something like
this:
<HEADER> - 4 bytes
<MSG_TYPE> - 1 byte
<SEQUENCE> - 2 bytes
<PAYLOAD_LENGTH> - 2 bytes
<PAYLOAD> - X bytes (variable based on message type)
<CHECKSUM> - 4 bytes
>
> [...]
You are making contradictory assumptions here: Is the channel noisy --
there is any chance of message corruption, perhaps by:
- A single-bit error in one byte;
- A burst error, smearing a series of bits, possibly crossing byte
boundaries;
- Any chance of dropping a character (are you polling the UART? If so,
can you guarantee that the polling task, perhaps competing with other
tasks for CPU/scheduler priority, will inspect the UART sufficiently
often that the UART's hardware queue will never overflow?);
- Maybe you've got the UART working with a DMA controller -- this is
more common, and mostly gets rid of the task scheduling hazards, plus
CPU overhead nuisance of polling -- but exactly how do you set up the
DMA controller to issue an interrupt at the right time for the client
receiver/queue/stream/whatever? Do you have to have a "header-DMA"
transfer, followed by a "packet body plus checksum" transfer?
- On the flip side of dropping a character, is there any chance that
some noise or glitch may be seen as a start-of-character sequence,
leading to an extra character being inserted during an idle time?
As you can see, things can get fairly hairy fairly quickly. However, your
packet format has a contradiction in it:
1. Either the channel is completely correct at every single hardware
and software operation level (unlikely, but...), in which case, why
bother with a checksum? OR
2. The channel is noisy/fallible, so the variable-length specifier can
be POISONED by noise, and yet it is trusted/worshipped by the protocol
code as if it's perfect! What if a frame came in which was intended
to have a <PAYLOAD_LENGTH> of "06", but noise corrupted it to "32"?
The receiver would need to have some way of detecting this
corruption... for a two-way protocol, one end might hang, waiting for
the end of the frame, and the other end might hang, waiting for an
ACK.
-------
One way of possibly patching this is by:
1. Adding a checksum after the end of the fixed-length header -- including
the specification of the variable-length body that is to follow; and
2. Demand that the start-of-packet <HEADER> *never* appears anywhere else
in the frame -- nowhere in the variable-length data, nowhere in the
fixed-length header, except as the first four bytes.
3. Adding limits to frame sizes, timers to limit frame transmission
period, and a mechanism for a receiver to respond with a *sequenced*
NAK. Without an increasing sequence number on messages from both ends,
frames and/or ACK/NAK messages may end up getting duplicated, leading
confusion that could result in a frame being dropped or maybe duplicated.
--
Okay, some more on techniques at the bit level (bit stuffing), and at the
byte level (byte stuffing) for ensuring that some "sentinel" header
sequence never appears in an arbitrary data stream, except as a by-product
of corruption, e.g. noise. (If corruption does occur, the protocol depends on
the strength of the checksum; for large frames, 32-bit CRCs are favoured;
the CCITT CRC32 has been the traditional favourite, but in the last 15 years or
so, the Castagnoli CRC32 has been gaining traction as a worthy successor).
--
I've written a number of low-level microcontroller networking systems, in the
late 1980s/early 1990s, before TCP/IP became widely established. These
networks were based on SDLC/HDLC:
https://en.m.wikipedia.org/wiki/High-Level_Data_Link_Control
Usually, the UART would be programmed for NRZI (non-return-to-zero-inverted)
bit-blitting protocol. The key thing to notice in the article is in the
header sequence:
Protocol Structure -> Frame Format -> Flag
The technique of *bit stuffing* is used to ensure that the flag does not appear
in the data, so, when a flag arrives at end-of-frame, the receiver knows that
the checksum is the previous 16 bits.
The flag is 0x7e: A zero bit, six 1 bits, and a final zero bit. Bit stuffing,
at the transmitter end, watches the in-frame data as it passes, and, whenever
it sees five consecutive 1 bits, unconditionally inserts a zero bit, before
resuming the frame data. Likewise, the receiver watches the data stream, and if
it sees five 1-bits followed by a zero bit, it discards the zero bit.
--
If you are working purely in software, then byte-stuffing can be used instead of
bit stuffing, for the variable-length data in the frame. This is done by:
1. Designating one value as the Flag: 0x7e;
2. Designating another byte as an Escape, typically: 0x7d; and
3. Using a simple, reversible sequence to do the stuffing:
3a. If a byte is not a Flag or an Escape, send it as-is;
3b. For either of a Flag byte or an Escape byte:
- Send the Escape byte; and
- Send the original byte, XORed with 0x20.
So, a Flag (0x7e) in the data gets sent as "0x7d 0x5e"; and
an Escape (0x7d) in the data gets sent as "0x7d 0x5d".
-----------
Okay,