[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Patterns
- From: Mouse <mouse@...>
- Date: Wed, 14 Dec 2022 16:31:34 -0500 (EST)
> Suppose the topic filter "/sport/#" and these [...]
> Only the first three are allowed to match, so "/sport.*" is not an
> option as that would match the forth one as well. On the other hand
> "/sport/.*" would not match the first one.
Could this maybe be a use case for importing a full-powered regex
package and then matching "^/sport(/|$)" or some such?
Or perhaps matching against /sport and /sport/.* both?
As for other comments, well, it's not clear to me whether you're just
coding to a spec you have nothing else to do with or you're involved in
creating the spec you cite. But it appears to be confusing characters
with Unicode codepoints:
When it performs subscription matching the Server MUST NOT perform any
normalization of Topic Names or Topic Filters, or any modification or
substitution of unrecognized characters [MQTT-4.7.3-4]. Each
non-wildcarded level in the Topic Filter has to match the
corresponding level in the Topic Name character for character for the
match to succeed.
It may be just sloppy language, or it may be disambiguated elsewhere (I
didn't read all 8000+ lines), but "character" in a Unicode environment
can be an ambiguous term, at least sometimes including things that look
like single characters to a user but formed from multiple codepoints
using combining codepoints. For example, A-grave can be represented by
combining U+0300 and U+0041 or by U+00C0 by itself. Forbidding
normalization leads to counterintuitive things, such as a pattern
containing A-grave (potentially) matching some-to-none of the Topic
Names containing A-grave.
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML email@example.com
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B