|
At last week's Lua Workshop 2016 in San Francisco, I introduced Rosie Pattern Language (RPL) [1]. Those who were at the workshop may be interested in the following points which were not covered in my talk [3]. For everyone else, see below for a description of Rosie [2]. - Although RPL takes a parsing combinator approach (with the specific combinator set being that from Parsing _expression_ Grammars), there is much familiar syntax from regex, including for repetition (*, +, ?, and {min, max}) and for posix character sets (e.g. [[:alpha:]], [:^digit:]). - Rosie's C API is generated automatically from a Lua version (in src/api.lua). It's done in a rudimentary way, but has proven to be a useful approach during development. - Rosie automatically tokenizes text on the fly, with a customizable "token boundary" pattern that is similar in spirit to `\b` from regex, except that the boundary is inserted automagically. Rosie uses curly braces {...} as an operator that disables tokenization for the _expression_ inside the braces. - The RPL compiler generates a domain-specific version of closures. Suppose pattern A is defined in terms of other patterns B and C. The pattern bound to A will use the definitions of B and C that are in effect at the time A is defined. B and C may be redefined later, without affecting A. - Finally, in case it was not clear: RPL is not Turing complete. There is, in fact, no way to do arbitrary computation in RPL itself. I plan to introduce RPL customizations/extensions written in Lua, but RPL itself is designed only for pattern matching. Comments and questions are welcome! And I want to thank everyone from the workshop who gave suggestions, insights, and feedback already. Jamie [1] Rosie is open source and free (MIT license): https://github.com/jamiejennings/rosie-pattern-language [2] About the Rosie Pattern Language (RPL): RPL is an efficient and scalable way to match patterns in unstructured and semi-structured text, such as log files, blog posts, emails, etc. RPL is designed to replace the use of regex for such tasks, because: (1) regex are hard to read and maintain; and (2) regex engines can require exponential time, which will stall a data pipeline. The RPL compiler is implemented in Lua and compiles a pattern matching language down to lpeg, which essentially serves as an intermediate representation. The lpeg patterns are, of course, then optimized and executed by Roberto's lpeg virtual machine. This gives high pattern-matching performance to RPL. There is a foreign function interface that hides the entire implementation (including Lua, lpeg, and cjson) inside a C library. Sample programs that call `librosie` are given in C, Go, Python, Perl, _javascript_, and Ruby. [3] My presentation from the Lua Workshop 2016 is posted at https://github.com/jamiejennings/rosie-pattern-language/blob/master/doc/Lua%20Workshop%202016%20Jennings%20v4.pdf |