[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LPeg can do what Lua patterns can't (was Re: lpeg as a part of lua)
- From: Sean Conner <sean@...>
- Date: Sun, 29 Oct 2017 02:22:01 -0400
It was thus said that the Great Daurnimator once stated:
> On 29 October 2017 at 15:38, Sean Conner <sean@conman.org> wrote:
> > It was thus said that the Great Dirk Laurie once stated:
> >>
> >> Put another way: the issue is not what can you do with LPeg, it is
> >> what can you do that you can't do with Lua patterns, and whether
> >> that extra is sufficiently common to justify adding LPeg to Lua.
> >
> > I think I've come across something else that LPeg can easily do (for
> > various values of "easily") that would be difficult to do with Lua patterns.
> > The background for the project is to output text to fit the width of a
> > terminal, and said output contains a mixture of UTF-8 and terminal escape
> > codes, for example:
> >
> > Стоял он, дум\27[31;41m великих полн"
> >
> > That string is 56 bytes long, and contains 26 printable glyphs. If I wanted
> > to print out only 20 glyphs (because that's the width of our terminal, or
> > all that's left on the current line of our terminal), how do I calculate how
> > many bytes to write?
>
> Not all glyphs take up a single terminal cell. Some take no columns,
> some take multiple columns.
> To solve this, you need a library with knowledge of these widths.
> One such library that is available on most computers is libunistring.
> See https://www.gnu.org/software/libunistring/manual/libunistring.html#uniwidth_002eh
> One level higher is the unistring function u8_width_linebreaks that
> tells you where to insert line breaks for a given piece of text to fit
> in a terminal.
> https://www.gnu.org/software/libunistring/manual/libunistring.html#unilbrk_002eh
>
> It's one of the reasons I wrote https://github.com/daurnimator/lua-unistring
It's close to what I want, but not fully there. The function u8_width()
is close, but the documentation states, "[t]his function ignores control
characters in the string," which to me, says it will treat this:
Стоял он, дум\27[31;41m великих полн
as
Стоял он, дум[31;41m великих полн
and not
Стоял он, дум великих полн
which is what I'm trying to do. I'm also having to treat HT (the tab
character) specially because it can be anywhere from 0 to 8 terminal
positions, depending upon where it occurs in the string (I don't want it
ignored). I'm pretty sure I can work around control codes and escape
sequences using u8_width().
-spc (So I'm still interested in non-LPeg solutions to this)