lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Wed, Jun 12, 2013 at 5:02 PM, Dirk Laurie <dirk.laurie@gmail.com> wrote:
> If `pos` comes before `char`, one can write an iterator on the model
> of `ipairs`:
>
>     for pos,char in utf8(str) do ...


Almost... but you end up with the position of the next character... So
you need some trickery. Assuming a valid UTF-8 string:

Usage:
     for finish, start, char in utf8_next_char, "˙†ƒ˙©√" do
        print(cpt)
    end
`start` and `finish` being the bounds of the character, and `cpt`
being the UTF-8 code point.
It produces:
    ˙
    †
    ƒ
    ˙
    ©
    √
local
function utf8_next_char (subject, i)
    i = i and i+1 or 1
    if i > #subject then return end
    local offset = utf8_offset(s_byte(subject,i))
    return i + offset, i, s_sub(subject, i, i + offset)
end

it has the annoying property of passing the end position before the
start position, but it is stateless.

-- Pierre-Yves