lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


The manual says:

---
utf8.offset (s, n [, i])

Returns the byte index where the encoding of the n-th character of s starts,
counting from position i. A negative n gets characters before position i.
The default for i is 1. Returns nil if the subject does not have such character.

As a special case, when n is 0 the function returns the start of the encoding
of the character that contains the i-th byte of s.

This function assumes that s is a valid UTF-8 string.
---

Actually, the routine seems always to return something, even if s is not valid.
The result when n>0 seems to be correct if there are n-1 valid UTF-8 characters.

> s='voilà'
> #s
6
> utf8.offset(s,6)
7
> s=s:sub(1,-2).."\xFC"
> s
voil�
> utf8.offset(s,5)
5