Lua documentation is clear about that : the Length operator on strings returns the size in bytes of the string (not in characters !)
Lua manual also claims: "The string library assumes one-byte character encodings."
So, by redefining string.sub() you have broken compatibility with a lot of Lua code.
Nevertheless, a consistent change of both the string library and the string operators from bytes to unicode codepoints might be a good idea. At least, such Lua dialect would be interesting to try.