|
Hi Sean, On 1/6/11 10:59 PM, Sean Conner wrote: it assumes a valid UTF-8 string to begin with I think that this is the main problem with the tests not giving consistent results, the 'broken' chars. -Been doing a lot of UTF-8 wrangling recently Can you tell me an official count of https://gist.github.com/768309 ('tamed' version of the last) http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt http://www.columbia.edu/kermit/utf8.html Thanks, Henning |