|
Roberto Ierusalimschy wrote:
asian languages hardly use spaces, but i get the impression that they need less characters to express ideas so in the end (translated books are not per se thicker) it's still relatively compact (if chinese puts 30 chars on a line, that means some 100 bytes; a language using the latin script with accents (french, vietnamese, etc) has some 70 chars per line and quite some of them are multibyte which then also adds up to 100+; arab is a different story. i think that compactness is no real issue here (not more than that german needing more characters to express an idea then e.g. french). HansIn fact, UTF-8 also uses a maximum of 4 bytes to represent any code point, but requires 3 bytes to represent code points in asian languages, so in general terms it is less compact than UTF-16, but in some applications ("mostly ascii") it will turn out to be better.If I understand correctly, even asian languages use ascii punctuation (dots, spaces, newlines, commas, etc.), which uses 1 byte in utf-8 but 2 in utf-16. So, even for these languages utf-8 it is not so less compact as it seems.
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------