lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


2009/10/19 David Given <dg@cowlark.com>:
> Tony Finch wrote:
> [...]
>>
>> The solution is generally for TLDs to implement a character set policy.
>> For example, .at only allows these non-ascii characters in domain names:
>> ä ü ö ë à á â è é ê ì í î ï ò ó ô ù ú û ý ÿ ã å æ ç ð ñ õ ø œ š þ ž
>
> Interestingly, after posting my message I tried resolving the addresses; and
> it turns out that www.google.com and www.𝐠𝐨𝐨𝐠𝐥𝐞.com both appear to
> resolve to the same address... as does www.google.𝐜𝐨𝐦. (Using the bold
> versions of the characters here to make them visible --- these are all from
> U+1D400 MATHEMATICAL BOLD.) So it certainly looks like something is
> remapping at least some letter-like glyphs to actual ASCII. I don't know
> whether that's my Linux system or part of DNS.
>

Standard DNS only supports ASCII. Unicode addresses are mapped to
ASCII using various rules, see http://en.wikipedia.org/wiki/IDNA .

To be at least a little on-topic, we have Lua bindings to libidn which
provides IDNA conversion, among other unicode profiles, see
http://prosody.im/tip/util-src/encodings.c .

Examples:

matthew@silver:~/lxmppd-hg-depot$ lua -lutil.encodings
Lua 5.1.2  Copyright (C) 1994-2007 Lua.org, PUC-Rio
> =encodings.idna.to_ascii("www.google.𝐜𝐨𝐦.") -- Your unicode www.google.com
www.google.com. -- The letters you chose map directly to ASCII
> return encodings.idna.to_ascii("写真管理.com") -- unlike these
xn--g7q225f79b5rg.com

Matthew