Skip Menu |
 

This queue is for tickets about the Net-IDN-Encode CPAN distribution.

Report information
The Basics
Id: 103368
Status: open
Priority: 0/
Queue: Net-IDN-Encode

People
Owner: CFAERBER [...] cpan.org
Requestors: matthew.unwin [...] returnpath.com
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: (no value)
Fixed in: (no value)



Subject: Issue converting a unicode domain to ascii and that same domain from ascii to unicode
Date: Tue, 7 Apr 2015 13:35:38 -0600
To: bug-Net-IDN-Encode [...] rt.cpan.org
From: Matthew Unwin <matthew.unwin [...] returnpath.com>
Download (untitled) / with headers
text/plain 1.9k
We have a client who has registered the following two domains (these are in the Tamil language): ெசாசியதெ-ெஜனரால.com (xn----oweaj2b6a1bms6ihf1ggb.com) ெசாசியதெ-ெஜனரால.net (xn----oweaj2b6a1bms6ihf1ggb.net) These domains fail to convert when using Net::IDN::Encode version 2.201 and perl 5.18 on Centos 6.5. When I try to convert the two domains above using domain_to_ascii(), I get the following error: begins with General_Category=Mark [V5] at .../lib/perl5/x86_64-linux/Net/IDN/Encode.pm line 46. The reverse, domain_to_unicode() also fails when testing with the converted values noted above. I have tried all combinations of the optional parameters: AllowUnassigned, UseSTD3ASCIIRules, TransitionalProcessing without success. I have also tried: uts46_to_ascii() / uts46_to_unicode -- fails idna2003_to_ascii() -- succeeds, results in: xn----oweaj2b6a1bms6ihf1ggb encode_punycode() [tested without the .com and .net] -- succeeds, results in: --oweaj2b6a1bms6ihf1ggb I have tried a variety of on-line tools to try and validate that the domain names are valid: http://mct.verisign-grs.com/ -- fails http://㯙㯜㯙㯟.net/ <http://xn--domain.net/> --succeeds (works in both idna2003 and idna2008 modes and prints out code points) http://punycode.phlymail.de/ --succeeds (works in both idna2003 and idna2008 modes) http://www.motobit.com/util/punycode-decoder-encoder.asp -- succeeds (used "To IDN") https://iwantmyname.com/domain-tools/idns/idn-punycode-converter --succeeds http://www.punycoder.com/ -- succeeds https://mothereff.in/punycode --succeeds http://idn-encoding.online-domain-tools.com/ --succeeds http://www.idnconverter.se/ --succeeds So, other than Verisign's online tool, I haven't found another unicode to IDN/punycode converter that has problems converting the two domains above. This leads me to believe there is a bug somewhere in Net::IDN::Encode. Thanks!
Download (untitled) / with headers
text/plain 700b
I think that Net::IDN::Encode is correct here, as the label starts with U+0BC6 (TAMIL VOWEL SIGN E), which is a combining mark (Mark, Spacing Combining [Mc]). In IDNA 2008, labels must not start with a combining mark. IDNA 2008 and UTS #46 are in agreement about this: RFC 5891, section 4.2.3.2: The Unicode string MUST NOT begin with a combining mark or combining character (see The Unicode Standard, Section 2.11 [Unicode] for an exact definition). UTS #46, section 4.1: 5. The label must not begin with a combining mark, that is: General_Category=Mark. It also does not make sense to START a label with a character that - being a combining mark - has to FOLLOW another character.
Download (untitled) / with headers
text/plain 700b
I think that Net::IDN::Encode is correct here, as the label starts with U+0BC6 (TAMIL VOWEL SIGN E), which is a combining mark (Mark, Spacing Combining [Mc]). In IDNA 2008, labels must not start with a combining mark. IDNA 2008 and UTS #46 are in agreement about this: RFC 5891, section 4.2.3.2: The Unicode string MUST NOT begin with a combining mark or combining character (see The Unicode Standard, Section 2.11 [Unicode] for an exact definition). UTS #46, section 4.1: 5. The label must not begin with a combining mark, that is: General_Category=Mark. It also does not make sense to START a label with a character that - being a combining mark - has to FOLLOW another character.
Download (untitled) / with headers
text/plain 429b
It is strange, however, hat IDNA 2003 did allow the registration of the string as a domain name. The intention behind Net::IDN::Encode is that strings allowed in any IDNA version (IDNA 2003, UTS #46, IDNA 2008) are also allowed by Net::IDN::Encode. So far, my impression was that UTS #46 would serve that purpose. So I'm considering to add an option (or opt-out) to ignore rule V5 for strings that were valid IDNA 2003 strings.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.