Skip Menu |
 
rt.cpan.org will be shut down on March 1st, 2021.

This queue is for tickets about the Unicode-CaseFold CPAN distribution.

Report information
The Basics
Id: 77122
Status: rejected
Priority: 0/
Queue: Unicode-CaseFold

People
Owner: Nobody in particular
Requestors: RSAVAGE [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 0.02
Fixed in: (no value)



Subject: Output of fc kills Encode::decode
Download (untitled) / with headers
text/plain 2.3k
Hi I'm processing subcountry names in Estonia, from: http://en.wikipedia.org/wiki/ISO_3166-2:EE I got to that page from the list of all countries: http://en.wikipedia.org/wiki/ISO_3166-2 Code: for my $element (@$table) { $i++; $self -> log(debug => "code: $$element{code}"); $self -> log(debug => "name: $$element{name}"); $self -> log(debug => "decode: " . decode('utf8', $$element{name})); $self -> log(debug => "decode fc: " . decode('utf8', fc $$element{name})); $sth -> execute($country_id, $$element{code}, decode('utf8', fc $$element{name}), decode('utf8', $$element{name}), $i); } Output: debug: code: EE-37. debug: name: Harjumaa. debug: decode: Harjumaa. debug: decode fc: harjumaa. debug: code: EE-39. debug: name: Hiiumaa. debug: decode: Hiiumaa. debug: decode fc: hiiumaa. debug: code: EE-44. debug: name: Ida-Virumaa. debug: decode: Ida-Virumaa. debug: decode fc: ida-virumaa. debug: code: EE-49. debug: name: Jõgevamaa. debug: decode: Jõgevamaa. Cannot decode string with wide characters at /home/ron/perl5/perlbrew/perls/perl-5.14.2/lib/5.14.2/x86_64-linux- thread-multi/Encode.pm line 176. So, the call to fc returns something unacceptable to decode, when the name is Jõgevamaa. I rigged the code to skip Estonia, and the code works in all other countries and their subcountries. I then rigged the code to skip Jõgevamaa, and the next place it dies is: debug: code: EE-65. debug: name: Põlvamaa. debug: decode: Põlvamaa. Cannot decode string with wide characters at /home/ron/perl5/perlbrew/perls/perl-5.14.2/lib/5.14.2/x86_64-linux- thread-multi/Encode.pm line 176. I.e The names corresponding to the codes EE-51, EE-57 and EE-59 are all handled ok. I rigged it to skip Põlvamaa, and the next place it dies is: debug: code: EE-86. debug: name: Võrumaa. debug: decode: Võrumaa. Cannot decode string with wide characters at /home/ron/perl5/perlbrew/perls/perl-5.14.2/lib/5.14.2/x86_64-linux- thread-multi/Encode.pm line 176. So, each problem is 'o' with a tilde above it. When I rigged to code to skip these 3 cases, everything worked. This is Debian 6, 64 bit. Perl V 5.14.2. Encode V 2.44. Unicode::CaseFold V 0.02. Unicode::Normalize V 1.14. Installing Perl V 5.15.9... Versions of Encode, Unicode::CaseFold, Unicode::Normalize are the same. Same problem :-(. Cheers Ron
Download (untitled) / with headers
text/plain 514b
On Thu May 10 23:58:19 2012, RSAVAGE wrote: Show quoted text
> $self -> log(debug => "decode: " . decode('utf8', > $$element{name})); > $self -> log(debug => "decode fc: " . decode('utf8', fc > $$element{name}));
This isn't a bug in Unicode::CaseFold, except possibly the lack of a better error message (I will see what perl 5.16 does, and try to imitate it). In any case, decode('utf8', fc $bytes) is invalid. You should be writing fc decode('utf8', $bytes) instead, as fc works on character- strings, not byte-strings.
Subject: Re: [rt.cpan.org #77122] Output of fc kills Encode::decode
Date: Mon, 14 May 2012 09:50:35 +1000
To: bug-Unicode-CaseFold [...] rt.cpan.org
From: Ron Savage <ron [...] savage.net.au>
Download (untitled) / with headers
text/plain 731b
Hi Andrew On 14/05/12 01:48, Andrew Rodland via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=77122> > > On Thu May 10 23:58:19 2012, RSAVAGE wrote:
>> $self -> log(debug => "decode: " . decode('utf8', >> $$element{name})); >> $self -> log(debug => "decode fc: " . decode('utf8', fc >> $$element{name}));
> > This isn't a bug in Unicode::CaseFold, except possibly the lack of a > better error message (I will see what perl 5.16 does, and try to imitate > it). In any case, decode('utf8', fc $bytes) is invalid. You should be > writing fc decode('utf8', $bytes) instead, as fc works on character- > strings, not byte-strings.
OK. Thanx for the reply. -- Ron Savage http://savage.net.au/ Ph: 0421 920 622


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.