Skip Menu |
 

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 16698
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: sthoenna [...] efn.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Date: Fri, 23 Dec 2005 02:56:58 -0800
From: Yitzchak Scott-Thoennes <sthoenna [...] efn.org>
To: bug-Encode [...] rt.cpan.org
Subject: Re: [perl #37757] decode_utf8 broken in perl 5.8.7
Filing this in CPAN's RT in case that gets Dan's attention. On Fri, Dec 02, 2005 at 12:17:05AM -0800, Yitzchak Scott-Thoennes wrote: Show quoted text
> On Tue, Nov 29, 2005 at 12:34:11PM +0100, Michael Schroeder wrote:
> > > > On Mon, 28 Nov 2005 sthoenna@efn.org wrote:
> > > On Thu, Nov 24, 2005 at 11:42:08AM -0800, debianbugs@j3e. de wrote:
> > > > decode_utf8() doesn't return "false" if run with non-UTF-8 string. It just > > > > returns the non-UTF-8 string. To see this bug in action use convmv from > > > > http://j3e.de/linux/convmv/ and convert a filename from latin1 to utf8. It will > > > > tell you that the file is already UTF-8 encoded. convmv evaluates decode_utf8() > > > > to see if a file is already utf-8-encoded.
> > > > > > I don't see any indication in the Encode doc that decode_utf8 would > > > ever return false on error. To use it to check for valid utf8, I > > > think you'd need to specify the CHECK parameter as FB_CROAK and wrap > > > the call in an eval {}; see: > > > http://perldoc.perl.org/Encode.html#Handling-Malformed-Data > > > > > > Perhaps you should use utf8::decode() instead?
> > > > Well, the perluniintro manpage says: > > > > - How Do I Detect Data That's Not Valid In a Particular Encoding? > > > > Use the "Encode" package to try converting it. For example, > > > > use Encode 'decode_utf8'; > > if (decode_utf8($string_of_bytes_that_I_think_is_utf8)) { > > # valid > > } else { > > # invalid > > }
> > Ah, I hadn't noticed that; that doesn't agree with the doc in Encode > itself, but up through Encode 2.09 (2.08 was included with perl5.8.6), > decode_utf8 did actually just call utf8::decode when no check > parameter was passed. Encode 2.10 (in perl5.8.7) now works as > described in the Encode doc, but doesn't work as described in > perluniintro. > > Dan, perhaps it would be a good idea to put back the old behavior > (reversing the change you made for > http://rt.cpan.org/NoAuth/Bug.html?id=8872 and changing the doc > instead) when no check parameter is passed?
Download (untitled) / with headers
text/plain 2.2k
On Fri Dec 23 05:57:49 2005, sthoenna@efn.org wrote: Show quoted text
> Filing this in CPAN's RT in case that gets Dan's attention. > > On Fri, Dec 02, 2005 at 12:17:05AM -0800, Yitzchak Scott-Thoennes > wrote:
> > On Tue, Nov 29, 2005 at 12:34:11PM +0100, Michael Schroeder wrote:
> > > > > > On Mon, 28 Nov 2005 sthoenna@efn.org wrote:
> > > > On Thu, Nov 24, 2005 at 11:42:08AM -0800, debianbugs@j3e. de
> wrote:
> > > > > decode_utf8() doesn't return "false" if run with non-UTF-8
> string. It just
> > > > > returns the non-UTF-8 string. To see this bug in action use
> convmv from
> > > > > http://j3e.de/linux/convmv/ and convert a filename from latin1
> to utf8. It will
> > > > > tell you that the file is already UTF-8 encoded. convmv
> evaluates decode_utf8()
> > > > > to see if a file is already utf-8-encoded.
> > > > > > > > I don't see any indication in the Encode doc that decode_utf8
> would
> > > > ever return false on error. To use it to check for valid utf8,
> I
> > > > think you'd need to specify the CHECK parameter as FB_CROAK and
> wrap
> > > > the call in an eval {}; see: > > > > http://perldoc.perl.org/Encode.html#Handling-Malformed-Data > > > > > > > > Perhaps you should use utf8::decode() instead?
> > > > > > Well, the perluniintro manpage says: > > > > > > - How Do I Detect Data That's Not Valid In a Particular Encoding? > > > > > > Use the "Encode" package to try converting it. For example, > > > > > > use Encode 'decode_utf8'; > > > if (decode_utf8($string_of_bytes_that_I_think_is_utf8)) { > > > # valid > > > } else { > > > # invalid > > > }
> > > > Ah, I hadn't noticed that; that doesn't agree with the doc in Encode > > itself, but up through Encode 2.09 (2.08 was included with
> perl5.8.6),
> > decode_utf8 did actually just call utf8::decode when no check > > parameter was passed. Encode 2.10 (in perl5.8.7) now works as > > described in the Encode doc, but doesn't work as described in > > perluniintro. > > > > Dan, perhaps it would be a good idea to put back the old behavior > > (reversing the change you made for > > http://rt.cpan.org/NoAuth/Bug.html?id=8872 and changing the doc > > instead) when no check parameter is passed?
RT #14559 reports the same bug which is fixed in 2.13. Dan the Encode Maintainer


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.