Skip Menu |
 

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 16698
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: sthoenna [...] efn.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Return-Path: <sthoenna [...] efn.org>
X-Original-To: bug-Encode [...] rt.cpan.org
Delivered-To: cpan-bug+encode [...] diesel.bestpractical.com
Received: from la.mx.develooper.com (x1.develooper.com [63.251.223.170]) by diesel.bestpractical.com (Postfix) with SMTP id AF1754D80F2 for <bug-Encode [...] rt.cpan.org>; Fri, 23 Dec 2005 05:57:44 -0500 (EST)
Received: (qmail 4417 invoked by alias); 23 Dec 2005 10:57:18 -0000
X-Spam-Check-BY: la.mx.develooper.com
Received-SPF: fail (x1.develooper.com: domain of sthoenna [...] efn.org does not designate 209.221.136.5 as permitted sender)
Received: from zipcon.net (HELO zipcon.net) (209.221.136.5) by la.mx.develooper.com (qpsmtpd/0.28) with SMTP; Fri, 23 Dec 2005 02:57:13 -0800
Received: (qmail 32678 invoked from network); 23 Dec 2005 03:02:24 -0800
Received: from unknown (HELO efn.org) (209.221.136.20) by mail.zipcon.net with SMTP; 23 Dec 2005 03:02:24 -0800
Received: by efn.org (sSMTP sendmail emulation); Fri, 23 Dec 2005 02:56:58 -0800
Date: Fri, 23 Dec 2005 02:56:58 -0800
From: Yitzchak Scott-Thoennes <sthoenna [...] efn.org>
To: bug-Encode [...] rt.cpan.org
Subject: Re: [perl #37757] decode_utf8 broken in perl 5.8.7
Message-ID: <20051223105657.GA2420 [...] efn.org>
References: <20051128120232.GB3568 [...] efn.org> <20051129113411.GA27959 [...] immd4.informatik.uni-erlangen.de> <20051202081705.GA4592 [...] efn.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20051202081705.GA4592 [...] efn.org>
User-Agent: Mutt/1.4.2.1i
Organization: bs"d
X-RT-Original-Encoding: us-ascii
Content-Length: 2058
Filing this in CPAN's RT in case that gets Dan's attention. On Fri, Dec 02, 2005 at 12:17:05AM -0800, Yitzchak Scott-Thoennes wrote: Show quoted text
> On Tue, Nov 29, 2005 at 12:34:11PM +0100, Michael Schroeder wrote:
> > > > On Mon, 28 Nov 2005 sthoenna@efn.org wrote:
> > > On Thu, Nov 24, 2005 at 11:42:08AM -0800, debianbugs@j3e. de wrote:
> > > > decode_utf8() doesn't return "false" if run with non-UTF-8 string. It just > > > > returns the non-UTF-8 string. To see this bug in action use convmv from > > > > http://j3e.de/linux/convmv/ and convert a filename from latin1 to utf8. It will > > > > tell you that the file is already UTF-8 encoded. convmv evaluates decode_utf8() > > > > to see if a file is already utf-8-encoded.
> > > > > > I don't see any indication in the Encode doc that decode_utf8 would > > > ever return false on error. To use it to check for valid utf8, I > > > think you'd need to specify the CHECK parameter as FB_CROAK and wrap > > > the call in an eval {}; see: > > > http://perldoc.perl.org/Encode.html#Handling-Malformed-Data > > > > > > Perhaps you should use utf8::decode() instead?
> > > > Well, the perluniintro manpage says: > > > > - How Do I Detect Data That's Not Valid In a Particular Encoding? > > > > Use the "Encode" package to try converting it. For example, > > > > use Encode 'decode_utf8'; > > if (decode_utf8($string_of_bytes_that_I_think_is_utf8)) { > > # valid > > } else { > > # invalid > > }
> > Ah, I hadn't noticed that; that doesn't agree with the doc in Encode > itself, but up through Encode 2.09 (2.08 was included with perl5.8.6), > decode_utf8 did actually just call utf8::decode when no check > parameter was passed. Encode 2.10 (in perl5.8.7) now works as > described in the Encode doc, but doesn't work as described in > perluniintro. > > Dan, perhaps it would be a good idea to put back the old behavior > (reversing the change you made for > http://rt.cpan.org/NoAuth/Bug.html?id=8872 and changing the doc > instead) when no check parameter is passed?
MIME-Version: 1.0
In-Reply-To: <20051223105657.GA2420 [...] efn.org>
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
Message-Id: <rt-3.5.HEAD-22141-1137337068-863.16698-0-0 [...] rt.cpan.org>
References: <20051128120232.GB3568 [...] efn.org> <20051129113411.GA27959 [...] immd4.informatik.uni-erlangen.de> <20051202081705.GA4592 [...] efn.org> <20051223105657.GA2420 [...] efn.org>
Content-Type: text/plain; charset="utf8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 2296
Download (untitled) / with headers
text/plain 2.2k
On Fri Dec 23 05:57:49 2005, sthoenna@efn.org wrote: Show quoted text
> Filing this in CPAN's RT in case that gets Dan's attention. > > On Fri, Dec 02, 2005 at 12:17:05AM -0800, Yitzchak Scott-Thoennes > wrote:
> > On Tue, Nov 29, 2005 at 12:34:11PM +0100, Michael Schroeder wrote:
> > > > > > On Mon, 28 Nov 2005 sthoenna@efn.org wrote:
> > > > On Thu, Nov 24, 2005 at 11:42:08AM -0800, debianbugs@j3e. de
> wrote:
> > > > > decode_utf8() doesn't return "false" if run with non-UTF-8
> string. It just
> > > > > returns the non-UTF-8 string. To see this bug in action use
> convmv from
> > > > > http://j3e.de/linux/convmv/ and convert a filename from latin1
> to utf8. It will
> > > > > tell you that the file is already UTF-8 encoded. convmv
> evaluates decode_utf8()
> > > > > to see if a file is already utf-8-encoded.
> > > > > > > > I don't see any indication in the Encode doc that decode_utf8
> would
> > > > ever return false on error. To use it to check for valid utf8,
> I
> > > > think you'd need to specify the CHECK parameter as FB_CROAK and
> wrap
> > > > the call in an eval {}; see: > > > > http://perldoc.perl.org/Encode.html#Handling-Malformed-Data > > > > > > > > Perhaps you should use utf8::decode() instead?
> > > > > > Well, the perluniintro manpage says: > > > > > > - How Do I Detect Data That's Not Valid In a Particular Encoding? > > > > > > Use the "Encode" package to try converting it. For example, > > > > > > use Encode 'decode_utf8'; > > > if (decode_utf8($string_of_bytes_that_I_think_is_utf8)) { > > > # valid > > > } else { > > > # invalid > > > }
> > > > Ah, I hadn't noticed that; that doesn't agree with the doc in Encode > > itself, but up through Encode 2.09 (2.08 was included with
> perl5.8.6),
> > decode_utf8 did actually just call utf8::decode when no check > > parameter was passed. Encode 2.10 (in perl5.8.7) now works as > > described in the Encode doc, but doesn't work as described in > > perluniintro. > > > > Dan, perhaps it would be a good idea to put back the old behavior > > (reversing the change you made for > > http://rt.cpan.org/NoAuth/Bug.html?id=8872 and changing the doc > > instead) when no check parameter is passed?
RT #14559 reports the same bug which is fixed in 2.13. Dan the Encode Maintainer


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.