Skip Menu |
 

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 8089
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: DANKOGAI [...] cpan.org
Requestors: derhoermi [...] gmx.net
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Return-Path: <derhoermi [...] gmx.net>
X-Original-To: bug-Encode [...] rt.cpan.org
Delivered-To: cpan-bug+encode [...] pallas.eruditorum.org
Received: from mail.gmx.net (pop.gmx.de [213.165.64.20]) by pallas.eruditorum.org (Postfix) with SMTP id E051E84C00E for <bug-Encode [...] rt.cpan.org>; Thu, 21 Oct 2004 15:35:17 -0400 (EDT)
Received: (qmail 8985 invoked by uid 65534); 21 Oct 2004 19:36:23 -0000
Received: from dsl-082-082-072-126.arcor-ip.net (EHLO voyager) (82.82.72.126) by mail.gmx.net (mp012) with SMTP; 21 Oct 2004 21:36:23 +0200
X-Authenticated: #723575
From: Bjoern Hoehrmann <derhoermi [...] gmx.net>
To: bug-Encode [...] rt.cpan.org
Subject: Encode::utf8::decode_xs does not check partial chars
Date: Thu, 21 Oct 2004 21:36:14 +0200
Message-Id: <418d0f0b.569835059 [...] smtp.bjoern.hoehrmann.de>
X-Mailer: Forte Agent 1.92/32.572
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-RT-Original-Encoding: us-ascii
Content-Length: 454
Download (untitled) / with headers
text/plain 454b
Hi, % perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rn))" does not work as expected (it should print "Bj\x{FFFD}rn") which is apparently due to Encode::utf8::decode_xs(), the code ... if ((s + skip) > e) { /* Partial character - done */ break; } ... causes the routine to assume that the octets following that "partial" character are well-formed UTF-8, but this should not be assumed as it causes the unexpected behavior above.
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
MIME-Version: 1.0
X-Mailer: MIME-tools 5.411 (Entity 5.404)
X-RT-Original-Encoding: iso-8859-1
Content-Length: 573
Download (untitled) / with headers
text/plain 573b
[derhoermi@gmx.net - Thu Oct 21 15:35:26 2004]: Show quoted text
> % perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rn))" > > does not work as expected (it should print "Bj\x{FFFD}rn") which is > apparently due to Encode::utf8::decode_xs(), the code
In this particular case, your expectation is wrong. Try perl -MEncode -le 'print decode(q(iso-latin1), qq(Bj\xF6rn))' and it works as expected. You expect perl treats "Bj\xF6rn" as UTF-8 but perl does not. Perl treats \xHH as iso-latin1. See "Perl's Unicode Model" section of perldoc perluniintro. Dan the Encode Maintainer


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.