Skip Menu |
 

This queue is for tickets about the Text-Unidecode CPAN distribution.

Report information
The Basics
Id: 30501
Status: rejected
Priority: 0/
Queue: Text-Unidecode

People
Owner: Nobody in particular
Requestors: labassistant [...] nese.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Received: from la.mx.develooper.com (x1.develooper.com [63.251.223.170]) by diesel.bestpractical.com (Postfix) with SMTP id 796834D803E for <bug-Text-Unidecode [...] rt.cpan.org>; Mon, 5 Nov 2007 14:47:14 -0500 (EST)
Received: (qmail 9681 invoked by alias); 5 Nov 2007 19:47:13 -0000
Received: from ik-out-1112.google.com (HELO ik-out-1112.google.com) (66.249.90.182) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Mon, 05 Nov 2007 11:47:10 -0800
Received: by ik-out-1112.google.com with SMTP id c28so505415ika for <bug-Text-Unidecode [...] rt.cpan.org>; Mon, 05 Nov 2007 11:47:05 -0800 (PST)
Received: by 10.142.76.4 with SMTP id y4mr1237206wfa.1194292022138; Mon, 05 Nov 2007 11:47:02 -0800 (PST)
Received: by 10.142.50.1 with HTTP; Mon, 5 Nov 2007 11:47:02 -0800 (PST)
Delivered-To: cpan-bug+text-unidecode [...] diesel.bestpractical.com
MIME-Version: 1.0
Subject: Incorrect transliteration of \x{8e}
Domainkey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; b=cf3BWmZtgrDp4z6H5YlinUgk666GJJWmeHnSuppzJRvloZe7b6UcQd23M4dEtfFk39/TZfii22QRiadmpE++i3NeL6qpUbXUn0AdlVJeAhRzyo+Ehyc1/0mqT3E8pIwqzhL9368DOz1hWopZbLjd0tdIfDA6+ccbE/142ycDCcg=
X-Spam-Status: No, hits=-2.6 required=8.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VERIFIED,DK_SIGNED,HTML_MESSAGE,SPF_PASS
Return-Path: <gbisesi.nese [...] gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; bh=fnlEaOjzZe3/igEzGyGVYwyqc4P/x1ejEx1d3NPuhu0=; b=g1HwYA7c+MAJaG3DgeVnFYHco6akq/Lf4GipKVeybnZQntm3TwGnlGnTT9VKeuQ08UFPHE9tF9Qj+rxpZokRFszUVz4oDCxwvNtHcwsNlWLh5nN0GYI27MRrysmMGStRtkBbKF3/HDv2qiLYS5GA4eIh8HdzEUmd1k9OnnlkjNc=
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: bug-Text-Unidecode [...] rt.cpan.org
X-Google-Sender-Auth: a7436b3de9d23d7d
Sender: gbisesi.nese [...] gmail.com
Date: Mon, 5 Nov 2007 14:47:02 -0500
Message-Id: <dea1a1d20711051147p41bc5a03s2573ddfa9561c3a4 [...] mail.gmail.com>
Content-Type: multipart/mixed; boundary="----=_Part_27845_14145444.1194292022131"
To: bug-Text-Unidecode [...] rt.cpan.org
From: "Gavin Bisesi" <labassistant [...] nese.com>
Content-Length: 0
content-type: application/octet-stream; name="perlv"
content-disposition: attachment; filename="perlv"
X-Attachment-Id: f_f8nemdd1
Content-Transfer-Encoding: base64
Content-Length: 2862
Download perlv
application/octet-stream 2.7k

Message body not shown because it is not plain text.

Content-Type: multipart/alternative; boundary="----=_Part_27846_21742030.1194292022131"
Content-Length: 0
content-type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
X-RT-Original-Encoding: ISO-8859-1
Content-Length: 138
Download (untitled) / with headers
text/plain 138b
In Text::Unidecode v0.04, \x{8e} (é) is transliterated as an empty string rather than as "e". Output of "perl -V" is in the attachment.
Content-Type: text/html; charset=ISO-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
X-RT-Original-Encoding: ISO-8859-1
Content-Length: 167
MIME-Version: 1.0
In-Reply-To: <dea1a1d20711051147p41bc5a03s2573ddfa9561c3a4 [...] mail.gmail.com>
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
References: <dea1a1d20711051147p41bc5a03s2573ddfa9561c3a4 [...] mail.gmail.com>
Message-Id: <rt-3.6.HEAD-20848-1194393972-703.30501-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf8"
Content-Transfer-Encoding: binary
From: SBURKE [...] cpan.org
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 420
Download (untitled) / with headers
text/plain 420b
\x{8e} is correctly transliterated as empty-string, because \x{8e} is not "é" in Unicode; it is nothing, thence, nothing. Text::Unidecode requires that the input be in Unicode. You're apparently forgetting to apply the Encoding filter that would translate your non-Unicode encoding, into Unicode. (My spidey sense is tingling and telling me your encoding is the old-timey encoding MacAscii, but that's just a guess.)
MIME-Version: 1.0 (Apple Message framework v624)
X-Spam-Status: No, hits=-2.6 required=8.0 tests=BAYES_00
In-Reply-To: <rt-3.6.HEAD-20848-1194393972-703.30501-6-0 [...] rt.cpan.org>
X-Mailer: Apple Mail (2.624)
References: <RT-Ticket-30501 [...] rt.cpan.org> <dea1a1d20711051147p41bc5a03s2573ddfa9561c3a4 [...] mail.gmail.com> <rt-3.6.HEAD-20848-1194393972-703.30501-6-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"; format="flowed"
X-RT-Original-Encoding: ISO-8859-1
Received: from la.mx.develooper.com (x1.develooper.com [63.251.223.170]) by diesel.bestpractical.com (Postfix) with SMTP id 5E8194D81ED for <bug-Text-Unidecode [...] rt.cpan.org>; Wed, 7 Nov 2007 12:37:25 -0500 (EST)
Received: (qmail 28176 invoked by alias); 7 Nov 2007 17:37:24 -0000
Received: from smtp.nese.com (HELO smtp.nese.com) (66.92.90.37) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Wed, 07 Nov 2007 09:37:16 -0800
Received: from localhost (localhost [127.0.0.1]) by smtp.nese.com (Postfix) with ESMTP id F2EDF10387D7 for <bug-Text-Unidecode [...] rt.cpan.org>; Wed, 7 Nov 2007 12:37:07 -0500 (EST)
Received: from smtp.nese.com ([127.0.0.1]) by localhost (localhost [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 18367-03 for <bug-Text-Unidecode [...] rt.cpan.org>; Wed, 7 Nov 2007 12:37:07 -0500 (EST)
Received: from [192.168.2.21] (executive.nese.com [66.92.90.34]) by smtp.nese.com (Postfix) with ESMTP id 3201D10387CC for <bug-Text-Unidecode [...] rt.cpan.org>; Wed, 7 Nov 2007 12:37:07 -0500 (EST)
Delivered-To: cpan-bug+text-unidecode [...] diesel.bestpractical.com
Subject: Re: [rt.cpan.org #30501] Incorrect transliteration of \x{8e}
Return-Path: <labassistant [...] nese.com>
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: bug-Text-Unidecode [...] rt.cpan.org
Date: Wed, 7 Nov 2007 12:37:07 -0500
Message-Id: <0997f297278d3461a64d43b007304802 [...] nese.com>
To: bug-Text-Unidecode [...] rt.cpan.org
Content-Transfer-Encoding: quoted-printable
From: labassistant <labassistant [...] nese.com>
X-RT-Original-Encoding: utf-8
RT-Message-ID: <rt-3.6.HEAD-10670-1194457052-1309.30501-0-0 [...] rt.cpan.org>
Content-Length: 743
Download (untitled) / with headers
text/plain 743b
Thanks very much, and you're right, I am on a mac (macperl 5.8.6; OS X 10.4, primarily). How do I change that string to unicode? Would I use Encode::encode_utf8()? Thanks for the timely reply. On Nov 6, 2007, at 7:06 PM, via RT wrote: Show quoted text
> > <URL: http://rt.cpan.org/Ticket/Display.html?id=30501 > > > \x{8e} is correctly transliterated as empty-string, because \x{8e} is > not "é" in Unicode; it is nothing, thence, nothing. > Text::Unidecode requires that the input be in Unicode. > > You're apparently forgetting to apply the Encoding filter that would > translate your non-Unicode encoding, into Unicode. > > (My spidey sense is tingling and telling me your encoding is the > old-timey encoding MacAscii, but that's just a guess.) > >


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.