Skip Menu |
 

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the PPI CPAN distribution.

Report information
The Basics
Id: 12722
Status: resolved
Priority: 0/
Queue: PPI

People
Owner: Nobody in particular
Requestors: cpan [...] perlmeister.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.903
Fixed in: (no value)



Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
MIME-Version: 1.0
X-Mailer: MIME-tools 5.415 (Entity 5.415)
Subject: PPI 0.906 chokes on embedded POD with umlauts
X-RT-Original-Encoding: iso-8859-1
Content-Length: 668
Download (untitled) / with headers
text/plain 668b
The PPI tokenizer chokes on Perl modules containing umlauts in their embedded POD documentation. Example: wget http://search.cpan.org/src/MSCHILLI/Log-Log4perl-0.51/lib/Log/Log4perl.pm #!/usr/bin/perl use PPI::Document; my $d = PPI::Document->load("Log4perl.pm"); $d or print PPI::Tokenizer::errstr(), "\n"; results "Source code contains unsupported characters (first one encountered was '�')" because of the line Ceki Gülcü, "Short introduction to log4j", somewhere in the POD part. Would be great if Latin-1 chars would be acceptable as well, perl allows them in strings, regexes and POD. Anyway, thanks for this great module!
Return-Path: <adam [...] phase-n.com>
X-Original-To: bug-PPI [...] rt.cpan.org
Delivered-To: cpan-bug+ppi [...] diesel.bestpractical.com
X-Greylist: delayed 400 seconds by postgrey-1.16 at diesel; Sun, 08 May 2005 23:29:31 EDT
Received: from la.mx.develooper.com (x1.develooper.com [63.251.223.170]) by diesel.bestpractical.com (Postfix) with SMTP id C8CA34D8070 for <bug-PPI [...] rt.cpan.org>; Sun, 8 May 2005 23:29:31 -0400 (EDT)
Received: (qmail 9799 invoked by alias); 9 May 2005 03:22:49 -0000
X-Spam-Check-BY: la.mx.develooper.com
Received-SPF: neutral (x1.develooper.com: local policy)
Received: from starfury.linearg.com (HELO starfury.linearg.com) (202.90.48.2) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Sun, 08 May 2005 20:22:46 -0700
Received: from localhost (localhost [127.0.0.1]) by starfury.linearg.com (Postfix) with ESMTP id B14E4804C0DF for <bug-PPI [...] rt.cpan.org>; Mon, 9 May 2005 13:22:40 +1000 (EST)
Received: from starfury.linearg.com ([127.0.0.1]) by localhost (starfury [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 28723-12 for <bug-PPI [...] rt.cpan.org>; Mon, 9 May 2005 13:22:40 +1000 (EST)
Received: from [172.31.0.178] (hq-nat.linearg.net [202.90.48.125]) by starfury.linearg.com (Postfix) with ESMTP id 99C3A804B869 for <bug-PPI [...] rt.cpan.org>; Mon, 9 May 2005 13:22:40 +1000 (EST)
Message-ID: <427ED736.1080007 [...] phase-n.com>
Date: Mon, 09 May 2005 13:21:26 +1000
From: Adam Kennedy <adam [...] phase-n.com>
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: bug-PPI [...] rt.cpan.org
Subject: Re: [cpan #12722] PPI 0.906 chokes on embedded POD with umlauts
References: <rt-12722-37087.13.6829997244689 [...] cpan.org>
In-Reply-To: <rt-12722-37087.13.6829997244689 [...] cpan.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at linearg.com
RT-Send-Cc:
X-RT-Original-Encoding: iso-8859-1
Content-Length: 1467
Download (untitled) / with headers
text/plain 1.4k
PPI is about 80-90% capable of handling all of latin1. But in a few places it isn't capable. The errors from those places were drowning out all other legitimate errors, so I've disabled any support for full latin-1 manually at this time. If you would like to help, I would really appreciate some unit tests specifically testing where latin-1 both _is_ and _isn't_ allowed, so that I can clean up the various corners where there are problems and be sure that they are working sufficiently well. Regards Adam K Michael_Schilli via RT wrote: Show quoted text
> This message about PPI was sent to you by MSCHILLI <MSCHILLI@cpan.org> via rt.cpan.org > > Full context and any attached attachments can be found at: > <URL: https://rt.cpan.org/Ticket/Display.html?id=12722 > > > The PPI tokenizer chokes on Perl modules containing umlauts in their embedded POD documentation. Example: > > wget > http://search.cpan.org/src/MSCHILLI/Log-Log4perl-0.51/lib/Log/Log4perl.pm > > #!/usr/bin/perl > use PPI::Document; > my $d = PPI::Document->load("Log4perl.pm"); > $d or print PPI::Tokenizer::errstr(), "\n"; > > results "Source code contains unsupported characters (first one encountered was '&#65533;')" because of the line > > Ceki Gülcü, "Short introduction to log4j", > > somewhere in the POD part. Would be great if Latin-1 chars would be acceptable as well, perl allows them in strings, regexes and POD. > > Anyway, thanks for this great module!
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
MIME-Version: 1.0
X-Mailer: MIME-tools 5.415 (Entity 5.415)
X-RT-Original-Encoding: iso-8859-1
Content-Length: 32
This is a duplicate of bug 11682


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.