Skip Menu |
 
rt.cpan.org will be shut down on March 1st, 2021.

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the PPI CPAN distribution.

Report information
The Basics
Id: 12722
Status: resolved
Priority: 0/
Queue: PPI

People
Owner: Nobody in particular
Requestors: cpan [...] perlmeister.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.903
Fixed in: (no value)



Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
MIME-Version: 1.0
X-Mailer: MIME-tools 5.415 (Entity 5.415)
Subject: PPI 0.906 chokes on embedded POD with umlauts
X-RT-Original-Encoding: iso-8859-1
Content-Length: 668
Download (untitled) / with headers
text/plain 668b
The PPI tokenizer chokes on Perl modules containing umlauts in their embedded POD documentation. Example: wget http://search.cpan.org/src/MSCHILLI/Log-Log4perl-0.51/lib/Log/Log4perl.pm #!/usr/bin/perl use PPI::Document; my $d = PPI::Document->load("Log4perl.pm"); $d or print PPI::Tokenizer::errstr(), "\n"; results "Source code contains unsupported characters (first one encountered was '�')" because of the line Ceki Gülcü, "Short introduction to log4j", somewhere in the POD part. Would be great if Latin-1 chars would be acceptable as well, perl allows them in strings, regexes and POD. Anyway, thanks for this great module!
Return-Path: <adam [...] phase-n.com>
X-Original-To: bug-PPI [...] rt.cpan.org
Delivered-To: cpan-bug+ppi [...] diesel.bestpractical.com
X-Greylist: delayed 400 seconds by postgrey-1.16 at diesel; Sun, 08 May 2005 23:29:31 EDT
Received: from la.mx.develooper.com (x1.develooper.com [63.251.223.170]) by diesel.bestpractical.com (Postfix) with SMTP id C8CA34D8070 for <bug-PPI [...] rt.cpan.org>; Sun, 8 May 2005 23:29:31 -0400 (EDT)
Received: (qmail 9799 invoked by alias); 9 May 2005 03:22:49 -0000
X-Spam-Check-BY: la.mx.develooper.com
Received-SPF: neutral (x1.develooper.com: local policy)
Received: from starfury.linearg.com (HELO starfury.linearg.com) (202.90.48.2) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Sun, 08 May 2005 20:22:46 -0700
Received: from localhost (localhost [127.0.0.1]) by starfury.linearg.com (Postfix) with ESMTP id B14E4804C0DF for <bug-PPI [...] rt.cpan.org>; Mon, 9 May 2005 13:22:40 +1000 (EST)
Received: from starfury.linearg.com ([127.0.0.1]) by localhost (starfury [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 28723-12 for <bug-PPI [...] rt.cpan.org>; Mon, 9 May 2005 13:22:40 +1000 (EST)
Received: from [172.31.0.178] (hq-nat.linearg.net [202.90.48.125]) by starfury.linearg.com (Postfix) with ESMTP id 99C3A804B869 for <bug-PPI [...] rt.cpan.org>; Mon, 9 May 2005 13:22:40 +1000 (EST)
Message-ID: <427ED736.1080007 [...] phase-n.com>
Date: Mon, 09 May 2005 13:21:26 +1000
From: Adam Kennedy <adam [...] phase-n.com>
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: bug-PPI [...] rt.cpan.org
Subject: Re: [cpan #12722] PPI 0.906 chokes on embedded POD with umlauts
References: <rt-12722-37087.13.6829997244689 [...] cpan.org>
In-Reply-To: <rt-12722-37087.13.6829997244689 [...] cpan.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at linearg.com
RT-Send-Cc:
X-RT-Original-Encoding: iso-8859-1
Content-Length: 1467
Download (untitled) / with headers
text/plain 1.4k
PPI is about 80-90% capable of handling all of latin1. But in a few places it isn't capable. The errors from those places were drowning out all other legitimate errors, so I've disabled any support for full latin-1 manually at this time. If you would like to help, I would really appreciate some unit tests specifically testing where latin-1 both _is_ and _isn't_ allowed, so that I can clean up the various corners where there are problems and be sure that they are working sufficiently well. Regards Adam K Michael_Schilli via RT wrote: Show quoted text
> This message about PPI was sent to you by MSCHILLI <MSCHILLI@cpan.org> via rt.cpan.org > > Full context and any attached attachments can be found at: > <URL: https://rt.cpan.org/Ticket/Display.html?id=12722 > > > The PPI tokenizer chokes on Perl modules containing umlauts in their embedded POD documentation. Example: > > wget > http://search.cpan.org/src/MSCHILLI/Log-Log4perl-0.51/lib/Log/Log4perl.pm > > #!/usr/bin/perl > use PPI::Document; > my $d = PPI::Document->load("Log4perl.pm"); > $d or print PPI::Tokenizer::errstr(), "\n"; > > results "Source code contains unsupported characters (first one encountered was '&#65533;')" because of the line > > Ceki Gülcü, "Short introduction to log4j", > > somewhere in the POD part. Would be great if Latin-1 chars would be acceptable as well, perl allows them in strings, regexes and POD. > > Anyway, thanks for this great module!
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
MIME-Version: 1.0
X-Mailer: MIME-tools 5.415 (Entity 5.415)
X-RT-Original-Encoding: iso-8859-1
Content-Length: 32
This is a duplicate of bug 11682


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.