PPI is about 80-90% capable of handling all of latin1. But in a few
places it isn't capable. The errors from those places were drowning out
all other legitimate errors, so I've disabled any support for full
latin-1 manually at this time.
If you would like to help, I would really appreciate some unit tests
specifically testing where latin-1 both _is_ and _isn't_ allowed, so
that I can clean up the various corners where there are problems and be
sure that they are working sufficiently well.
Regards
Adam K
Michael_Schilli via RT wrote:
Show quoted text> This message about PPI was sent to you by MSCHILLI <MSCHILLI@cpan.org> via rt.cpan.org
>
> Full context and any attached attachments can be found at:
> <URL:
https://rt.cpan.org/Ticket/Display.html?id=12722 >
>
> The PPI tokenizer chokes on Perl modules containing umlauts in their embedded POD documentation. Example:
>
> wget
>
http://search.cpan.org/src/MSCHILLI/Log-Log4perl-0.51/lib/Log/Log4perl.pm
>
> #!/usr/bin/perl
> use PPI::Document;
> my $d = PPI::Document->load("Log4perl.pm");
> $d or print PPI::Tokenizer::errstr(), "\n";
>
> results "Source code contains unsupported characters (first one encountered was '�')" because of the line
>
> Ceki Gülcü, "Short introduction to log4j",
>
> somewhere in the POD part. Would be great if Latin-1 chars would be acceptable as well, perl allows them in strings, regexes and POD.
>
> Anyway, thanks for this great module!