Skip Menu |
 

This queue is for tickets about the Pod-Perldoc CPAN distribution.

Report information
The Basics
Id: 80527
Status: resolved
Priority: 0/
Queue: Pod-Perldoc

People
Owner: Nobody in particular
Requestors:
Cc: explorer [...] joaquinferrero.com
AdminCc:

Bug Information
Severity: Normal
Broken in: 3.17
Fixed in: 3.20



CC: explorer [...] joaquinferrero.com
Subject: perldoc cannot find functions sections when is called with -L switch
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 1072
When perldoc is called using the '-L XX' and '-f' switches, it cannot find the corresponding section of the requested function if the string returned by POD2::XX::search_perlfunc_re() (i.e. the string that marks the beginning of the section in perlfunc.pod that contains the descriptions of the functions available) is not encoded as iso-8859-1. perldoc -L ES perlfunc works, but perldoc -L ES -f chr returns the following error message: No documentation for perl function 'chr' found Interum solution: search_perlfunc_re() should return an iso-8859-1-encoded string fully (or partially) matching the same string that should appear in perlfunc.pod. Proposed solution: The search process of the string returned by search_perlfunc_re() should consider the encoding used for perlfunc.pod. In POD2::ES all the docs are UTF-8-encoded. As a temporary solution, we have fixed this issue by removing characters with diacritic marks: sub search_perlfunc_re { return 'Lista de funciones de Perl en orden'; } (removed ‘alfabético’)
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-26932-1351710347-770.80527-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 560
Download (untitled) / with headers
text/plain 560b
Show quoted text
> perldoc -L ES perlfunc > > works, but > > perldoc -L ES -f chr > > returns the following error message: > > No documentation for perl function 'chr' found > > Interum solution: search_perlfunc_re() should return an > iso-8859-1-encoded string fully (or partially) matching the same > string that should appear in perlfunc.pod. > > Proposed solution: The search process of the string returned by > search_perlfunc_re() should consider the encoding used for > perlfunc.pod.
Thanks for the report. We definitely need to fix this.
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-7766-1359431373-958.80527-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 1511
Download (untitled) / with headers
text/plain 1.4k
On Wed Oct 31 11:44:36 2012, explorer@joaquinferrero.com wrote: Show quoted text
> When perldoc is called using the '-L XX' and '-f' switches, it > cannot find the corresponding section of the requested function > if the string returned by POD2::XX::search_perlfunc_re() (i.e. > the string that marks the beginning of the section in > perlfunc.pod that contains the descriptions of the functions > available) is not encoded as iso-8859-1. > > perldoc -L ES perlfunc > > works, but > > perldoc -L ES -f chr > > returns the following error message: > > No documentation for perl function 'chr' found > > Interum solution: search_perlfunc_re() should return an > iso-8859-1-encoded string fully (or partially) matching the same > string that should appear in perlfunc.pod. > > Proposed solution: The search process of the string returned by > search_perlfunc_re() should consider the encoding used for > perlfunc.pod. > > In POD2::ES all the docs are UTF-8-encoded. As a temporary > solution, we have fixed this issue by removing characters with > diacritic marks: > > sub search_perlfunc_re { > return 'Lista de funciones de Perl en orden'; > } > > (removed ‘alfabético’)
There's no easy way to tell what encoding a given file is in reliably, so I am wondering if we should have a callback function in POD2::XX like search_perlfunc_re_encoding() which returns a string scalar like "latin1" or "utf8" or whatever is appropriate. What do you think about that? Thanks. Mark
MIME-Version: 1.0
In-Reply-To: <rt-3.8.HEAD-7766-1359431373-958.80527-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <rt-3.8.HEAD-7766-1359431373-958.80527-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-14031-1359560910-238.80527-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 894
Download (untitled) / with headers
text/plain 894b
Le 2013-01-29 04:49:33, mallen a écrit : Show quoted text
> > There's no easy way to tell what encoding a given file is in reliably,
This is irrelevant to the issue. Perl source encoding is specified with "use utf8" or "use encoding ...". POD source encoding is specified with "=encoding ...". Show quoted text
> so I am wondering if we should have a callback function in POD2::XX like > search_perlfunc_re_encoding() which returns a string scalar like > "latin1" or "utf8" or whatever is appropriate.
POD2::ES has "use utf8" at the beginning. So the search_perlfunc_re_encoding() returns a Unicode strings (which Perl internals calls "utf8", see the utf8 module). So POD2::ES seems fine. This is Perldoc that must be fixed. Mark, are you sure that Perldoc correctly process POD sections after having decoded it from bytes to the encoding specified by "=encoding" ? -- Olivier Mengué - http://perlresume.org/DOLMEN
From explorer [...] joaquinferrero.com Wed Jan 30 16: 40:15 2013
MIME-Version: 1.0
X-Spam-Status: No, score=-6.235 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, SPF_SOFTFAIL=0.665] autolearn=ham
In-Reply-To: <rt-3.8.HEAD-7766-1359431373-634.80527-6-0 [...] rt.cpan.org>
X-Message-Linecount: 94
X-Spam-Flag: NO
References: <RT-Ticket-80527 [...] rt.cpan.org> <rt-3.8.HEAD-7766-1359431373-634.80527-6-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <5109932C.4070005 [...] joaquinferrero.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
X-Connected-Ip: 127.0.0.1:40621
X-Date: 2013-01-30 22:40:05
X-RT-Original-Encoding: utf-8
X-Spam-Score: -6.235
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 38CDD240542 for <cpan-bug+Pod-Perldoc [...] hipster.bestpractical.com>; Wed, 30 Jan 2013 16:40:15 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IYhp8iMBtsmj for <cpan-bug+Pod-Perldoc [...] hipster.bestpractical.com>; Wed, 30 Jan 2013 16:40:13 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 3F82424044C for <bug-Pod-Perldoc [...] rt.cpan.org>; Wed, 30 Jan 2013 16:40:12 -0500 (EST)
Received: (qmail 22394 invoked by uid 103); 30 Jan 2013 21:40:12 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 30 Jan 2013 21:40:12 -0000
Received: from aprosi320.aprosi.net (HELO aprosi320.aprosi.net) (82.194.86.160) by 16.mx.develooper.com (qpsmtpd/0.84/v0.84-167-g4ed6cab) with ESMTP; Wed, 30 Jan 2013 13:40:08 -0800
Received: from localhost ([127.0.0.1]) by aprosi320.aprosi.net with esmtpa (Exim 4.72) (envelope-from <explorer [...] joaquinferrero.com>) id 1U0fO0-0007Hs-M1 for bug-Pod-Perldoc [...] rt.cpan.org; Wed, 30 Jan 2013 22:40:05 +0100
Delivered-To: cpan-bug+Pod-Perldoc [...] hipster.bestpractical.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130105 Thunderbird/17.0.2
Subject: Re: [rt.cpan.org #80527] perldoc cannot find functions sections when is called with -L switch
Return-Path: <explorer [...] joaquinferrero.com>
X-Spam-Check-BY: 16.mx.develooper.com
X-Original-To: cpan-bug+Pod-Perldoc [...] hipster.bestpractical.com
X-RT-Mail-Extension: pod-perldoc
X-Body-Linecount: 81
X-Body-Size: 2203
Date: Wed, 30 Jan 2013 22:39:56 +0100
X-Spam-Level:
X-Message-Size: 2839
To: bug-Pod-Perldoc [...] rt.cpan.org
Content-Transfer-Encoding: 8bit
X-Exim-Version: 4.72 (build at 12-May-2011 18:51:33)
From: Joaquin Ferrero <explorer [...] joaquinferrero.com>
RT-Message-ID: <rt-3.8.HEAD-14578-1359582016-124.80527-0-0 [...] rt.cpan.org>
Content-Length: 2203
Download (untitled) / with headers
text/plain 2.1k
El 29/01/13 04:49, Mark Allen via RT escribió: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=80527 > > > On Wed Oct 31 11:44:36 2012, explorer@joaquinferrero.com wrote:
>> When perldoc is called using the '-L XX' and '-f' switches, it >> cannot find the corresponding section of the requested function >> if the string returned by POD2::XX::search_perlfunc_re() (i.e. >> the string that marks the beginning of the section in >> perlfunc.pod that contains the descriptions of the functions >> available) is not encoded as iso-8859-1. >> >> perldoc -L ES perlfunc >> >> works, but >> >> perldoc -L ES -f chr >> >> returns the following error message: >> >> No documentation for perl function 'chr' found >> >> Interum solution: search_perlfunc_re() should return an >> iso-8859-1-encoded string fully (or partially) matching the same >> string that should appear in perlfunc.pod. >> >> Proposed solution: The search process of the string returned by >> search_perlfunc_re() should consider the encoding used for >> perlfunc.pod. >> >> In POD2::ES all the docs are UTF-8-encoded. As a temporary >> solution, we have fixed this issue by removing characters with >> diacritic marks: >> >> sub search_perlfunc_re { >> return 'Lista de funciones de Perl en orden'; >> } >> >> (removed ‘alfabético’)
> > There's no easy way to tell what encoding a given file is in reliably, > so I am wondering if we should have a callback function in POD2::XX like > search_perlfunc_re_encoding() which returns a string scalar like > "latin1" or "utf8" or whatever is appropriate. > > What do you think about that? > > Thanks. > > Mark > > >
Other solution: 1) edit perlfunc.pod, and search by line =head2 Alphabetical Listing of Perl Functions And add one line, below: X<Alphabetical Listing of Perl Functions> or, =for Pod::Functions Alphabetical Listing of Perl Functions 2) Modify the perldoc procedure to search by this line, and not by the =head2 line. The =head2 tag will by displayed by perldoc, but not the =for line With this solution, the translations teams and perldoc don't need the search_perlfunc_re() function anymore :) Best Regards, JF^D
MIME-Version: 1.0
In-Reply-To: <rt-3.8.HEAD-14031-1359560910-238.80527-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <rt-3.8.HEAD-7766-1359431373-958.80527-0-0 [...] rt.cpan.org> <rt-3.8.HEAD-14031-1359560910-238.80527-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-14579-1359589855-791.80527-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 960
Download (untitled) / with headers
text/plain 960b
On Wed Jan 30 10:48:30 2013, DOLMEN wrote: Show quoted text
> So POD2::ES seems fine. This is Perldoc that must be fixed. > > Mark, are you sure that Perldoc correctly process POD sections after > having decoded it from bytes to the encoding specified by "=encoding" ?
Perldoc *doesn't* interpret POD at all. It's jobs are: 1) Locate the appropriate file (or section of perlfunc, etc) 2) Feed the file to the appropriate formatter 3) Dump the formatted output from step #2 to a pager So perldoc has no way of correctly interpreting =encoding directives without parsing through the file and looking for them. It's arguable that it *ought* to do that, but historically it hasn't. That's why I suggested making the encoding of the POD2::XX regex a callback. We could instead *assume* that POD2::XX is encoded in Latin1 unless Encode:is_utf8 returns true. Or vice versa (assume it's utf8 unless is_utf8 returns false) Other thoughts? Thanks for your help on this. Mark
MIME-Version: 1.0
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.11-28023-1367040665-531.80527-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 913
Download (untitled) / with headers
text/plain 913b
On Wed Oct 31 11:44:36 2012, explorer@joaquinferrero.com wrote: Show quoted text
> In POD2::ES all the docs are UTF-8-encoded. As a temporary > solution, we have fixed this issue by removing characters with > diacritic marks: > > sub search_perlfunc_re { > return 'Lista de funciones de Perl en orden'; > } > > (removed ‘alfabético’)
OK, I found the problem. When perldoc open filehandles for "dynamic" POD files - like extracts from perlfunc.pod it doesn't open them as UTF-8, so we make sure to do so and add an '=encoding utf8' on top of that. This has the happy side effect of making the full regex with diacritical marks work properly (at least on my local Pod::Perldoc.) Somewhere in the tool chain, you need the latest Pod::Simple and Pod::Text distributions from CPAN as they have much much better UTF-8 support in them now. This is fixed in Pod::Perldoc 3.20 which headed to CPAN shortly. Thanks.
MIME-Version: 1.0
In-Reply-To: <rt-4.0.11-28023-1367040665-531.80527-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: API
References: <rt-4.0.11-28023-1367040665-531.80527-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.12-18539-1369052641-1371.0-0-0 [...] rt.cpan.org>
Message-ID: <rt-4.0.12-18539-1369052641-625.80527-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
From: Joaquin Ferrero <explorer [...] joaquinferrero.com>
Content-Length: 683
Download (untitled) / with headers
text/plain 683b
El Sáb Abr 27 01:31:05 2013, mallen escribió: Show quoted text
> > This is fixed in Pod::Perldoc 3.20 which headed to CPAN shortly. >
Confirmed. POD2/ES.pm: 55 # String for perldoc with -L switch 56 sub search_perlfunc_re { 57 return 'Lista de funciones de Perl en orden alfabético'; 58 } (I added the word "alfabético", with the "é" utf8 char) Now, perldoc -f <function> work perfectly: $ perldoc -f chr chr NÚMERO chr Devuelve el carácter representado por NÚMERO en el conjunto de caracteres. Por ejemplo, "chr(65)" es "A" tanto en ASCII como en Unicode, y chr(0x263a) es una cara sonriente en Unicode. Thanks!
MIME-Version: 1.0
X-Message-Linecount: 186
X-Spam-Flag: NO
X-Virus-Checked: Checked
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Content-Type: multipart/alternative; boundary="----A01AKRTTE59NVDBZTYWY13CE9QYZ7K"
X-Date: 2014-02-06 14:05:07
X-Spam-Score: -1.899
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 5BAC5240AB9 for <cpan-bug+Pod-Perldoc [...] hipster.bestpractical.com>; Thu, 6 Feb 2014 08:05:21 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 44iLZDYeqyGK for <cpan-bug+Pod-Perldoc [...] hipster.bestpractical.com>; Thu, 6 Feb 2014 08:05:19 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 78C2324036C for <bug-Pod-Perldoc [...] rt.cpan.org>; Thu, 6 Feb 2014 08:05:17 -0500 (EST)
Received: (qmail 827 invoked by alias); 6 Feb 2014 13:05:17 -0000
Received: from ks201440.kimsufi.com (HELO aprosi100.aprosi.net) (91.121.132.11) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Thu, 06 Feb 2014 05:05:12 -0800
Received: from 230.77.20.95.dynamic.jazztel.es ([95.20.77.230] helo=[192.168.1.4]) by aprosi100.aprosi.net with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from <explorer [...] joaquinferrero.com>) id 1WBOdf-00084G-EV for bug-Pod-Perldoc [...] rt.cpan.org; Thu, 06 Feb 2014 14:05:07 +0100
Delivered-To: cpan-bug+Pod-Perldoc [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #80527] perldoc cannot find functions sections when is called with -L switch
X-Spam-Check-BY: la.mx.develooper.com
X-Body-Linecount: 174
X-Body-Size: 6281
Date: Thu, 06 Feb 2014 14:03:48 +0100
X-Spam-Level:
To: bug-Pod-Perldoc [...] rt.cpan.org
Content-Transfer-Encoding: 8bit
X-Authenticator: plain
From explorer [...] joaquinferrero.com Thu Feb 6 08: 05:21 2014
In-Reply-To: <rt-3.8.HEAD-7766-1359431373-634.80527-6-0 [...] rt.cpan.org>
X-Spam-Status: No, score=-1.899 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001] autolearn=ham
X-RT-Interface: API
References: <RT-Ticket-80527 [...] rt.cpan.org> <rt-3.8.HEAD-7766-1359431373-634.80527-6-0 [...] rt.cpan.org>
Message-ID: <72a55b7f-a324-4dbf-808a-b774ac34e19c [...] email.android.com>
X-Connected-Ip: 95.20.77.230:44086
User-Agent: K-9 Mail for Android
Return-Path: <explorer [...] joaquinferrero.com>
X-RT-Mail-Extension: pod-perldoc
X-Original-To: cpan-bug+Pod-Perldoc [...] hipster.bestpractical.com
X-Authenticated-User: explorer [...] joaquinferrero.com
X-Message-Size: 6928
X-Exim-Version: 4.80 (build at 02-Jan-2013 18:59:17)
From: Joaquín Ferrero <explorer [...] joaquinferrero.com>
RT-Message-ID: <rt-4.0.18-9023-1391691922-1644.80527-0-0 [...] rt.cpan.org>
Content-Length: 0
content-type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
X-RT-Original-Encoding: utf-8
Content-Length: 2657
Download (untitled) / with headers
text/plain 2.5k
El 29/01/13 04:49, Mark Allen via RT escribió: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=80527 > > > On Wed Oct 31 11:44:36 2012, explorer@joaquinferrero.com wrote:
>> When perldoc is called using the '-L XX' and '-f' switches, it >> cannot find the corresponding section of the requested function >> if the string returned by POD2::XX::search_perlfunc_re() (i.e. >> the string that marks the beginning of the section in >> perlfunc.pod that contains the descriptions of the functions >> available) is not encoded as iso-8859-1. >> >> perldoc -L ES perlfunc >> >> works, but >> >> perldoc -L ES -f chr >> >> returns the following error message: >> >> No documentation for perl function 'chr' found >> >> Interum solution: search_perlfunc_re() should return an >> iso-8859-1-encoded string fully (or partially) matching the same >> string that should appear in perlfunc.pod. >> >> Proposed solution: The search process of the string returned by >> search_perlfunc_re() should consider the encoding used for >> perlfunc.pod. >> >> In POD2::ES all the docs are UTF-8-encoded. As a temporary >> solution, we have fixed this issue by removing characters with >> diacritic marks: >> >> sub search_perlfunc_re { >> return 'Lista de funciones de Perl en orden'; >> } >> >> (removed ‘alfabético’)
> > There's no easy way to tell what encoding a given file is in reliably, > so I am wondering if we should have a callback function in POD2::XX like > search_perlfunc_re_encoding() which returns a string scalar like > "latin1" or "utf8" or whatever is appropriate. > > What do you think about that? > > Thanks. > > Mark > > >
Yes, it's true. The Spanish PerlDoc team suggested to change all original English pod documentation to utf8 encoding, but this proposal was not approved. The Spanish version are all utf8 encoded. Other language translation will be. In this moment, 31 of 169 English pods have the encoding line :) The best part for this problem is that known that encoding of pod documents is easy: all pod are ISO-8859-1, unless the pod have a =encoding tag, showing the encoding. The problem now is to make a regex compatible with these encoding, so perldoc can find the start of list of functions in perlfunc. The search_perlfunc_re_encoding() function would read the first lines of perlfunc.pod and show the encoding, but perldoc can make this operation, also. Other solution: 1) edit perlfunc.pod, and at line 2) remove all the code about I will talk with the Spanish PerlDoc team, and we will send you another email. Best Regards, JF^D -- Enviado desde mi teléfono con K-9 Mail.
content-type: text/html; charset="utf-8"
Content-Transfer-Encoding: 8bit
X-RT-Original-Encoding: utf-8
Content-Length: 3361


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.