Skip Menu |
 

This queue is for tickets about the podlators CPAN distribution.

Report information
The Basics
Id: 68741
Status: open
Priority: 0/
Queue: podlators

People
Owner: Nobody in particular
Requestors: matt.lawrence [...] virgin.net
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 1.00
Fixed in: (no value)



Subject: Replacement of some characters with X
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
X-RT-Original-Encoding: utf-8
Content-Type: multipart/mixed; boundary="----------=_1307716276-18807-498"
Content-Length: 0
Content-Type: text/plain; charset="UTF-8"
Content-Disposition: inline
Content-Transfer-Encoding: binary
Content-Length: 415
Download (untitled) / with headers
text/plain 415b
I noticed that things like E<copy> and E<pound> are rendered as X by Pod::Man, though these are rendered as expected in HTML etc. The correct behaviour seems to occur with "pod2man --utf8", but I couldn't find an easy way of making perldoc pass that option on. The attached patch worked for me, it adds mappings to roff escapes for latin-1 characters 0xa0 to 0xbf, 0xd7 and 0xf7. Based on the groff_chars man page.
Subject: pod_man_escapes.patch
MIME-Version: 1.0
Content-Type: text/x-patch; name="pod_man_escapes.patch"
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline; filename="pod_man_escapes.patch"
Content-Transfer-Encoding: binary
Content-Length: 2719
--- lib/Pod/Man.pm +++ lib/Pod/Man.pm @@ -1315,22 +1315,56 @@ # This only works in an ASCII world. What to do in a non-ASCII world is very # unclear -- hopefully we can assume UTF-8 and just leave well enough alone. @ESCAPES{0xA0 .. 0xFF} = ( - "\\ ", undef, undef, undef, undef, undef, undef, undef, - undef, undef, undef, undef, undef, "\\%", undef, undef, - - undef, undef, undef, undef, undef, undef, undef, undef, - undef, undef, undef, undef, undef, undef, undef, undef, - + # 0xa0 + "\\ ", # non-breaking space + "\\[r!]", # inverted exclamation mark + "\\[ct]", # cent + "\\[Po]", # pound sterling + "\\[Cs]", # currency symbol + "\\[Ye]", # yen + "\\[bb]", # broken bar + "\\[sc]", # section + "\\[ad]", # diaresis + "\\[co]", # copyright + "\\[Of]", # feminine ordinal indicator + "\\[Fo]", # left guillemot + "\\[no]", # logical not + "\\%", # roff special + "\\[rg]", # registered + "\\[a-]", # macron + + # 0xb0 + "\\[de]", # degree + "\\[+-]", # plusminus + "\\[S2]", # superscript 2 + "\\[S3]", # superscript 3 + "\\[aa]", # acute accent + "\\[mc]", # micro sign + "\\[ps]", # paragraph + "\\[pc]", # centered period + "\\[ac]", # cedilla accent + "\\[S1]", # superscript 1 + "\\[Om]", # masculine ordinal indicator + "\\[Fc]", # right guillemot + "\\[14]", # one quarter + "\\[12]", # one half + "\\[34]", # three quarters + "\\[r?]", # inverted question mark + + # 0xc0 "A\\*`", "A\\*'", "A\\*^", "A\\*~", "A\\*:", "A\\*o", "\\*(AE", "C\\*,", "E\\*`", "E\\*'", "E\\*^", "E\\*:", "I\\*`", "I\\*'", "I\\*^", "I\\*:", - "\\*(D-", "N\\*~", "O\\*`", "O\\*'", "O\\*^", "O\\*~", "O\\*:", undef, + # 0xd0 + "\\*(D-", "N\\*~", "O\\*`", "O\\*'", "O\\*^", "O\\*~", "O\\*:", "\\[mu]", "O\\*/", "U\\*`", "U\\*'", "U\\*^", "U\\*:", "Y\\*'", "\\*(Th", "\\*8", + # 0xe0 "a\\*`", "a\\*'", "a\\*^", "a\\*~", "a\\*:", "a\\*o", "\\*(ae", "c\\*,", "e\\*`", "e\\*'", "e\\*^", "e\\*:", "i\\*`", "i\\*'", "i\\*^", "i\\*:", - "\\*(d-", "n\\*~", "o\\*`", "o\\*'", "o\\*^", "o\\*~", "o\\*:", undef, + # 0xf0 + "\\*(d-", "n\\*~", "o\\*`", "o\\*'", "o\\*^", "o\\*~", "o\\*:", "\\[di]", "o\\*/" , "u\\*`", "u\\*'", "u\\*^", "u\\*:", "y\\*'", "\\*(th", "y\\*:", ) if ASCII; --- t/man.t +++ t/man.t @@ -226,11 +226,11 @@ ### =head1 YEN -It cost me E<165>12345! That should be an X. +It cost me E<165>12345! That should not be an X. ### .SH "YEN" .IX Header "YEN" -It cost me X12345! That should be an X. +It cost me \[Ye]12345! That should not be an X. ### ###
From rra [...] stanford.edu Fri Jun 10 13: 47:25 2011
MIME-Version: 1.0
X-Spam-Status: No, score=-7.24 tagged_above=-99.9 required=10 tests=[AWL=-1.119, BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, SPF_NEUTRAL=0.779] autolearn=ham
In-Reply-To: <rt-3.8.HEAD-18807-1307716276-134.68741-4-0 [...] rt.cpan.org> (Matthew Lawrence via's message of "Fri, 10 Jun 2011 10:31:17 -0400")
X-Spam-Flag: NO
References: <RT-Ticket-68741 [...] rt.cpan.org> <rt-3.8.HEAD-18807-1307716276-134.68741-4-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <87aadpinob.fsf [...] windlord.stanford.edu>
Content-Type: text/plain; charset="utf-8"
Organization: The Eyrie
X-RT-Original-Encoding: utf-8
X-Spam-Score: -7.24
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 7E50661E00F for <cpan-bug+podlators [...] hipster.bestpractical.com>; Fri, 10 Jun 2011 13:47:25 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 42WeMLAf+rCK for <cpan-bug+podlators [...] hipster.bestpractical.com>; Fri, 10 Jun 2011 13:47:23 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 2D59861E00B for <bug-podlators [...] rt.cpan.org>; Fri, 10 Jun 2011 13:47:23 -0400 (EDT)
Received: (qmail 29751 invoked by uid 103); 10 Jun 2011 17:47:22 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 10 Jun 2011 17:47:22 -0000
Received: from smtp3.Stanford.EDU (HELO smtp.stanford.edu) (171.67.219.83) by 16.mx.develooper.com (qpsmtpd/0.80/v0.80-19-gf52d165) with ESMTP; Fri, 10 Jun 2011 10:47:20 -0700
Received: from smtp.stanford.edu (localhost.localdomain [127.0.0.1]) by localhost (Postfix) with SMTP id 6E799D82BA for <bug-podlators [...] rt.cpan.org>; Fri, 10 Jun 2011 10:47:17 -0700 (PDT)
Received: from windlord.stanford.edu (windlord.Stanford.EDU [171.67.225.134]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.stanford.edu (Postfix) with ESMTPS id 1A04BD825E for <bug-podlators [...] rt.cpan.org>; Fri, 10 Jun 2011 10:47:17 -0700 (PDT)
Received: by windlord.stanford.edu (Postfix, from userid 1000) id F3D602F4E6; Fri, 10 Jun 2011 10:47:16 -0700 (PDT)
Delivered-To: cpan-bug+podlators [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #68741] Replacement of some characters with X
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux)
Return-Path: <rra [...] stanford.edu>
X-Spam-Check-BY: 16.mx.develooper.com
X-Original-To: cpan-bug+podlators [...] hipster.bestpractical.com
X-RT-Mail-Extension: podlators
Date: Fri, 10 Jun 2011 10:47:16 -0700
X-Spam-Level:
To: bug-podlators [...] rt.cpan.org
From: Russ Allbery <rra [...] stanford.edu>
RT-Message-ID: <rt-3.8.HEAD-18809-1307728046-1785.68741-0-0 [...] rt.cpan.org>
Content-Length: 1688
Download (untitled) / with headers
text/plain 1.6k
"Matthew Lawrence via RT" <bug-podlators@rt.cpan.org> writes: Show quoted text
> I noticed that things like E<copy> and E<pound> are rendered as X by > Pod::Man, though these are rendered as expected in HTML etc. The correct > behaviour seems to occur with "pod2man --utf8", but I couldn't find an > easy way of making perldoc pass that option on.
There's some discussion about changing the default to assume UTF-8 output under at least some circumstances. The current behavior is not a bug -- it's intentional, because old versions of *roff on some platforms will segfault and core dump when given 8-bit characters. pod2man has always produced maximally conservative output by default because the generated output is intended for distribution. However, it looks like those platforms have mostly died out, and it's probably time to start doing something else. The question is: what else to do? The problem with character sets is that you don't know which one to choose. We can blindly output UTF-8, but that means that if someone views the page in a locale that isn't UTF-8, they're going to get mangled garbage. (Of course, the X's are already mangled garbage, so this is probably not that much of a drawback.) I'm currently leaning towards outputing UTF-8 by default, but I'm kicking around the idea of trying to use the user's locale. Show quoted text
> The attached patch worked for me, it adds mappings to roff escapes for > latin-1 characters 0xa0 to 0xbf, 0xd7 and 0xf7. Based on the groff_chars > man page.
This we definitely cannot do, since those escapes are groff-specific and Perl supports platforms other than Linux. -- Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.