Skip Menu |
 

This queue is for tickets about the Pod-XML CPAN distribution.

Report information
The Basics
Id: 21304
Status: resolved
Priority: 0/
Queue: Pod-XML

People
Owner: mattw [...] mattsscripts.co.uk
Requestors: SREZIC [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.96
Fixed in: (no value)



Subject: Wrong entities in Pod::XML output
MIME-Version: 1.0
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Type: text/plain; charset="utf8"
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 1226
Download (untitled) / with headers
text/plain 1.1k
It seems that Pod::XML uses standard HTML entities for non-ASCII characters. This is not correct for XML output, as there are no entities predefined in XML (except lt,gt,amp and a few others). There are some solutions for this problem: * Do not use entities at all, but use characters in the encoding specified in the XML preamble; * Define all entities you are using; or * use numeric entities like &#196; or &#xff; I find the last solution the best and easiest. You can check XML validity with xmllint from libxml2. For example, this is the output from a Pod containing a lot of german umlauts (The tail +2 is necessary because of a bug reported before): $ pod2xml < /home/e/eserte/src/bbbike/doc/tests.pod | tail +2 | xmllint - -:18: parser error : Entity 'auml' not defined Progressbar w&auml;chst in vern&uuml;nftigen Schritten. ^ -:18: parser error : Entity 'uuml' not defined Progressbar w&auml;chst in vern&uuml;nftigen Schritten. ^ -:21: parser error : Entity 'ouml' not defined Hauptfenster &ouml;ffnet sich. Das Fenster sollte m&ouml;glichst gro&szlig; sein ^ -:21: parser error : Entity 'ouml' not defined ... Regards, Slaven
MIME-Version: 1.0
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
Message-Id: <rt-3.6.HEAD-2022-1157656805-796.21304-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 9
All done.
MIME-Version: 1.0
X-Spam-Status: No, hits=-2.5 required=8.0 tests=BAYES_00,FORGED_RCVD_HELO,UNPARSEABLE_RELAY
X-Authentication-Warning: vran.herceg.de: eserte set sender to slaven [...] rezic.de using -f
In-Reply-To: <rt-3.6.HEAD-2022-1157656809-691.21304-10-0 [...] rt.cpan.org>
X-Mailer: GNU Emacs/sendmail [version 21.3.1]
Received-SPF: pass (x1.develooper.com: local policy)
References: <rt-3.6.HEAD-2022-1157656809-691.21304-10-0 [...] rt.cpan.org>
Lines: 39
Reply-To: slaven [...] rezic.de
Content-Type: text/plain; charset="utf-8"
X-RT-Original-Encoding: us-ascii
Received: from la.mx.develooper.com (ss1.fabel.dk [63.251.223.179]) by diesel.bestpractical.com (Postfix) with SMTP id 1A7F84D80D4 for <bug-Pod-XML [...] rt.cpan.org>; Thu, 7 Sep 2006 19:11:51 -0400 (EDT)
Received: (qmail 7222 invoked by alias); 7 Sep 2006 23:11:51 -0000
Received: from mail4.netbeat.de (HELO mail4.netbeat.de) (83.243.58.163) by la.mx.develooper.com (qpsmtpd/0.28) with SMTP; Thu, 07 Sep 2006 16:11:48 -0700
Received: (qmail 6278 invoked from network); 7 Sep 2006 23:11:42 -0000
Received: from unknown (HELO localhost.localdomain) (85.178.85.28) by mail4.netbeat.de with SMTP; 7 Sep 2006 23:11:42 -0000
Received: (from eserte [...] localhost) by vran.herceg.de (8.12.10/8.12.10/Submit) id k87NBYEe014877; Fri, 8 Sep 2006 01:11:34 +0200 (CEST) (envelope-from slaven [...] rezic.de)
Delivered-To: cpan-bug+pod-xml [...] diesel.bestpractical.com
Subject: Re: [rt.cpan.org #21304] Resolved: Wrong entities in Pod::XML output
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
Return-Path: <slaven [...] rezic.de>
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: bug-Pod-XML [...] rt.cpan.org
Date: 08 Sep 2006 01:11:34 +0200
Sender: eserte [...] vran.herceg.de
Message-Id: <87r6ynp909.fsf [...] vran.herceg.de>
To: bug-Pod-XML [...] rt.cpan.org
From: Slaven Rezic <slaven [...] rezic.de>
X-RT-Original-Encoding: utf-8
RT-Message-ID: <rt-3.6.HEAD-2076-1157670715-515.21304-0-0 [...] rt.cpan.org>
Content-Length: 1031
"Matt Wilson via RT" <bug-Pod-XML@rt.cpan.org> writes: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=21304 > > > According to our records, your request has been resolved. If you have any > further questions or concerns, please respond to this message. >
Unfortunately escaping of < and > does not work anymore. Consider the following Pod: =head1 AUTHOR Slaven Rezic <srezic@cpan.org> =cut Using pod2xml and validation with xmllint: $ pod2xml < /tmp/bla.pod | xmllint - -:9: parser error : error parsing attribute name Slaven Rezic <srezic@cpan.org> ^ -:9: parser error : attributes construct error Slaven Rezic <srezic@cpan.org> ^ -:9: parser error : Couldn't find end of Start Tag srezic line 9 Slaven Rezic <srezic@cpan.org> ^ Exit 1 Regards, Slaven -- Slaven Rezic - slaven <at> rezic <dot> de babybike - routeplanner for cyclists in Berlin handheld (e.g. Compaq iPAQ with Linux) version of bbbike http://bbbike.sourceforge.net
MIME-Version: 1.0
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
Message-Id: <rt-3.6.HEAD-433-1157912487-1687.21304-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 100
Download (untitled) / with headers
text/plain 100b
Should be all fixed now, including several potential bugs I discovered while investigating this one.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.