Skip Menu |
 

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 64569
Status: resolved
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: dwheeler [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



From dwheeler [...] cpan.org Fri Jan 7 03: 13:52 2011
MIME-Version: 1.0 (Apple Message framework v1082)
X-Spam-Status: No, score=-8.3 tagged_above=-99.9 required=10 tests=[AWL=-1.400, BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5] autolearn=ham
X-Mailer: Apple Mail (2.1082)
X-Spam-Flag: NO
X-Virus-Checked: Checked by ClamAV on 16.mx.develooper.com
Message-ID: <C78C9DBB-9704-46E4-8AC0-6E0220F0ECCF [...] cpan.org>
Content-Type: multipart/mixed; boundary=Apple-Mail-98-400723978
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Spam-Score: -8.3
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id B53F9241388 for <cpan-bug+xml-libxml [...] hipster.bestpractical.com>; Fri, 7 Jan 2011 03:13:52 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cMYxxi3Wu3F0 for <cpan-bug+xml-libxml [...] hipster.bestpractical.com>; Fri, 7 Jan 2011 03:13:50 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 5E6BD2412DC for <bug-xml-libxml [...] rt.cpan.org>; Fri, 7 Jan 2011 03:13:50 -0500 (EST)
Received: (qmail 12628 invoked by uid 103); 7 Jan 2011 08:13:49 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 7 Jan 2011 08:13:49 -0000
Received: from host-201.commandprompt.net (HELO smtp.kineticode.com) (207.173.203.201) by 16.mx.develooper.com (qpsmtpd/0.80) with ESMTP; Fri, 07 Jan 2011 00:13:43 -0800
Received: from [10.0.1.20] (c-24-21-128-239.hsd1.or.comcast.net [24.21.128.239]) by smtp.kineticode.com (Postfix) with ESMTPSA id D3CD8508054 for <bug-xml-libxml [...] rt.cpan.org>; Fri, 7 Jan 2011 00:13:40 -0800 (PST)
Delivered-To: cpan-bug+xml-libxml [...] hipster.bestpractical.com
Subject: Unfortunate Recovery Moves Elements Around
Return-Path: <dwheeler [...] cpan.org>
X-RT-Mail-Extension: xml-libxml
X-Original-To: cpan-bug+xml-libxml [...] hipster.bestpractical.com
X-Spam-Check-BY: 16.mx.develooper.com
Date: Fri, 7 Jan 2011 00:13:40 -0800
X-Spam-Level:
To: bug-xml-libxml [...] rt.cpan.org
From: "David E. Wheeler" <dwheeler [...] cpan.org>
Content-Length: 0
content-type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-RT-Original-Encoding: us-ascii
Content-Length: 2888
Download (untitled) / with headers
text/plain 2.8k
Given this XML: <?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"> <channel> <title>No Closing, Man</title> <link>http://blog.noclosing.com</link> <language>en-us</language> <ttl>40</ttl> <description>This is horked.</description> <item> <dc:creator>Jo Mama</dc:creator> <title>Welcome to the Jungle</title> <description><p><span>hi</p></description> <pubDate>Fri, 17 Dec 2010 16:35:00 +0000</pubDate> <guid>http://blog.noclosing.com/2710</guid> <link>http://blog.noclosing/2710.html</link> </item> <item> <dc:creator>Jamie</dc:creator> <title>Whatever</title> <description>This is the description</description> <pubDate>Fri, 31 Dec 2010 15:12:00 +0000</pubDate> <guid>http://blog.noclosing.com/2722</guid> <link>http://blog.noclosing/2722.html</link> </item> </channel> </rss> Where the closing </span> is missing on line 12, I run use 5.12.0; use XML::LibXML; my $parser = XML::LibXML->new({ recover => 2, no_network => 1, no_blanks => 1, no_cdata => 1, }); $parser->recover(2); say $parser->load_xml(string => $xml)->toString; And XML::LibXML emits (I've run it through tidy here): <?xml version="1.0" encoding="UTF-8"?> <rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"> <channel> <title>No Closing, Man</title> <link>http://blog.noclosing.com</link> <language>en-us</language> <ttl>40</ttl> <description>This is horked.</description> <item> <dc:creator>Jo Mama</dc:creator> <title>Welcome to the Jungle</title> <description> <p> <span>hi</span> </p> <pubDate>Fri, 17 Dec 2010 16:35:00 +0000</pubDate> <guid>http://blog.noclosing.com/2710</guid> <link>http://blog.noclosing/2710.html</link> </description> <item> <dc:creator>Jamie</dc:creator> <title>Whatever</title> <description>This is the description</description> <pubDate>Fri, 31 Dec 2010 15:12:00 +0000</pubDate> <guid>http://blog.noclosing.com/2722</guid> <link>http://blog.noclosing/2722.html</link> </item> </item> </channel> </rss> Note that the closing </span> is nicely included, so it recovered that. However, the second <item> element has been moved inside the first! You can see this clearly by the nested closing </item> tags four lines from the bottom. I don't know if this is an XML::LibXML bug or libxml2 bug, but I've attached a test case using this example. Best, David .
content-type: application/octet-stream; name="46recover.t"
content-disposition: attachment; filename="46recover.t"
Content-Transfer-Encoding: 7bit
Content-Length: 2087
Download 46recover.t
text/x-perl 2k

Message body is not shown because sender requested not to inline it.

From dwheeler [...] cpan.org Fri Jan 7 03: 22:16 2011
MIME-Version: 1.0 (Apple Message framework v1082)
X-Spam-Status: No, score=-8.226 tagged_above=-99.9 required=10 tests=[AWL=-1.326, BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5] autolearn=ham
In-Reply-To: <rt-3.8.HEAD-19311-1294388033-165.64569-3-0 [...] rt.cpan.org>
X-Mailer: Apple Mail (2.1082)
X-Spam-Flag: NO
References: <RT-Ticket-64569 [...] rt.cpan.org> <C78C9DBB-9704-46E4-8AC0-6E0220F0ECCF [...] cpan.org> <rt-3.8.HEAD-19311-1294388033-165.64569-3-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Content-Type: text/plain; charset="utf-8"
Message-ID: <0FC22153-761B-46A4-93D1-734E71BCA398 [...] cpan.org>
X-RT-Original-Encoding: utf-8
X-Spam-Score: -8.226
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 8D0AF241388 for <cpan-bug+XML-LibXML [...] hipster.bestpractical.com>; Fri, 7 Jan 2011 03:22:16 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1RD6KbyQPL5q for <cpan-bug+XML-LibXML [...] hipster.bestpractical.com>; Fri, 7 Jan 2011 03:22:15 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id E20482412DC for <bug-XML-LibXML [...] rt.cpan.org>; Fri, 7 Jan 2011 03:22:14 -0500 (EST)
Received: (qmail 13422 invoked by uid 103); 7 Jan 2011 08:22:14 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 7 Jan 2011 08:22:14 -0000
Received: from host-201.commandprompt.net (HELO smtp.kineticode.com) (207.173.203.201) by 16.mx.develooper.com (qpsmtpd/0.80) with ESMTP; Fri, 07 Jan 2011 00:22:13 -0800
Received: from [10.0.1.20] (c-24-21-128-239.hsd1.or.comcast.net [24.21.128.239]) by smtp.kineticode.com (Postfix) with ESMTPSA id CA4A8508054 for <bug-XML-LibXML [...] rt.cpan.org>; Fri, 7 Jan 2011 00:22:08 -0800 (PST)
Delivered-To: cpan-bug+XML-LibXML [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #64569] AutoReply: Unfortunate Recovery Moves Elements Around
Return-Path: <dwheeler [...] cpan.org>
X-Spam-Check-BY: 16.mx.develooper.com
X-Original-To: cpan-bug+XML-LibXML [...] hipster.bestpractical.com
X-RT-Mail-Extension: xml-libxml
Date: Fri, 7 Jan 2011 00:22:08 -0800
X-Spam-Level:
To: bug-XML-LibXML [...] rt.cpan.org
Content-Transfer-Encoding: quoted-printable
From: "David E. Wheeler" <dwheeler [...] cpan.org>
RT-Message-ID: <rt-3.8.HEAD-17551-1294388537-579.64569-0-0 [...] rt.cpan.org>
Content-Length: 126
Download (untitled) / with headers
text/plain 126b
Ah, I get the same output from xmllint, so it's an issue with libxml2, not XML::LibXML. Apologies for the noise. Best, David
MIME-Version: 1.0
In-Reply-To: <C78C9DBB-9704-46E4-8AC0-6E0220F0ECCF [...] cpan.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <C78C9DBB-9704-46E4-8AC0-6E0220F0ECCF [...] cpan.org>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-9059-1310329219-1698.64569-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 90
Closing as UPSTREAM because it's a problem with libxml2. Thanks. Regards, -- Shlomi Fish


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.