Skip Menu |
 

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 64569
Status: resolved
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: dwheeler [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Unfortunate Recovery Moves Elements Around
Date: Fri, 7 Jan 2011 00:13:40 -0800
To: bug-xml-libxml [...] rt.cpan.org
From: "David E. Wheeler" <dwheeler [...] cpan.org>
Download (untitled) / with headers
text/plain 2.8k
Given this XML: <?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"> <channel> <title>No Closing, Man</title> <link>http://blog.noclosing.com</link> <language>en-us</language> <ttl>40</ttl> <description>This is horked.</description> <item> <dc:creator>Jo Mama</dc:creator> <title>Welcome to the Jungle</title> <description><p><span>hi</p></description> <pubDate>Fri, 17 Dec 2010 16:35:00 +0000</pubDate> <guid>http://blog.noclosing.com/2710</guid> <link>http://blog.noclosing/2710.html</link> </item> <item> <dc:creator>Jamie</dc:creator> <title>Whatever</title> <description>This is the description</description> <pubDate>Fri, 31 Dec 2010 15:12:00 +0000</pubDate> <guid>http://blog.noclosing.com/2722</guid> <link>http://blog.noclosing/2722.html</link> </item> </channel> </rss> Where the closing </span> is missing on line 12, I run use 5.12.0; use XML::LibXML; my $parser = XML::LibXML->new({ recover => 2, no_network => 1, no_blanks => 1, no_cdata => 1, }); $parser->recover(2); say $parser->load_xml(string => $xml)->toString; And XML::LibXML emits (I've run it through tidy here): <?xml version="1.0" encoding="UTF-8"?> <rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"> <channel> <title>No Closing, Man</title> <link>http://blog.noclosing.com</link> <language>en-us</language> <ttl>40</ttl> <description>This is horked.</description> <item> <dc:creator>Jo Mama</dc:creator> <title>Welcome to the Jungle</title> <description> <p> <span>hi</span> </p> <pubDate>Fri, 17 Dec 2010 16:35:00 +0000</pubDate> <guid>http://blog.noclosing.com/2710</guid> <link>http://blog.noclosing/2710.html</link> </description> <item> <dc:creator>Jamie</dc:creator> <title>Whatever</title> <description>This is the description</description> <pubDate>Fri, 31 Dec 2010 15:12:00 +0000</pubDate> <guid>http://blog.noclosing.com/2722</guid> <link>http://blog.noclosing/2722.html</link> </item> </item> </channel> </rss> Note that the closing </span> is nicely included, so it recovered that. However, the second <item> element has been moved inside the first! You can see this clearly by the nested closing </item> tags four lines from the bottom. I don't know if this is an XML::LibXML bug or libxml2 bug, but I've attached a test case using this example. Best, David .
Download 46recover.t
text/x-perl 2k

Message body is not shown because sender requested not to inline it.

Subject: Re: [rt.cpan.org #64569] AutoReply: Unfortunate Recovery Moves Elements Around
Date: Fri, 7 Jan 2011 00:22:08 -0800
To: bug-XML-LibXML [...] rt.cpan.org
From: "David E. Wheeler" <dwheeler [...] cpan.org>
Download (untitled) / with headers
text/plain 126b
Ah, I get the same output from xmllint, so it's an issue with libxml2, not XML::LibXML. Apologies for the noise. Best, David
Closing as UPSTREAM because it's a problem with libxml2. Thanks. Regards, -- Shlomi Fish


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.