This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id:
23439
Status:
resolved
Priority:
Low/Low
Queue:

People
Owner:
Nobody in particular
Requestors:
nick [...] aevum.de
Cc:
AdminCc:

BugTracker
Severity:
Normal
Broken in:
3.23
Fixed in:
(no value)



Subject: as_XML can output invalid attribute names
I use HTML::Tree successfully to convert tag soup to XHTML. There's only one little problem with the as_XML method. It doesn't check whether attributes names conform to the XML spec, so it can output invalid XML. So I have to walk through the tree manually and remove all invalid attributes. It would be nice if this could be done by HTML::Tree. See here for valid XML attribute names: http://www.w3.org/TR/xml/#NT-Name
From: PETEK@cpan.org
On Fri Nov 17 11:21:05 2006, nick.aevum.de wrote:
Show quoted text
> It doesn't check whether > attributes names conform to the XML spec, so it can output invalid XML.
Could you please provide a sample? I see the spec for attribute names, but seeing an example of broken behavior would help more. HTML::Tree also makes no guarantee at the current time that the XML is valid, but I'll be glad to fix what I can. I am also loathe to implement functionality that causes people to lose data, so any fix will have to incorporate warnings or workarounds to make sure that doesn't happen.
From: nick@aevum.de
On Fri Nov 17 13:26:06 2006, PETEK wrote:
Show quoted text
> On Fri Nov 17 11:21:05 2006, nick.aevum.de wrote:
> > It doesn't check whether > > attributes names conform to the XML spec, so it can output invalid XML.
> > Could you please provide a sample? I see the spec for attribute names, > but seeing an example of broken behavior would help more.
For example if you have invalid characters in an attribute name: use HTML::Tree; my $tree = HTML::TreeBuilder->new_from_content('<img inval!d="asd">'); print $tree->as_XML(); This produces invalid XML by simply cpoying the attribute name: <img inval!d="asd" />
Show quoted text
> HTML::Tree also makes no guarantee at the current time that the XML is > valid, but I'll be glad to fix what I can. > > I am also loathe to implement functionality that causes people to lose > data, so any fix will have to incorporate warnings or workarounds to > make sure that doesn't happen.
Ideally there should be an option whether to croak if an invalid attribute name occurs, or to simply remove the attribute. I think the least desirable thing is to produce invalid XML, because most likely this will cause problems later on.
A patch to fix this has been applied to the new repo at http://github.com/jfearn/HTML-Tree The patch will cause the parser to die with an appropriate message when it hits invalid attribute name.
Subject: 4.0 released
Hi HTML::Tree ve4rsion 4.0 has been released which includes a fix for this issue. Cheers, Jeff.


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.