Skip Menu |
 

This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id: 23439
Status: resolved
Priority: 0/
Queue: HTML-Tree

People
Owner: Nobody in particular
Requestors: nick [...] aevum.de
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 3.23
Fixed in: (no value)



Subject: as_XML can output invalid attribute names
MIME-Version: 1.0
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Type: text/plain; charset="utf8"
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 419
Download (untitled) / with headers
text/plain 419b
I use HTML::Tree successfully to convert tag soup to XHTML. There's only one little problem with the as_XML method. It doesn't check whether attributes names conform to the XML spec, so it can output invalid XML. So I have to walk through the tree manually and remove all invalid attributes. It would be nice if this could be done by HTML::Tree. See here for valid XML attribute names: http://www.w3.org/TR/xml/#NT-Name
MIME-Version: 1.0
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
Message-Id: <rt-3.6.HEAD-23166-1163787966-1731.23439-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf8"
Content-Transfer-Encoding: binary
From: PETEK [...] cpan.org
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 568
Download (untitled) / with headers
text/plain 568b
On Fri Nov 17 11:21:05 2006, nick.aevum.de wrote: Show quoted text
> It doesn't check whether > attributes names conform to the XML spec, so it can output invalid XML.
Could you please provide a sample? I see the spec for attribute names, but seeing an example of broken behavior would help more. HTML::Tree also makes no guarantee at the current time that the XML is valid, but I'll be glad to fix what I can. I am also loathe to implement functionality that causes people to lose data, so any fix will have to incorporate warnings or workarounds to make sure that doesn't happen.
MIME-Version: 1.0
In-Reply-To: <rt-3.6.HEAD-23166-1163787966-1731.23439-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
References: <rt-3.6.HEAD-23166-1163787966-1731.23439-0-0 [...] rt.cpan.org>
Message-Id: <rt-3.6.HEAD-23166-1163789891-912.23439-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf8"
Content-Transfer-Encoding: binary
From: nick [...] aevum.de
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 1139
Download (untitled) / with headers
text/plain 1.1k
On Fri Nov 17 13:26:06 2006, PETEK wrote: Show quoted text
> On Fri Nov 17 11:21:05 2006, nick.aevum.de wrote:
> > It doesn't check whether > > attributes names conform to the XML spec, so it can output invalid XML.
> > Could you please provide a sample? I see the spec for attribute names, > but seeing an example of broken behavior would help more.
For example if you have invalid characters in an attribute name: use HTML::Tree; my $tree = HTML::TreeBuilder->new_from_content('<img inval!d="asd">'); print $tree->as_XML(); This produces invalid XML by simply cpoying the attribute name: <img inval!d="asd" /> Show quoted text
> HTML::Tree also makes no guarantee at the current time that the XML is > valid, but I'll be glad to fix what I can. > > I am also loathe to implement functionality that causes people to lose > data, so any fix will have to incorporate warnings or workarounds to > make sure that doesn't happen.
Ideally there should be an option whether to croak if an invalid attribute name occurs, or to simply remove the attribute. I think the least desirable thing is to produce invalid XML, because most likely this will cause problems later on.
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-26541-1272083183-494.23439-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 195
Download (untitled) / with headers
text/plain 195b
A patch to fix this has been applied to the new repo at http://github.com/jfearn/HTML-Tree The patch will cause the parser to die with an appropriate message when it hits invalid attribute name.
MIME-Version: 1.0
Subject: 4.0 released
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-24885-1285123061-1248.23439-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 96
Hi HTML::Tree ve4rsion 4.0 has been released which includes a fix for this issue. Cheers, Jeff.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.