Skip Menu |
 

This queue is for tickets about the HTML-Tiny CPAN distribution.

Report information
The Basics
Id: 34378
Status: resolved
Worked: 1.3 hours (80 min)
Priority: 0/
Queue: HTML-Tiny

People
Owner: andy [...] hexten.net
Requestors: spamcollector_cpan [...] juerd.nl
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: (no value)



Subject: Invalid HTML syntax
MIME-Version: 1.0
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Type: text/plain
Charset: utf8
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 218
Download (untitled) / with headers
text/plain 218b
HTML::Tiny returns XHTML/XML syntax, which is *not* always valid HTML. For example, "<br />" is invalid HTML. It should be "<br>". My page using Captcha::reCAPTCHA does not validate as HTML because of this. -- Juerd
MIME-Version: 1.0
X-Mailer: MIME-tools 5.426 (Entity 5.426)
Content-Disposition: inline
Charset: utf8
Message-Id: <rt-3.6.HEAD-18983-1214096128-499.34378-0-0 [...] rt.cpan.org>
Content-Type: text/plain
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 438
Download (untitled) / with headers
text/plain 438b
On Sun Mar 23 18:30:23 2008, JUERD wrote: Show quoted text
> HTML::Tiny returns XHTML/XML syntax, which is *not* always valid HTML. > > For example, "<br />" is invalid HTML. It should be "<br>". > > My page using Captcha::reCAPTCHA does not validate as HTML because of this.
Just a suggestion to Andy but you could probably pass "HTML" or "XHTML" to ->new and you would get "<br>" or "<br />". But it may not be his intent to product "valid" HTML.
MIME-Version: 1.0
X-Mailer: MIME-tools 5.426 (Entity 5.426)
Charset: utf8
Message-Id: <rt-3.6.HEAD-11017-1218117132-1500.34378-0-0 [...] rt.cpan.org>
Content-Type: multipart/mixed; boundary="----------=_1218117132-11017-52"
X-RT-Original-Encoding: utf-8
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 1372
Download (untitled) / with headers
text/plain 1.3k
Indeed the generated HTML is broken. More details for differences between XHTML and HTML: http://www.w3.org/TR/xhtml1/diffs.html and http://www.w3.org/TR/xhtml1/guidelines.html And more in depth: http://www.cs.tut.fi/~jkorpela/html/empty.html In particular, the problem is generating "minimized" elements, such as <br />. In HTML (which is a dialect of SGML and has little to do with XHTML or XML), the slash in this context is a null end tag, which can be used as a shorthand for closing tags. Thus, <br /> translated to the more ordinary form is <br>>, i.e. line break followed by greater than. Naturally this breaks validators. In HTML::Tiny, these elements are called "closed" elements. Apparently Andy Armstrong has already anticipated this, because there are two places in the source code marked with a comment indicating than a special "xml mode" flag is needed. A patch is attached, which lets the user give a parameter to the constructor. By default, the module generates XHTML. Use the following to make it generate valid[1] HTML: my $h = HTML::Tiny->new(mode => 'html'); The patch is otherwise trivial, except that all n+1 tests needed to be updated as well. As a bonus the patched version now generates correct empty attributes; i.e. <input checked="checked" /> instead of <input checked />. [1] Valid as defined "do what I intend, not what I ask for".
MIME-Version: 1.0
X-Mailer: MIME-tools 5.426 (Entity 5.426)
Content-Type: multipart/mixed; boundary="----------=_1218117132-11017-51"
Charset: utf8
Content-Length: 0
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: iso-8859-1
Content-Length: 0
Content-Type: application/x-gzip; name="HTML-Tiny-1.01.tar.gz"
Content-Disposition: inline; filename="HTML-Tiny-1.01.tar.gz"
Content-Transfer-Encoding: base64
Content-Length: 16505
Download HTML-Tiny-1.01.tar.gz
application/x-gzip 16.1k

Message body not shown because it is not plain text.

MIME-Version: 1.0
In-Reply-To: <rt-3.6.HEAD-11017-1218117132-1500.34378-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.426 (Entity 5.426)
Charset: utf8
References: <rt-3.6.HEAD-11017-1218117132-1500.34378-0-0 [...] rt.cpan.org>
Message-Id: <rt-3.6.HEAD-27142-1218117233-798.34378-0-0 [...] rt.cpan.org>
Content-Type: multipart/mixed; boundary="----------=_1218117233-27142-12"
X-RT-Original-Encoding: utf-8
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 73
Naturally attached the wrong file... Correct file attached to this reply.
MIME-Version: 1.0
X-Mailer: MIME-tools 5.426 (Entity 5.426)
Content-Type: multipart/mixed; boundary="----------=_1218117233-27142-11"
Charset: utf8
Content-Length: 0
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: iso-8859-1
Content-Length: 0
Content-Type: application/x-gzip; name="HTML-Tiny-1.01-htmlfix.diff.gz"
Content-Disposition: inline; filename="HTML-Tiny-1.01-htmlfix.diff.gz"
Content-Transfer-Encoding: base64
Content-Length: 4540
Download HTML-Tiny-1.01-htmlfix.diff.gz
application/x-gzip 4.4k

Message body not shown because it is not plain text.

MIME-Version: 1.0
In-Reply-To: <rt-3.6.HEAD-27142-1218117233-798.34378-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.426 (Entity 5.426)
Content-Disposition: inline
Charset: utf8
References: <rt-3.6.HEAD-11017-1218117132-1500.34378-0-0 [...] rt.cpan.org> <rt-3.6.HEAD-27142-1218117233-798.34378-0-0 [...] rt.cpan.org>
Message-Id: <rt-3.6.HEAD-11012-1218118409-680.34378-0-0 [...] rt.cpan.org>
Content-Type: text/plain
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 89
Thanks everyone. I've just applied VRK's excellent patch and released 1.03 to the CPAN.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.