Skip Menu |

This queue is for tickets about the HTML-Clean CPAN distribution.

Report information
The Basics
Id: 6772
Status: new
Priority: 0/
Queue: HTML-Clean

Owner: Nobody in particular
Requestors: cpanbughtmlclean [...]

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: (no value)


Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
MIME-Version: 1.0
X-Mailer: MIME-tools 5.411 (Entity 5.404)
Subject: Issue with <pre> tags.
X-RT-Original-Encoding: iso-8859-1
Content-Length: 1100
When using HTML Clean, I found that when I had produced some code inside the preformatted tags (<pre></pre>) that this module was actually removing some of the return characters. Since this is pre-formatted, this corrupts the way in which the page is supposed to be shown, and thus is not a valid optimisation. In my example, I have: &lt;code&gt;&lt;span class=&quot;linecomment&quot;&gt;# Perl code here&lt;/span&gt;<br> &lt;span class=&quot;category2&quot;&gt;print&lt;/span&gt; &quot;Hello world!&quot;;&lt;/code&gt; and this then gets converted to: &lt;code&gt;&lt;span class=&quot;linecomment&quot;&gt;# Perl code here&lt;/span&gt;&lt;span class=&quot;category2&quot;&gt;print&lt;/span&gt; &quot;Hello world!&quot;;&lt;/code&gt; There's a line return missing from in between the comment and the next line. (NB. I added a break tag (br) to ensure that the line return is shown) Clearly when showing code inside pre tags and then optimising the entire page there's a big problem. Suggested fix: turn off optimisations betweens pre tags. Perl version: This is perl, v5.8.2 built for i386-freebsd
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
MIME-Version: 1.0
X-Mailer: MIME-tools 5.415 (Entity 5.415)
From: Allard Hoeve
X-RT-Original-Encoding: iso-8859-1
Content-Length: 73
MIME-Version: 1.0
X-Mailer: MIME-tools 5.415 (Entity 5.415)
Subject: Documenting it helps :)
From: Gunnar Wolf
Content-Type: multipart/mixed; boundary="----------=_1118849620-10691-4"
Content-Length: 0
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: iso-8859-1
Content-Length: 192
Download (untitled) / with headers
text/plain 192b
Many people get bitten by this bug as it is right now. I didn't fix the bug, but at least this patch puts a prominent notice hard to be ignored, and mentions its presence in the documentation.
Content-Type: application/octet-stream; name="HTML-Clean.patch"
Content-Disposition: inline; filename="HTML-Clean.patch"
Content-Transfer-Encoding: base64
Content-Length: 1609
Download HTML-Clean.patch
text/x-diff 1.5k
Index: lib/HTML/ =================================================================== --- lib/HTML/ (revision 1171) +++ lib/HTML/ (revision 1172) @@ -375,6 +375,16 @@ =back +Please note that if your HTML includes preformatted regions (this means, if +it includes <pre>...</pre>, we do not suggest removing whitespace, as it will +alter the rendered defaults. + +HTML::Clean will print out a warning if it finds a preformatted region and is +requested to strip whitespace. In order to prevent this, specify that you don't +want to strip whitespace - i.e. + + $h->strip( {whitespace => 0} ); + =cut use vars qw/ @@ -435,6 +445,17 @@ } if ($do_whitespace) { + if ($$h =~ /<pre/i) { + warn << 'EOF' +Warning: Stripping whitespace will affect preformatted region\'s layout +You have a <pre> region in your HTML, which depends on the whitespace not +being modified. You requested to strip the whitespace - The rendered results +will be affected. + +Hint: Use $h->strip({whitespace => 0}); instead. +EOF + } + $$h =~ s,[\r\n]+,\n,sg; # Carriage/LF -> LF $$h =~ s,\s+\n,\n,sg; # empty line $$h =~ s,\n\s+<,\n<,sg; # space before tag

This service is sponsored and maintained by Best Practical Solutions and runs on infrastructure.

Please report any issues with to