Skip Menu |
 

This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id: 93642
Status: rejected
Priority: 0/
Queue: HTML-Tree

People
Owner: Nobody in particular
Requestors: alexander.danel [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 5.03
Fixed in: (no value)



Subject: Method "as_HTML()" does not format new() elements.
Download (untitled) / with headers
text/plain 397b
Where-as method "as_HTML()" will format a tree that was originally created via the method "parse_content()"; it will not format elements created via the method "new()". The attached test script demonstrates that the origin of the element, "parse_content()" versus "new()", is the deciding factor regarding whether the element will be formatted. I am on CygWin, Perl 5.14, with HTML::Tree 5.03.
Subject: test-as-html.pl
Download test-as-html.pl
text/x-perl 3.4k
#!/usr/bin/perl # Script: test-as-html.pl # Started by Alexander Danel on 2014-March-7 # E-Mail: alexander.danel@gmail.com # BitCard user name: AL_X # This script demonstrates a peculiarity with "as_HTML()". # The first "print" statement shows that a tree created with "parse_content()" # then formatted with "as_HTML()" is nicely formatted. # The second "print" statement shows that elements created with "new()" # and inserted into the hierarchy do not get formatted. # # Note that in the middle of the clump of elements created with "new()", # I have pushed the original "Second paragraph" and "Third paragraph". # While most of the clump is not formatted, "Second" and "Third" are formatted; # this is compelling evidence that origin, not proximity, is the deciding # factor. # Elements originating from "parse_content()" get formatted, # elements originationf from "new()" do not get formatted. # # Clumps of unformatted HTML become a problem when the clump becomes so large # that it hampers editing and even viewing in a small terminal window, # for example, a 24x80 terminal. # $ perl --version # # This is perl 5, version 14, subversion 2 (v5.14.2) built for cygwin-thread-multi-64int # (with 7 registered patches, see perl -V for more detail) # $ ls -d -1 .cpan/build/HTML-Tree* # .cpan/build/HTML-Tree-5.03-LAT5Ju # .cpan/build/HTML-Tree-5.03-LAT5Ju.yml # Here are results; can you reproduce them on your system? my $resultsOfPrintStatementsLookLikeThis = <<"END_HERE"; <html> <head> </head> <body> <p>First paragraph here</p> <div> <p>Second paragraph here</p> <p>Third paragraph here</p> </div> <p>Fourth paragraph here</p> </body> </html> #========================== <html> <head> </head> <body> <p>First paragraph here</p> <div><span>Contents of span001</span><span>Contents of span002</span><div> <p>Second paragraph here</p> <p>Third paragraph here</p> </div><span>Contents of span003</span><span>Contents of span004</span></div> <p>Fourth paragraph here</p> </body> </html> #========================== END_HERE use strict; use warnings; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new; # empty tree $tree->ignore_text(0); $tree->parse_content(<main::DATA>); $tree->elementify(); print $tree->as_HTML(undef,' ',{}),"\n#==========================\n"; my $originalDiv = $tree->find_by_tag_name('div'); my @originalContent = $originalDiv->detach_content(); my $exteriorDiv = HTML::Element->new('div'); $originalDiv->replace_with($exteriorDiv); $originalDiv->destroy(); my $span001 = HTML::Element->new('span'); $span001->push_content('Contents of span001'); my $span002 = HTML::Element->new('span'); $span002->push_content('Contents of span002'); my $interiorDiv = HTML::Element->new('div'); $interiorDiv->push_content(@originalContent); # paragraphs "Second", "Third" my $span003 = HTML::Element->new('span'); $span003->push_content('Contents of span003'); my $span004 = HTML::Element->new('span'); $span004->push_content('Contents of span004'); $exteriorDiv->push_content($span001,$span002,$interiorDiv,$span003,$span004); print $tree->as_HTML(undef,' ',{}),"\n#==========================\n"; exit 0; __DATA__ <html> <head> </head> <body> <p>First paragraph here</p> <div> <p>Second paragraph here</p> <p>Third paragraph here</p> </div> <p>Fourth paragraph here</p> </body> </html>
Download (untitled) / with headers
text/plain 661b
This has nothing to do with the difference between parse_content and new. Rather, it's the difference between <span> and <p>. Those aren't formatted the same way, because they aren't the same thing. <p> is a block-level tag, and <span> is inline. For proof, just change new('span') to new('p') in your example and see what you get. Anyway, as_HTML is not much of a pretty-printer, and never will be. If you want nicely formatted HTML, there are other modules that will dump an HTML::Element tree with better human-readable formatting. I use HTML::PrettyPrinter myself, although I'm not entirely happy with it. I've been thinking about writing a new one.
Subject: Re: [rt.cpan.org #93642] Method "as_HTML()" does not format new() elements.
Date: Fri, 7 Mar 2014 22:57:35 -0600
To: bug-HTML-Tree [...] rt.cpan.org
From: Alexander Danel <alexander.danel [...] gmail.com>
Download (untitled) / with headers
text/plain 1.4k
Christopher, When I got your e-mail, I was just putting together an e-mail saying just what you said. In fact, I performed exactly that test, replacing 'span' with 'p'. Why is it that I only figure these things out after posting? (Probably because I posted as soon as I had the simplified code example; but analyzed my simplified code only after posting.) There are still some peculiarities with <div> being treated as an inline sometimes and block other-times. Of course, that too is valid; <div> can be either. I'll look at the issue some more, and/or use PrettyPrinter. So, is there some way of withdrawing my bug report, so that I reduce my humiliation? Alexander On Fri, Mar 7, 2014 at 10:49 PM, Christopher J. Madsen via RT < bug-HTML-Tree@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=93642 > > > This has nothing to do with the difference between parse_content and new. > Rather, it's the difference between <span> and <p>. Those aren't > formatted the same way, because they aren't the same thing. <p> is a > block-level tag, and <span> is inline. For proof, just change new('span') > to new('p') in your example and see what you get. > > Anyway, as_HTML is not much of a pretty-printer, and never will be. If > you want nicely formatted HTML, there are other modules that will dump an > HTML::Element tree with better human-readable formatting. I use > HTML::PrettyPrinter myself, although I'm not entirely happy with it. I've > been thinking about writing a new one. >


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.