This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id:
60474
Status:
resolved
Priority:
Low/Low
Queue:

People
Owner:
Nobody in particular
Requestors:
SREZIC [...] cpan.org
Cc:
AdminCc:

BugTracker
Severity:
Unimportant
Broken in:
3.23_3
Fixed in:
(no value)



Subject: Last word may be eaten when parsing
The output of the following script: #!/usr/bin/perl use HTML::TreeBuilder; $tree = HTML::TreeBuilder->new->parse("ABC DEF GHI."); warn $tree->dump; __END__ look like this: <html> @0 (IMPLICIT) <head> @0.0 (IMPLICIT) <body> @0.1 (IMPLICIT) "ABC DEF" This means that the last word ("GHI.") is missing in the parsed tree. This can be workaround by either adding a newline to the string, or by wrapping the text with some tag. Regards, Slaven
HTML::Parser (which HTML::TreeBuilder is a subclass of) needs you to call $parser->eof to flush any remaining text when you are done calling $p->parse().
Here are a couple of examples to show how this works: $ perl -MHTML::TreeBuilder -e '$tree = HTML::TreeBuilder->new->parse("ABC DEF GHI."); print $tree->dump;' <html> @0 (IMPLICIT) <head> @0.0 (IMPLICIT) <body> @0.1 (IMPLICIT) "ABC DEF" $ perl -MHTML::TreeBuilder -e '$tree = HTML::TreeBuilder->new->parse("ABC DEF GHI."); $tree->eof(); print $tree->dump;' <html> @0 (IMPLICIT) <head> @0.0 (IMPLICIT) <body> @0.1 (IMPLICIT) "ABC DEF GHI." $ perl -MHTML::TreeBuilder -e '$tree = HTML::TreeBuilder->new->parse("ABC DEF GHI."); $tree->parse(" JKL."); $tree->eof(); print $tree->dump;' <html> @0 (IMPLICIT) <head> @0.0 (IMPLICIT) <body> @0.1 (IMPLICIT) "ABC DEF GHI. JKL."


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.