This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id:
29805
Status:
rejected
Priority:
Low/Low
Queue:

People
Owner:
Nobody in particular
Requestors:
jmason [...] cpan.org
Cc:
AdminCc:

BugTracker
Severity:
Normal
Broken in:
3.23
Fixed in:
(no value)



Subject: as_text replaces /<br>/ with "", but a whitespace char would be better
hi -- An as_text() call against Sat 06/10/07<br>20:00 should produce something like 'Sat 06/10/07 20:00' (or maybe with a \n.) instead it produces 'Sat 06/10/0720:00' see http://rt.cpan.org/Ticket/Display.html?id=29799 for a bug report against Web::Scraper that provides a demo. (that module's maintainer indicated that this output was generated by the as_text method of HTML::Element.)
as_text won't and can't do that at the moment as a design decision. This was a conversation that came up at the 2006 Chicago Hackathon, and the question I put forward then was this - what elements would you do this for? Further, when would you do them? If I have a block of HTML 3, for example, that reads: <xmp><br></xmp> That <br> should not be converted, but a blind regexp engine would convert it. Beyond that, <br> is not the only element that would need this treatment. People expect the same with <hr> as well as <p>, <div>, <blockquote> and other block-level elements. as_text was never intended to be used as a sanitization method nor a display method - the man page specifically states that it is the concatenation of text elements as the tree is descended. Changing that is a design decision and won't be considered until the major version is bumped up to 4.0, which is down the road quite a ways.


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.