Skip Menu |
 

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id: 46099
Status: open
Priority: 0/
Queue: HTML-Parser

People
Owner: Nobody in particular
Requestors: bdfoy [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Make iframe parsing configurable
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Type: text/plain
Charset: utf8
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 380
Download (untitled) / with headers
text/plain 380b
Since the latest versions of HMTL::Parser do not parse the content of iframes, some of my applications using HTML::SimpleLinkExtor have broken. The text between the iframe tags is what the browser displays and is usually more HTML, and I need to be able to extract any links in that text. I'd like to at least be able to turn on parsing for iframes, even if it is off by default.
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Charset: utf8
Content-Type: text/plain
Message-ID: <rt-3.6.HEAD-13950-1245489460-1333.46099-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 778
Download (untitled) / with headers
text/plain 778b
On Fri May 15 02:15:45 2009, BDFOY wrote: Show quoted text
> Since the latest versions of HMTL::Parser do not parse the content of > iframes, some of my applications using HTML::SimpleLinkExtor have > broken. The text between the iframe tags is what the browser displays > and is usually more HTML, and I need to be able to extract any links in > that text.
Browsers that support iframes are supposed to ignore everything inside the iframe. They are supposed to render the HTML found at the 'src' location. Show quoted text
> I'd like to at least be able to turn on parsing for iframes, even if it > is off by default.
I see the point if you need to emulate the behaviour of very old browsers. A workaround is to invoke a subparser on the iframe content text. I'll see if I find an easier way to do this.
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Charset: utf8
Content-Type: text/plain
Message-ID: <rt-3.6.HEAD-13950-1245489849-324.46099-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 223
Download (untitled) / with headers
text/plain 223b
The TODO file has this entry: - make literal tags configurable. The current list is hardcoded to be "script", "style", "title", "iframe", "textarea", "xmp", and "plaintext". which would be my preferred way to fix this.
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-20563-1316539209-980.46099-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 127
Download (untitled) / with headers
text/plain 127b
Making literal tags configurable would also be useful for those doing javascript templates with <script type="text/html"> tags.
MIME-Version: 1.0
In-Reply-To: <rt-3.6.HEAD-13950-1245489460-1333.46099-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <rt-3.6.HEAD-13950-1245489460-1333.46099-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-31049-1350512522-1469.46099-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
From: andrew [...] pimlott.net
X-RT-Original-Encoding: utf-8
Content-Length: 466
Download (untitled) / with headers
text/plain 466b
On Sat Jun 20 05:17:40 2009, GAAS wrote: Show quoted text
> > I'd like to at least be able to turn on parsing for iframes, even if
> it
> > is off by default.
> > I see the point if you need to emulate the behaviour of very old > browsers.
What is the point of not parsing the content of iframes? I can't find any justification, and it seems at odds both with the spec and user expectations. Removing this special case would make HTML::Parser simpler and more uniform. Andrew
MIME-Version: 1.0
In-Reply-To: <rt-3.8.HEAD-31049-1350512522-1469.46099-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <rt-3.6.HEAD-13950-1245489460-1333.46099-0-0 [...] rt.cpan.org> <rt-3.8.HEAD-31049-1350512522-1469.46099-0-0 [...] rt.cpan.org>
Content-Type: text/html; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-30680-1350598193-1049.46099-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 106
I explained the point just above the text you quoted.  What's "the spec" you'r refering to?


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.