This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id:
8763
Status:
resolved
Priority:
Low/Low
Queue:

People
Owner:
Nobody in particular
Requestors:
stefan.maus [...] smartit.de
Cc:
AdminCc:

BugTracker
Severity:
Important
Broken in:
(no value)
Fixed in:
(no value)

Attachments
Mechanize_1.05_04.patch



Subject: 'A'-tags not recognized without closing 'A'-tag
On a page like this: <html> <head> <title>page title</title> </head> <body class="normal"> <a name="top" /> <a href="link1">Test1</a> <a href="link2">Test2</a> <A href="link3">Test3</a> </body> </html> WWW::Mechanize will not recognize the link to "link1". The problem seems to be the 'A'-Tag without the closing 'A'-tag above it. I tried to locate the Bug, but I can't figure it out. Anyway, attached is a workaround patch that changed the '<a />'-tag to '<a ></a>'. Not verry pretty, but it works. This occured with the version 1.05_04 also as in an older version (sorry, I didn't wrote it down) on SuSE 9.0 with perl v5.8.1 Hope I made all correct, since this is my first patch submition and I am not verry good in english.
--- Mechanize.pm 2004-11-06 06:33:07.000000000 +0100 +++ patched_Mechanize.pm 2004-12-07 11:11:12.000000000 +0100 @@ -1761,6 +1761,8 @@ sub _extract_links_and_images { my $self = shift; + $self->{content} =~ s/<([aA])(\s+[^>]+)*\s*\/>/<$1$2><\/$1>/g; + my $parser = HTML::TokeParser->new(\$self->{content}); $self->{links} = [];
I moved this to the HTML-Parser queue, which handles our parsing at this level. Note that this is invalid HTML. The end tag for <a> tags is required, as documented here: http://www.blooberry.com/indexdot/html/tagpages/a/a-bookmark.htm Mark
Is there anything you want to change in HTML::Parser with regards to this? The default behaviour is to treat the "/" as a boolean attribute, but it you enable XML-mode then it will generate both a start_tag and end_tag event. But in XML-mode the case of the tags must match. This example was inconsistent.


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.