Skip Menu |
 

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the WWW-Mechanize CPAN distribution.

Report information
The Basics
Id: 22891
Status: resolved
Priority: 0/
Queue: WWW-Mechanize

People
Owner: Nobody in particular
Requestors: henrywong [...] yahoo.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Mechanize seemed to discard the first URL inside first <p> tag in a html page
Download (untitled) / with headers
text/plain 500b
If you have page with: <p><a href="http://www.url1.com/gi1?a=1">test1</a><p> <p><a href="http://www.url2.com/gi2?a=2">test2</a><p> mech-dump -links will return http://www.url2.com/gi2?a=2 or <p><a href="http://www.first.com/gi1?a=1">first</a><a href="http://www.url1.com/gi1?a=1">test1</a><p> <p><a href="http://www.url2.com/gi2?a=2">test2</a><p> mech-dump -links will return: http://www.url1.com/gi1?a=1 http://www.url2.com/gi2?a=2 The first link in <p> for a page is always discarded.
Subject: Mechanize seemed to discard the first URL after an <h1> tag
From: henrywong [...] yahoo.com
Download (untitled) / with headers
text/plain 1.1k
After a little more debugging the issues turns out to be the link Following an <h1> tag is discarded. Example: <h1> hello world</h1> <p><a href="http://www.url1.com/gi1?a=1">test1</a><p> <p><a href="http://www.url2.com/gi2?a=2">test2</a><p> mech-dump -links will return http://www.url2.com/gi2?a=2 and if remove h1 tag <p><a href="http://www.url1.com/gi1?a=1">test1</a><p> <p><a href="http://www.url2.com/gi2?a=2">test2</a><p> mech-dump -links will return both links: http://www.url1.com/gi1?a=1 http://www.url2.com/gi2?a=2 FYI I have not tried <h2> <3> .... Thanks for your help. On Tue Nov 07 19:11:50 2006, henrywong wrote: Show quoted text
> If you have page with: > > > <p><a href="http://www.url1.com/gi1?a=1">test1</a><p> > <p><a href="http://www.url2.com/gi2?a=2">test2</a><p> > > mech-dump -links > > will return http://www.url2.com/gi2?a=2 > > or > <p><a href="http://www.first.com/gi1?a=1">first</a><a > href="http://www.url1.com/gi1?a=1">test1</a><p> > <p><a href="http://www.url2.com/gi2?a=2">test2</a><p> > > mech-dump -links > > will return: > http://www.url1.com/gi1?a=1 > http://www.url2.com/gi2?a=2 > > The first link in <p> for a page is always discarded.
Subject: Mechanize seemed to discard the first URL after this <a name="anchor"/> tag in a html page
From: henrywong [...] yahoo.com
Download (untitled) / with headers
text/plain 1.6k
OK, the real problem turned out to be Mechanize seemed to discard the first URL after this <a ame="anchor"/> tag in a html page. <h1> hello world</h1> <a name="anchor"/> <p><a href="http://www.url1.com/gi1?a=1">test1</a><p> <p><a href="http://www.url2.com/gi2?a=2">test2</a><p> mech-dump -links for above html code returns http://www.url2.com/gi2?a=2 On Tue Nov 07 19:38:28 2006, henrywong wrote: Show quoted text
> After a little more debugging the issues turns out to be the link > Following an <h1> tag is discarded. > > Example: > <h1> hello world</h1> > <p><a href="http://www.url1.com/gi1?a=1">test1</a><p> > <p><a href="http://www.url2.com/gi2?a=2">test2</a><p> > > mech-dump -links will return http://www.url2.com/gi2?a=2 > > and if remove h1 tag > > <p><a href="http://www.url1.com/gi1?a=1">test1</a><p> > <p><a href="http://www.url2.com/gi2?a=2">test2</a><p> > > mech-dump -links will return both links: > > http://www.url1.com/gi1?a=1 > > http://www.url2.com/gi2?a=2 > > FYI I have not tried <h2> <3> .... > > Thanks for your help. > > > On Tue Nov 07 19:11:50 2006, henrywong wrote:
> > If you have page with: > > > > > > <p><a href="http://www.url1.com/gi1?a=1">test1</a><p> > > <p><a href="http://www.url2.com/gi2?a=2">test2</a><p> > > > > mech-dump -links > > > > will return http://www.url2.com/gi2?a=2 > > > > or > > <p><a href="http://www.first.com/gi1?a=1">first</a><a > > href="http://www.url1.com/gi1?a=1">test1</a><p> > > <p><a href="http://www.url2.com/gi2?a=2">test2</a><p> > > > > mech-dump -links > > > > will return: > > http://www.url1.com/gi1?a=1 > > http://www.url2.com/gi2?a=2 > > > > The first link in <p> for a page is always discarded.
> >


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.