Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the WWW-Mechanize CPAN distribution.

Report information
The Basics
Id: 47510
Status: rejected
Priority: 0/
Queue: WWW-Mechanize

Owner: Nobody in particular
Requestors: FANY [...]

Bug Information
Severity: (no value)
Broken in: 1.54
Fixed in: (no value)

Subject: ->find_link() behaviour depends on internal encoding of strings
Download (untitled) / with headers
text/plain 1.6k
If ->find_link( text => $text ) works correctly for links that include non-breaking spaces depends on the internal encoding of the $mech->content one one hand and that of $text on the other. If the utf8 flag is not set for the content, non-breaking spaces will not get removed by the get_trimmed_text method within HTML::TokeParser, because /\s/ does _not_ match non-breaking spaces for latin1 strings, and so they will have to be specified in the $text in order to find the matching link. If $text has an utf8 bit set, find_link(), however, will then complain that "'...' is space-padded and cannot succeed" and discard this filter argument, because /\s/ _does_ match non-breaking spaces in this case. Please find a test script attached, which should produce the following output: 25: ( "\xA0Vermietungen", "/anzeigen/antz2/index.html?xv[order]=pdat+desc%2Cfirst_mod+desc%2Csort2+desc%2C+sort3+desc&xv[start]=0&xv[vwnum]=10&xv[cart_query]=&qv[categories]=92", ) 27: ' Vermietungen' is space-padded and cannot succeed at /tmp/test_find_link line 21 (undef, "/mp_styles/mainpost_global.css") (Disclaimer: For some reason I was not yet able to track down, it did behaved differently when I tried it on another computer, because there the content of the page in question was returned with an utf8 bit set.) I suggest to utf8::upgrade($content) before feeding it to HTML::TokeParser and also to utf8::upgrade() the values in WWW::Mechanize->_clean_keys() to avoid this problem. BTW, I also think that it's not a good idea to simply discard arguments which seem invalid, because this simply causes _any_ link to be found, which is usually not what you would expect. IMHO one should return _no_ link in this case. Regards, fany
Subject: test_find_link
Download test_find_link
application/octet-stream 572b

Message body not shown because it is not plain text.

This service is sponsored and maintained by Best Practical Solutions and runs on infrastructure.

Please report any issues with to