Skip Menu |
 

This queue is for tickets about the Perlanet CPAN distribution.

Report information
The Basics
Id: 102221
Status: open
Priority: 0/
Queue: Perlanet

People
Owner: DAVECROSS [...] cpan.org
Requestors: grtodd [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.56
Fixed in: (no value)



Subject: URLEncoding issue with metacpan.org "News" feed
Download (untitled) / with headers
text/plain 1.1k
MetaCPAN's "Recent" feed works well and Perlanet creates links like the following for aggregation: http://metacpan.org/release/DWHEELER/Pod-Simple-3.29_6 The "News" feed at MetaCPAN (https://metacpan.org/feed/news) however uses URLs and links like this (with anchors): https://metacpan.org/news#sslimprovements which are lowercase with white space removed (note the "#"). When Perlnaet tries to create an aggregation from this feed it URL encodes "#" as %@# the resulting links look like: http://metacpan.org/news%23SSL%20improvements and thus break since if # is urlencoded it is not seen as an anchor, but as a literal character in the path. I have no idea if this is a Perlanet bug or not nor how or where to fix it. There may be some sort of discrepancy between the RDF/Atom feed describing the page and the actual source of he actual page. A work around might be to add "/" to the end of the URL which causes "%23" to be seen as an anchor. For example: http://metacpan.org/news/%23SSL%20improvements does find the page - if not the actual anchor location. Or perhaps adjusting settings when the HTML::Scrubber object is created - but I haven't investigated further.
Download (untitled) / with headers
text/plain 2.8k
On Thu Feb 19 15:00:19 2015, grtodd@gmail.com wrote: Show quoted text
> MetaCPAN's "Recent" feed works well and Perlanet creates links like > the following for aggregation: > > http://metacpan.org/release/DWHEELER/Pod-Simple-3.29_6 > > The "News" feed at MetaCPAN (https://metacpan.org/feed/news) however > uses URLs and links like this (with anchors): > > https://metacpan.org/news#sslimprovements > > which are lowercase with white space removed (note the "#"). When > Perlnaet tries to create an aggregation from this feed it URL encodes > "#" as %@# the resulting links look like: > > http://metacpan.org/news%23SSL%20improvements > > and thus break since if # is urlencoded it is not seen as an anchor, > but as a literal character in the path. > > I have no idea if this is a Perlanet bug or not nor how or where to > fix it. There may be some sort of discrepancy between the RDF/Atom > feed describing the page and the actual source of he actual page. > > A work around might be to add "/" to the end of the URL which causes > "%23" to be seen as an anchor. For example: > > http://metacpan.org/news/%23SSL%20improvements > > does find the page - if not the actual anchor location. Or perhaps > adjusting settings when the HTML::Scrubber object is created - but I > haven't investigated further.
Hi, It looks like there are a few things going on here. Firstly, there's no problem with the feed handling. If you're generating a feed file and you look at the URLs that are in that, then you'll see that they are correct. Secondly, MetaCPAN are creating invalid URLs. They all have spaces in - and spaces shouldn't exist in URLs. They should all be encoded to %20 or +. The URL you give as an example (https://metacpan.org/news#sslimprovements) doesn't exist in their feed. It's actually "http://metacpan.org/news#SSL improvements". Thirdly, MetaCPAN are creating URLs that contain fragments which link to <a> elements that don't exist. If they publish a URL like https://metacpan.org/news#sslimprovements then you'd expect to find an <a> element like <a name="sslimprovements">. That doesn't exist in the HTML source. So, even if Perlanet worked as expected, your links wouldn't work because the MetaCPAN site is broken. I'll see if I can submit a patch to them to fix those issues. But, there is still a problem with the page that Perlanet is generating for you. I don't think that it should change '#' to '%23'. That's happening because in the sample TT file which I provide (and which, I assume you copied) I use the 'uri' filter to clean up URLs for display. A quick fix would be to remove the 'url' filter. But I need to think about what other effects that might have. I think it's good practice to have it there (in most cases). It might be a bug in TT's 'uri' filter. It might need to add '#' to the list of characters that it doesn't touch. Thanks for the report. Cheers, Dave...


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.