Skip Menu |

This queue is for tickets about the URI CPAN distribution.

Report information
The Basics
Id: 75026
Status: open
Priority: 0/
Queue: URI

Owner: Nobody in particular
Requestors: bits [...]

Bug Information
Severity: (no value)
Broken in: 1.59
Fixed in: (no value)

Subject: Percent character not escaped
Download (untitled) / with headers
text/plain 1.4k
Hi Gisle, I was expecting a "%" character not followed by /[0-9a-fA-F]{2}/ to be percent-encoded, per RFC 2396 2.4.2: Because the percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI. However, the percent "%" character passes through URI unescaped: use URI; use URI::Escape; my $unescaped = ''; my $expected_uri = ''; my @uris = ( URI->new($unescaped), URI->new_abs('10%_of_nothing', ''), URI->new_abs(uri_escape('10%_of_nothing'), '') ); for (@uris) { my $canonical_uri = $_->canonical; print $canonical_uri, ($canonical_uri eq $expected_uri ? ' is ' : " isn't "), $expected_uri, "\n"; } # isn't # isn't # is Curiously, use Regexp::Common qw( URI ); # gives $RE{URI}{HTTP} print $unescaped, ($RE{URI}{HTTP}->matches($unescaped) ? ' matches ' : " doesn't match "), "Regexp::Common::URI::http\n"; # matches Regexp::Common::URI::http are these both bugs or have I misinterpreted the spec?
From: bits [...]
The web RT interface seems to mangle the report, download looks ok.
From: bits [...]
Download (untitled) / with headers
text/plain 516b
Gisle, Could you please weigh in on whether URI->canonical() would need to process strings containing percent characters not followed by /[0-9a-fA-F] {2}/ by escaping them to %25 to form a valid RFC 2393 URI? If canonical() isn't meant to accept unescaped strings and produce a conformant URI, perhaps the docs could indicate that strings should be parsed into its components, appropriately uri_escaped and re-assembled into a URI string before passing to canonical? Thanks for shedding some light on this.
It's deliberate that URI does not modify % followed by something that isn't a hex number.  As I remember it this was based on some wording somewhere (in the old days) that said that the sequence % not followed by a hex number was reserved for future extensions.  By passing these sequences through unchanged URI would be compatible with this potential future.

I'm not able to locate this wording anywhere now.

This means that I'm fine with changing URI's behaviour in this regard.

This service is sponsored and maintained by Best Practical Solutions and runs on infrastructure.

Please report any issues with to