Skip Menu |
 

This queue is for tickets about the URI CPAN distribution.

Report information
The Basics
Id: 75026
Status: open
Priority: 0/
Queue: URI

People
Owner: Nobody in particular
Requestors: bits [...] itools.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 1.59
Fixed in: (no value)



Subject: Percent character not escaped
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 1449
Download (untitled) / with headers
text/plain 1.4k
Hi Gisle, I was expecting a "%" character not followed by /[0-9a-fA-F]{2}/ to be percent-encoded, per RFC 2396 2.4.2: Because the percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI. However, the percent "%" character passes through URI unescaped: use URI; use URI::Escape; my $unescaped = 'http://example.org/10%_of_nothing'; my $expected_uri = 'http://example.org/10%25_of_nothing'; my @uris = ( URI->new($unescaped), URI->new_abs('10%_of_nothing', 'http://example.org/'), URI->new_abs(uri_escape('10%_of_nothing'), 'http://example.org/') ); for (@uris) { my $canonical_uri = $_->canonical; print $canonical_uri, ($canonical_uri eq $expected_uri ? ' is ' : " isn't "), $expected_uri, "\n"; } # http://example.org/10%_of_nothing isn't http://example.org/10%25_of_nothing # http://example.org/10%_of_nothing isn't http://example.org/10%25_of_nothing # http://example.org/10%25_of_nothing is http://example.org/10%25_of_nothing Curiously, use Regexp::Common qw( URI ); # gives $RE{URI}{HTTP} print $unescaped, ($RE{URI}{HTTP}->matches($unescaped) ? ' matches ' : " doesn't match "), "Regexp::Common::URI::http\n"; # http://example.org/10%_of_nothing matches Regexp::Common::URI::http are these both bugs or have I misinterpreted the spec?
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-10810-1329315642-1741.75026-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
From: bits [...] itools.com
X-RT-Original-Encoding: utf-8
Content-Length: 67
The web RT interface seems to mangle the report, download looks ok.
MIME-Version: 1.0
In-Reply-To: <rt-3.8.HEAD-10810-1329315642-1741.75026-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <rt-3.8.HEAD-10810-1329315642-1741.75026-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-30922-1335040567-785.75026-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
From: bits [...] itools.com
X-RT-Original-Encoding: utf-8
Content-Length: 516
Download (untitled) / with headers
text/plain 516b
Gisle, Could you please weigh in on whether URI->canonical() would need to process strings containing percent characters not followed by /[0-9a-fA-F] {2}/ by escaping them to %25 to form a valid RFC 2393 URI? If canonical() isn't meant to accept unescaped strings and produce a conformant URI, perhaps the docs could indicate that strings should be parsed into its components, appropriately uri_escaped and re-assembled into a URI string before passing to canonical? Thanks for shedding some light on this.
MIME-Version: 1.0
In-Reply-To: <rt-3.8.HEAD-30922-1335040567-785.75026-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <rt-3.8.HEAD-10810-1329315642-1741.75026-0-0 [...] rt.cpan.org> <rt-3.8.HEAD-30922-1335040567-785.75026-0-0 [...] rt.cpan.org>
Content-Type: text/html; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-20175-1336910043-404.75026-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 541
It's deliberate that URI does not modify % followed by something that isn't a hex number.  As I remember it this was based on some wording somewhere (in the old days) that said that the sequence % not followed by a hex number was reserved for future extensions.  By passing these sequences through unchanged URI would be compatible with this potential future.

I'm not able to locate this wording anywhere now.

This means that I'm fine with changing URI's behaviour in this regard.



This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.