This queue is for tickets about the libapreq CPAN distribution.

Report information
The Basics
Id:
24724
Status:
new
Priority:
Low/Low
Queue:

People
Owner:
Nobody in particular
Requestors:
SREZIC [...] cpan.org
Cc:
AdminCc:

BugTracker
Severity:
Normal
Broken in:
1.33
Fixed in:
(no value)



Subject: Apache::Request: support for %uXXXX escape sequence not usable
As it seems, Apache::Request tries to understand URL-escaped sequences in the form %uXXXX (what you got e.g. from CGI::Ajax when dealing with unicode characters with codepoints > 255). The problem is that there's no indication in the parsed values whether the bytes should be interpreted as iso-8859-1 or utf-8. So for instance, if the QUERY_STRING looks like something=%fc%u00fc then Apache::Request parses it into a mixture of iso-8859-1 and utf-8 bytes: "\374\303\274" when it should be "\374\374" or "\x{fc}\x{fc}". I don't know of a good solution how to solve this. The unescaping is done in ap_unescape_url_u in apache_request.c, a part of the code which only deals with C strings and not Perl SVs (in the latter case we could set the SvUTF8 if needed). Maybe a possible solution would be a global flag which indicates which encoding to use in this function. If this global flag is set to "iso-8859-1", then a warning would occur if %uXXXX with XXXX > 0xFF occurs (since this could not be represented in iso-8859-1). If the global flag is set to "utf-8" then escape sequences %80 .. %ff would cause to be converted into utf-8 sequences. When converting the values to Perl SVs, the data should be flagged properly as utf-8. To be backward compatible, the flag should probably be set to "iso-8859-1" by default. Regards, Slaven


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.