Skip Menu |

This queue is for tickets about the MediaWiki-API CPAN distribution.

Report information
The Basics
Id: 59673
Status: resolved
Priority: 0/
Queue: MediaWiki-API

Owner: Nobody in particular
Requestors: n [...]

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)

Subject: Problem with unicode in article names
Date: Sat, 24 Jul 2010 18:12:41 +0400
To: bug-MediaWiki-API [...]
From: Nikolay Shaplov <n [...]>
Download (untitled) / with headers
text/plain 1.3k
I am trying to parse french wiktionary using MediaWiki::API. I've met some problems with unicode. Here is an example: use strict; use MediaWiki::API; my $mw = MediaWiki::API->new(); $mw->{config}->{api_url} = ''; $mw->{config}->{use_http_get}=1; $mw->{config}->{ skip_encoding } =1; my $articles = $mw->list ( { action => 'query', list => 'categorymembers', cmtitle => 'Catégorie:français', cmcontinue => 'campisoliennes !Campisoliennes|', cmlimit => 'max' } , {skip_encoding => 1}) || die $mw->{error}->{code} . ': ' . $mw->{error}->{details};. When it gets to the cmcontinue = "canaux darrosage ! !canaux d’arrosage|" script fails: Can't escape \x{2019}, try uri_escape_utf8() instead at MediaWiki/ line 754 Right now this example reproduces the error in one step, but if wiki maintainers add some more words to the category the behavior might changed... To solve this problem I've forced cleaning of utf-8 flag before url escaping in _make_querystring sub _make_querystring { my ($ref) = @_; print $ref->{cmcontinue}, "\n"; my @qs = (); for my $key ( keys %{$ref} ) { my $val=$ref->{$key}; Encode::_utf8_off($val); my $keyval = uri_escape($key) . '=' . uri_escape($val); push(@qs, $keyval); } return '?' . join('&',@qs); } With this patch everything works well
Download (untitled) / with headers
text/plain 425b
Thanks for the reply. you are right that the combination of skip encoding and use_http_get is not working currently. The fix should be to do a uri_escape_utf8 when the values contain utf8 or uri_escape when they don't. (Turning off utf8 works too as you have suggested). I will release a fix soon. In the meantime please continue to use your fix or switch off use_http_get (and use the post method instead which will work).
fixed in 0.34

This service is sponsored and maintained by Best Practical Solutions and runs on infrastructure.

Please report any issues with to