Skip Menu |
 

This queue is for tickets about the AnyEvent-Twitter CPAN distribution.

Report information
The Basics
Id: 53566
Status: resolved
Priority: 0/
Queue: AnyEvent-Twitter

People
Owner: Nobody in particular
Requestors: hideki.yamamura [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: update_status does not support non-ASCII characters
Date: Mon, 11 Jan 2010 03:15:04 +0900
To: bug-anyevent-twitter [...] rt.cpan.org
From: 山村 英貴 <hideki.yamamura [...] gmail.com>
Download (untitled) / with headers
text/plain 1.1k
Thanks for this very useful module. I found a bug about handling utf8 strings. The bug is that update_status does not support non-ASCII characters. First, this module use common::sense, so it implies "use utf8". In update_status(), $url->query_form is called with two arguments: (status, $status_e) key:status is utf8-flagged, but val:$status_e is octets. So in URI::_query::query_form, $status_e will be converted to utf8 when those key & val are connected. But $status_e is already converted to utf-8, so posted update turns unreadable strings when $status contained non-ASCII characters (like CJK, etc). --- /usr/lib/perl5/site_perl/5.8.8/AnyEvent/Twitter.pm 2009-11-05 08:09:54.000000000 +0900 +++ ./lib/AnyEvent/Twitter.pm 2010-01-11 03:03:21.000000000 +0900 @@ -477,11 +477,9 @@ sub update_status { my ($self, $status, $done_cb) = @_; - my $status_e = _encode_status $status; - my $url = URI::URL->new ($self->{base_url}); $url->path_segments ('statuses', "update.json"); - $url->query_form (status => $status_e); + $url->query_form (status => decode_utf8($status)); my $hdrs = { $self->_get_basic_auth };
Subject: Re: [rt.cpan.org #53566] update_status does not support non-ASCII characters
Date: Mon, 11 Jan 2010 16:53:18 +0100
To: 山村 英貴 via RT <bug-AnyEvent-Twitter [...] rt.cpan.org>
From: Robin Redeker <elmex [...] ta-sa.org>
Download (untitled) / with headers
text/plain 2.6k
Hi! On Sun, Jan 10, 2010 at 01:15:39PM -0500, 山村 英貴 via RT wrote: Show quoted text
> Sun Jan 10 13:15:38 2010: Request 53566 was acted upon. > Transaction: Ticket created by hideki.yamamura@gmail.com > Queue: AnyEvent-Twitter > Subject: update_status does not support non-ASCII characters > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: hideki.yamamura@gmail.com > Status: new > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=53566 > > > > Thanks for this very useful module. > > I found a bug about handling utf8 strings. > The bug is that update_status does not support non-ASCII characters. > > First, this module use common::sense, so it implies "use utf8". > In update_status(), $url->query_form is called with two arguments: > (status, $status_e) > key:status is utf8-flagged, but val:$status_e is octets. > > So in URI::_query::query_form, $status_e will be converted to utf8 > when those key & val are connected. > But $status_e is already converted to utf-8, so posted update turns > unreadable strings > when $status contained non-ASCII characters (like CJK, etc). > > > --- /usr/lib/perl5/site_perl/5.8.8/AnyEvent/Twitter.pm 2009-11-05 > 08:09:54.000000000 +0900 > +++ ./lib/AnyEvent/Twitter.pm 2010-01-11 03:03:21.000000000 +0900 > @@ -477,11 +477,9 @@ > sub update_status { > my ($self, $status, $done_cb) = @_; > > - my $status_e = _encode_status $status; > - > my $url = URI::URL->new ($self->{base_url}); > $url->path_segments ('statuses', "update.json"); > - $url->query_form (status => $status_e); > + $url->query_form (status => decode_utf8($status)); > > my $hdrs = { $self->_get_basic_auth }; >
This patch will break the module. The problem is, that C<$status> should be a plain string containing un-encoded unicode characters. calling decode_utf8 on it does not make sense at all, as the input to the update_status function should get un-encoded strings already. As URIs can't represent unicode characters the string must be encoded, which should be utf8 (i think i read that in the twitter API). So the code as it is, is correct if $status is a unicode string. The UTF8 flag is just an internal flag, which' should not be exposed or handled specially on the Perl language level. So, to come back to your Problem: What is the Problem? Can you put together a small test case that exposes the problem? Which Perl version do you use? Greetings, Robin -- Robin Redeker | Deliantra, the free code+content MORPG elmex@ta-sa.org / r.redeker@gmail.com | http://www.deliantra.net http://www.ta-sa.org/ |
Subject: Re: [rt.cpan.org #53566] update_status does not support non-ASCII characters
Date: Tue, 12 Jan 2010 01:37:40 +0900
To: bug-AnyEvent-Twitter [...] rt.cpan.org
From: Hideki Yamamura <hideki.yamamura [...] gmail.com>
Download (untitled) / with headers
text/plain 855b
Thanks for replying. I'm using Perl version is 5.8.8 and 5.10.0 (on CentOS 5.3 and Debian lenny). I made a small script for URI::URL's strange behavior. Executing this script, you'll get the four encoded URLs. AnyEvent::Twitter 0.27 uses first result with these two factors: get_url_utf8() (because use common::sense will convert 'status' to utf8-flagged string) $jp_octets (because $status_e was encoded by encoded_utf8($status)) But this encoded url string is completely broken. Apparently URI::URL's query_form can handle utf8-flagged strings when all parameters are utf8-flagged, so I made a patch to make $status utf8-flagged with Encode::decode_utf8. This patch's intention is: When $status was plain-string (utf8 encoded), it would be decoded by decode_utf8. When $status was utf8-flagged strings, it would be done nothing by decode_utf8.
Download uri-utf8.pl
text/x-perl 1.8k

Message body is not shown because sender requested not to inline it.

Subject: Re: [rt.cpan.org #53566] update_status does not support non-ASCII characters
Date: Tue, 12 Jan 2010 11:20:05 +0100
To: 山村 英貴 via RT <bug-AnyEvent-Twitter [...] rt.cpan.org>
From: Robin Redeker <elmex [...] ta-sa.org>
Download (untitled) / with headers
text/plain 1.7k
On Mon, Jan 11, 2010 at 11:38:01AM -0500, 山村 英貴 via RT wrote: Show quoted text
> Queue: AnyEvent-Twitter > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=53566 > > > Thanks for replying. > > I'm using Perl version is 5.8.8 and 5.10.0 (on CentOS 5.3 and Debian lenny). > > I made a small script for URI::URL's strange behavior. > Executing this script, you'll get the four encoded URLs. > > AnyEvent::Twitter 0.27 uses first result with these two factors: > get_url_utf8() (because use common::sense will convert 'status' to > utf8-flagged string) > $jp_octets (because $status_e was encoded by encoded_utf8($status)) > But this encoded url string is completely broken. > > Apparently URI::URL's query_form can handle utf8-flagged strings when > all parameters are utf8-flagged, > so I made a patch to make $status utf8-flagged with Encode::decode_utf8. > > This patch's intention is: > When $status was plain-string (utf8 encoded), it would be decoded by > decode_utf8. > When $status was utf8-flagged strings, it would be done nothing by decode_utf8. >
Best is, that you report this as a Bug against URI::URL. The bug seems to be, that URI::URL double-encodes the string $jp_octets in the first case, as far as I can see it. I don't know what you mess around with the utf8-flag. At Perl language level I should not have to think about that INTERNAL flag. If URI::URL does produce broken URLs if I pass in octets, then the bug is URI::URL. Even if URI::URL doesn't handle strings, which are internally flagged somehow, correctly, it should at least document that. Greetings, Robin -- Robin Redeker | Deliantra, the free code+content MORPG elmex@ta-sa.org / r.redeker@gmail.com | http://www.deliantra.net http://www.ta-sa.org/ |
Subject: Re: [rt.cpan.org #53566] update_status does not support non-ASCII characters
Date: Wed, 13 Jan 2010 00:34:10 +0900
To: bug-AnyEvent-Twitter [...] rt.cpan.org
From: Hideki Yamamura <hideki.yamamura [...] gmail.com>
Download (untitled) / with headers
text/plain 624b
More simply, please refer this script. You are doing same thing in AnyEvent::Twitter line 484. It is very usual problem arond utf8-flag in Japan because we use multibyte languages. I think you should not use common::sense. And I find far more excellent solution. Please use this patch (and drop first patch). --- /usr/lib/perl5/site_perl/5.8.8/AnyEvent/Twitter.pm 2009-11-05 08:09:54.000000000 +0900 +++ ./lib/AnyEvent/Twitter.pm 2010-01-13 00:20:12.000000000 +0900 @@ -1,5 +1,6 @@ package AnyEvent::Twitter; -use common::sense; +use strict; +use warnings; use Carp qw/croak/; use AnyEvent; use AnyEvent::HTTP;
Download utf8-problem.pl
text/x-perl 136b

Message body is not shown because sender requested not to inline it.

Subject: Re: [rt.cpan.org #53566] update_status does not support non-ASCII characters
Date: Thu, 14 Jan 2010 21:10:59 +0900
To: bug-AnyEvent-Twitter [...] rt.cpan.org
From: Hideki Yamamura <hideki.yamamura [...] gmail.com>
Download (untitled) / with headers
text/plain 602b
I found another way to avoid utf8-related problem. This is because status is utf8-on but 'status' is utf8-off. Following your advice, I submit a bug report for URI module. Thanks. --- /usr/lib/perl5/site_perl/5.8.8/AnyEvent/Twitter.pm 2009-11-05 08:09:54.000000000 +0900 +++ ./lib/AnyEvent/Twitter.pm 2010-01-14 21:01:23.000000000 +0900 @@ -481,7 +481,7 @@ my $url = URI::URL->new ($self->{base_url}); $url->path_segments ('statuses', "update.json"); - $url->query_form (status => $status_e); + $url->query_form ('status' => $status_e); my $hdrs = { $self->_get_basic_auth };
Subject: Re: [rt.cpan.org #53566] update_status does not support non-ASCII characters
Date: Thu, 14 Jan 2010 13:20:06 +0100
To: 山村 英貴 via RT <bug-AnyEvent-Twitter [...] rt.cpan.org>
From: Robin Redeker <elmex [...] ta-sa.org>
Download (untitled) / with headers
text/plain 1.1k
On Thu, Jan 14, 2010 at 07:11:29AM -0500, 山村 英貴 via RT wrote: Show quoted text
> Queue: AnyEvent-Twitter > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=53566 > > > I found another way to avoid utf8-related problem. > This is because status is utf8-on but 'status' is utf8-off. > Following your advice, I submit a bug report for URI module. Thanks.
Thats great. Thanks! Btw. the "status => ..." utf-8 flag on bareword bug is known in 5.10 and is fixed in the development version and the next release of Perl5 I think. I stumbled across that myself once. Show quoted text
> > > --- /usr/lib/perl5/site_perl/5.8.8/AnyEvent/Twitter.pm 2009-11-05 > 08:09:54.000000000 +0900 > +++ ./lib/AnyEvent/Twitter.pm 2010-01-14 21:01:23.000000000 +0900 > @@ -481,7 +481,7 @@ > > my $url = URI::URL->new ($self->{base_url}); > $url->path_segments ('statuses', "update.json"); > - $url->query_form (status => $status_e); > + $url->query_form ('status' => $status_e); > > my $hdrs = { $self->_get_basic_auth }; >
Greetings, Robin -- Robin Redeker | Deliantra, the free code+content MORPG elmex@ta-sa.org / r.redeker@gmail.com | http://www.deliantra.net http://www.ta-sa.org/ |
Subject: Re: [rt.cpan.org #53566] update_status does not support non-ASCII characters
Date: Sat, 30 Jan 2010 15:31:43 +0900
To: bug-AnyEvent-Twitter [...] rt.cpan.org
From: Hideki Yamamura <hideki.yamamura [...] gmail.com>
Download (untitled) / with headers
text/plain 730b
Hi. I have difficulty that URI.pm's maintainer does not respond my ticket. We have truly no way to post Japanese text to Twitter with AnyEvent::Twitter(0.27) on Perl 5.8 and 5.10. Would you apply this patch? Thanks. Show quoted text
>> --- /usr/lib/perl5/site_perl/5.8.8/AnyEvent/Twitter.pm  2009-11-05 >> 08:09:54.000000000 +0900 >> +++ ./lib/AnyEvent/Twitter.pm      2010-01-14 21:01:23.000000000 +0900 >> @@ -481,7 +481,7 @@ >> >>     my $url = URI::URL->new ($self->{base_url}); >>     $url->path_segments ('statuses', "update.json"); >> -   $url->query_form (status => $status_e); >> +   $url->query_form ('status' => $status_e); >> >>     my $hdrs = { $self->_get_basic_auth };
-- Hideki YAMAMURA <hideki.yamamura@gmail.com>
Download (untitled) / with headers
text/plain 519b
On Sat Jan 30 01:32:43 2010, hideki.yamamura@gmail.com wrote: Show quoted text
> Hi. > I have difficulty that URI.pm's maintainer does not respond my ticket. > We have truly no way to post Japanese text to Twitter with > AnyEvent::Twitter(0.27) > on Perl 5.8 and 5.10. > Would you apply this patch? Thanks. >
Oh, thats sad @ URI maintainer. I've applied your workaround. You can fetch the updated version from my git repository: http://git.ta-sa.org/AnyEvent-Twitter.git I will release it probably this week. Greetings, Robin


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.