Skip Menu |
 

This queue is for tickets about the DBIx-Class-Schema-Loader CPAN distribution.

Report information
The Basics
Id: 123698
Status: open
Priority: 0/
Queue: DBIx-Class-Schema-Loader

People
Owner: Nobody in particular
Requestors: felix.ostmann [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.07047
Fixed in: (no value)



Subject: Enums types are not properly create when unicode character is used
MIME-Version: 1.0
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
Message-ID: <rt-4.0.18-29090-1511258041-1591.0-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 849
Download (untitled) / with headers
text/plain 849b
The {extra}{list} enum values are not correct encoded. I use the same connection settings for the app itself and all data from the database are correctly encoded except this enum. Show quoted text
> \dT+
... steinhaus_main | enum_tasks_status | enum_tasks_status | 4 | offen +| | | | | erledigt +| | | | | zurückgestellt | ... $ grep status -C5 Tasks.pm ... "status", { data_type => "enum", default_value => "offen", extra => { custom_type_name => "enum_tasks_status", list => ["offen", "erledigt", "zur\xFCckgestellt"], }, is_nullable => 0, }, ... the file is in utf8 with use utf8; in the beginning so i expected: list => ["offen", "erledigt", "zurückgestellt"],
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-29090-1511258041-1591.0-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <rt-4.0.18-29090-1511258041-1591.0-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-29090-1511262507-983.123698-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 1406
Download (untitled) / with headers
text/plain 1.3k
On 2017-11-21 09:54:01, felix.ostmann@gmail.com wrote: Show quoted text
> The {extra}{list} enum values are not correct encoded. I use the same > connection settings for the app itself and all data from the database > are correctly encoded except this enum. > >
> > \dT+
> ... > steinhaus_main | enum_tasks_status | enum_tasks_status | 4 | > offen +| > | | | | > erledigt +| > | | | | > zurückgestellt | > ... > > > $ grep status -C5 Tasks.pm > ... > "status", > { > data_type => "enum", > default_value => "offen", > extra => { > custom_type_name => "enum_tasks_status", > list => ["offen", "erledigt", "zur\xFCckgestellt"], > }, > is_nullable => 0, > }, > ... > > the file is in utf8 with use utf8; in the beginning so i expected: > > list => ["offen", "erledigt", "zurückgestellt"],
These representations of the string are equivalent: $ perl -Mutf8 -E 'say "zur\xFCckgestellt" eq "zurückgestellt"' 1 Schema::Loader uses Data::Dump to serialise method call arguments in the generated files, and it encodes all non-ASCII (and non-printable) characters using \x notation. For aesthetic reasons it might be desirable to output Unicode word characters literally too, but the current output is not incorrect. - ilmari
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-29090-1511262507-983.123698-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: API
References: <rt-4.0.18-29090-1511258041-1591.0-0-0 [...] rt.cpan.org> <rt-4.0.18-29090-1511262507-983.123698-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-22555-1511264593-228.0-0-0 [...] rt.cpan.org>
Message-ID: <rt-4.0.18-22555-1511264593-172.123698-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
From: felix.ostmann [...] gmail.com
Content-Length: 1897
Download (untitled) / with headers
text/plain 1.8k
Am Di 21. Nov 2017, 06:08:27, ilmari schrieb: Show quoted text
> On 2017-11-21 09:54:01, felix.ostmann@gmail.com wrote:
> > The {extra}{list} enum values are not correct encoded. I use the same > > connection settings for the app itself and all data from the database > > are correctly encoded except this enum. > > > >
> > > \dT+
> > ... > > steinhaus_main | enum_tasks_status | enum_tasks_status | 4 > > | > > offen +| > > | | | > > | > > erledigt +| > > | | | > > | > > zurückgestellt | > > ... > > > > > > $ grep status -C5 Tasks.pm > > ... > > "status", > > { > > data_type => "enum", > > default_value => "offen", > > extra => { > > custom_type_name => "enum_tasks_status", > > list => ["offen", "erledigt", "zur\xFCckgestellt"], > > }, > > is_nullable => 0, > > }, > > ... > > > > the file is in utf8 with use utf8; in the beginning so i expected: > > > > list => ["offen", "erledigt", "zurückgestellt"],
> > These representations of the string are equivalent: > > $ perl -Mutf8 -E 'say "zur\xFCckgestellt" eq "zurückgestellt"' > 1 > > Schema::Loader uses Data::Dump to serialise method call arguments in > the generated files, and it encodes all non-ASCII (and non-printable) > characters using \x notation. > > For aesthetic reasons it might be desirable to output Unicode word > characters literally too, but the current output is not incorrect. > > - ilmari
It is not really the same ... In the real code i have to make a Encode::decode('ISO-8859-15', $enum) as a quickfix. $ cat ticket123698.pl use utf8; use 5.20.0; use Data::Dumper; say "zur\xFCckgestellt" eq "zurückgestellt"; print Dumper("zur\xFCckgestellt","zurückgestellt"); $ perl ticket123698.pl 1 $VAR1 = 'zur�ckgestellt'; $VAR2 = "zur\x{fc}ckgestellt";
X-Amavis-Alert: BAD HEADER SECTION, Improper folded header field made up entirely of whitespace (char 20 hex): X-Exiscan-Spam-Report: ...ping.uio.no\n Score Rule\n \n * -5.0 PING_U[...]
MIME-Version: 1.0
X-Spam-Status: No, score=-5.269 tagged_above=-99.9 required=10 tests=[AWL=0.631, BAYES_00=-1.9, FROM_OUR_RT=-4] autolearn=ham
In-Reply-To: <rt-4.0.18-22555-1511264594-1429.123698-5-0 [...] rt.cpan.org> (Felix Antonius Wilhelm Ostmann via's message of "Tue, 21 Nov 2017 06:43:16 -0500")
X-Exiscan-Spam-Score: -3.7 (---)
X-Cpan.org: This message routed through the cpan.org mail forwarding service. Please use PAUSE pause.perl.org to configure your delivery settings.
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-123698 [...] rt.cpan.org> <rt-4.0.18-29090-1511258041-1591.123698-5-0 [...] rt.cpan.org> <rt-4.0.18-29090-1511262507-983.123698-5-0 [...] rt.cpan.org> <rt-4.0.18-22555-1511264594-1429.123698-5-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <d8j7euj4yya.fsf [...] dalvik.ping.uio.no>
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
X-Spam-Score: -5.269
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 70527240108 for <cpan-bug+DBIx-Class-Schema-Loader [...] hipster.bestpractical.com>; Tue, 21 Nov 2017 07:07:58 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hqRa2db53ARN for <cpan-bug+DBIx-Class-Schema-Loader [...] hipster.bestpractical.com>; Tue, 21 Nov 2017 07:07:55 -0500 (EST)
Received: from xx1.develooper.com (xx1.develooper.com [207.171.7.115]) by hipster.bestpractical.com (Postfix) with ESMTPS id 7C2082400FE for <bug-DBIx-Class-Schema-Loader [...] rt.cpan.org>; Tue, 21 Nov 2017 07:07:55 -0500 (EST)
Received: from localhost (xx1.develooper.com [127.0.0.1]) by localhost (Postfix) with ESMTP id D5AC01211DA for <bug-DBIx-Class-Schema-Loader [...] rt.cpan.org>; Tue, 21 Nov 2017 04:07:52 -0800 (PST)
Received: from xx1.develooper.com (xx1.develooper.com [127.0.0.1]) by localhost (Postfix) with SMTP id B960E1211F0 for <bug-DBIx-Class-Schema-Loader [...] rt.cpan.org>; Tue, 21 Nov 2017 04:07:47 -0800 (PST)
Received: from ping.uio.no (pike.ping.uio.no [193.157.115.208]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by xx1.develooper.com (Postfix) with ESMTPS id B29B51211DA for <bug-DBIx-Class-Schema-Loader [...] rt.cpan.org>; Tue, 21 Nov 2017 04:07:45 -0800 (PST)
Received: from [2001:700:100:570::211] (helo=dalvik.ping.uio.no ident=Debian-exim) by ping.uio.no with esmtp (Exim 4.72 #1 (Debian)) id 1eH7L3-00051t-Fa for <bug-DBIx-Class-Schema-Loader [...] rt.cpan.org>; Tue, 21 Nov 2017 13:07:43 +0100
Received: from ilmari by dalvik.ping.uio.no with local (Exim 4.84_2) (envelope-from <ilmari [...] ilmari.org>) id 1eH7L3-00041J-8t for bug-DBIx-Class-Schema-Loader [...] rt.cpan.org; Tue, 21 Nov 2017 13:07:41 +0100
Delivered-To: cpan-bug+DBIx-Class-Schema-Loader [...] hipster.bestpractical.com
X-Exiscan-Spam-Report: SpamAssassin 3.3.1 (2010-03-16) on pike.ping.uio.no Score Rule * -5.0 PING_UIO_MAIL_IS_INTERNAL Message has never been outside * 129.240.0.0/16 or 193.157.115.0/24 * 1.3 RDNS_NONE Delivered to internal network by a host with no rDNS
Subject: Re: [rt.cpan.org #123698] Enums types are not properly create when unicode character is used
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux)
Return-Path: <ilmari [...] ilmari.org>
X-Original-To: cpan-bug+DBIx-Class-Schema-Loader [...] hipster.bestpractical.com
X-RT-Mail-Extension: dbix-class-schema-loader
Date: Tue, 21 Nov 2017 12:07:41 +0000
X-PMX-Spam: Gauge=IIIIIIII, Probability=8%, Report=' HTML_00_01 0.05, HTML_00_10 0.05, BODYTEXTP_SIZE_3000_LESS 0, BODY_SIZE_1800_1899 0, BODY_SIZE_2000_LESS 0, BODY_SIZE_5000_LESS 0, BODY_SIZE_7000_LESS 0, IN_REP_TO 0, LEGITIMATE_SIGNS 0, MSG_THREAD 0, REFERENCES 0, SINGLE_URI_IN_BODY 0, SPF_NONE 0, URI_WITH_PATH_ONLY 0, __ANY_URI 0, __BOUNCE_CHALLENGE_SUBJ 0, __BOUNCE_NDR_SUBJ_EXEMPT 0, __CP_URI_IN_BODY 0, __CT 0, __CTE 0, __CT_TEXT_PLAIN 0, __FORWARDED_MSG 0, __FRAUD_COMMON 0, __FRAUD_MONEY_CURRENCY 0, __FRAUD_MONEY_CURRENCY_DOLLAR 0, __FRAUD_REFNUM 0, __HAS_FROM 0, __HAS_MSGID 0, __HTTPS_URI 0, __IN_REP_TO 0, __MIME_TEXT_ONLY 0, __MIME_TEXT_P 0, __MIME_TEXT_P1 0, __MIME_VERSION 0, __NO_HTML_TAG_RAW 0, __REFERENCES 0, __SANE_MSGID 0, __SINGLE_URI_TEXT 0, __STOCK_PHRASE_7 0, __SUBJ_ALPHA_END 0, __SUBJ_ALPHA_NEGATE 0, __TO_MALFORMED_2 0, __TO_NAME 0, __TO_NAME_DIFF_FROM_ACC 0, __TO_REAL_NAMES 0, __URI_IN_BODY 0, __URI_NOT_IMG 0, __URI_NO_WWW 0, __URI_NS , __URI_WITH_PATH 0, __USER_AGENT 0, __zen.spamhaus.org_ERROR '
X-Spam-Level:
X-Quarantine-ID: <hqRa2db53ARN>
X-PMX-Version: 5.6.1.2065439, Antispam-Engine: 2.7.2.376379, Antispam-Data: 2017.11.21.120016
To: "Felix Antonius Wilhelm Ostmann via RT" <bug-DBIx-Class-Schema-Loader [...] rt.cpan.org>
Content-Transfer-Encoding: 8bit
From: ilmari [...] ilmari.org (Dagfinn Ilmari Mannsåker)
RT-Message-ID: <rt-4.0.18-28779-1511266079-1644.123698-0-0 [...] rt.cpan.org>
Content-Length: 1772
Download (untitled) / with headers
text/plain 1.7k
"Felix Antonius Wilhelm Ostmann via RT" <bug-DBIx-Class-Schema-Loader@rt.cpan.org> writes: Show quoted text
> It is not really the same ...
The _internal_ representation is not the same; the \x from will be represented internally as one byte per code point ("downgraded"), while the literal form will be utf-8-encoded ("upgraded"). Semantically they are the same, as evidenced by "eq" returning true. Show quoted text
> In the real code i have to make a Encode::decode('ISO-8859-15', $enum) as a quickfix.
Please show where in the real code you have to do this. It smells like something you're passing it to suffering from the Unicode Bug, i.e. treating the characters in the 128..255 range differently depending on the internal representation (see https://metacpan.org/pod/perlunicode#The-%22Unicode-Bug%22 for details). Show quoted text
> $ cat ticket123698.pl > use utf8; > use 5.20.0; > use Data::Dumper; > say "zur\xFCckgestellt" eq "zurückgestellt"; > print Dumper("zur\xFCckgestellt","zurückgestellt"); > $ perl ticket123698.pl > 1 > $VAR1 = 'zur�ckgestellt'; > $VAR2 = "zur\x{fc}ckgestellt";
The different outputs here are a quirk of how Data::Dumper deals with downgraded vs. upgraded strings (which could be viewed as an instance of the Unicode Bug, but doesn't actually affect semantics). The first one is showing as � because you haven't thold perl that your terminal expects UTF-8-encoded strings. Adding use open qw(:std :utf8); to the script will make it apply a UTF-8 encoding layer to the standard input/output/error filehandles, so non-ASCII charcters show correctly. - ilmari -- "I use RMS as a guide in the same way that a boat captain would use a lighthouse. It's good to know where it is, but you generally don't want to find yourself in the same spot." - Tollef Fog Heen
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-28779-1511266079-1644.123698-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: API
References: <RT-Ticket-123698 [...] rt.cpan.org> <rt-4.0.18-29090-1511258041-1591.123698-5-0 [...] rt.cpan.org> <rt-4.0.18-29090-1511262507-983.123698-5-0 [...] rt.cpan.org> <rt-4.0.18-22555-1511264594-1429.123698-5-0 [...] rt.cpan.org> <d8j7euj4yya.fsf [...] dalvik.ping.uio.no> <rt-4.0.18-28779-1511266079-1644.123698-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-8568-1511271339-1522.0-0-0 [...] rt.cpan.org>
Message-ID: <rt-4.0.18-8568-1511271339-1017.123698-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
From: felix.ostmann [...] gmail.com
Content-Length: 3030
Download (untitled) / with headers
text/plain 2.9k
Am Di 21. Nov 2017, 07:07:59, ilmari@ilmari.org schrieb: Show quoted text
> "Felix Antonius Wilhelm Ostmann via RT" > <bug-DBIx-Class-Schema-Loader@rt.cpan.org> writes: >
> > It is not really the same ...
> > The _internal_ representation is not the same; the \x from will be > represented internally as one byte per code point ("downgraded"), > while > the literal form will be utf-8-encoded ("upgraded"). Semantically they > are the same, as evidenced by "eq" returning true. >
> > In the real code i have to make a Encode::decode('ISO-8859-15', > > $enum) as a quickfix.
> > Please show where in the real code you have to do this. It smells > like > something you're passing it to suffering from the Unicode Bug, > i.e. treating the characters in the 128..255 range differently > depending > on the internal representation (see > https://metacpan.org/pod/perlunicode#The-%22Unicode-Bug%22 for > details). >
> > $ cat ticket123698.pl > > use utf8; > > use 5.20.0; > > use Data::Dumper; > > say "zur\xFCckgestellt" eq "zurückgestellt"; > > print Dumper("zur\xFCckgestellt","zurückgestellt"); > > $ perl ticket123698.pl > > 1 > > $VAR1 = 'zur�ckgestellt'; > > $VAR2 = "zur\x{fc}ckgestellt";
> > The different outputs here are a quirk of how Data::Dumper deals with > downgraded vs. upgraded strings (which could be viewed as an instance > of > the Unicode Bug, but doesn't actually affect semantics). The first > one > is showing as � because you haven't thold perl that your terminal > expects UTF-8-encoded strings. Adding > > use open qw(:std :utf8); > > to the script will make it apply a UTF-8 encoding layer to the > standard > input/output/error filehandles, so non-ASCII charcters show correctly. > > - ilmari
OK, here is the real world scenario with pseudo code. I am using DBIx::Class + Catalyst + Template Toolkit ResultSet: sub enum_status { my ($self) = @_; # FIXME see https://rt.cpan.org/Public/Bug/Update.html?id=123698 return map { Encode::decode("ISO-8859-15", $_) } @{ $self->result_source->column_info('status')->{extra}->{list} }; return @{ $self->result_source->column_info('status')->{extra}->{list} }; } Catalyst-Controller: $c->stash->{status_order} = [ $rs->enum_status ]; Template: [% FOREACH status IN status_order %] <a href="[% c.request.uri_with({status => status}) %]"> [% END %] Without the FIXME the links are ISO-8859-15 After reading your reply and docs about unicode-Bug i changed the code to the following: __PACKAGE__->column_adds( ... { data_type => "enum", default_value => "offen", extra => { custom_type_name => "enum_tasks_status", list => ["offen", "erledigt", "zur\xFCckgestellt"], }, is_nullable => 0, }, ... ); ... # DO NOT MODIFY THIS OR ANYTHING ABOVE! md5sum:W4KhHAXiEW35h5XWiZwhFg utf8::upgrade($_) for @{ __PACKAGE__->column_info('status')->{extra}->{list} }; But in my option this is kind of a bug. Why are all other strings comming from the database already upgraded but not this?


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.