Skip Menu |
 

This queue is for tickets about the DBD-mysql CPAN distribution.

Report information
The Basics
Id: 25590
Status: resolved
Worked: 30 min
Priority: 0/
Queue: DBD-mysql

People
Owner: CAPTTOFU [...] cpan.org
Requestors: JRED [...] cpan.org
Cc: pali [...] cpan.org
AdminCc:

Bug Information
Severity: Important
Broken in: 4.003
Fixed in: 4.041_01

Attachments
DBD-mysql-4.003-missing-utf8-upgrade-test.patch.txt



Subject: mysql_enable_utf8 misses utf8 upgrading of non utf8 strings
MIME-Version: 1.0
X-Mailer: MIME-tools 5.418 (Entity 5.418)
X-RT-Original-Encoding: utf-8
Content-Type: multipart/mixed; boundary="----------=_1174475869-17981-9"
Content-Length: 0
Content-Type: text/plain; charset="utf8"
Content-Disposition: inline
Content-Transfer-Encoding: binary
Content-Length: 566
Download (untitled) / with headers
text/plain 566b
When using mysql_enable_utf8 DBD::mysql misses upgrading non utf8 strings. Instead it assumes that all data passed to DBD::mysql is valid utf8 already. Non ASCII characters are simply removed from strings passed to DBD::mysql. Attached you find a patch against t/utf8.t (DBD-mysql-4.003 release) which adds a correspondent regression test. With this patch test #19 fails with the following message: not ok 19 at line 185 got back 'umlauts: ' instead of 'umlauts: äüö'. I'm using DBI 1.54 and DBD::mysql 4.003 compiled with MySQL 5.0.27 client libraries.
Subject: DBD-mysql-4.003-missing-utf8-upgrade-test.patch.txt
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----------=_1174475869-17981-8"
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Length: 0
Content-Type: text/plain; charset="utf8"
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 0
Content-Type: text/plain; charset="utf-8"; name="DBD-mysql-4.003-missing-utf8-upgrade-test.patch.txt"
Content-Disposition: inline; filename="DBD-mysql-4.003-missing-utf8-upgrade-test.patch.txt"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: iso-8859-1
Content-Length: 3334
--- t/utf8.t 2007-03-21 11:59:04.954355876 +0100 +++ t/utf8.t.patched 2007-03-21 11:58:42.653080899 +0100 @@ -102,7 +102,7 @@ # Create a new table; In an ideal world, it'd be more sensible to # make the whole database UTF8... # - $query = "CREATE TABLE $table (name VARCHAR(64) CHARACTER SET utf8, bincol BLOB, shape GEOMETRY)"; + $query = "CREATE TABLE $table (name VARCHAR(64) CHARACTER SET utf8, bincol BLOB, shape GEOMETRY, test INTEGER)"; Test($state or $dbh->do($query)) or ErrMsgF("Cannot create table: Error %s.\n", $dbh->errstr); @@ -113,24 +113,34 @@ my $utf8_str = "\x{0100}dam"; # "Adam" with a macron. my $quoted_utf8_str = "'\x{0100}dam'"; - + my $blob = "\x{c4}\x{80}dam"; # same as utf8_str but not utf8 encoded my $quoted_blob = "'\x{c4}\x{80}dam'"; + my $umlaut_str = "umlauts: äüö"; + my $quoted_umlaut_str = "'umlauts: äüö'"; + Test( $state or ( $dbh->quote( $utf8_str ) eq $quoted_utf8_str ) ) or ErrMsg( "Failed to retain UTF-8 flag when quoting.\n" ); Test( $state or ( $dbh->quote( $blob ) eq $quoted_blob ) ) or ErrMsg( "UTF-8 flag was set when quoting.\n" ); + Test( $state or ( $dbh->quote( $umlaut_str ) eq $quoted_umlaut_str ) ) + or ErrMsg( "Failed to retain UTF-8 flag when quoting non-utf8 string.\n" ); + Test( $state or ( $dbh->{ mysql_enable_utf8 } ) ) or ErrMsg( "mysql_enable_utf8 didn't survive connect()\n" ); - $query = qq{INSERT INTO $table (name, bincol, shape) VALUES (?,?, GeomFromText('Point(132865 501937)'))}; + $query = qq{INSERT INTO $table (name, bincol, shape, test) VALUES (?,?, GeomFromText('Point(132865 501937)'),1)}; Test( $state or $dbh->do( $query, {}, $utf8_str,$blob ) ) or ErrMsgF( "INSERT failed: query $query, error %s.\n", $dbh->errstr ); - $query = "SELECT name,bincol,asbinary(shape) FROM $table LIMIT 1"; + $query = qq{INSERT INTO $table (name,test) values (?,2)}; + Test( $state or $dbh->do( $query, {}, $umlaut_str ) ) + or ErrMsgF( "INSERT failed: query $query, error %s.\n", $dbh->errstr ); + + $query = "SELECT name,bincol,asbinary(shape) FROM $table WHERE test=1 LIMIT 1"; Test( $state or ($sth = $dbh->prepare( $query ) ) ) or ErrMsgF( "prepare failed: query $query, error %s.\n", $dbh->errstr ); @@ -158,6 +168,22 @@ Test( $state or ($ref->[1] eq $blob) ) or ErrMsgF( "got back '$ref->[1]' instead of '$blob'.\n" ); + Test( $state or $sth->finish ) + or ErrMsgF( "Cannot finish: %s.\n", $sth->errstr ); + + # test if non-utf8 tagged data was inserted correctly + $query = "SELECT name FROM $table WHERE test=2 LIMIT 1"; + Test( $state or ($sth = $dbh->prepare( $query ) ) ) + or ErrMsgF( "prepare failed: query $query, error %s.\n", $dbh->errstr ); + + Test($state or $sth->execute) + or ErrMsgF( "execute failed: query $query, error %s.\n", $dbh->errstr ); + + Test( $state or defined( $ref = $sth->fetchrow_arrayref ) ) + or ErrMsgF( "fetch failed: query $query, error %s.\n", $sth->errstr ); + + Test( $state or ($ref->[0] eq $umlaut_str) ) + or ErrMsgF( "got back '$ref->[0]' instead of '$umlaut_str'.\n" ); Test( $state or $sth->finish ) or ErrMsgF( "Cannot finish: %s.\n", $sth->errstr );
MIME-Version: 1.0
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
Message-Id: <rt-3.6.HEAD-18036-1174480934-229.25590-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 1138
Download (untitled) / with headers
text/plain 1.1k
Joern, Thank you very much for this patch. One issue though, I had applied another patch to this test from Joost Diepenmaat that might address what your patch intends to address. Could you please check out the latest DBD::mysql and verify if you still need to make a patch against that test? If you would like, I can even give you commit access to the repository. I'm hoping to do a release either later today or tomorrow, but will wait upon your response before I do. Kind regards, Patrick On Wed Mar 21 07:17:54 2007, JRED wrote: Show quoted text
> > When using mysql_enable_utf8 DBD::mysql misses upgrading non utf8 > strings. Instead it assumes that all data passed to DBD::mysql is valid > utf8 already. Non ASCII characters are simply removed from strings > passed to DBD::mysql. > > Attached you find a patch against t/utf8.t (DBD-mysql-4.003 release) > which adds a correspondent regression test. > > With this patch test #19 fails with the following message: > > not ok 19 at line 185 > got back 'umlauts: ' instead of 'umlauts: äüö'. > > I'm using DBI 1.54 and DBD::mysql 4.003 compiled with MySQL 5.0.27 > client libraries.
MIME-Version: 1.0
In-Reply-To: <rt-3.6.HEAD-18036-1174480934-229.25590-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
Message-Id: <rt-3.6.HEAD-17969-1174509784-654.25590-0-0 [...] rt.cpan.org>
References: <rt-3.6.HEAD-18036-1174480934-229.25590-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 1179
Download (untitled) / with headers
text/plain 1.1k
On Mi. 21. Mär. 2007, 08:42:14, CAPTTOFU wrote: Show quoted text
> Thank you very much for this patch. One issue though, I had applied > another patch to this test from Joost Diepenmaat that might address what > your patch intends to address. Could you please check out the latest > DBD::mysql and verify if you still need to make a patch against that > test? If you would like, I can even give you commit access to the > repository.
Thanks for the quick reply. I just checked out svn trunk and my patched utf8.t still fails. As far as I understand the Changes file correctly, Joost's patch addresses a missing utf8 flag when selecting utf8_bin columns from the database. My issue is that DBD::mysql passes all data as-is to the database even when the connection is in utf8 mode. This way all non ASCII characters of non-utf8-tagged strings gets lost in the database. But passing non-utf8-tagged strings to DBD::mysql should be absolutely valid, since they're valid for Perl they should be valid for DBD::mysql as well ;) A fix would be to utf8::upgrade all binded parameters and the SQL statement itself before sending them to the server when mysql_enable_utf8 is active. Regards, Jörn
MIME-Version: 1.0
In-Reply-To: <rt-3.6.HEAD-17969-1174509784-654.25590-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
References: <rt-3.6.HEAD-18036-1174480934-229.25590-0-0 [...] rt.cpan.org> <rt-3.6.HEAD-17969-1174509784-654.25590-0-0 [...] rt.cpan.org>
Message-Id: <rt-3.6.HEAD-17981-1174587110-315.25590-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf8"
Content-Transfer-Encoding: binary
From: JRED [...] cpan.org
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 453
Download (untitled) / with headers
text/plain 453b
On Mi. 21. Mär. 2007, 16:43:05, JRED wrote: Show quoted text
> A fix would be to utf8::upgrade all binded parameters and the SQL > statement itself before sending them to the server when > mysql_enable_utf8 is active.
More exactly: of course not all strings need to be upgraded, just those targeted to character columns. In particular blobs wouldn't survive an utf8::upgrade ;) If you're interested I can provide more test scripts for these issues. Regards, Jörn
MIME-Version: 1.0
In-Reply-To: <rt-3.6.HEAD-17981-1174587110-315.25590-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
References: <rt-3.6.HEAD-18036-1174480934-229.25590-0-0 [...] rt.cpan.org> <rt-3.6.HEAD-17969-1174509784-654.25590-0-0 [...] rt.cpan.org> <rt-3.6.HEAD-17981-1174587110-315.25590-0-0 [...] rt.cpan.org>
Message-Id: <rt-3.6.HEAD-14807-1180230993-921.25590-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf8"
Content-Transfer-Encoding: binary
From: CAPTTOFU [...] cpan.org
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 836
Download (untitled) / with headers
text/plain 836b
Hi, This is on my radar, but I admit, I'm not a guru on UTF8 - probably an American issue. I have recently started to have to deal with UTF8 a bit on grazr, particularly with Chinese feeds, and now understand some of the issues with UTF8. What does utf8::upgrade do exactly? I would like to fix this issue! On Thu Mar 22 14:11:50 2007, JRED wrote: Show quoted text
> On Mi. 21. Mär. 2007, 16:43:05, JRED wrote: >
> > A fix would be to utf8::upgrade all binded parameters and the SQL > > statement itself before sending them to the server when > > mysql_enable_utf8 is active.
> > More exactly: of course not all strings need to be upgraded, just those > targeted to character columns. In particular blobs wouldn't survive an > utf8::upgrade ;) > > If you're interested I can provide more test scripts for these issues. > > Regards, > > Jörn
MIME-Version: 1.0
In-Reply-To: <rt-3.6.HEAD-14807-1180230993-921.25590-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
References: <rt-3.6.HEAD-18036-1174480934-229.25590-0-0 [...] rt.cpan.org> <rt-3.6.HEAD-17969-1174509784-654.25590-0-0 [...] rt.cpan.org> <rt-3.6.HEAD-17981-1174587110-315.25590-0-0 [...] rt.cpan.org> <rt-3.6.HEAD-14807-1180230993-921.25590-0-0 [...] rt.cpan.org>
Message-Id: <rt-3.6.HEAD-14796-1180257848-250.25590-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf8"
Content-Transfer-Encoding: binary
From: JRED [...] cpan.org
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 1092
On Sat May 26 21:56:33 2007, CAPTTOFU wrote: Show quoted text
> What does utf8::upgrade do exactly?
When the variable has the utf-8 flag set, it does nothing, because it assumes that this variables carries valid utf-8 already. If the utf-8 flag isn't set, the variable is _converted_ to utf-8 and the utf-8 flag is set as well. The latter case is interesting in conjunction with DBD::mysql when the database connection is utf-8, but the application passes non utf-8 data. Then the data has to be "upgraded" to utf-8 and must not be passed as-is, because MySQL then receives illegal data which does not conform to utf-8. So all parameters binded to SQL statements (and the SQL statements themself) need to go through a utf8::upgrade() and everything will work as expected. We do this in a database layer we wrote for our applications, but we want to get rid of this layer ;) Of course it would be better if this is handled on DBI/DBD level, not on application level. I dunno the DBI/DBD architecture well, but probably this could be done even at DBI level, so all drivers would benefit. Regards, Jörn
MIME-Version: 1.0
In-Reply-To: <rt-3.6.HEAD-14796-1180257848-250.25590-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
References: <rt-3.6.HEAD-18036-1174480934-229.25590-0-0 [...] rt.cpan.org> <rt-3.6.HEAD-17969-1174509784-654.25590-0-0 [...] rt.cpan.org> <rt-3.6.HEAD-17981-1174587110-315.25590-0-0 [...] rt.cpan.org> <rt-3.6.HEAD-14807-1180230993-921.25590-0-0 [...] rt.cpan.org> <rt-3.6.HEAD-14796-1180257848-250.25590-0-0 [...] rt.cpan.org>
Message-Id: <rt-3.6.HEAD-5485-1180533685-1006.25590-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf8"
Content-Transfer-Encoding: binary
From: JDIEPEN [...] cpan.org
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 667
Download (untitled) / with headers
text/plain 667b
Sorry for jumping in, but... On Sun May 27 05:24:09 2007, JRED wrote: Show quoted text
> So all parameters binded to SQL statements (and the SQL statements > themself) need to go through a utf8::upgrade() and everything will work > as expected.
Note that any bounded binary columns/values must NOT be upgraded. This means you need to know for each bounded parameter if it's a text value or binary. I've looked at this issue myself, but I couldn't figure out how to do it. My experience with dbd::mysql & libmysqlclient is pretty limited though. Also note that it's NOT enough to check the sql type of the column. See also RT #24738 (fixed that for selecting binary & utf-8 values)
MIME-Version: 1.0
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-28296-1477148873-1455.25590-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 195
Download (untitled) / with headers
text/plain 195b
Fix for UTF-8 support in DBD::mysql is in my pull request: https://github.com/perl5-dbi/DBD-mysql/pull/67 I would like if more people affected by UTF-8 bugs in DBD::mysql could test my changes...
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-28296-1477148873-1455.25590-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <rt-4.0.18-28296-1477148873-1455.25590-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-7011-1498900552-928.25590-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 37
Reopening, fix was reverted in 4.043.
X-RT-Interface: REST
MIME-Version: 1.0
X-Mailer: MIME-tools 5.504 (Entity 5.504)
RT-Message-ID: <rt-4.0.18-28699-1510688849-1243.25590-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: binary
Content-Length: 78


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.