Skip Menu |
 

This queue is for tickets about the DBD-mysql CPAN distribution.

Report information
The Basics
Id: 25590
Status: resolved
Worked: 30 min
Priority: 0/
Queue: DBD-mysql

People
Owner: CAPTTOFU [...] cpan.org
Requestors: JRED [...] cpan.org
Cc: pali [...] cpan.org
AdminCc:

Bug Information
Severity: Important
Broken in: 4.003
Fixed in: 4.041_01

Attachments
DBD-mysql-4.003-missing-utf8-upgrade-test.patch.txt



Subject: mysql_enable_utf8 misses utf8 upgrading of non utf8 strings
Download (untitled) / with headers
text/plain 566b
When using mysql_enable_utf8 DBD::mysql misses upgrading non utf8 strings. Instead it assumes that all data passed to DBD::mysql is valid utf8 already. Non ASCII characters are simply removed from strings passed to DBD::mysql. Attached you find a patch against t/utf8.t (DBD-mysql-4.003 release) which adds a correspondent regression test. With this patch test #19 fails with the following message: not ok 19 at line 185 got back 'umlauts: ' instead of 'umlauts: äüö'. I'm using DBI 1.54 and DBD::mysql 4.003 compiled with MySQL 5.0.27 client libraries.
Subject: DBD-mysql-4.003-missing-utf8-upgrade-test.patch.txt
--- t/utf8.t 2007-03-21 11:59:04.954355876 +0100 +++ t/utf8.t.patched 2007-03-21 11:58:42.653080899 +0100 @@ -102,7 +102,7 @@ # Create a new table; In an ideal world, it'd be more sensible to # make the whole database UTF8... # - $query = "CREATE TABLE $table (name VARCHAR(64) CHARACTER SET utf8, bincol BLOB, shape GEOMETRY)"; + $query = "CREATE TABLE $table (name VARCHAR(64) CHARACTER SET utf8, bincol BLOB, shape GEOMETRY, test INTEGER)"; Test($state or $dbh->do($query)) or ErrMsgF("Cannot create table: Error %s.\n", $dbh->errstr); @@ -113,24 +113,34 @@ my $utf8_str = "\x{0100}dam"; # "Adam" with a macron. my $quoted_utf8_str = "'\x{0100}dam'"; - + my $blob = "\x{c4}\x{80}dam"; # same as utf8_str but not utf8 encoded my $quoted_blob = "'\x{c4}\x{80}dam'"; + my $umlaut_str = "umlauts: äüö"; + my $quoted_umlaut_str = "'umlauts: äüö'"; + Test( $state or ( $dbh->quote( $utf8_str ) eq $quoted_utf8_str ) ) or ErrMsg( "Failed to retain UTF-8 flag when quoting.\n" ); Test( $state or ( $dbh->quote( $blob ) eq $quoted_blob ) ) or ErrMsg( "UTF-8 flag was set when quoting.\n" ); + Test( $state or ( $dbh->quote( $umlaut_str ) eq $quoted_umlaut_str ) ) + or ErrMsg( "Failed to retain UTF-8 flag when quoting non-utf8 string.\n" ); + Test( $state or ( $dbh->{ mysql_enable_utf8 } ) ) or ErrMsg( "mysql_enable_utf8 didn't survive connect()\n" ); - $query = qq{INSERT INTO $table (name, bincol, shape) VALUES (?,?, GeomFromText('Point(132865 501937)'))}; + $query = qq{INSERT INTO $table (name, bincol, shape, test) VALUES (?,?, GeomFromText('Point(132865 501937)'),1)}; Test( $state or $dbh->do( $query, {}, $utf8_str,$blob ) ) or ErrMsgF( "INSERT failed: query $query, error %s.\n", $dbh->errstr ); - $query = "SELECT name,bincol,asbinary(shape) FROM $table LIMIT 1"; + $query = qq{INSERT INTO $table (name,test) values (?,2)}; + Test( $state or $dbh->do( $query, {}, $umlaut_str ) ) + or ErrMsgF( "INSERT failed: query $query, error %s.\n", $dbh->errstr ); + + $query = "SELECT name,bincol,asbinary(shape) FROM $table WHERE test=1 LIMIT 1"; Test( $state or ($sth = $dbh->prepare( $query ) ) ) or ErrMsgF( "prepare failed: query $query, error %s.\n", $dbh->errstr ); @@ -158,6 +168,22 @@ Test( $state or ($ref->[1] eq $blob) ) or ErrMsgF( "got back '$ref->[1]' instead of '$blob'.\n" ); + Test( $state or $sth->finish ) + or ErrMsgF( "Cannot finish: %s.\n", $sth->errstr ); + + # test if non-utf8 tagged data was inserted correctly + $query = "SELECT name FROM $table WHERE test=2 LIMIT 1"; + Test( $state or ($sth = $dbh->prepare( $query ) ) ) + or ErrMsgF( "prepare failed: query $query, error %s.\n", $dbh->errstr ); + + Test($state or $sth->execute) + or ErrMsgF( "execute failed: query $query, error %s.\n", $dbh->errstr ); + + Test( $state or defined( $ref = $sth->fetchrow_arrayref ) ) + or ErrMsgF( "fetch failed: query $query, error %s.\n", $sth->errstr ); + + Test( $state or ($ref->[0] eq $umlaut_str) ) + or ErrMsgF( "got back '$ref->[0]' instead of '$umlaut_str'.\n" ); Test( $state or $sth->finish ) or ErrMsgF( "Cannot finish: %s.\n", $sth->errstr );
Download (untitled) / with headers
text/plain 1.1k
Joern, Thank you very much for this patch. One issue though, I had applied another patch to this test from Joost Diepenmaat that might address what your patch intends to address. Could you please check out the latest DBD::mysql and verify if you still need to make a patch against that test? If you would like, I can even give you commit access to the repository. I'm hoping to do a release either later today or tomorrow, but will wait upon your response before I do. Kind regards, Patrick On Wed Mar 21 07:17:54 2007, JRED wrote: Show quoted text
> > When using mysql_enable_utf8 DBD::mysql misses upgrading non utf8 > strings. Instead it assumes that all data passed to DBD::mysql is valid > utf8 already. Non ASCII characters are simply removed from strings > passed to DBD::mysql. > > Attached you find a patch against t/utf8.t (DBD-mysql-4.003 release) > which adds a correspondent regression test. > > With this patch test #19 fails with the following message: > > not ok 19 at line 185 > got back 'umlauts: ' instead of 'umlauts: äüö'. > > I'm using DBI 1.54 and DBD::mysql 4.003 compiled with MySQL 5.0.27 > client libraries.
Download (untitled) / with headers
text/plain 1.1k
On Mi. 21. Mär. 2007, 08:42:14, CAPTTOFU wrote: Show quoted text
> Thank you very much for this patch. One issue though, I had applied > another patch to this test from Joost Diepenmaat that might address what > your patch intends to address. Could you please check out the latest > DBD::mysql and verify if you still need to make a patch against that > test? If you would like, I can even give you commit access to the > repository.
Thanks for the quick reply. I just checked out svn trunk and my patched utf8.t still fails. As far as I understand the Changes file correctly, Joost's patch addresses a missing utf8 flag when selecting utf8_bin columns from the database. My issue is that DBD::mysql passes all data as-is to the database even when the connection is in utf8 mode. This way all non ASCII characters of non-utf8-tagged strings gets lost in the database. But passing non-utf8-tagged strings to DBD::mysql should be absolutely valid, since they're valid for Perl they should be valid for DBD::mysql as well ;) A fix would be to utf8::upgrade all binded parameters and the SQL statement itself before sending them to the server when mysql_enable_utf8 is active. Regards, Jörn
From: JRED [...] cpan.org
Download (untitled) / with headers
text/plain 453b
On Mi. 21. Mär. 2007, 16:43:05, JRED wrote: Show quoted text
> A fix would be to utf8::upgrade all binded parameters and the SQL > statement itself before sending them to the server when > mysql_enable_utf8 is active.
More exactly: of course not all strings need to be upgraded, just those targeted to character columns. In particular blobs wouldn't survive an utf8::upgrade ;) If you're interested I can provide more test scripts for these issues. Regards, Jörn
From: CAPTTOFU [...] cpan.org
Download (untitled) / with headers
text/plain 836b
Hi, This is on my radar, but I admit, I'm not a guru on UTF8 - probably an American issue. I have recently started to have to deal with UTF8 a bit on grazr, particularly with Chinese feeds, and now understand some of the issues with UTF8. What does utf8::upgrade do exactly? I would like to fix this issue! On Thu Mar 22 14:11:50 2007, JRED wrote: Show quoted text
> On Mi. 21. Mär. 2007, 16:43:05, JRED wrote: >
> > A fix would be to utf8::upgrade all binded parameters and the SQL > > statement itself before sending them to the server when > > mysql_enable_utf8 is active.
> > More exactly: of course not all strings need to be upgraded, just those > targeted to character columns. In particular blobs wouldn't survive an > utf8::upgrade ;) > > If you're interested I can provide more test scripts for these issues. > > Regards, > > Jörn
From: JRED [...] cpan.org
On Sat May 26 21:56:33 2007, CAPTTOFU wrote: Show quoted text
> What does utf8::upgrade do exactly?
When the variable has the utf-8 flag set, it does nothing, because it assumes that this variables carries valid utf-8 already. If the utf-8 flag isn't set, the variable is _converted_ to utf-8 and the utf-8 flag is set as well. The latter case is interesting in conjunction with DBD::mysql when the database connection is utf-8, but the application passes non utf-8 data. Then the data has to be "upgraded" to utf-8 and must not be passed as-is, because MySQL then receives illegal data which does not conform to utf-8. So all parameters binded to SQL statements (and the SQL statements themself) need to go through a utf8::upgrade() and everything will work as expected. We do this in a database layer we wrote for our applications, but we want to get rid of this layer ;) Of course it would be better if this is handled on DBI/DBD level, not on application level. I dunno the DBI/DBD architecture well, but probably this could be done even at DBI level, so all drivers would benefit. Regards, Jörn
From: JDIEPEN [...] cpan.org
Download (untitled) / with headers
text/plain 667b
Sorry for jumping in, but... On Sun May 27 05:24:09 2007, JRED wrote: Show quoted text
> So all parameters binded to SQL statements (and the SQL statements > themself) need to go through a utf8::upgrade() and everything will work > as expected.
Note that any bounded binary columns/values must NOT be upgraded. This means you need to know for each bounded parameter if it's a text value or binary. I've looked at this issue myself, but I couldn't figure out how to do it. My experience with dbd::mysql & libmysqlclient is pretty limited though. Also note that it's NOT enough to check the sql type of the column. See also RT #24738 (fixed that for selecting binary & utf-8 values)
Download (untitled) / with headers
text/plain 195b
Fix for UTF-8 support in DBD::mysql is in my pull request: https://github.com/perl5-dbi/DBD-mysql/pull/67 I would like if more people affected by UTF-8 bugs in DBD::mysql could test my changes...
Reopening, fix was reverted in 4.043.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.