|
|
| # | Tue Aug 17 06:33:12 2004 | SREZIC - Ticket created | |||
[text/plain 419b]
On occasion I get the following warning when parsing excel sheets:
Character in "C" format wrapped at /usr/local/lib/perl5/site_perl/5.8.0/Spreadsheet/ParseExcel/FmtDefault.pm line 68. The parsed data looks OK though. I guess it has something to do with Unicode characters in the excel sheet. Replacing the line 68 with return pack('U*', unpack('n*', $sTxt)); seems to remove the warnings. Regards, Slaven |
|||||
| # | Fri Jul 15 14:58:51 2005 | guest - Correspondence added | |||
[text/plain 498b]
[SREZIC - Tue Aug 17 06:33:12 2004]:
> On occasion I get the following warning when parsing excel sheets: > (...) > Replacing the line 68 with > (...) > seems to remove the warnings. Nice. I wonder what side-effects this has. I get a couple of other similar warnings: Character in 'c' format wrapped in pack at /usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1790. Character in 'c' format wrapped in pack at /usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1789. |
|||||
| # | Fri Aug 12 11:00:04 2005 | guest - Correspondence added | |||
[text/plain 1.4k]
[guest - Fri Jul 15 14:58:51 2005]:
> > [ Change unpack("C*",... to unpack("U*",... > Nice. I wonder what side-effects this has. It seems to me that Excel encodes extended characters in UTF-8, so this will correctly parse extended characters into a Unicode string. (I noticed this with several characters, suprisingly including the Euro symbol.) This could potentially mess up calling code which is expecting a string containing ASCII characters, but it seems fine in the context of the module itself. > I get a couple of other similar warnings: > > Character in 'c' format wrapped in pack at > /usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1790. > Character in 'c' format wrapped in pack at > /usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1789. These warnings are because the module is trying to mask off the last two bits of a character by unpacking it to a *signed* number and then masking 0xFC and then repacking it as a singed number. But, the & 0xFC operation seems to remove the sign. So, I made this change to those two lines: < substr($sWk, 3, 1) &= pack('c', unpack("c",substr($sWk, 3, 1)) & 0xFC); > substr($sWk, 3, 1) &= pack('C', unpack("C",substr($sWk, 3, 1)) & 0xFC); < substr($lWk, 0, 1) &= pack('c', unpack("c",substr($lWk, 0, 1)) & 0xFC); > substr($lWk, 0, 1) &= pack('C', unpack("C",substr($lWk, 0, 1)) & 0xFC); which does the same operation, but uses an unsigned integer. |
|||||
| # | Fri Aug 12 11:02:31 2005 | guest - Correspondence added | |||
[text/plain 1.4k]
[guest - Fri Jul 15 14:58:51 2005]:
> > [ Change unpack("C*",... to unpack("U*",... > Nice. I wonder what side-effects this has. It seems to me that Excel encodes extended characters in UTF-8, so this will correctly parse extended characters into a Unicode string. (I noticed this with several characters, suprisingly including the Euro symbol.) This could potentially mess up calling code which is expecting a string containing ASCII characters, but it seems fine in the context of the module itself. > I get a couple of other similar warnings: > > Character in 'c' format wrapped in pack at > /usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1790. > Character in 'c' format wrapped in pack at > /usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1789. These warnings are because the module is trying to mask off the last two bits of a character by unpacking it to a *signed* number and then masking 0xFC and then repacking it as a singed number. But, the & 0xFC operation seems to remove the sign. So, I made this change to those two lines: < substr($sWk, 3, 1) &= pack('c', unpack("c",substr($sWk, 3, 1)) & 0xFC); > substr($sWk, 3, 1) &= pack('C', unpack("C",substr($sWk, 3, 1)) & 0xFC); < substr($lWk, 0, 1) &= pack('c', unpack("c",substr($lWk, 0, 1)) & 0xFC); > substr($lWk, 0, 1) &= pack('C', unpack("C",substr($lWk, 0, 1)) & 0xFC); which does the same operation, but uses an unsigned integer. |
|||||
| # | Sat Jan 28 11:12:10 2006 | guest - Correspondence added | |||||
[text/plain 2.3k]
So,
is the correct patch something like changing these lines? Can anyone validate this and if its ok include it in the module for a future release? Line 68 of FmtDefault.pm: ------------------------ #return pack('C*', unpack('n*', $sTxt)); return pack('U*', unpack('n*', $sTxt)); Lines 1789,1790 of ParseExcel.pm: ----------------------------------- #substr($sWk, 3, 1) &= pack('c', unpack("c",substr($sWk, 3, 1)) & 0xFC); #substr($lWk, 0, 1) &= pack('c', unpack("c",substr($lWk, 0, 1)) & 0xFC); # changed accordingly with http://rt.cpan.org/Public/Bug/Display.html?id=7376 substr($sWk, 3, 1) &= pack('C', unpack("C",substr($sWk, 3, 1)) & 0xFC); substr($lWk, 0, 1) &= pack('C', unpack("C",substr($lWk, 0, 1)) & 0xFC); Regards, Sergio Freire On Fri Aug 12 11:02:31 2005, guest wrote: > [guest - Fri Jul 15 14:58:51 2005]: > > > > [ Change unpack("C*",... to unpack("U*",... > > Nice. I wonder what side-effects this has. > > It seems to me that Excel encodes extended characters in UTF-8, so this > will correctly parse extended characters into a Unicode string. (I > noticed this with several characters, suprisingly including the Euro > symbol.) This could potentially mess up calling code which is > expecting a string containing ASCII characters, but it seems fine in > the context of the module itself. > > > I get a couple of other similar warnings: > > > > Character in 'c' format wrapped in pack at > > /usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1790. > > Character in 'c' format wrapped in pack at > > /usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1789. > > These warnings are because the module is trying to mask off the last > two bits of a character by unpacking it to a *signed* number and then > masking 0xFC and then repacking it as a singed number. But, the & 0xFC > operation seems to remove the sign. So, I made this change to those > two lines: > < substr($sWk, 3, 1) &= pack('c', unpack("c",substr($sWk, 3, > 1)) & 0xFC); > > substr($sWk, 3, 1) &= pack('C', unpack("C",substr($sWk, 3, > 1)) & 0xFC); > > < substr($lWk, 0, 1) &= pack('c', unpack("c",substr($lWk, 0, > 1)) & 0xFC); > > substr($lWk, 0, 1) &= pack('C', unpack("C",substr($lWk, 0, > 1)) & 0xFC); > > > which does the same operation, but uses an unsigned integer. > |
|||||||
| # | Sat Jan 28 11:12:11 2006 | RT_System - Status changed from 'new' to 'open' | ||
| # | Thu Mar 02 10:43:17 2006 | guest - Correspondence added | |||||
[text/plain 486b]
I got a different warning:
substr outside of string at /usr/local/lib/perl5/site_perl/5.8.0/Spreadsheet/ParseExcel.pm line 1558. It means: "You tried to reference a substr that pointed outside of a string. That is, the absolute value of the offset was larger than the length of the string". A check on the string length before the substr is attempted would resolve that. That was with 0.2602, but the Changes file doesn't indicate this changed in the most recent release. Mark |
|||||||
| # | Thu Mar 02 10:53:40 2006 | guest - Correspondence added | |||
[text/plain 638b]
On Thu Mar 02 10:43:17 2006, guest wrote:
> I got a different warning: > > substr outside of string at > /usr/local/lib/perl5/site_perl/5.8.0/Spreadsheet/ParseExcel.pm line 1558. > > It means: "You tried to reference a substr that pointed outside of a > string. That is, the absolute value of the offset was larger than the > length of the string". BTW, I got this error when trying parse a file that my system identified as a "Microsoft Excel Worksheet". When the same data was provided in a format identified as a "Microsoft Office Document", it worked. I didn't create these files, so I don't know exactly how they were created. |
|||||
| # | Mon Sep 11 07:42:25 2006 | SZABGAB - Correspondence added | |||
[text/plain 333b]
On Tue Aug 17 06:33:12 2004, SREZIC wrote:
> On occasion I get the following warning when parsing excel sheets: > > Character in "C" format wrapped at > /usr/local/lib/perl5/site_perl/5.8.0/Spreadsheet/ParseExcel/FmtDefault.pm > line 68. > > return pack('U*', unpack('n*', $sTxt)); this fix has been applied to version 0.27_01 |
|||||
| # | Wed Sep 13 04:04:32 2006 | marschap - Correspondence added | |||
[text/plain 841b]
Hi,
On Mon Sep 11 07:42:25 2006, SZABGAB wrote: > On Tue Aug 17 06:33:12 2004, SREZIC wrote: > > On occasion I get the following warning when parsing excel sheets: > > > > Character in "C" format wrapped at > > /usr/local/lib/perl5/site_perl/5.8.0/Spreadsheet/ParseExcel/FmtDefault.pm > > line 68. > > > > return pack('U*', unpack('n*', $sTxt)); > > this fix has been applied to version 0.27_01 This change might increase the Perl version required for Spreadsheet::ParseExcel. I don't know if the 'U' option for pack() was there before Perl 5.6 (or even later) Maybe wrapping it in an alternative à la return pack(($] >= 5.006) ? 'U*' : 'C*'), unpack('n*', $sTxt)) might do the trick. Alternatively make Spreadsheet::ParseExcel require Perl 5.7 (with all the unicode stuff in place). I prefer the latter. Regards Peter |
|||||
| # | Wed Jan 14 04:26:35 2009 | JMCNAMARA - Subject changed from 'Warnings during parsing of excel sheet' to 'Warnings during parsing of excel sheet (Unicode issue)' | ||
Time to display: 0.951521
»|« RT 3.6.HEAD Copyright 1996-2009 Best Practical Solutions, LLC.
