Skip Menu | You are currently an anonymous guest. | Login | Return to Main | About rt.cpan.org
 

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.

X Report information
Id: 7376
Status: open
Left: 0 min
Priority: 0/0
Queue: Spreadsheet-ParseExcel

Owner: Nobody
Requestors: SREZIC <SREZIC [...] cpan.org>
Cc:
AdminCc:

Severity: Unimportant
Broken in: 0.2603
Fixed in: (no value)



X History Display mode: Brief headersFull headers
#   Tue Aug 17 06:33:12 2004 SREZIC - Ticket created  
Subject: Warnings during parsing of excel sheet
[text/plain 419b]
On occasion I get the following warning when parsing excel sheets:

Character in "C" format wrapped at /usr/local/lib/perl5/site_perl/5.8.0/Spreadsheet/ParseExcel/FmtDefault.pm line 68.

The parsed data looks OK though. I guess it has something to do with Unicode characters in the excel sheet. Replacing the line 68 with

return pack('U*', unpack('n*', $sTxt));

seems to remove the warnings.

Regards,
Slaven

#   Fri Jul 15 14:58:51 2005 guest - Correspondence added  
From: ernst[...]cron-it.de
[text/plain 498b]
[SREZIC - Tue Aug 17 06:33:12 2004]:

> On occasion I get the following warning when parsing excel sheets:
> (...)
> Replacing the line 68 with
> (...)
> seems to remove the warnings.

Nice. I wonder what side-effects this has.

I get a couple of other similar warnings:

Character in 'c' format wrapped in pack at
/usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1790.
Character in 'c' format wrapped in pack at
/usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1789.

#   Fri Aug 12 11:00:04 2005 guest - Correspondence added  
From: afwest
[text/plain 1.4k]
[guest - Fri Jul 15 14:58:51 2005]:

> > [ Change unpack("C*",... to unpack("U*",...
> Nice. I wonder what side-effects this has.

It seems to me that Excel encodes extended characters in UTF-8, so this
will correctly parse extended characters into a Unicode string. (I
noticed this with several characters, suprisingly including the Euro
symbol.) This could potentially mess up calling code which is
expecting a string containing ASCII characters, but it seems fine in
the context of the module itself.

> I get a couple of other similar warnings:
>
> Character in 'c' format wrapped in pack at
> /usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1790.
> Character in 'c' format wrapped in pack at
> /usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1789.

These warnings are because the module is trying to mask off the last
two bits of a character by unpacking it to a *signed* number and then
masking 0xFC and then repacking it as a singed number. But, the & 0xFC
operation seems to remove the sign. So, I made this change to those
two lines:
< substr($sWk, 3, 1) &= pack('c', unpack("c",substr($sWk, 3,
1)) & 0xFC);
> substr($sWk, 3, 1) &= pack('C', unpack("C",substr($sWk, 3,
1)) & 0xFC);

< substr($lWk, 0, 1) &= pack('c', unpack("c",substr($lWk, 0,
1)) & 0xFC);
> substr($lWk, 0, 1) &= pack('C', unpack("C",substr($lWk, 0,
1)) & 0xFC);


which does the same operation, but uses an unsigned integer.


#   Fri Aug 12 11:02:31 2005 guest - Correspondence added  
From: afwest
[text/plain 1.4k]
[guest - Fri Jul 15 14:58:51 2005]:

> > [ Change unpack("C*",... to unpack("U*",...
> Nice. I wonder what side-effects this has.

It seems to me that Excel encodes extended characters in UTF-8, so this
will correctly parse extended characters into a Unicode string. (I
noticed this with several characters, suprisingly including the Euro
symbol.) This could potentially mess up calling code which is
expecting a string containing ASCII characters, but it seems fine in
the context of the module itself.

> I get a couple of other similar warnings:
>
> Character in 'c' format wrapped in pack at
> /usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1790.
> Character in 'c' format wrapped in pack at
> /usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1789.

These warnings are because the module is trying to mask off the last
two bits of a character by unpacking it to a *signed* number and then
masking 0xFC and then repacking it as a singed number. But, the & 0xFC
operation seems to remove the sign. So, I made this change to those
two lines:
< substr($sWk, 3, 1) &= pack('c', unpack("c",substr($sWk, 3,
1)) & 0xFC);
> substr($sWk, 3, 1) &= pack('C', unpack("C",substr($sWk, 3,
1)) & 0xFC);

< substr($lWk, 0, 1) &= pack('c', unpack("c",substr($lWk, 0,
1)) & 0xFC);
> substr($lWk, 0, 1) &= pack('C', unpack("C",substr($lWk, 0,
1)) & 0xFC);


which does the same operation, but uses an unsigned integer.


#   Sat Jan 28 11:12:10 2006 guest - Correspondence added  
Subject: Warnings during parsing of excel sheet - correct patch
From: Sergio Freire
[text/plain 2.3k]
So,
is the correct patch something like changing these lines?
Can anyone validate this and if its ok include it in the module for a
future release?

Line 68 of FmtDefault.pm:
------------------------

#return pack('C*', unpack('n*', $sTxt));
return pack('U*', unpack('n*', $sTxt));


Lines 1789,1790 of ParseExcel.pm:
-----------------------------------

#substr($sWk, 3, 1) &= pack('c', unpack("c",substr($sWk, 3, 1)) & 0xFC);
#substr($lWk, 0, 1) &= pack('c', unpack("c",substr($lWk, 0, 1)) & 0xFC);
# changed accordingly with
http://rt.cpan.org/Public/Bug/Display.html?id=7376
substr($sWk, 3, 1) &= pack('C', unpack("C",substr($sWk, 3, 1)) & 0xFC);
substr($lWk, 0, 1) &= pack('C', unpack("C",substr($lWk, 0, 1)) & 0xFC);



Regards,
Sergio Freire




On Fri Aug 12 11:02:31 2005, guest wrote:
> [guest - Fri Jul 15 14:58:51 2005]:
>
> > > [ Change unpack("C*",... to unpack("U*",...
> > Nice. I wonder what side-effects this has.
>
> It seems to me that Excel encodes extended characters in UTF-8, so this
> will correctly parse extended characters into a Unicode string. (I
> noticed this with several characters, suprisingly including the Euro
> symbol.) This could potentially mess up calling code which is
> expecting a string containing ASCII characters, but it seems fine in
> the context of the module itself.
>
> > I get a couple of other similar warnings:
> >
> > Character in 'c' format wrapped in pack at
> > /usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1790.
> > Character in 'c' format wrapped in pack at
> > /usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1789.
>
> These warnings are because the module is trying to mask off the last
> two bits of a character by unpacking it to a *signed* number and then
> masking 0xFC and then repacking it as a singed number. But, the & 0xFC
> operation seems to remove the sign. So, I made this change to those
> two lines:
> < substr($sWk, 3, 1) &= pack('c', unpack("c",substr($sWk, 3,
> 1)) & 0xFC);
> > substr($sWk, 3, 1) &= pack('C', unpack("C",substr($sWk, 3,
> 1)) & 0xFC);
>
> < substr($lWk, 0, 1) &= pack('c', unpack("c",substr($lWk, 0,
> 1)) & 0xFC);
> > substr($lWk, 0, 1) &= pack('C', unpack("C",substr($lWk, 0,
> 1)) & 0xFC);
>
>
> which does the same operation, but uses an unsigned integer.
>



#   Sat Jan 28 11:12:11 2006 RT_System - Status changed from 'new' to 'open'  
#   Thu Mar 02 10:43:17 2006 guest - Correspondence added  
Subject: Warnings during parsing of excel sheet (substr outside of string)
From: mark[...]summersault.com
[text/plain 486b]
I got a different warning:

substr outside of string at
/usr/local/lib/perl5/site_perl/5.8.0/Spreadsheet/ParseExcel.pm line 1558.

It means: "You tried to reference a substr that pointed outside of a
string. That is, the absolute value of the offset was larger than the
length of the string".

A check on the string length before the substr is attempted would
resolve that.

That was with 0.2602, but the Changes file doesn't indicate this changed
in the most recent release.

Mark
#   Thu Mar 02 10:53:40 2006 guest - Correspondence added  
From: mark[...]summersault.com
[text/plain 638b]
On Thu Mar 02 10:43:17 2006, guest wrote:
> I got a different warning:
>
> substr outside of string at
> /usr/local/lib/perl5/site_perl/5.8.0/Spreadsheet/ParseExcel.pm line 1558.
>
> It means: "You tried to reference a substr that pointed outside of a
> string. That is, the absolute value of the offset was larger than the
> length of the string".

BTW, I got this error when trying parse a file that my system identified
as a "Microsoft Excel Worksheet". When the same data was provided in a
format identified as a "Microsoft Office Document", it worked.

I didn't create these files, so I don't know exactly how they were created.
#   Mon Sep 11 07:42:25 2006 SZABGAB - Correspondence added  
From: SZABGAB[...]cpan.org
[text/plain 333b]
On Tue Aug 17 06:33:12 2004, SREZIC wrote:
> On occasion I get the following warning when parsing excel sheets:
>
> Character in "C" format wrapped at
> /usr/local/lib/perl5/site_perl/5.8.0/Spreadsheet/ParseExcel/FmtDefault.pm
> line 68.
>
> return pack('U*', unpack('n*', $sTxt));

this fix has been applied to version 0.27_01
#   Wed Sep 13 04:04:32 2006 marschap - Correspondence added  
From: bitcard[...]adpm.de
[text/plain 841b]
Hi,

On Mon Sep 11 07:42:25 2006, SZABGAB wrote:
> On Tue Aug 17 06:33:12 2004, SREZIC wrote:
> > On occasion I get the following warning when parsing excel sheets:
> >
> > Character in "C" format wrapped at
>
> /usr/local/lib/perl5/site_perl/5.8.0/Spreadsheet/ParseExcel/FmtDefault.pm
> > line 68.
> >
> > return pack('U*', unpack('n*', $sTxt));
>
> this fix has been applied to version 0.27_01

This change might increase the Perl version required for
Spreadsheet::ParseExcel.

I don't know if the 'U' option for pack() was there before Perl 5.6
(or even later)

Maybe wrapping it in an alternative à la

return pack(($] >= 5.006) ? 'U*' : 'C*'), unpack('n*', $sTxt))

might do the trick.

Alternatively make Spreadsheet::ParseExcel require Perl 5.7 (with all
the unicode stuff in place).

I prefer the latter.

Regards
Peter
#   Wed Jan 14 04:26:35 2009 JMCNAMARA - Subject changed from 'Warnings during parsing of excel sheet' to 'Warnings during parsing of excel sheet (Unicode issue)'