Skip Menu |
 
Update: The rt.cpan.org bug tracker service is no longer shutting down.

This queue is for tickets about the Spreadsheet-XLSX CPAN distribution.

Report information
The Basics
Id: 66516
Status: open
Priority: 0/
Queue: Spreadsheet-XLSX

People
Owner: Nobody in particular
Requestors: giulioo [...] pobox.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



From giulioo [...] gmail.com Thu Mar 10 05: 54:28 2011
MIME-Version: 1.0
X-Spam-Status: No, score=-5.209 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_FROM=0.001, NML_ADSP_CUSTOM_MED=0.9, RCVD_IN_DNSWL_HI=-5, SPF_NEUTRAL=0.779, T_TO_NO_BRKTS_FREEMAIL=0.01] autolearn=ham
X-Mailer: Forte Agent 5.00/32.1171
X-Spam-Flag: NO
Message-ID: <20110310105058.A75F1F815 [...] i3.golden.dom>
content-type: text/plain; charset="utf-8"
Reply-To: giulioo [...] pobox.com
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Organization: unknown
X-Spam-Score: -5.209
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 0E62961E009 for <cpan-bug+Spreadsheet-XLSX [...] hipster.bestpractical.com>; Thu, 10 Mar 2011 05:54:28 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qJvDsyEUUpYX for <cpan-bug+Spreadsheet-XLSX [...] hipster.bestpractical.com>; Thu, 10 Mar 2011 05:54:26 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 129C924170D for <bug-Spreadsheet-XLSX [...] rt.cpan.org>; Thu, 10 Mar 2011 05:54:25 -0500 (EST)
Received: (qmail 1014 invoked by uid 103); 10 Mar 2011 10:54:25 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 10 Mar 2011 10:54:25 -0000
Received: from relay-2.smtp.seeweb.it (HELO relay-2.smtp.seeweb.it) (217.194.8.131) by 16.mx.develooper.com (qpsmtpd/0.80/v0.80-19-gf52d165) with ESMTP; Thu, 10 Mar 2011 02:54:20 -0800
Received: from i3.golden.dom (unknown [151.58.205.50]) by relay-2.smtp.seeweb.it (Postfix) with ESMTPA id 1778920E473 for <bug-Spreadsheet-XLSX [...] rt.cpan.org>; Thu, 10 Mar 2011 12:54:14 +0100 (CET)
Delivered-To: cpan-bug+Spreadsheet-XLSX [...] hipster.bestpractical.com
Subject: Using much more RAM than Spreadsheet-ParseExcel
Return-Path: <giulioo [...] gmail.com>
X-RT-Mail-Extension: spreadsheet-xlsx
X-Original-To: cpan-bug+Spreadsheet-XLSX [...] hipster.bestpractical.com
X-Spam-Check-BY: 16.mx.develooper.com
Date: Thu, 10 Mar 2011 11:51:02 +0100
X-Spam-Level:
To: bug-Spreadsheet-XLSX [...] rt.cpan.org
Content-Transfer-Encoding: quoted-printable
From: Giulio <giulioo [...] gmail.com>
X-RT-Original-Encoding: us-ascii
Content-Length: 700
Download (untitled) / with headers
text/plain 700b
Spreadsheet-XLSX-0.13 - perl-5.8.8 - Linux RHEL5.x Test file has 65000 rows with 4 columns each. 1) Test file in Excel 2007 .xlsx format: Parsing with Spreadsheet-XLSX-0.13: Process RAM usages stabilizes at 250 MB 2) Test file in BIFF .xls format Spreadsheet-ParseExcel-0.32 using Parse(): Process RAM usages stabilizes at 160 MB Spreadsheet-ParseExcel-0.32 using new()/CellHandler: Process RAM usages stabilizes at 60 MB So right now it's 60 MB for .xls and 250 MB for .xlsx. I understand .xlsx is more verbose and so much more data has to be kept in memory in order to be analyzed; however, would it be technically feasable to add something like CellHandler Spreadsheet-XLSX? Thanks
MIME-Version: 1.0
In-Reply-To: <20110310105058.A75F1F815 [...] i3.golden.dom>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
References: <20110310105058.A75F1F815 [...] i3.golden.dom>
Content-Type: multipart/mixed; boundary="----------=_1339109745-8038-32"
Message-ID: <rt-3.8.HEAD-8038-1339109745-266.66516-0-0 [...] rt.cpan.org>
X-RT-Original-Encoding: utf-8
RT-Send-CC: do [...] eludia.ru
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 571
Download (untitled) / with headers
text/plain 571b
On Thu Mar 10 05:54:28 2011, giulioo@pobox.com wrote: (snip) Show quoted text
> > So right now it's 60 MB for .xls and 250 MB for .xlsx. > > I understand .xlsx is more verbose and so much more data has to be kept in > memory in order to be analyzed; however, would it be technically
feasable to Show quoted text
> add something like CellHandler Spreadsheet-XLSX?
Patch attached to implement CellHandler and NotSetCell (and optionally implement converter in the optional options hash). It looks like this library has been unmaintained for a while. Note to author: Are you still around and maintaining?
MIME-Version: 1.0
Subject: XLSX.pm.diff
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Type: application/octet-stream; name="XLSX.pm.diff"
Content-Disposition: inline; filename="XLSX.pm.diff"
Content-Transfer-Encoding: base64
Content-Length: 2036
Download XLSX.pm.diff
text/x-diff 1.9k
--- XLSX.pm Sun May 16 02:07:33 2010 +++ XLSX.pm Thu Jun 7 15:41:03 2012 @@ -17,7 +17,15 @@ sub new { - my ($class, $filename, $converter) = @_; + my ($class, $filename, @options) = @_; + + my $converter; + + $converter = shift @options if @options % 2; + my %opts = @options; + $converter ||= $opts{Converter}; + my $cell_handler = $opts{CellHandler}; + my $not_set_cell = $opts{NotSetCell}; my $self = {}; @@ -110,9 +118,9 @@ my ($tag, $other) = ($1, $'); - my @pairs = split /\" /, $other; - $tag eq 'sheet' or next; + + my @pairs = split /\" /, $other; my $sheet = { MaxRow => 0, @@ -145,6 +153,7 @@ $self -> {Worksheet} = \@Worksheet; + my $sheet_count = 0; foreach my $sheet (@Worksheet) { my $member_sheet = $self -> {zip} -> memberNamed ("xl/$sheet->{path}") or next; @@ -211,7 +220,8 @@ $cell->{Type}="Text"; $cell->{Val}=$cell->{_Value}; } - $sheet -> {Cells} [$row] [$col] = $cell; + $cell_handler->($oBook, $sheet_count, $row, $col, $cell) if $cell_handler; + $sheet -> {Cells} [$row] [$col] = $cell unless $not_set_cell; } } @@ -219,7 +229,9 @@ $sheet -> {MinRow} = 0 if $sheet -> {MinRow} > $sheet -> {MaxRow}; $sheet -> {MinCol} = 0 if $sheet -> {MinCol} > $sheet -> {MaxCol}; - } + } continue { + $sheet_count++; + } foreach my $stys (keys %style_info){ } bless ($self, $class); @@ -278,6 +290,25 @@ but not all. It populates the classes from Spreadsheet::ParseExcel for interoperability; including Workbook, Worksheet, and Cell. +The CellHandler and NotSetCell options of Spreadsheet::ParseExcel are now implemented: + + Spreadsheet::XLSX->new ('test.xlsx', %options); + +Where %options may have the following keys; + +=over + +=item Converter + +=item CellHandler + +=item NotSetCell + +=back + +Converter may be optionally specified in the hash or as the third argument to new(). +CellHandler and NotSetCell work as they do in Spreadsheet::ParseExcel. + =head1 SEE ALSO =over 2
MIME-Version: 1.0
In-Reply-To: <rt-3.8.HEAD-8038-1339109745-266.66516-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <20110310105058.A75F1F815 [...] i3.golden.dom> <rt-3.8.HEAD-8038-1339109745-266.66516-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-5644-1339112178-1730.66516-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 273
Download (untitled) / with headers
text/plain 273b
On Thu Jun 07 18:55:45 2012, DOUGW wrote: Show quoted text
> > Patch attached to implement CellHandler and NotSetCell (and optionally > implement converter in the optional options hash).
Forgot ParseAbort(). Add after "if $cell_handler;": return $oBook if defined $oBook->{_ParseAbort};
MIME-Version: 1.0
In-Reply-To: <rt-3.8.HEAD-5644-1339112178-1730.66516-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <20110310105058.A75F1F815 [...] i3.golden.dom> <rt-3.8.HEAD-8038-1339109745-266.66516-0-0 [...] rt.cpan.org> <rt-3.8.HEAD-5644-1339112178-1730.66516-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-5030-1339112441-1469.66516-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 377
Download (untitled) / with headers
text/plain 377b
On Thu Jun 07 19:36:18 2012, DOUGW wrote: Show quoted text
> On Thu Jun 07 18:55:45 2012, DOUGW wrote:
> > > > Patch attached to implement CellHandler and NotSetCell (and optionally > > implement converter in the optional options hash).
> > Forgot ParseAbort(). > > Add after "if $cell_handler;": > return $oBook if defined $oBook->{_ParseAbort};
No, add it after "unless $not_set_cell;"
MIME-Version: 1.0
In-Reply-To: <rt-3.8.HEAD-5030-1339112441-1469.66516-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <20110310105058.A75F1F815 [...] i3.golden.dom> <rt-3.8.HEAD-8038-1339109745-266.66516-0-0 [...] rt.cpan.org> <rt-3.8.HEAD-5644-1339112178-1730.66516-0-0 [...] rt.cpan.org> <rt-3.8.HEAD-5030-1339112441-1469.66516-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-31486-1339234323-592.66516-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
From: giulioo [...] pobox.com
X-RT-Original-Encoding: utf-8
Content-Length: 400
Download (untitled) / with headers
text/plain 400b
On Thu Jun 07 19:40:41 2012, DOUGW wrote: Show quoted text
> On Thu Jun 07 19:36:18 2012, DOUGW wrote:
> > On Thu Jun 07 18:55:45 2012, DOUGW wrote:
> > > > > > Patch attached to implement CellHandler and NotSetCell (and optionally > > > implement converter in the optional options hash).
With a new 65.000 rows .xlsx test file (don't have the old one) memory stabilizes at 182MB instead of 264MB, -31%. Thanks!


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.