Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Spreadsheet-ParseExcel CPAN distribution.

Maintainer(s)' notes

If you are reporting a bug in Spreadsheet::ParseExcel here are some pointers

1) State the issues as clearly and as concisely as possible. A simple program or Excel test file (see below) will often explain the issue better than a lot of text.

2) Provide information on your system, version of perl and module versions. The following program will generate everything that is required. Put this information in your bug report.

    #!/usr/bin/perl -w

    print "\n    Perl version   : $]";
    print "\n    OS name        : $^O";
    print "\n    Module versions: (not all are required)\n";

    my @modules = qw(
                      Spreadsheet::ParseExcel
                      Scalar::Util
                      Unicode::Map
                      Spreadsheet::WriteExcel
                      Parse::RecDescent
                      File::Temp
                      OLE::Storage_Lite
                      IO::Stringy
                    );

    for my $module (@modules) {
        my $version;
        eval "require $module";

        if (not $@) {
            $version = $module->VERSION;
            $version = '(unknown)' if not defined $version;
        }
        else {
            $version = '(not installed)';
        }

        printf "%21s%-24s\t%s\n", "", $module, $version;
    }

    __END__

3) Upgrade to the latest version of Spreadsheet::ParseExcel (or at least test on a system with an upgraded version). The issue you are reporting may already have been fixed.

4) Create a small example program that demonstrates your problem. The program should be as small as possible. A few lines of codes are worth tens of lines of text when trying to describe a bug.

5) Supply an Excel file that demonstrates the problem. This is very important. If the file is big, or contains confidential information, try to reduce it down to the smallest Excel file that represents the issue. If you don't wish to post a file here then send it to me directly: jmcnamara@cpan.org

6) Say if the test file was created by Excel, OpenOffice, Gnumeric or something else. Say which version of that application you used.

7) If you are submitting a patch you should check with the maintainer whether the issue has already been patched or if a fix is in the works. Patches should be accompanied by test cases.

Asking a question

If you would like to ask a more general question there is the Spreadsheet::ParseExcel Google Group.

Report information
The Basics
Id:
39203
Status:
open
Priority:
Low/Low

People
Owner:
Nobody in particular
Requestors:
Franz.Fasching [...] gmail.com
Cc:
AdminCc:



Subject: Patch against v0.33 enabling Excel formula parsing and evaluation
(This is *not* a bug report) I put together a small patch against Spreadsheet::ParseExcel v0.33 enabling formula parsing and evaluation. Instructions apply for an installed v0.33: - Make a backup of ParseExcel.pm in case something goes wrong $ cp ParseExcel.pm ParseExcel.pm.orig - Patch Spreadsheet/ParseExcel.pm with the attached diff: $ patch ParseExcel.pm <ParseExcel.pm.diff - Put the attached Formula.pm into the Spreadsheet directory - Test with the attached "spetest formulatest.xls" See Formula.html generated from the module's POD for more information. Use at your own risk, no warranties express or implied. Have fun! -Franz
Subject: spetest

Message body not shown because it is not plain text.

Subject: Formula.pm

Message body is not shown because it is too large.

Subject: formulatest.xls

Message body not shown because it is not plain text.

Subject: ParseExcel.pm.diff
555a556,560 > #FFbegin - additional code for formula BIFF parsing > my ($formula_length) = unpack("v", substr($sWk, 20, 2)); > my $formula_hexstring = substr($sWk, 22, $formula_length); > #FFend > 565c570 < Kind => 'Formulra Bool', --- > Kind => 'Formula Bool', # FFcorr 'Formulra' 569a575 > Formula => $formula_hexstring, #FFadd 574c580,596 < else { # Result (Reserve Only) --- > #FFbeg - string formula > elsif ( $iKind==0 ) { > _NewCell ( > $oBook, $iR, $iC, > Kind => 'Formula String', > Val => '', # FFrem - string follows formula tokens! > FormatNo=> $iF, > Format => $oBook->{Format}[$iF], > Numeric => 0, > Formula => $formula_hexstring, #FFadd > Code => undef, #FFrem - set later by _subString() > Book => $oBook, > ); > $oBook->{_PrevPos} = [$iR, $iC, $iF]; > } > #FFend > else { # Result (Reserve Only) 586a609 > Formula => $formula_hexstring, #FFadd 621a645,654 > #FFbeg > if ( defined(my $cell = > $oBook->{Worksheet}[$oBook->{_CurSheet}]->{Cells}[$iR][$iC]) > ) { > # cell already set (probably by a previous string formula) > # just augment with string value > $cell->{Val} = $cell->{_Value} = $sTxt; > $cell->{Code} = $sCode; > } else { > #FFend 631a665 > } #FFadd 1740a1775 > Formula => $rhKey{Formula} || '', #FFadd 1754a1790,1796 > #FFbeg - add formula parsing code > if ( $oCell->{Formula} ) { > push @{$oBook->{Formulae}}, [$oBook->{_CurSheet}, $iR, $iC]; > $oCell->parse_formula($oBook, $oBook->{_CurSheet}, $iR, $iC); > } > #FFend >
Subject: Formula.html
Spreadsheet::ParseExcel::Formula


NAME

Spreadsheet::ParseExcel::Formula - extension of Spreadsheet::ParseExcel to handle parsing and evaluation of Excel formulas


SYNOPSIS

NOTE: Please read the section LIMITATIONS before using this module to make sure it suits your purpose!

    # use with or without SaveParser extension
    use Spreadsheet::ParseExcel;
    use Spreadsheet::ParseExcel::SaveParser::Workbook;
    use Spreadsheet::ParseExcel::Formula;
    # load and parse Excel file including formulas
    my $xls = Spreadsheet::ParseExcel::SaveParser::Workbook->Parse('test.xls');
    # set formula evaluation iteration limit and/or epsilon
    # (optional; only needed for self-referential formula structures)
    $xls->set_iteration_limit(10);      # default: 10
    $xls->set_epsilon(1e-6);            # default: 1e-6
    # set and change cell values as you like (optional)
    # this sets cell A1 of the first worksheet to the numerical value 17
    $xls->{Worksheet}->[0]->{Cells}->[0]->[0]->{Val} = 17;
    # evaluate the formulas in the Excel workbook
    $xls->evaluate();
    # retrieve and print formula cell results by accessing the "Val" member of
    # a Cell object (it is assumed that cell A2 contains a formula referencing
    # cell A1)
    print 'Cell A2 value: ',
          $xls->{Worksheet}->[0]->{Cells}->[0]->[1]->{Val}, "\n";
    # save the workbook to a new excel file using SaveParser's SaveAs method
    # Note: currently formulas are not saved.
    $xls->SaveAs('test1.xls');


DESCRIPTION

You have already read the section LIMITATIONS, haven't you?

Spreadsheet::ParseExcel::Formula can be used to enable formula parsing and evaluation in Excel 2003/97/XP files. The internal binary representation of Excel formulas (see INTERNALS) is parsed on parsing the excel file with the Parse methods of either Spreadsheet::ParseExcel::Workbook or Spreadsheet::ParseExcel::SaveParser::Workbook.

This is achieved by extending the Spreadsheet::ParseExcel::Workbook and Spreadsheet::ParseExcel::Cell classes only, therefore this piece of code may be considered a pseudo-module, as it neither implements a classs, nor implements or uses the namespace of Spreadsheet::ParseExcel::Formula.

There are currently strict limitations on the number and use of Excel functions and formula syntax implemented (see LIMITATIONS), but you are encouraged to extend and improve the functions and syntax recognized.

Additional Spreadsheet::ParseExcel::Workbook methods

(In the following, $xls denotes a valid Spreadsheet::ParseExcel::Workbook object).

$xls->evaluate()

Evaluates all formulas within the workbook object until either the current iteration limit is exceeded, or the global error of all formula cells is less then the current epsilon limit.

Returns false (undef) if the iteration limit has been exceeded, or true (1) if the workbook evaluated successfully.

NOTE: A true return value does not necessarily indicate, that your workbook/worksheet is free of cell errors. As already explained, cell errors are handled as strings and compared as such, meaning that if this string error values compare equal on successive iterations, the cell is considered stable and evaluation has been successful. OTOH, if false is returned, this does not necessarily mean that your workbook/worksheet evaluated erroneously, since this depends on the functions and self-referential formula structures used within the workbook/worksheet.

In general, evaluate() is the only workbook method you need for formula evaluation. You do not need any of the methods described below, unless you have self-referential formula structures within your Excel file, and want fine-grained control over formula evaluation.

$xls->get_iteration_limit()

Retrieves the current iteration limit (default: 10) for formula evaluation. Returns the current iteration limit (scalar, number).

$xls->set_iteration_limit($num)

Sets the current iteration limit for formula evaluation to $num. Returns the new iteration limit (scalar, number).

$xls->get_epsilon()

Retrieves the current epsilon limit (default: 1e-6) for formula evaluation. Returns the current epsilon limit (scalar, number).

$xls->set_epsilon($num)

Sets the current epsilon limit for formula evaluation to $num. Returns the new epsilon limit (scalar, number).

Additional Spreadsheet::ParseExcel::Cell methods

(In the following, $cell denotes a valid Spreadsheet::ParseExcel::Cell object).

$cell->evaluate()

Evaluates a single cell containing a formula, sets the cell value to and returns the evaluation result.

Note that this method should not be directly invoked, as this is done by the evaluate() method of the Spreadsheet::ParseExcel::Workbook class. The only meaningful purpose is when evaluation of a whole workbook is too time-consuming, and evaluation of a single formula cell is sufficient for a particular type of application.


INTERNALS

This is in addition to the section LIMITATIONS, which you should have definitely read by now!

This module hooks itself into the parsing process of Spreadsheet::ParseExcel, and parses the binary formula string of Excel into a RPN (Reverse Polish Notation, see http://en.wikipedia.org/wiki/Reverse_Polish_Notation) parse sequence (basically a Perl array of formula tokens).

During evaluation, this RPN parse sequence of the formula is interpreted for each formula cell facilitated by a stack machine (see http://en.wikipedia.org/wiki/Stack_machine), where each token or formula function consumes a number of arguments from the stack, and pushes its result back onto the stack. The final result of a cell is then the top of stack, which should then contain only this one last entry.

Since in Excel self-referential (see http://en.wikipedia.org/wiki/Self-referential) formulas are allowed, a worksheet/workbook needs to be iteratively (see http://en.wikipedia.org/wiki/Iteration) evaluated, until all values (hopefully) stabilize onto a final formula result.

The question as what ``stabilize'' means is answered by an epsilon range (see http://en.wikipedia.org/wiki/Limit_(mathematics)), against which the difference of the current and previous values of a cell are compared. If the absolute value of this difference is smaller than this epsilon, the cell is considered ``stable'', otherwise the evaluation process needs another iteration.

Since there are cases, where a self-referential formula complex may not stabilize onto a final value (e.g. when a RAND() function is involved), a limit needs to be placed on the maximum number of iterations.

Both epsilon and the iteration limit may be queried and set using corresponding accessors (see DESCRIPTION).


LIMITATIONS

  • Only Excel 2003/97/XP formulas are parsed correctly (this is the so-called BIFF8 format). Trying to parse files produced with other versions may in the best case produce erroneous and unpredictable results.

  • Only a small but useful subset of possible formula syntax is implemented. Currently unimplemented formula features and constructs include:

    • Array constants such as {1, 2}.

    • Cell range intersections (the space operator).

    • Cell range lists/unions (the comma operator).

    • Defined names (variables), i.e. named cells or cell ranges.

    • Cell ranges using defined names (the colon operator with defined names), e.g. namedcell:B2. NOTE: Not to be confused with regular cell ranges like A1:B2; these are implemented and should work as expected.

    • All types of reference subexpressions (constant, reference, deleted, incomplete, etc.) used for encapsulation of the cell range and list operators.

    • 3D cell references and 3D cell range references, i.e. cross-worksheet references of the form "OtherWorksheet"!A1. This means the formulas may only reference cells within the same worksheet.

    • All types of deleted cell references (2D, 3D, relative, etc.), as these indicate an erroneous formula. It is assumed, that the worksheet to be evaluated is debugged and works correctly within Excel itself.

    • Matrix formulas.

    • Multiple operation tables.

    • Natural language references.

    • The CHOOSE function control.

    • Assignment in macro sheets.

  • Only a small but useful subset (about one third) of possible functions useable in formulas is implemented. Currently implemented functions are:

    COUNT, IF, ISNA, ISERROR, SUM, AVERAGE, MIN, MAX, NA, DOLLAR, FIXED, SIN, COS, TAN, ATAN, PI, SQRT, EXP, LN, LOG10, ABS, INT, SIGN, ROUND, REPT, MID, LEN, VALUE, TRUE, FALSE, AND, OR, NOT, MOD, VAR, RAND, ATAN2, ASIN, ACOS, LOG, CHAR, LOWER, UPPER, PROPER, LEFT, RIGHT, EXACT, TRIM, REPLACE, SUBSTITUTE, CODE, FIND, ISERR, ISTEXT, ISNUMBER, ISBLANK, T, N, CLEAN, TRUNC, USDOLLAR, ROUNDUP, ROUNDDOWN, MEDIAN, SUMPRODUCT, SINH, COSH, TANH, ASINH, ACOSH, ATANH, EVEN, FLOOR, CEILING, ODD, CONCATENATE, POWER, RADIANS, DEGREES, SUMIF, COUNTIF

  • Boolean values are encoded as integers 0 and 1 as in Perl.

  • There is no such thing as an error type or object. Errors are implemented as simple strings beginning with # and ending with !, like e.g. '#N/A!'.

  • All this means that even those formulas are implemented, you might get different, if not completely erroneous results out of evaluating your particular Excel files, especially if calculations on edge cases of a particular function are involved, or the evaluation of a particular nested function results in an error. YMMV, you have been warned!


TODO

  • A lot! You are encouraged to help improving and extending formula evaluation within Spreadsheet::ParseExcel!

  • Syntactical improvements: Parsing and evaluating currently unrecognized tokens such as constant arrays or 3D cell references.

  • Functional improvements: Extend the number of implemented functions, especially with Date&Time, and statistical functions.

  • Extensive testing: Write comprehensive test cases for testing all edge cases and boundary conditions of the implmemented functions, and improve error handling on formula evaluation errors.

  • Wishlist 1: Enable formula modification within Perl. This involves parsing the ASCII representation of Excel formulas, possibly in all languages supported by Excel, and storing it back in the internal RPN parse sequence.

  • Wishlist 2: Enable Spreadsheet::WriteExcel to write back the internally stored RPN parse sequence of formulas into the resulting Excel file. This involves reverting the process of binary token parsing, i.e. converting the RPN parse sequence into its corresponing binary representation.


AUTHOR

Franz Fasching (franz dot fasching at gmail dot com).


COPYRIGHT

Copyright (c) 2008 Franz Fasching.

All Rights Reserved. This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself as specified in the Perl README file, i.e. the ``Artistic License'' or the ``GNU General Public License (GPL)''.


SEE ALSO

The Spreadsheet::ParseExcel, Spreadsheet::ParseExcel::SaveParser, and Spreadsheet::WriteExcel modules.

OpenOffice.org has made the specification of the Excel file format publicly available (see http://sc.openoffice.org/excelfileformat.pdf), which has recently been made available also by Microsoft.


ACKNOWLEDGEMENTS

  • Kawai Takanori, and Gabor Szabo for their impressive Spreadsheet::ParseExcel module.

  • John McNamara for his excellent Spreadsheet::WriteExcel module.

  • Dr. Claus Fischer (TXware GmbH), who enabled me to write this module as part of a client project, and make it publicly available under the PERL Artistic License and the GPL.

On Fri Sep 12 04:38:55 2008, ffasching wrote:
Show quoted text
> (This is *not* a bug report) > > I put together a small patch against Spreadsheet::ParseExcel v0.33 > enabling formula parsing and evaluation.
Hi, That is certainly interesting. I'm guessing that this was necessitated by the fact that Spreadsheet::WriteExcel doesn't write the formula value with the formula. Is that correct? Or perhaps you had another requirement. A related project that I would be interested in initiating would be a formula deparser, i.e., something to take a packed ptg expression and convert it to its textual expression (1E 01 00 1E 02 00 1E 03 00 05 03 converted to 1+2*3). This would allow SaveParser to effectively rewrite formulas to WriteExcel. John. --


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.