Subject:  Patch against v0.33 enabling Excel formula parsing and evaluation 
Subject:  spetest 
Subject:  Formula.pm 
Subject:  formulatest.xls 
Subject:  ParseExcel.pm.diff 
Subject:  Formula.html 
 NAME
 SYNOPSIS
 DESCRIPTION
 Additional Spreadsheet::ParseExcel::Workbook methods
 Additional Spreadsheet::ParseExcel::Cell methods
 INTERNALS
 LIMITATIONS
 TODO
 AUTHOR
 COPYRIGHT
 SEE ALSO
 ACKNOWLEDGEMENTS
NAME
Spreadsheet::ParseExcel::Formula  extension of Spreadsheet::ParseExcel to handle parsing and evaluation of Excel formulas
SYNOPSIS
NOTE: Please read the section LIMITATIONS before using this module to make sure it suits your purpose!
# use with or without SaveParser extension use Spreadsheet::ParseExcel; use Spreadsheet::ParseExcel::SaveParser::Workbook; use Spreadsheet::ParseExcel::Formula;
# load and parse Excel file including formulas my $xls = Spreadsheet::ParseExcel::SaveParser::Workbook>Parse('test.xls');
# set formula evaluation iteration limit and/or epsilon # (optional; only needed for selfreferential formula structures) $xls>set_iteration_limit(10); # default: 10 $xls>set_epsilon(1e6); # default: 1e6
# set and change cell values as you like (optional) # this sets cell A1 of the first worksheet to the numerical value 17 $xls>{Worksheet}>[0]>{Cells}>[0]>[0]>{Val} = 17;
# evaluate the formulas in the Excel workbook $xls>evaluate();
# retrieve and print formula cell results by accessing the "Val" member of # a Cell object (it is assumed that cell A2 contains a formula referencing # cell A1) print 'Cell A2 value: ', $xls>{Worksheet}>[0]>{Cells}>[0]>[1]>{Val}, "\n";
# save the workbook to a new excel file using SaveParser's SaveAs method # Note: currently formulas are not saved. $xls>SaveAs('test1.xls');
DESCRIPTION
You have already read the section LIMITATIONS, haven't you?
Spreadsheet::ParseExcel::Formula can be used to enable formula parsing and evaluation in Excel 2003/97/XP files. The internal binary representation of Excel formulas (see INTERNALS) is parsed on parsing the excel file with the Parse methods of either Spreadsheet::ParseExcel::Workbook or Spreadsheet::ParseExcel::SaveParser::Workbook.
This is achieved by extending the Spreadsheet::ParseExcel::Workbook and Spreadsheet::ParseExcel::Cell classes only, therefore this piece of code may be considered a pseudomodule, as it neither implements a classs, nor implements or uses the namespace of Spreadsheet::ParseExcel::Formula.
There are currently strict limitations on the number and use of Excel functions and formula syntax implemented (see LIMITATIONS), but you are encouraged to extend and improve the functions and syntax recognized.
Additional Spreadsheet::ParseExcel::Workbook methods
(In the following, $xls denotes a valid Spreadsheet::ParseExcel::Workbook object).
 $xls>evaluate()

Evaluates all formulas within the workbook object until either the current iteration limit is exceeded, or the global error of all formula cells is less then the current epsilon limit.
Returns false (undef) if the iteration limit has been exceeded, or true (1) if the workbook evaluated successfully.
NOTE: A true return value does not necessarily indicate, that your workbook/worksheet is free of cell errors. As already explained, cell errors are handled as strings and compared as such, meaning that if this string error values compare equal on successive iterations, the cell is considered stable and evaluation has been successful. OTOH, if false is returned, this does not necessarily mean that your workbook/worksheet evaluated erroneously, since this depends on the functions and selfreferential formula structures used within the workbook/worksheet.
In general, evaluate() is the only workbook method you need for formula evaluation. You do not need any of the methods described below, unless you have selfreferential formula structures within your Excel file, and want finegrained control over formula evaluation.
 $xls>get_iteration_limit()

Retrieves the current iteration limit (default: 10) for formula evaluation. Returns the current iteration limit (scalar, number).
 $xls>set_iteration_limit($num)

Sets the current iteration limit for formula evaluation to $num. Returns the new iteration limit (scalar, number).
 $xls>get_epsilon()

Retrieves the current epsilon limit (default: 1e6) for formula evaluation. Returns the current epsilon limit (scalar, number).
 $xls>set_epsilon($num)

Sets the current epsilon limit for formula evaluation to $num. Returns the new epsilon limit (scalar, number).
Additional Spreadsheet::ParseExcel::Cell methods
(In the following, $cell denotes a valid Spreadsheet::ParseExcel::Cell object).
 $cell>evaluate()

Evaluates a single cell containing a formula, sets the cell value to and returns the evaluation result.
Note that this method should not be directly invoked, as this is done by the evaluate() method of the Spreadsheet::ParseExcel::Workbook class. The only meaningful purpose is when evaluation of a whole workbook is too timeconsuming, and evaluation of a single formula cell is sufficient for a particular type of application.
INTERNALS
This is in addition to the section LIMITATIONS, which you should have definitely read by now!
This module hooks itself into the parsing process of Spreadsheet::ParseExcel, and parses the binary formula string of Excel into a RPN (Reverse Polish Notation, see http://en.wikipedia.org/wiki/Reverse_Polish_Notation) parse sequence (basically a Perl array of formula tokens).
During evaluation, this RPN parse sequence of the formula is interpreted for each formula cell facilitated by a stack machine (see http://en.wikipedia.org/wiki/Stack_machine), where each token or formula function consumes a number of arguments from the stack, and pushes its result back onto the stack. The final result of a cell is then the top of stack, which should then contain only this one last entry.
Since in Excel selfreferential (see http://en.wikipedia.org/wiki/Selfreferential) formulas are allowed, a worksheet/workbook needs to be iteratively (see http://en.wikipedia.org/wiki/Iteration) evaluated, until all values (hopefully) stabilize onto a final formula result.
The question as what ``stabilize'' means is answered by an epsilon range (see http://en.wikipedia.org/wiki/Limit_(mathematics)), against which the difference of the current and previous values of a cell are compared. If the absolute value of this difference is smaller than this epsilon, the cell is considered ``stable'', otherwise the evaluation process needs another iteration.
Since there are cases, where a selfreferential formula complex may not stabilize onto a final value (e.g. when a RAND() function is involved), a limit needs to be placed on the maximum number of iterations.
Both epsilon and the iteration limit may be queried and set using corresponding accessors (see DESCRIPTION).
LIMITATIONS

Only Excel 2003/97/XP formulas are parsed correctly (this is the socalled BIFF8 format). Trying to parse files produced with other versions may in the best case produce erroneous and unpredictable results.

Only a small but useful subset of possible formula syntax is implemented. Currently unimplemented formula features and constructs include:

Array constants such as {1, 2}.

Cell range intersections (the space operator).

Cell range lists/unions (the comma operator).

Defined names (variables), i.e. named cells or cell ranges.

Cell ranges using defined names (the colon operator with defined names), e.g. namedcell:B2. NOTE: Not to be confused with regular cell ranges like A1:B2; these are implemented and should work as expected.

All types of reference subexpressions (constant, reference, deleted, incomplete, etc.) used for encapsulation of the cell range and list operators.

3D cell references and 3D cell range references, i.e. crossworksheet references of the form "OtherWorksheet"!A1. This means the formulas may only reference cells within the same worksheet.

All types of deleted cell references (2D, 3D, relative, etc.), as these indicate an erroneous formula. It is assumed, that the worksheet to be evaluated is debugged and works correctly within Excel itself.

Matrix formulas.

Multiple operation tables.

Natural language references.

The CHOOSE function control.

Assignment in macro sheets.


Only a small but useful subset (about one third) of possible functions useable in formulas is implemented. Currently implemented functions are:
COUNT, IF, ISNA, ISERROR, SUM, AVERAGE, MIN, MAX, NA, DOLLAR, FIXED, SIN, COS, TAN, ATAN, PI, SQRT, EXP, LN, LOG10, ABS, INT, SIGN, ROUND, REPT, MID, LEN, VALUE, TRUE, FALSE, AND, OR, NOT, MOD, VAR, RAND, ATAN2, ASIN, ACOS, LOG, CHAR, LOWER, UPPER, PROPER, LEFT, RIGHT, EXACT, TRIM, REPLACE, SUBSTITUTE, CODE, FIND, ISERR, ISTEXT, ISNUMBER, ISBLANK, T, N, CLEAN, TRUNC, USDOLLAR, ROUNDUP, ROUNDDOWN, MEDIAN, SUMPRODUCT, SINH, COSH, TANH, ASINH, ACOSH, ATANH, EVEN, FLOOR, CEILING, ODD, CONCATENATE, POWER, RADIANS, DEGREES, SUMIF, COUNTIF

Boolean values are encoded as integers 0 and 1 as in Perl.

There is no such thing as an error type or object. Errors are implemented as simple strings beginning with # and ending with !, like e.g. '#N/A!'.

All this means that even those formulas are implemented, you might get different, if not completely erroneous results out of evaluating your particular Excel files, especially if calculations on edge cases of a particular function are involved, or the evaluation of a particular nested function results in an error. YMMV, you have been warned!
TODO

A lot! You are encouraged to help improving and extending formula evaluation within Spreadsheet::ParseExcel!

Syntactical improvements: Parsing and evaluating currently unrecognized tokens such as constant arrays or 3D cell references.

Functional improvements: Extend the number of implemented functions, especially with Date&Time, and statistical functions.

Extensive testing: Write comprehensive test cases for testing all edge cases and boundary conditions of the implmemented functions, and improve error handling on formula evaluation errors.

Wishlist 1: Enable formula modification within Perl. This involves parsing the ASCII representation of Excel formulas, possibly in all languages supported by Excel, and storing it back in the internal RPN parse sequence.

Wishlist 2: Enable Spreadsheet::WriteExcel to write back the internally stored RPN parse sequence of formulas into the resulting Excel file. This involves reverting the process of binary token parsing, i.e. converting the RPN parse sequence into its corresponing binary representation.
AUTHOR
Franz Fasching (franz dot fasching at gmail dot com).
COPYRIGHT
Copyright (c) 2008 Franz Fasching.
All Rights Reserved. This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself as specified in the Perl README file, i.e. the ``Artistic License'' or the ``GNU General Public License (GPL)''.
SEE ALSO
The Spreadsheet::ParseExcel, Spreadsheet::ParseExcel::SaveParser, and Spreadsheet::WriteExcel modules.
OpenOffice.org has made the specification of the Excel file format publicly available (see http://sc.openoffice.org/excelfileformat.pdf), which has recently been made available also by Microsoft.
ACKNOWLEDGEMENTS

Kawai Takanori, and Gabor Szabo for their impressive Spreadsheet::ParseExcel module.

John McNamara for his excellent Spreadsheet::WriteExcel module.

Dr. Claus Fischer (TXware GmbH), who enabled me to write this module as part of a client project, and make it publicly available under the PERL Artistic License and the GPL.