This queue is for tickets about the Statistics-Lite CPAN distribution.

Report information
The Basics
Id:
22697
Status:
resolved
Priority:
Low/Low

People
Owner:
brianiacus [...] yahoo.com
Requestors:
cpan [...] clotho.com
Cc:
AdminCc:



Subject: Tests fail
The tests are failing. However, because you are using Test.pm instead of Test::More, CPAN.pm does not detect the failures. I recommend that the tests be corrected (I believe the code is right and the tests are wrong) and that the module be updated to use Test::More, which is actively maintained. Below is a snippet from the end of the tests. This is Perl 5.8.6 on MacOSX 10.4. -- Chris ok 17 ok 18 not ok 19 # Test 19 got: "0.666666666666667" (test.pl at line 49) # Expected: "1" # test.pl line 49 is: ok($stats{variance},1); not ok 20 # Test 20 got: "0.816496580927726" (test.pl at line 50) # Expected: "1" # test.pl line 50 is: ok($stats{stddev},1);
From: Alexandr Ciornii <alexchorny@gmail.com>
On Oct 30 09:57:09 2006, CLOTHO wrote: Made following changes: - tests switched to Test::More - Fixed bug in 'variance' - Fixed tests for 'variance' To author: You may simply upload attached distribution to PAUSE. ------- Alexandr Ciornii, http://chorny.net
--- Lite.pm.dist Sun Mar 26 19:48:49 2006 +++ Lite.pm Tue Jan 23 14:18:12 2007 @@ -87,7 +87,7 @@ return unless @_; return 0 unless @_ > 1; my $mean= mean @_; - return (sum map { ($_ - $mean)**2 } @_) / $#_; + return (sum map { ($_ - $mean)**2 } @_) / scalar(@_); } sub stddev
use strict; use Test::More tests => 21; use_ok('Statistics::Lite', ':all'); is(min(1,2,3),1,'min'); is(max(1,2,3),3,'max'); is(range(1,2,3),2,'range'); is(sum(1,2,3),6,'sum'); is(count(1,2,3),3); is(mean(1,2,3),2); is(median(1,2,3),2); is(mode(1,2,3),2); ok(abs(variance(1,2,3)-0.66666666666666)<0.0000000001,'variance'); ok(abs(stddev(1,2,3)-0.81649658092772)<0.0000000001,'stddev'); my %stats= statshash(1,2,3); is($stats{min},1); is($stats{max},3); is($stats{range},2); is($stats{sum},6); is($stats{count},3); is($stats{mean},2); is($stats{median},2); is($stats{mode},2); ok(abs($stats{variance}-0.66666666666666)<0.0000000001,'variance'); ok(abs($stats{stddev}-0.81649658092772)<0.0000000001,'stddev');

Message body not shown because it is not plain text.

package Statistics::Lite; use strict; use vars qw($VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS); require Exporter; $VERSION = '2.0'; @ISA = qw(Exporter); @EXPORT = (); @EXPORT_OK = qw(min max range sum count mean median mode variance stddev statshash statsinfo); %EXPORT_TAGS= ( all => [ @EXPORT_OK ], funcs => [qw<min max range sum count mean median mode variance stddev>], stats => [qw<statshash statsinfo>], ); sub count { return scalar @_; } sub min { return unless @_; return $_[0] unless @_ > 1; my $min= shift; foreach(@_) { $min= $_ if $_ < $min; } return $min; } sub max { return unless @_; return $_[0] unless @_ > 1; my $max= shift; foreach(@_) { $max= $_ if $_ > $max; } return $max; } sub range { return unless @_; return 0 unless @_ > 1; return abs($_[1]-$_[0]) unless @_ > 2; my $min= shift; my $max= $min; foreach(@_) { $min= $_ if $_ < $min; $max= $_ if $_ > $max; } return $max - $min; } sub sum { return unless @_; return $_[0] unless @_ > 1; my $sum; foreach(@_) { $sum+= $_; } return $sum; } sub mean { return unless @_; return $_[0] unless @_ > 1; return sum(@_)/scalar(@_); } sub median { return unless @_; return $_[0] unless @_ > 1; @_= sort{$a<=>$b}@_; return $_[$#_/2] if @_&1; my $mid= @_/2; return ($_[$mid-1]+$_[$mid])/2; } sub mode { return unless @_; return $_[0] unless @_ > 1; my %count; foreach(@_) { $count{$_}++; } my $maxhits= max(values %count); foreach(keys %count) { delete $count{$_} unless $count{$_} == $maxhits; } return mean(keys %count); } sub variance { return unless @_; return 0 unless @_ > 1; my $mean= mean @_; return (sum map { ($_ - $mean)**2 } @_) / scalar(@_); } sub stddev { return unless @_; return 0 unless @_ > 1; return sqrt variance @_; } sub statshash { return unless @_; return ( count => 1, min => $_[0], max => $_[0], range => 0, sum => $_[0], mean => $_[0], median => $_[0], mode => $_[0], variance => 0, stddev => 0, ) unless @_ > 1; my $count= scalar(@_); @_= sort{$a<=>$b}@_; my $median; if(@_&1) { $median= $_[$#_/2]; } else { my $mid= @_/2; $median= ($_[$mid-1]+$_[$mid])/2; } my $sum= 0; my %count; foreach(@_) { $sum+= $_; $count{$_}++; } my $mean= $sum/$count; my $variance= mean map { ($_ - $mean)**2 } @_; my $maxhits= max(values %count); foreach(keys %count) { delete $count{$_} unless $count{$_} == $maxhits; } return ( count => $count, min => $_[0], max => $_[-1], range => ($_[-1] - $_[0]), sum => $sum, mean => $mean, median => $median, mode => mean(keys %count), variance => $variance, stddev => sqrt($variance), ); } sub statsinfo { my %stats= statshash(@_); return <<"."; min = $stats{min} max = $stats{max} range = $stats{range} sum = $stats{sum} count = $stats{count} mean = $stats{mean} median = $stats{median} mode = $stats{mode} variance = $stats{variance} stddev = $stats{stddev} . } 1; __END__ =head1 NAME Statistics::Lite - Small stats stuff. =head1 SYNOPSIS use Statistics::Lite qw(:all); $min= min @data; $mean= mean @data; %data= statshash @data; print "sum= $data{sum} stddev= $data{stddev}\n"; print statsinfo(@data); =head1 DESCRIPTION This module is a lightweight, functional alternative to larger, more complete, object-oriented statistics packages. As such, it is likely to be better suited, in general, to smaller data sets. This is also a module for dilettantes. When you just want something to give some very basic, high-school-level statistical values, without having to set up and populate an object first, this module may be useful. =over 6 =head2 NOTE This version now uses unbiased estimators (previous versions used biased estimators) for variance and standard deviation. To get the same biased C<stddev()> and C<variance()> available in previous versions, simply add a zero to the data set: $stddev_biased= stddev 0, @data; =back =head1 FUNCTIONS =over 4 =item C<min(@data)>, C<max(@data)>, C<range(@data)>, C<sum(@data)>, C<count(@data)> Return the minimum value, maximum value, range (max - min), sum, or count of values in C<@data>. (Count simply returns C<scalar(@data)>.) =item C<mean(@data)>, C<median(@data)>, C<mode(@data)> Calculates the mean, median, or mode average of the values in C<@data>. (In the event of ties in the mode average, their mean is returned.) =item C<variance(@data)>, C<stddev(@data)> Return the standard deviation or variance of C<@data>. =item C<statshash(@data)> Returns a hash whose keys are the names of all the functions listed above, with the corresponding values, calculated for the data set. =item C<statsinfo(@data)> Returns a string describing the data set, using the values detailed above. =back =head2 Import Tags The C<:all> import tag imports all functions from this module into the current namespace (use with caution). To import the individual statistical funcitons, use the import tag C<:funcs>; use C<:stats> to import C<statshash(@data)> and C<statsinfo(@data)>. =head1 AUTHOR Brian Lalonde E<lt>brian@webcoder.infoE<gt> =head1 SEE ALSO perl(1). =cut
Show quoted text
> --- Lite.pm.dist Sun Mar 26 19:48:49 2006 > +++ Lite.pm Tue Jan 23 14:18:12 2007 > @@ -87,7 +87,7 @@ > return unless @_; > return 0 unless @_ > 1; > my $mean= mean @_; > - return (sum map { ($_ - $mean)**2 } @_) / $#_; > + return (sum map { ($_ - $mean)**2 } @_) / scalar(@_); > }
This actually depends on whether you want your variance to include sampling as a dependent quantity (sampling variance) or not. That determines whether the denominator is N or N-1. In the real world, you almost always want N-1. N is only appropriate when you can be sure that your sample is representative of the entire population. So, I recommend that this particular part of the patch be rejected. Alternatively, one could include both implementations of variance (N and N-1), but that kind of defeats the ::Lite part of the module. More info: http://en.wikipedia.org/wiki/Standard_deviation#Estimating_population_standard_deviation_from_sample_standard_deviation Chris
From: sabol@alderaan.gsfc.nasa.gov
On Tue Jan 23 09:55:54 2007, CDOLAN wrote:
Show quoted text
> This actually depends on whether you want your variance to include > sampling as a dependent quantity (sampling variance) or not. That > determines whether the denominator is N or N-1. In the real world, you > almost always want N-1. N is only appropriate when you can be sure that > your sample is representative of the entire population. > > So, I recommend that this particular part of the patch be rejected. > Alternatively, one could include both implementations of variance (N and > N-1), but that kind of defeats the ::Lite part of the module.
I agree with Chris that this patch be rejected. The docs make it clear that the variance and standard deviations computed by Statistics::Lite are the biased (N-1) kind and a trivial workaround for computing the unbiased kind is provided in the documentation. The tests are correct; it is the statshash() code that is wrong. I recommend the following patch: --- Lite.pm.orig Sun Mar 26 11:48:49 2006 +++ Lite.pm Thu Feb 1 03:00:08 2007 @@ -122,7 +122,7 @@ my %count; foreach(@_) { $sum+= $_; $count{$_}++; } my $mean= $sum/$count; - my $variance= mean map { ($_ - $mean)**2 } @_; + my $variance= (sum map { ($_ - $mean)**2 } @_) / $#_; my $maxhits= max(values %count); foreach(keys %count) { delete $count{$_} unless $count{$_} == $maxhits; } Also, I agree that it would be nice if the test suite were updated to use Test::More.
Going to mark this as fixed. If this is still happening, reopen or refile, please.


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.