This queue is for tickets about the Text-CSV_XS CPAN distribution.

Report information
The Basics
Id:
123320
Status:
resolved
Priority:
Low/Low
Queue:

People
Owner:
Nobody in particular
Requestors:
CLemmen [...] excelsiorintegrated.com
Cc:
AdminCc:



Subject: Text::CSV_XS bug w/Mac format files
Date: Wed, 18 Oct 2017 13:56:31 +0000
To: "bug-Text-CSV_XS@rt.cpan.org" <bug-Text-CSV_XS@rt.cpan.org>
From: Charles Stuart Lemmen <CLemmen@excelsiorintegrated.com>

Hi,

 

I believe I’ve found a bug in the Text::CSV_XS package with regards to Mac type files (eol as single carriage return) and handling successive files with one csv object ref. Below are the relevant details.

 

Dist name/ver: Text::CSV_XS-1.32

Perl version: This is perl 5, version 24, subversion 0 (v5.24.0) built for MSWin32-x64-multi-thread

O/S: Windows 10 Home (Major version: 10  Minor Version: 0.15063)

 

Bug details:

 

If you create one Text::CSV_XS handle and use it with two different files (one with a bad header and one with a good header) that use carriage returns only as end-of-line markers and the first has an invalid header, attempting to get the second (good) file’s header will also fail. Skip getting the first file’s header and the second one succeeds. If you attempt to get the first (bad) file’s header but then clear the ‘_AHEAD’ instance var before getting the second (good) file’s header the second will succeed whereas before it did not.

 

There are some things about this first file with the bad header that help to cause the second header call to fail:

 

  1. If it has only one non-header data record and that record does not end with the end-of-line carriage return.
  2. If there are multiple non-header records, they all have proper end-of-line carriage returns but the first non-header data record (record #2) has an empty column (,,) – this empty column can be double quoted or not, doesn’t matter.

 

So it sounds like “leftover” ‘_AHEAD’ data is somehow negatively influencing the handling of other files. Here’s a short code example:

 

    # First make two csv files, one with an empty (dangling) header column, one that's ok.

    # These are both "Mac" format meaning only carriage returns for EOL.

    my $bad_csv_file = 'test_bad_csv.csv';

    my $good_csv_file = 'test_good_csv.csv';

    my $bfh;

    my $gfh;

    if(open($bfh, '>', $bad_csv_file)) {

        print($bfh "col1,col2,col3,\r\"One\",\"\",\"Three\"\r\"Four\",\"Five and a half\",\"Six\"\r");

        close $bfh;

    }

    if(open($gfh, '>', $good_csv_file)) {

        print($gfh "col1,col2,col3\r\"One\",\"Two\",\"Three\"\r");

        close $gfh;

    }

    -e $bad_csv_file or croak “No bad file!\n”;

    -e $good_csv_file or croak “No good file!\n”;

 

    # Init csv ref to handle files.

    my $csv = Text::CSV_XS->new({binary => 1, auto_diag => 1, eol => "\r"});

 

    # Open and use the new files.

    open($bfh, '<', $bad_csv_file) or croak "$!\n";

    open($gfh, '<', $good_csv_file) or croak "$!\n";

   

    # Get the header of the bad file (this will fail).

    my @bad_header;

    eval {

        local $@;

        @bad_header = $csv->header($bfh);

        print "Got bad header ok:\n\n" . Dumper(\@bad_header) . "\n\n";

        1;

    }

    or do {

        print "Failed to get header from bad csv file!\n";

    };

   

    # Get the header of the good file (this will fail too but should not).

    my @good_header;

    eval {

        local $@;

        @good_header = $csv->header($gfh);

        print "Got good header ok:\n\n" . Dumper(\@good_header) . "\n\n";

        1;

    }

    or do {

        print "Failed to get header from good csv file!\n";

    };

   

    close $bfh;

    close $gfh;

 

My current workaround is going to be, before I call header on the next file, to check if ‘_AHEAD’ exists and is not empty and if so clear it (this should be safe if the name ever changes since we check first). If ‘_AHEAD’ is not present then attempt the header call using eval and if it fails, create a new Text::CSV_XS instance (with the same options as the original) and attempt the header call a second time. If the second call fails then we can be sure the second file is broken too.

 

Thanks!

 

Stuart Lemmen

IT Development & Support

Excelsior Integrated LLC

413-394-4340

clemmen@excelsiorintegrated.com

www.excelsiorintegrated.com

Excelsior Integrated SmallMCM 3PL seal vectorMCM2017 logo-small2

 

Image displayed inline above

Image displayed inline above

Image displayed inline above

Subject: Re: [rt.cpan.org #123320] Text::CSV_XS bug w/Mac format files
Date: Wed, 18 Oct 2017 18:09:05 +0200
To: bug-Text-CSV_XS@rt.cpan.org
From: "H.Merijn Brand" <h.m.brand@xs4all.nl>
On Wed, 18 Oct 2017 11:30:54 -0400, "Charles Stuart Lemmen via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote:
Show quoted text
> I believe I've found a bug in the Text::CSV_XS package with regards > to Mac type files (eol as single carriage return) and handling > successive files with one csv object ref. Below are the relevant > details.
Thanks for the report I think this is very much related to ticket #122764 https://rt.cpan.org/Public/Bug/Display.html?id=122764 which has been resolved in 1.32 but got a fix for an additional problem that surfaced when using a BOM in combination with the \r EOL. The fix is applied in the upcoming 1.33. If you too think it is related, could you try $ wget --output-document=Text-CSV_XS-git.tgz \ https://github.com/Tux/Text-CSV_XS/archive/master.tar.gz and see if that fixes this issue too? If not, I will start digging. That will not be quick, as I do not have a Windows development box right here. (It might be related to Windows) -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.27 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/

Message body not shown because it is not plain text.

Subject: RE: [rt.cpan.org #123320] Text::CSV_XS bug w/Mac format files
Date: Wed, 18 Oct 2017 17:52:03 +0000
To: "bug-Text-CSV_XS@rt.cpan.org" <bug-Text-CSV_XS@rt.cpan.org>
From: Charles Stuart Lemmen <CLemmen@excelsiorintegrated.com>
Just a quick update that may be relevant and/or shed some light on this issue. I'm experiencing some odd behavior with my test and your ver. 1.33 of this package (compiled and installed locally) on Linux. Keep in mind the perl binary being used is old: This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi So that may play into this too. If I use the following code (which includes my kludgy fix): ### START ### #!/usr/bin/perl use strict; use warnings; use Carp; BEGIN: { use lib '/home/clemmen/bin/pm/lib/perl5/x86_64-linux-thread-multi'; } use Text::CSV_XS; # First make two csv files, one with an empty (dangling) header column, one that's ok. # These are both "Mac" format meaning only carriage returns for EOL. my $bad_csv_file = 'test_bad_csv.csv'; my $good_csv_file = 'test_good_csv.csv'; my $bfh; my $gfh; if(open($bfh, '>', $bad_csv_file)) { print($bfh "col1,col2,col3,\r\"One\",\"\",\"Three\"\r\"Four\",\"Five and a half\",\"Six\"\r"); close $bfh; } if(open($gfh, '>', $good_csv_file)) { print($gfh "col1,col2,col3\r\"One\",\"Two\",\"Three\"\r"); close $gfh; } -e $bad_csv_file or croak "No bad file!\n"; -e $good_csv_file or croak "No good file!\n"; # Init csv ref to handle files. my $csv = Text::CSV_XS->new({binary => 1, auto_diag => 1, eol => "\r"}); # Open and use the new files. open($bfh, '<', $bad_csv_file) or croak "$!\n"; open($gfh, '<', $good_csv_file) or croak "$!\n"; # Get the header of the bad file (this will fail). my @bad_header; eval { local $@; @bad_header = $csv->header($bfh); print "Got bad header ok:\n\n" . Dumper(\@bad_header) . "\n\n"; 1; } or do { print "Failed to get header from bad csv file!\n"; }; # Get the header of the good file (this will fail too). $csv->{_AHEAD} = ''; my @good_header; eval { local $@; @good_header = $csv->header($gfh); print "Got good header ok:\n\n" . Dumper(\@good_header) . "\n\n"; 1; } or do { print "Failed to get header from good csv file!\n"; }; close $bfh; close $gfh; exit 0; ### END ### this way from the command line I get two errors: $ perl test_csv_bug.pl # CSV_XS ERROR: 1012 - INI - the header contains an empty field @ rec 1 pos 0 Failed to get header from bad csv file! Failed to get header from good csv file! However, if I simply include a debugging package I often use, the second error goes away!: $ perl -MData::Dumper test_csv_bug_clean.pl # CSV_XS ERROR: 1012 - INI - the header contains an empty field @ rec 1 pos 0 Failed to get header from bad csv file! Got good header ok: $VAR1 = [ 'col1', 'col2', 'col3' ]; I'm not at this point sure why including that package fixes the issue. -Stu
Show quoted text
-----Original Message----- From: h.m.brand@xs4all.nl via RT [mailto:bug-Text-CSV_XS@rt.cpan.org] Sent: Wednesday, October 18, 2017 12:09 PM To: Charles Stuart Lemmen <CLemmen@excelsiorintegrated.com> Subject: Re: [rt.cpan.org #123320] Text::CSV_XS bug w/Mac format files <URL: https://rt.cpan.org/Ticket/Display.html?id=123320 > On Wed, 18 Oct 2017 11:30:54 -0400, "Charles Stuart Lemmen via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote:
> I believe I've found a bug in the Text::CSV_XS package with regards to > Mac type files (eol as single carriage return) and handling successive > files with one csv object ref. Below are the relevant details.
Thanks for the report I think this is very much related to ticket #122764 https://rt.cpan.org/Public/Bug/Display.html?id=122764 which has been resolved in 1.32 but got a fix for an additional problem that surfaced when using a BOM in combination with the \r EOL. The fix is applied in the upcoming 1.33. If you too think it is related, could you try $ wget --output-document=Text-CSV_XS-git.tgz \ https://github.com/Tux/Text-CSV_XS/archive/master.tar.gz and see if that fixes this issue too? If not, I will start digging. That will not be quick, as I do not have a Windows development box right here. (It might be related to Windows) -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.27 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Subject: RE: [rt.cpan.org #123320] Text::CSV_XS bug w/Mac format files
Date: Wed, 18 Oct 2017 17:18:28 +0000
To: "bug-Text-CSV_XS@rt.cpan.org" <bug-Text-CSV_XS@rt.cpan.org>
From: Charles Stuart Lemmen <CLemmen@excelsiorintegrated.com>
H, Just installed the .pm in the .tar.gz you linked in my Windows environment and same problem persists. I tried to test that 1.33 package on linux but I am experiencing some odd behavior that seems unrelated to Text::CSV_XS. Nonetheless, there does appear to be a difference with how this potential bug works on Windows vs. Linux vs. ? Sorry I don't have more info for you at this time. -Stu
Show quoted text
-----Original Message----- From: h.m.brand@xs4all.nl via RT [mailto:bug-Text-CSV_XS@rt.cpan.org] Sent: Wednesday, October 18, 2017 12:09 PM To: Charles Stuart Lemmen <CLemmen@excelsiorintegrated.com> Subject: Re: [rt.cpan.org #123320] Text::CSV_XS bug w/Mac format files <URL: https://rt.cpan.org/Ticket/Display.html?id=123320 > On Wed, 18 Oct 2017 11:30:54 -0400, "Charles Stuart Lemmen via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote:
> I believe I've found a bug in the Text::CSV_XS package with regards to > Mac type files (eol as single carriage return) and handling successive > files with one csv object ref. Below are the relevant details.
Thanks for the report I think this is very much related to ticket #122764 https://rt.cpan.org/Public/Bug/Display.html?id=122764 which has been resolved in 1.32 but got a fix for an additional problem that surfaced when using a BOM in combination with the \r EOL. The fix is applied in the upcoming 1.33. If you too think it is related, could you try $ wget --output-document=Text-CSV_XS-git.tgz \ https://github.com/Tux/Text-CSV_XS/archive/master.tar.gz and see if that fixes this issue too? If not, I will start digging. That will not be quick, as I do not have a Windows development box right here. (It might be related to Windows) -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.27 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Subject: Re: [rt.cpan.org #123320] Text::CSV_XS bug w/Mac format files
Date: Thu, 19 Oct 2017 12:58:09 +0200
To: bug-Text-CSV_XS@rt.cpan.org
From: "H.Merijn Brand" <h.m.brand@xs4all.nl>
On Wed, 18 Oct 2017 11:30:54 -0400, "Charles Stuart Lemmen via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote:
Show quoted text
> If you create one Text::CSV_XS handle and use it with two different > files (one with a bad header and one with a good header) that use > carriage returns only as end-of-line markers and the first has an > invalid header, attempting to get the second (good) file's header > will also fail. Skip getting the first file's header and the second > one succeeds. If you attempt to get the first (bad) file's header but > then clear the '_AHEAD' instance var before getting the second (good) > file's header the second will succeed whereas before it did not.
So, I rewrote your test to what you see below and immediately see now what is the cause of the failure ... Before the fix: $ perl -Mblib sandbox/rt123320.pl # CSV_XS ERROR: 1012 - INI - the header contains an empty field @ rec 1 pos 0 Failed to get header from rt123320_bad.csv! # CSV_XS ERROR: 1012 - INI - the header contains an empty field @ rec 2 pos 0 Failed to get header from rt123320_good.csv! After the fix: $ perl -Mblib sandbox/rt123320.pl # CSV_XS ERROR: 1012 - INI - the header contains an empty field @ rec 1 pos 0 Failed to get header from rt123320_bad.csv! Use of uninitialized value in subroutine entry at /data/pro/3gl/CPAN/Text-CSV_XS/blib/lib/Text/CSV_XS.pm line 885, <$gfh> chunk 1. { header_from_good => [ 'col1', 'col2', 'col3' ] } Fetch again to test this. Though I think it is a bad idea to simply re-use $csv after a FAIL in use for one file for another file, it should not fail like this. Thanks again for spotting --8<--- rt123320.pl #!/pro/bin/perl use 5.18.2; use warnings; use Data::Peek; use Text::CSV_XS; # First make two csv files, one with an empty (dangling) header column, one that's ok. # These are both "Mac" format meaning only carriage returns for EOL. my $fn_bad = "rt123320_bad.csv"; my $fn_good = "rt123320_good.csv"; if (open my $bfh, ">", $fn_bad) { print $bfh join "\r" => q{col1,col2,col3,}, q{"One","","Three"}, q{"Four","Five and a half","Six"}, q{}; close $bfh; } if (open my $gfh, ">", $fn_good) { print $gfh join "\r" => q{col1,col2,col3}, q{"One","Two","Three"}, ""; close $gfh; } -e $fn_bad or die "$fn_bad missing!\n"; -e $fn_good or die "$fn_good missing!\n"; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\r", }); open my $bfh, "<", $fn_bad or die "$!\n"; open my $gfh, "<", $fn_good or die "$!\n"; # Get the header of the bad file (this will fail). my @bad_header; eval { local $@; @bad_header = $csv->header ($bfh); DDumper { header_from_bad => \@bad_header }; 1; } or warn "Failed to get header from $fn_bad!\n"; # Get the header of the good file (this will fail too but should not). my @good_header; eval { local $@; @good_header = $csv->header ($gfh); DDumper { header_from_good => \@good_header }; 1; } or print "Failed to get header from $fn_good!\n"; close $bfh; close $gfh; -->8--- -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.27 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/

Message body not shown because it is not plain text.



This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.