This queue is for tickets about the Text-CSV_XS CPAN distribution.

Report information
The Basics
Id:
40507
Status:
resolved
Priority:
Low/Low
Queue:

People
Owner:
Nobody in particular
Requestors:
lespea [...] gmail.com
Cc:
AdminCc:

BugTracker
Severity:
Normal
Broken in:
(no value)
Fixed in:
0.58



Subject: Parsing fails on escaped null byte
In one of the CSVs I have to regularly parse there are multiple lines that show up containing the following: "Audit active: ""TRUE "[nul]","Desired:","Audit active: ""TRUE "[nul]" I am enabling binary and when the line fails to parse I get the following error message (through error_diag): 2023EIQ - QUO character not allowed264 After reading your documentation I believe that the line should be parsed correctly (with the null byte being escaped by a ") but it isn't. Unless by "0 you meant the character 0 and not a null byte... If I remove the "[nul] then the line parses okay. It also works if I remove the preceding " from the null byte. I'm running perl 5.10 on Windows XP SP2. The .57 modules was obtained through ppm.
Subject: Re: [rt.cpan.org #40507] Parsing fails on escaped null byte
Date: Thu, 30 Oct 2008 08:51:46 +0100
To: bug-Text-CSV_XS@rt.cpan.org
From: "H.Merijn Brand" <h.m.brand@xs4all.nl>
Show quoted text
> "Audit active: ""TRUE "[nul]","Desired:","Audit active: ""TRUE "[nul]"
\0 does not have to be escaped. binary => 1 is sufficient.
Show quoted text
> I am enabling binary and when the line fails to parse I get the > following error message (through error_diag): > > 2023EIQ - QUO character not allowed264 > > After reading your documentation I believe that the line should be > parsed correctly (with the null byte being escaped by a ") but it isn't. > Unless by "0 you meant the character 0 and not a null byte...
A field within CSV must be surrounded by double-quotes to contain an embedded double-quote, represented by a pair of consecutive double-quotes. In binary mode you may additionally use the sequence C<"0> for representation of a NULL byte. Correct. "0 is the character 0, not a NULL byte.
Show quoted text
> If I remove the "[nul] then the line parses okay. It also works if I > remove the preceding " from the null byte.
As I can understand the ambiguity in the documentation, and allowing *both* in the escape sequence does not break backward compatability, I have now changed the code to allow these. It will be in the next release.
Show quoted text
> I'm running perl 5.10 on Windows XP SP2. The .57 modules was obtained > through ppm.
-- H.Merijn Brand Amsterdam Perl Mongers http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, SuSE 10.1, 10.2, and 10.3, AIX 5.2, and Cygwin. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Subject: Re: [rt.cpan.org #40507] Parsing fails on escaped null byte
Date: Thu, 30 Oct 2008 09:06:39 +0100
To: bug-Text-CSV_XS@rt.cpan.org
From: "H.Merijn Brand" <h.m.brand@xs4all.nl>
On Thu, 30 Oct 2008 03:52:12 -0400, "h.m.brand@xs4all.nl via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote:
Show quoted text
> Correct. "0 is the character 0, not a NULL byte. >
> > If I remove the "[nul] then the line parses okay. It also works if I > > remove the preceding " from the null byte.
> > As I can understand the ambiguity in the documentation, and allowing > *both* in the escape sequence does not break backward compatability, > I have now changed the code to allow these. It will be in the next > release.
Re-reading the docs, I think it is very very clear that the specs only allow "0, and not "\0, so I will not change it. You can choose to fix the generated data by one of the options, in order of preference: * do not escape the NULL byte * escape the NULL byte as "0 * add allow_loose_quotes => 1 The docs are clear in two places 1. Under SPECIFICATION "A field within CSV must be surrounded by double-quotes to contain an embedded double-quote, represented by a pair of consecutive double-quotes. In binary mode you may additionally use the sequence C<"0> for representation of a NULL byte." 2. Under FUNCTIONS, new (), binary "If this attribute is TRUE, you may use binary characters in quoted fields, including line feeds, carriage returns and NULL bytes. (The latter must be escaped as C<"0>.) By default this feature is off." If you have a clear addition to that I will consider, but IMHO the documentation is not ambiguous here (and my native language isn't English) -- H.Merijn Brand Amsterdam Perl Mongers http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, SuSE 10.1, 10.2, and 10.3, AIX 5.2, and Cygwin. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Subject: Re: [rt.cpan.org #40507] Parsing fails on escaped null byte
Date: Thu, 30 Oct 2008 10:12:40 -0500
To: bug-Text-CSV_XS@rt.cpan.org
From: "Adam Lesperance" <lespea@gmail.com>
Okay.  The output is generated using a third party tool so I will just have to continue to pre-process it before parsing the file using your module.  I look forward to being able to pass an array in instead of only a file handle...

Thank you for your quick response!
~Adam~


On Thu, Oct 30, 2008 at 03:07, h.m.brand@xs4all.nl via RT <bug-Text-CSV_XS@rt.cpan.org> wrote:
Show quoted text
On Thu, 30 Oct 2008 03:52:12 -0400, "h.m.brand@xs4all.nl via RT"
<bug-Text-CSV_XS@rt.cpan.org> wrote:

> Correct. "0 is the character 0, not a NULL byte.
>
> > If I remove the "[nul] then the line parses okay.  It also works if I
> > remove the preceding " from the null byte.
>
> As I can understand the ambiguity in the documentation, and allowing
> *both* in the escape sequence does not break backward compatability,
> I have now changed the code to allow these. It will be in the next
> release.

Re-reading the docs, I think it is very very clear that the specs only
allow "0, and not "\0, so I will not change it.

You can choose to fix the generated data by one of the options, in
order of preference:

 * do not escape the NULL byte
 * escape the NULL byte as "0
 * add allow_loose_quotes => 1

The docs are clear in two places

1. Under SPECIFICATION

 "A field within CSV must be surrounded by double-quotes to contain
  an embedded double-quote, represented by a pair of consecutive
  double-quotes. In binary mode you may additionally use the sequence
  C<"0> for representation of a NULL byte."

2. Under FUNCTIONS, new (), binary

 "If this attribute is TRUE, you may use binary characters in quoted
  fields, including line feeds, carriage returns and NULL bytes. (The
  latter must be escaped as C<"0>.) By default this feature is off."

If you have a clear addition to that I will consider, but IMHO the
documentation is not ambiguous here (and my native language isn't
English)

--
H.Merijn Brand          Amsterdam Perl Mongers  http://amsterdam.pm.org/
using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00,
11.11, 11.23, and 11.31, SuSE 10.1, 10.2, and 10.3, AIX 5.2, and Cygwin.
http://mirrors.develooper.com/hpux/           http://www.test-smoke.org/
http://qa.perl.org      http://www.goldmark.org/jeff/stupid-disclaimers/


Subject: Re: [rt.cpan.org #40507] Parsing fails on escaped null byte
Date: Thu, 30 Oct 2008 16:53:26 +0100
To: bug-Text-CSV_XS@rt.cpan.org
From: "H.Merijn Brand" <h.m.brand@xs4all.nl>
On Thu, 30 Oct 2008 11:13:41 -0400, "Adam via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote:
Show quoted text
> Okay. The output is generated using a third party tool so I will just have > to continue to pre-process it before parsing the file using your module. I
$ perl -pi -e's/"\0/"0/g' file.csv Did you try to use "allow_loose_escapes" ? allow_loose_escapes By default, parsing fields that have "escape_char" characters that escape characters that do not need to be escaped, like: my $csv = Text::CSV_XS->new ({ escape_char => "\\" }); $csv->parse (qq{1,"my bar\'s",baz,42}); would result in a parse error. Though it is still bad practice to allow this format, this option enables you to treat all escape character sequences equal. my $csv = Text::CSV_XS->new ({ binary => 1, allow_loose_escapes => 1 });
Show quoted text
> look forward to being able to pass an array in instead of only a file > handle...
Can you expand on that? Why? How? -- H.Merijn Brand Amsterdam Perl Mongers http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, SuSE 10.1, 10.2, and 10.3, AIX 5.2, and Cygwin. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Subject: Re: [rt.cpan.org #40507] Parsing fails on escaped null byte
Date: Thu, 30 Oct 2008 12:21:39 -0500
To: bug-Text-CSV_XS@rt.cpan.org
From: "Adam Lesperance" <lespea@gmail.com>
Show quoted text
> $ perl -pi -e's/"\0/"0/g' file.csv

Yes that's [almost] the exact command I use in fact :)

Show quoted text
> Did you try to use "allow_loose_escapes" ?

I did try that and I still get the same error :(

C:\>perl -e "use Text::CSV_XS;$csv=Text::CSV_XS->new({binary=>1,allow_loose_escapes=>1});open $fh, '<t.txt';while($line = $csv->getline($fh)){printf(qq{%s\n},$line->[1])};print $csv->error_diag;close $fh"
2023EIQ - QUO character not allowed23

C:\>type t.txt
"Audit active: ""TRUE "[nul]","Desired:","Audit active: ""TRUE "[nul]"


Show quoted text
> Can you expand on that? Why? How?

I clearly remember reading in the "todo" section that it was going to eventually be possible to pass an array/string of the lines of a file (which you could get via slurp or w/e) instead of a filehandle.  Going back to find it however I see it isn't there... guess I was daydreaming :(

My bad.

~Adam~


On Thu, Oct 30, 2008 at 10:54, h.m.brand@xs4all.nl via RT <bug-Text-CSV_XS@rt.cpan.org> wrote:
Show quoted text
On Thu, 30 Oct 2008 11:13:41 -0400, "Adam via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote:

> Okay.  The output is generated using a third party tool so I will just have
> to continue to pre-process it before parsing the file using your module.  I

$ perl -pi -e's/"\0/"0/g' file.csv

Did you try to use "allow_loose_escapes" ?

     allow_loose_escapes
         By default, parsing fields that have "escape_char" characters that
         escape characters that do not need to be escaped, like:

          my $csv = Text::CSV_XS->new ({ escape_char => "\\" });
          $csv->parse (qq{1,"my bar\'s",baz,42});

         would result in a parse error. Though it is still bad practice to
         allow this format, this option enables you to treat all escape
         character sequences equal.

my $csv = Text::CSV_XS->new ({ binary => 1, allow_loose_escapes => 1 });

> look forward to being able to pass an array in instead of only a file
> handle...

Can you expand on that? Why? How?


--
H.Merijn Brand          Amsterdam Perl Mongers  http://amsterdam.pm.org/
using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00,
11.11, 11.23, and 11.31, SuSE 10.1, 10.2, and 10.3, AIX 5.2, and Cygwin.
http://mirrors.develooper.com/hpux/           http://www.test-smoke.org/
http://qa.perl.org      http://www.goldmark.org/jeff/stupid-disclaimers/


Subject: Re: [rt.cpan.org #40507] Parsing fails on escaped null byte
Date: Thu, 30 Oct 2008 21:44:01 +0100
To: bug-Text-CSV_XS@rt.cpan.org
From: "H.Merijn Brand" <h.m.brand@xs4all.nl>
On Thu, 30 Oct 2008 13:22:33 -0400, "Adam via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote:
Show quoted text
> Queue: Text-CSV_XS > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=40507 > >
> > $ perl -pi -e's/"\0/"0/g' file.csv
> > Yes that's [almost] the exact command I use in fact :) >
> > Did you try to use "allow_loose_escapes" ?
> > I did try that and I still get the same error :(
I found the bug for allow_loose_escapes in the source just before error 2023, where it tested for allow_loose_quotes instead :( Your case now added to t/70_rt.t. I'll release it tomorrow Thanks for your persistence!
Show quoted text
> > Can you expand on that? Why? How?
> > I clearly remember reading in the "todo" section that it was going to > eventually be possible to pass an array/string of the lines of a file (which > you could get via slurp or w/e) instead of a filehandle. Going back to find > it however I see it isn't there... guess I was daydreaming :(
With all the other optimizations I made to getline (), and the problems you could enter up with slurp, I dropped that. What is the use of having the module support parsing an array, when it just requires a single loop. I might reconsider though. -- H.Merijn Brand Amsterdam Perl Mongers http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, SuSE 10.1, 10.2, and 10.3, AIX 5.2, and Cygwin. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Subject: Re: [rt.cpan.org #40507] Parsing fails on escaped null byte
Date: Thu, 30 Oct 2008 15:49:03 -0500
To: bug-Text-CSV_XS@rt.cpan.org
From: "Adam Lesperance" <lespea@gmail.com>
Show quoted text
> I found the bug for allow_loose_escapes in the source just before error
> 2023, where it tested for allow_loose_quotes instead :(
> Your case now added to t/70_rt.t. I'll release it tomorrow
>
> Thanks for your persistence!

Cool glad to hear that -- I'm looking forward to dropping the "pre-parse" step.  And no problem, I'm glad to help.  In fact: thank you for being so persistent!


Show quoted text
> With all the other optimizations I made to getline (), and the problems
> you could enter up with slurp, I dropped that. What is the use of
> having the module support parsing an array, when it just requires a
> single loop.
>
> I might reconsider though.

Oh that was just going to be my solution to the problem... I was going to slurp the file in and s/// the problems away.  Now that loose-escapes works I won't need to so you're right... not too much of a use there.

Again, thank you for your hard work... this makes my job about 1_000 times easier!
~Adam~


On Thu, Oct 30, 2008 at 15:44, h.m.brand@xs4all.nl via RT <bug-Text-CSV_XS@rt.cpan.org> wrote:
Show quoted text
On Thu, 30 Oct 2008 13:22:33 -0400, "Adam via RT"
>        Queue: Text-CSV_XS
>  Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=40507 >
>
> > $ perl -pi -e's/"\0/"0/g' file.csv
>
> Yes that's [almost] the exact command I use in fact :)
>
> > Did you try to use "allow_loose_escapes" ?
>
> I did try that and I still get the same error :(

I found the bug for allow_loose_escapes in the source just before error
2023, where it tested for allow_loose_quotes instead :(
Your case now added to t/70_rt.t. I'll release it tomorrow

Thanks for your persistence!

> > Can you expand on that? Why? How?
>
> I clearly remember reading in the "todo" section that it was going to
> eventually be possible to pass an array/string of the lines of a file (which
> you could get via slurp or w/e) instead of a filehandle.  Going back to find
> it however I see it isn't there... guess I was daydreaming :(

With all the other optimizations I made to getline (), and the problems
you could enter up with slurp, I dropped that. What is the use of
having the module support parsing an array, when it just requires a
single loop.

I might reconsider though.

--
H.Merijn Brand          Amsterdam Perl Mongers  http://amsterdam.pm.org/
using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00,
11.11, 11.23, and 11.31, SuSE 10.1, 10.2, and 10.3, AIX 5.2, and Cygwin.
http://mirrors.develooper.com/hpux/           http://www.test-smoke.org/
http://qa.perl.org      http://www.goldmark.org/jeff/stupid-disclaimers/




This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.