This queue is for tickets about the Mail-Box-Parser-C CPAN distribution.

Report information
The Basics
Id:
83749
Status:
resolved
Priority:
Low/Low

People
Owner:
Nobody in particular
Requestors:
jik [...] kamens.brookline.ma.us
Cc:
AdminCc:

BugTracker
Severity:
Important
Broken in:
3.006
Fixed in:
3.007

Attachments


Subject: Long header lines parsed improperly
Header lines longer than 1023 characters cause Mail::Box::Parser::C to parse the header improperly and corrupt the message. Yes, I realize that nothing is supposed to generate header lines that long, and yet, there are things that do, and "Be generous in what you accept" dictates that this could should do its best to parse them successfully. The attached patch implements a dynamic buffer for reading message lines, which is reallocated as needed to make enough space for the longest line in the mailbox, and freed when the mailbox is freed. I considered putting an upper limit on the line length to prevent memory exhaustion DoS attacks against the application running the code, but I decided not to because there is no length check on folded header lines in the existing code, which means the DoS potential is already there.
Subject: Mail-Box-Parser-C-3.006-long-lines.patch
--- C.xs.orig 2004-09-22 03:31:51.000000000 -0400 +++ C.xs 2013-03-04 08:41:08.083279924 -0500 @@ -12,8 +12,8 @@ #define TRACE_NOTICES 2 #define TRACE_DEBUG 1 -#ifndef MAX_LINE -#define MAX_LINE 1024 +#ifndef DFLT_LINE +#define DFLT_LINE 1024 #endif #ifndef NULL @@ -53,8 +53,9 @@ int strip_gt; int keep_line; /* unget line */ - char line[MAX_LINE+2]; /* one more for missing newline on * - * last line of file on Windows */ + char * line; + int line_buf_size; + long line_start; } Mailbox; @@ -78,6 +79,9 @@ New(0, box->filename, strlen(filename)+1, char); strcpy(box->filename, filename); + New(0, box->line, DFLT_LINE, char); + box->line_buf_size = DFLT_LINE; + return box; } @@ -144,8 +148,24 @@ } box->line_start = (long)ftell(box->file); - if(!fgets(box->line, MAX_LINE, box->file)) - return NULL; + + int already_read = 0; + while (fgets(box->line + already_read, box->line_buf_size - already_read, + box->file)) + { + int len = strlen(box->line); + already_read += len; + if (len == box->line_buf_size - 1 && box->line[len-1] != '\n') { + int new_size = box->line_buf_size * 2; + Renew(box->line, new_size, char); + box->line_buf_size = new_size; + continue; + } + break; + } + + if (! already_read) + return NULL; if(box->dosmode) { int len = strlen(box->line); @@ -640,6 +660,7 @@ } Safefree(box->filename); + Safefree(box->line); Safefree(box);
Subject: Re: [rt.cpan.org #83749] Long header lines parsed improperly
Date: Mon, 4 Mar 2013 15:18:29 +0100
To: Jonathan Kamens via RT <bug-Mail-Box-Parser-C@rt.cpan.org>
From: Mark Overmeer <solutions@overmeer.net>
* Jonathan Kamens via RT (bug-Mail-Box-Parser-C@rt.cpan.org) [130304 13:53]:
Show quoted text
> Mon Mar 04 08:53:06 2013: Request 83749 was acted upon. > Transaction: Ticket created by JIK > Queue: Mail-Box-Parser-C > Subject: Long header lines parsed improperly > Broken in: 3.006 > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=83749 > > > > Header lines longer than 1023 characters cause Mail::Box::Parser::C to > parse the header improperly and corrupt the message.
The latest email spec, RFC5322 section 2.1.1 says: There are two limits that this specification places on the number of characters in a line. Each line of characters MUST be no more than 998 characters, and SHOULD be no more than 78 characters, excluding the CRLF. The 998 character limit is due to limitations in many implementations that send, receive, or store IMF messages which simply cannot handle more than 998 characters on a line. Receiving implementations would do well to handle an arbitrarily large number of characters in a line for robustness sake. However, there are so many implementations that (in compliance with the transport requirements of [RFC5321]) do not accept messages containing more than 1000 characters including the CR and LF per line, it is important for implementations not to create such messages. So it partially agrees with you: "robustness". On the other hand, it complicates the code significantly. I will increase the max header line length to 4094. (I did not expect people still use the C parser version ;-) -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #83749] Long header lines parsed improperly
Date: Mon, 4 Mar 2013 09:27:48 -0500
To: bug-Mail-Box-Parser-C@rt.cpan.org
From: Jonathan Kamens <jik@kamens.us>

I am not sure I agree that my change complicates the code "significantly", and I don't really see the virtue in replacing one unnecessarily hard-coded constant with another, especially since the RFC recommends being able to handle arbitrarily long lines and I've given you a patch that does just that.

I will, however, concede that I've seen far fewer messages with lines longer than 4094 characters than 1022, so if all you're willing to do is increase the constant, that's better than doing nothing.

  Jik

Sent from my Android device

Subject: Re: [rt.cpan.org #83749] Long header lines parsed improperly
Date: Mon, 4 Mar 2013 15:35:07 +0100
To: Jonathan Kamens via RT <bug-Mail-Box-Parser-C@rt.cpan.org>
From: Mark Overmeer <mark@overmeer.net>
* Jonathan Kamens via RT (bug-Mail-Box-Parser-C@rt.cpan.org) [130304 14:28]:
Show quoted text
> I will, however, concede that I've seen far fewer messages with lines > longer than 4094 characters than 1022, so if all you're willing to do > is increase the constant, that's better than doing nothing.
What kind of broken application does produce header lines of more than two screen-pages? I cannot imagin that this is a valid use of MIME in the first place. -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #83749] Long header lines parsed improperly
Date: Mon, 04 Mar 2013 10:09:53 -0500
To: bug-Mail-Box-Parser-C@rt.cpan.org
From: Jonathan Kamens <jik@kamens.us>
On 03/04/2013 09:35 AM, Mark Overmeer via RT wrote:
Show quoted text
What kind of broken application does produce header lines of more than two screen-pages? I cannot imagin that this is a valid use of MIME in the first place.
Mark, you've been doing Internet standards programming long enough to know that's a silly question. :-) Trying to anticipate all the stupid things that other programmers will do that violate standards is an exercise in futility.

Having said that, the most common case I see of this problem is spam messages. While you could argue that correctly parsing spam messages is not terribly important, I would counter that my spam filters are hampered in their work if they can't parse messages properly.

Another point I want to bring up... Does the Perl parser handle arbitrarily long header lines. If so, it seems difficult to justify preserving different, incorrect behavior in the C parser for the sake of saving a few lines of code.

I did a scan of the messages in my IMAP folders on my server (only legitimate emails I've saved, not spam) and found two legitimate messages with long header lines. One of them was a DKIM-Signature line in a message from att.net, and the other was an X-Brightmail-Tracker line generated by convio.net. So it's not just the small guys who are doing stupid things with headers.

  jik

Subject: Re: [rt.cpan.org #83749] Long header lines parsed improperly
Date: Mon, 4 Mar 2013 16:36:36 +0100
To: Jonathan Kamens via RT <bug-Mail-Box-Parser-C@rt.cpan.org>
From: Mark Overmeer <solutions@overmeer.net>
* Jonathan Kamens via RT (bug-Mail-Box-Parser-C@rt.cpan.org) [130304 15:10]:
Show quoted text
> Queue: Mail-Box-Parser-C > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=83749 > > > Another point I want to bring up... Does the Perl parser handle > arbitrarily long header lines. If so, it seems difficult to justify > preserving different, incorrect behavior in the C parser for the sake of > saving a few lines of code.
This is a convincing argument.
Show quoted text
> I did a scan of the messages in my IMAP folders on my server (only > legitimate emails I've saved, not spam) and found two legitimate > messages with long header lines. One of them was a DKIM-Signature line > in a message from att.net, and the other was an X-Brightmail-Tracker > line generated by convio.net. So it's not just the small guys who are > doing stupid things with headers.
Hum... in stupidity, size doesn't matter ;-) Those lines are larger than 4k??? Or larger than 1k? What about this: int already_read = 0; while(fgets(box->line + already_read, box->line_buf_size - already_read, box->file)) { int len = strlen(box->line); already_read += len; if(len < box->line_buf_size-1 || box->line[len-1]=='\n') break; /* Extend header line buffer to contain more than DFLT_SIZE * chars. Rfc5322 restricts size to 998 octets, but larger * headers are encountered in the wild. */ box->line_buf_size *= 2; Renew(box->line, box->line_buf_size, char); } -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions drs Mark A.C.J. Overmeer MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #83749] Long header lines parsed improperly
Date: Mon, 04 Mar 2013 10:46:48 -0500
To: bug-Mail-Box-Parser-C@rt.cpan.org
From: Jonathan Kamens <jik@kamens.us>
On 03/04/2013 10:37 AM, Mark Overmeer via RT wrote:
Show quoted text
Hum... in stupidity, size doesn't matter ;-)
Those lines are larger than 4k???  Or larger than 1k?
Larger than 1k, not 4k. I did not find any larger than 4k. But for the DKIM-Signature line, at least, it could theoretically be, since its length is dependent on the headers that are signed, and an email message could have an arbitrary number of signed headers.
Show quoted text
What about this:
I think that looks fine. Thanks.

  jik

Subject: Re: [rt.cpan.org #83749] Long header lines parsed improperly
Date: Mon, 4 Mar 2013 16:50:02 +0100
To: Jonathan Kamens via RT <bug-Mail-Box-Parser-C@rt.cpan.org>
From: Mark Overmeer <solutions@overmeer.net>
* MARKOV Solutions (solutions@overmeer.net) [130304 16:36]:
Show quoted text
> int already_read = 0; > while(fgets(box->line + already_read, box->line_buf_size - already_read, > box->file)) > { int len = strlen(box->line); > already_read += len;
Shouldn't that be already_read = len; I think the loop is flawed. This is probably better: box->line_start = (long)ftell(box->file); int bufsize = box->line_buf_size; int bytes = 0; while(1) { fgets(box->line + bytes, bufsize - bytes, box->file) or break; bytes = strlen(box->line); if(bytes < bufsize-1 || box->line[bufsize-1]=='\n') break; /* Extend header line buffer to contain more than DFLT_SIZE * chars. Rfc5322 restricts size to 998 octets, but larger * headers are encountered in the wild. */ bufsize = box->line_buf_size *= 2; Renew(box->line, bufsize, char); } if(!bytes) return NULL; -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions drs Mark A.C.J. Overmeer MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #83749] Long header lines parsed improperly
Date: Mon, 04 Mar 2013 10:53:03 -0500
To: bug-Mail-Box-Parser-C@rt.cpan.org
From: Jonathan Kamens <jik@kamens.us>
Yes, that's right, sorry.

  jik

Subject: Re: [rt.cpan.org #83749] Long header lines parsed improperly
Date: Mon, 4 Mar 2013 22:41:05 +0100
To: Jonathan Kamens via RT <bug-Mail-Box-Parser-C@rt.cpan.org>
From: Mark Overmeer <mark@overmeer.net>
* Jonathan Kamens via RT (bug-Mail-Box-Parser-C@rt.cpan.org) [130304 13:53]:
Show quoted text
> Mon Mar 04 08:53:06 2013: Request 83749 was acted upon. > Transaction: Ticket created by JIK > Queue: Mail-Box-Parser-C > Subject: Long header lines parsed improperly > Broken in: 3.006
released as 3.007 Hopefully I did not introduce other bugs. -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.