This queue is for tickets about the PDF-API2 CPAN distribution.

Report information
The Basics
Id:
48683
Status:
resolved
Priority:
Low/Low
Queue:

People
Owner:
Nobody in particular
Requestors:
cherdt [...] gmail.com
Cc:
AdminCc:



Subject: PDF-API2 throws error "Malformed xref in PDF file" for newer PDFs (PDF v1.5+)
Although PDF-API2 works well for older PDF formats (1.4), newer PDF formats (1.5, 1.6) cause it to throw error an error message: "Malformed xref in PDF file at [path to File.pm] line 1198" I can reproduce the error with the following Perl script: use PDF::API2; $pdf = PDF::API2->open($ARGV[0]); The PDFs I am using were produced by Adobe Acrobat Pro 9 (the attached is an example). I saw the possibly related bug report submitted by abhinavk, but his fix did not work for me. Environment: Perl v5.10.0 (ActiveState) on WinXP Pro 2002 SP3.
Subject: HowToArgueEffectively.pdf

Message body not shown because it is not plain text.

Subject: [rt.cpan.org #48683]
Date: Fri, 14 Aug 2009 11:41:47 -0500
To: bug-PDF-API2@rt.cpan.org
From: Chris Herdt <cherdt@gmail.com>
I have since successfully used PDF::API2 to modify version 1.6 PDFs, so I'm not certain why the particular PDF in question produced the error. Perhaps that file in particular has an unfriendly xref value, which is apparently modified or removed when saved as an earlier PDF version.
On Fri Aug 14 12:42:10 2009, cherdt wrote:
Show quoted text
> I have since successfully used PDF::API2 to modify version 1.6 PDFs, > so I'm not certain why the particular PDF in question produced the > error. Perhaps that file in particular has an unfriendly xref value, > which is apparently modified or removed when saved as an earlier PDF > version.
Since PDF 1.5, the spec changed to allow xref information to be in streams instead of tables. This isn't supported by PDF::API2 (though I'll be very happy if someone beats me to fixing that and sends a patch!). Acrobat 9 started using cross-reference streams by default, so this error is more common with newer files. PDF::API2 will work fine if you generate a PDF in Acrobat 9 without using cross-reference streams, however. The easiest way to do this is to make it compatible with Acrobat 5.0 and later when you save.
From: pwomack@papermule.co.uk
Show quoted text
> > Acrobat 9 started using cross-reference streams by default, so this > error is more common with newer files. PDF::API2 will work fine if you > generate a PDF in Acrobat 9 without using cross-reference streams, > however. The easiest way to do this is to make it compatible with > Acrobat 5.0 and later when you save.
With the passage of time, these are becoming more common. I looked into adding the cross-reference stream myself, but it is too complex to be a "patch"; it'a a significant piece of implementation. So can I just "+1" the importance of this. BugBear
Version 2.020, just released, contains two updates relevant to this issue: 1) If PDF::API2 encounters a cross-reference stream, it will now give a more appropriate error message rather than saying that the cross-reference table is malformed. 2) The Known Issues section of the POD contains pointers to the PDF specification, which describes how both the old cross-reference table works and how the new cross-reference streams work.
From: don.huettl@grantstreet.com
I have attached three patches to implement read-only support for cross-reference streams and compressed objects. Saving the results will still write a v1.4 document. Patches should be applied in the following order: PDF-API2-2.023-XRefStm.patch PDF-API2-2.023-Predictor.patch PDF-API2-2.023-XRef-test.patch The unit test that I added needs the example document attached to this ticket to be saved as t/resources/HowToArgueEffectively.pdf. If this is applied, please credit my employer, Grant Street Group <gsg@cpan.org>, in addition to myself.
Subject: PDF-API2-2.023-XRefStm.patch
diff -ur PDF-API2-2.023/lib/PDF/API2.pm PDF-API2-2.023.1/lib/PDF/API2.pm --- PDF-API2-2.023/lib/PDF/API2.pm 2014-11-19 15:22:35.000000000 -0500 +++ PDF-API2-2.023.1/lib/PDF/API2.pm 2014-11-19 15:20:04.000000000 -0500 @@ -2421,13 +2421,6 @@ This module does not work with perl's -l command-line switch. -PDFs using cross-reference streams instead of cross-reference tables -are not yet supported. Cross-reference streams were added as an -option in version 1.5 of the PDF spec, but were only used infrequently -until Adobe Acrobat 9 started using them by default. A patch would be -welcome -- see the PDF 1.7 specification, sections 7.5.4 and 7.5.8 for -implementation details. - =head1 AUTHOR PDF::API2 was originally written by Alfred Reibenschuh. diff -urN PDF-API2-2.023/lib/PDF/API2/Basic/PDF/File.pm PDF-API2-2.023.1/lib/PDF/API2/Basic/PDF/File.pm --- PDF-API2-2.023/lib/PDF/API2/Basic/PDF/File.pm 2014-09-12 17:26:35.000000000 -0400 +++ PDF-API2-2.023.1/lib/PDF/API2/Basic/PDF/File.pm 2014-11-05 17:35:51.000000000 -0500 @@ -173,6 +173,8 @@ use PDF::API2::Basic::PDF::Page; use PDF::API2::Basic::PDF::Pages; use PDF::API2::Basic::PDF::Null; +use PDF::API2::Resource::XObject::Image::PNG; +use POSIX qw(ceil floor); no warnings qw[ deprecated recursion uninitialized ]; @@ -453,16 +455,17 @@ my $fh = $self->{' INFILE'}; my ($result, $value); - $str = update($fh, $str); + my $update = $opts{update} // 1; + $str = update($fh, $str) if $update; # Dictionary if ($str =~ m/^<</s) { $str = substr ($str, 2); - $str = update($fh, $str); + $str = update($fh, $str) if $update; $result = PDFDict(); while ($str !~ m/^>>/) { - if ($str =~ s|^/($reg_char+)||) { + if ($str =~ s|^/($reg_char+)$ws_char?||) { my $key = PDF::API2::Basic::PDF::Name::name_to_string($1, $self); ($value, $str) = $self->readval($str, %opts); $result->{$key} = $value; @@ -477,10 +480,10 @@ ($value, $str) = $self->readval($str, %opts); $result->{'null'} = $value; } - $str = update($fh, $str); # thanks gareth.jones@stud.man.ac.uk + $str = update($fh, $str) if $update; # thanks gareth.jones@stud.man.ac.uk } $str =~ s/^>>//; - $str = update($fh, $str); + $str = update($fh, $str) if $update; # streams can't be followed by a lone carriage-return. # fredo: yes they can !!! -- use the MacOS Luke. if (($str =~ s/^stream(?:(?:\015\012)|\012|\015)//) and ($result->{'Length'}->val != 0)) { # stream @@ -499,7 +502,7 @@ $value .= substr($str, 0, $length); $result->{' stream'} = $value; $result->{' nofilt'} = 1; - $str = update($fh, $str, 1); # tell update we are in-stream and only need an endstream + $str = update($fh, $str, 1) if $update; # tell update we are in-stream and only need an endstream $str = substr($str, index($str, 'endstream') + 9); } } @@ -543,7 +546,7 @@ $self->add_obj($result, $num, $value); $result->{' realised'} = 1; } - $str = update($fh, $str); # thanks to kundrat@kundrat.sk + $str = update($fh, $str) if $update; # thanks to kundrat@kundrat.sk $str =~ s/^endobj//; } @@ -619,12 +622,12 @@ # Array elsif ($str =~ m/^\[/) { $str =~ s/^\[//; - $str = update($fh, $str); + $str = update($fh, $str) if $update; $result = PDFArray(); while ($str !~ m/^\]/) { ($value, $str) = $self->readval($str, %opts); $result->add_elements($value); - $str = update($fh, $str); # str might just be exhausted! + $str = update($fh, $str) if $update; # str might just be exhausted! } $str =~ s/^\]//; } @@ -684,9 +687,40 @@ my ($self, $num, $gen, %opts) = @_; my $object_location = $self->locate_obj($num, $gen) || return undef; + my $object; + + if (ref $object_location) + { + # Compressed object. + my $src = $self->read_objnum($object_location->[0], 0, %opts); + die 'Cannot find the compressed object stream' unless $src; + + $src->read_stream if $src->{' nofilt'}; + + my ($map, $objects) = $src->{' stream'} =~ /^([\d ]+)(.*)$/; + my @mappings = split(/\s+/, $map); + my $count = scalar(@mappings); + + my $index = $object_location->[1] * 2; + + if ($mappings[$index] != $num) + { + die "Objind $num does not exist at index $index"; + } + + my $start = $mappings[++$index]; + $index += 2; + + my $length = $index > $count ? length($objects) : $mappings[$index]; + my $stream = "$num 0 obj" . substr($objects, $start, $length); + + ($object) = $self->readval($stream, %opts, objnum => $num, objgen => $gen, update => 0); + return $object; + } + my $current_location = $self->{' INFILE'}->tell; $self->{' INFILE'}->seek($object_location, 0); - my ($object) = $self->readval('', %opts, 'objnum' => $num, 'objgen' => $gen); + ($object) = $self->readval('', %opts, 'objnum' => $num, 'objgen' => $gen); $self->{' INFILE'}->seek($current_location, 0); return $object; } @@ -934,6 +968,8 @@ while (defined $tdict) { if (ref $tdict->{' xref'}{$num}) { my $ref = $tdict->{' xref'}{$num}; + return $ref unless scalar(@$ref) == 3; + if ($ref->[1] == $gen) { return $ref->[0] if ($ref->[2] eq 'n'); return undef; # if $ref->[2] eq 'f' @@ -1026,6 +1062,29 @@ =cut +sub _unpack +{ + my ($self, $width, $data) = @_; + + die "Invalid column width: $width" if $width < 1 || $width > 4; + + my $template; + + if ($width == 1) + { + return unpack('C', $data); + } + elsif ($width == 2) + { + return unpack('n', $data); + } + else + { + $data = "\x00$data" if $width == 3; + return unpack('N', $data); + } +} + sub readxrtr { my ($self, $xpos) = @_; my ($tdict, $buf, $xmin, $xnum, $xdiff); @@ -1035,6 +1094,8 @@ $fh->read($buf, 22); $buf = update($fh, $buf); # fix for broken JAWS xref calculation. + my $xlist = {}; + ## seams that some products calculate wrong prev entries (short) ## so we seek ahead to find one -- fredo; save for now #while($buf !~ m/^xref$cr/i && !eof($fh)) @@ -1043,42 +1104,120 @@ # $buf=update($fh,$buf); #} - unless ($buf =~ m/^xref$cr/i) { - if ($buf =~ m/^\d+\s+\d+\s+obj/i) { - die "The PDF file uses a cross-reference stream, which is not yet supported (see Known Issues in the PDF::API2 documentation)"; + if ($buf =~ s/^xref$cr//i) { + # Plain XRef tables. + while ($buf =~ m/^$ws_char*([0-9]+)$ws_char+([0-9]+)$ws_char*$cr(.*?)$/s) { + my $old_buf = $buf; + $xmin = $1; + $xnum = $2; + $buf = $3; + unless ($old_buf =~ /^[0-9]+ [0-9]+$cr/) { + # See PDF 1.7 section 7.5.4: Cross-Reference Table + warn q{Malformed xref in PDF file: subsection shall begin with a line containing two numbers separated by a SPACE (20h)}; + } + $xdiff = length($buf); + + $fh->read($buf, 20 * $xnum - $xdiff + 15, $xdiff); + while ($xnum-- > 0 and $buf =~ s/^0*([0-9]*)$ws_char+0*([0-9]+)$ws_char+([nf])$cr//) { + $xlist->{$xmin} = [$1, $2, $3] unless exists $xlist->{$xmin}; + $xmin++; + } } - else { - die "Malformed xref in PDF file $self->{' fname'}"; + + if ($buf !~ /^\s*trailer\b/i) { + die "Malformed trailer in PDF file $self->{' fname'} at " . ($fh->tell - length($buf)); } + + $buf =~ s/^\s*trailer\b//i; + + ($tdict, $buf) = $self->readval($buf); } - $buf =~ s/^xref$cr//i; + elsif ($buf =~ m/^\d+\s+\d+\s+obj/i) + { + # XRef streams. + ($tdict, $buf) = $self->readval($buf); - my $xlist = {}; - while ($buf =~ m/^$ws_char*([0-9]+)$ws_char+([0-9]+)$ws_char*$cr(.*?)$/s) { - my $old_buf = $buf; - $xmin = $1; - $xnum = $2; - $buf = $3; - unless ($old_buf =~ /^[0-9]+ [0-9]+$cr/) { - # See PDF 1.7 section 7.5.4: Cross-Reference Table - warn q{Malformed xref in PDF file: subsection shall begin with a line containing two numbers separated by a SPACE (20h)}; + my $stream = $tdict->{' stream'}; + + unless ($stream) + { + die "Malformed XRefStm object in PDF file $self->{' fname'}"; + } + + my $p = $tdict->{DecodeParms}->val; + my $pred = defined $p->{Predictor} ? $p->{Predictor}->val : 1; + + if ($pred > 1) + { + my $bpc = defined $p->{BitsPerComponent} ? $p->{BitsPerComponent} : 8; + my $colors = defined $p->{Colors} ? $p->{Colors}->val : 1; + my $columns = defined $p->{Columns} ? $p->{Columns}->val : 1; + + my $bpp = ceil($bpc * $colors / 8); + my $scanline = 1 + ceil($bpp * $columns); + + if ($pred == 2) + { + warn "The TIFF predictor logic has not been implemented"; + } + elsif ($pred >= 10 && $pred <= 15) + { + $stream = PDF::API2::Resource::XObject::Image::PNG::unprocess( + $bpc, $bpp, $colors, $columns, 0, $scanline, + \$tdict->{' stream'} + ); + } + else + { + warn "Invalid predictor: $pred"; + } } - $xdiff = length($buf); + + my @widths = map { $_->val } @{$tdict->{W}->val}; + + my $start = 0; + my $last; + + if (defined $tdict->{Index}) + { + my $index = $tdict->{Index}->val; + + $start = $index->[0]->val; + $last = $start + $index->[1]->val - 1; + } + else + { + $last = $tdict->{Size}->val - 1; + } + + for $xmin ($start...$last) + { + my @cols; + + for my $w (@widths) + { + my $data; + $data = $self->_unpack($w, substr($stream, 0, $w, '')) if $w; + + push @cols, $data; + } - $fh->read($buf, 20 * $xnum - $xdiff + 15, $xdiff); - while ($xnum-- > 0 and $buf =~ s/^0*([0-9]*)$ws_char+0*([0-9]+)$ws_char+([nf])$cr//) { - $xlist->{$xmin} = [$1, $2, $3] unless exists $xlist->{$xmin}; - $xmin++; + $cols[0] //= 1; + die 'Invalid XRefStm entry type: ', $cols[0] if $cols[0] > 2; + + next if exists $xlist->{$xmin}; + + my @objind = ($cols[1], $cols[2] // ($xmin ? 0 : 65535)); + push @objind, ($cols[0] == 0 ? 'f' : 'n') if $cols[0] < 2; + + $xlist->{$xmin} = \@objind; } } - - if ($buf !~ /^\s*trailer\b/i) { - die "Malformed trailer in PDF file $self->{' fname'} at " . ($fh->tell - length($buf)); + else + { + die "Malformed xref in PDF file $self->{' fname'}"; } - $buf =~ s/^\s*trailer\b//i; - - ($tdict, $buf) = $self->readval($buf); $tdict->{' loc'} = $xpos; $tdict->{' xref'} = $xlist; $self->{' maxobj'} = $xmin if $xmin > $self->{' maxobj'}; diff -urN PDF-API2-2.023/lib/PDF/API2/Resource/XObject/Image/PNG.pm PDF-API2-2.023.1/lib/PDF/API2/Resource/XObject/Image/PNG.pm --- PDF-API2-2.023/lib/PDF/API2/Resource/XObject/Image/PNG.pm 2014-09-12 17:26:35.000000000 -0400 +++ PDF-API2-2.023.1/lib/PDF/API2/Resource/XObject/Image/PNG.pm 2014-11-05 14:42:22.000000000 -0500 @@ -274,7 +274,8 @@ my $stream=uncompress($$sstream); my $prev=''; my $clearstream=''; - foreach my $n (0..$height-1) { + my $lastrow=($height||(length($stream)/$scanline))-1; + foreach my $n (0..$lastrow) { # print STDERR "line $n:"; my $line=substr($stream,$n*$scanline,$scanline); my $filter=vec($line,0,8);
Subject: PDF-API2-2.023-XRef-test.patch
commit c198a9745c7a Author: Don Huettl <don.huettl@grantstreet.com> Date: Thu Mar 19 15:28:53 2015 -0400 tests to validate cross-reference stream logic Adds a reference PDF document containing XRef streams, and the associated unit tests. diff --git a/PDF-API2/t/resources/HowToArgueEffectively.pdf b/PDF-API2/t/resources/HowToArgueEffectively.pdf new file mode 100644 index 000000000000..8bfd9482b940 Binary files /dev/null and b/PDF-API2/t/resources/HowToArgueEffectively.pdf differ diff --git a/PDF-API2/t/xref.t b/PDF-API2/t/xref.t new file mode 100644 index 000000000000..0280251f76b3 --- /dev/null +++ b/PDF-API2/t/xref.t @@ -0,0 +1,27 @@ +use Test::More tests => 2; + +use warnings; +use strict; + +use PDF::API2; + +my $pdf = eval { + PDF::API2->open('t/resources/HowToArgueEffectively.pdf'); +}; + +isa_ok($pdf, 'PDF::API2', q{doc containing an XRef stream}); + +my $file = $pdf->{pdf}; +my $pass = 1; + +while (my($id, $xref) = each %{$file->{' xref'}}) { + my $obj = $file->read_objnum($id, $xref->[1]); + + unless (ref($obj)) { + $pass = 0; + last; + } +} + +ok($pass, 'all XRef entries point to an object'); +
Subject: PDF-API2-2.023-Predictor.patch
commit 05fc95fdd98e Author: Don Huettl <don.huettl@grantstreet.com> Date: Thu Mar 19 15:24:29 2015 -0400 refactor PNG predictor logic into shared location This pulls the PNG predictor logic into its own filter module, to be used when parsing cross-reference streams as well. Also fixes a bug when the parser encounters an object stream near the end of the file. The stream location would be off by one, preventing it from parsing correctly. diff --git a/PDF-API2/lib/PDF/API2/Basic/PDF/File.pm b/PDF-API2/lib/PDF/API2/Basic/PDF/File.pm index 9c7503c09de6..c74fd774e43e 100644 --- a/PDF-API2/lib/PDF/API2/Basic/PDF/File.pm +++ b/PDF-API2/lib/PDF/API2/Basic/PDF/File.pm @@ -166,6 +166,7 @@ use PDF::API2::Basic::PDF::Utils; use PDF::API2::Basic::PDF::Array; use PDF::API2::Basic::PDF::Bool; use PDF::API2::Basic::PDF::Dict; +use PDF::API2::Basic::PDF::Filter::Predictor; use PDF::API2::Basic::PDF::Name; use PDF::API2::Basic::PDF::Number; use PDF::API2::Basic::PDF::Objind; @@ -490,6 +491,7 @@ sub readval { my $length = $result->{'Length'}->val; $result->{' streamsrc'} = $fh; $result->{' streamloc'} = $fh->tell - length($str); + $result->{' streamloc'}-- if $fh->eof; unless ($opts{'nostreams'}) { if ($length > length($str)) { $value = $str; @@ -1137,47 +1139,17 @@ sub readxrtr { # XRef streams. ($tdict, $buf) = $self->readval($buf); - my $stream = $tdict->{' stream'}; - - unless ($stream) + unless ($tdict->{' stream'}) { die "Malformed XRefStm object in PDF file $self->{' fname'}"; } - my $p = $tdict->{DecodeParms}->val; - my $pred = defined $p->{Predictor} ? $p->{Predictor}->val : 1; - - if ($pred > 1) - { - my $bpc = defined $p->{BitsPerComponent} ? $p->{BitsPerComponent} : 8; - my $colors = defined $p->{Colors} ? $p->{Colors}->val : 1; - my $columns = defined $p->{Columns} ? $p->{Columns}->val : 1; - - my $bpp = ceil($bpc * $colors / 8); - my $scanline = 1 + ceil($bpp * $columns); - - if ($pred == 2) - { - warn "The TIFF predictor logic has not been implemented"; - } - elsif ($pred >= 10 && $pred <= 15) - { - $stream = PDF::API2::Resource::XObject::Image::PNG::unprocess( - $bpc, $bpp, $colors, $columns, 0, $scanline, - \$tdict->{' stream'} - ); - } - else - { - warn "Invalid predictor: $pred"; - } - } - + my $stream = PDF::API2::Basic::PDF::Filter::Predictor->new($tdict)->infilt; my @widths = map { $_->val } @{$tdict->{W}->val}; - + my $start = 0; my $last; - + if (defined $tdict->{Index}) { my $index = $tdict->{Index}->val; diff --git a/PDF-API2/lib/PDF/API2/Basic/PDF/Filter/FlateDecode.pm b/PDF-API2/lib/PDF/API2/Basic/PDF/Filter/FlateDecode.pm index 5a9a9a8cdc15..86cc284594e7 100644 --- a/PDF-API2/lib/PDF/API2/Basic/PDF/Filter/FlateDecode.pm +++ b/PDF-API2/lib/PDF/API2/Basic/PDF/Filter/FlateDecode.pm @@ -43,6 +43,7 @@ sub infilt { my ($self, $dat, $last) = @_; my ($res, $status) = $self->{'infilt'}->inflate("$dat"); + # TODO: Ideally we should call the Predictor filter from here. $res; } diff --git a/PDF-API2/lib/PDF/API2/Basic/PDF/Filter/Predictor.pm b/PDF-API2/lib/PDF/API2/Basic/PDF/Filter/Predictor.pm new file mode 100644 index 000000000000..7d2c388dcfc0 --- /dev/null +++ b/PDF-API2/lib/PDF/API2/Basic/PDF/Filter/Predictor.pm @@ -0,0 +1,140 @@ +package PDF::API2::Basic::PDF::Filter::Predictor; + +our $VERSION = '2.023.1'; # VERSION + +use base 'PDF::API2::Basic::PDF::Filter'; + +use strict; +no warnings qw[ deprecated recursion uninitialized ]; + +use PDF::API2::Basic::PDF::Utils; +use POSIX qw(ceil floor); + +# This does not behave like the other filters, as it needs access to the +# source object. +sub new { + my ($class, $obj) = @_; + + my $self = {object => $obj}; + bless $self, $class; +} + +sub outfilt { + my ($self) = @_; + + warn 'The "outfilt" method is not implemented'; + return; +} + +sub infilt { + my ($self) = @_; + + # Decompress. + my $obj = $self->{object}; + $obj->read_stream if $obj->{' nofilt'}; + + my $param = $obj->{DecodeParms}; + my $predictor = defined $param->{Predictor} ? $param->{Predictor}->val : 0; + + return $obj->{' stream'} unless $predictor > 1; + + # Then de-predict. + if ($predictor == 2) { + $self->_depredict_tiff; + } elsif ($predictor >= 10 && $predictor <= 15) { + $self->_depredict_png; + } else { + warn "Invalid predictor: $predictor"; + } + + return $obj->{' stream'}; +} + +sub _paeth_predictor { + my ($a, $b, $c)=@_; + my $p = $a + $b - $c; + my $pa = abs($p - $a); + my $pb = abs($p - $b); + my $pc = abs($p - $c); + if(($pa <= $pb) && ($pa <= $pc)) { + return $a; + } elsif($pb <= $pc) { + return $b; + } else { + return $c; + } +} + +sub _depredict_png { + my ($self) = @_; + + my $obj = $self->{object}; + + my $param = $obj->{DecodeParms}; + my $stream = $obj->{' stream'}; + + $param->{Alpha} = PDFNum(0) unless $param->{Alpha}; + $param->{BitsPerComponent} = PDFNum(8) unless $param->{BitsPerComponent}; + $param->{Colors} = PDFNum(1) unless $param->{Colors}; + $param->{Columns} = PDFNum(1) unless $param->{Columns}; + $param->{Height} = PDFNum(0) unless $param->{Height}; + + my $alpha = $param->{Alpha}->val; + my $bpc = $param->{BitsPerComponent}->val; + my $colors = $param->{Colors}->val; + my $columns = $param->{Columns}->val; + my $height = $param->{Height}->val; + + my $bpp = ceil($bpc * $colors / 8); + my $comp = $colors + $alpha; + my $scanline = 1 + ceil($bpp * $columns); + + my $prev=''; + my $clearstream=''; + my $lastrow=($height||(length($stream)/$scanline))-1; + foreach my $n (0..$lastrow) { + # print STDERR "line $n:"; + my $line=substr($stream,$n*$scanline,$scanline); + my $filter=vec($line,0,8); + my $clear=''; + $line=substr($line,1); + # print STDERR " filter=$filter"; + if($filter==0) { + $clear=$line; + } elsif($filter==1) { + foreach my $x (0..length($line)-1) { + vec($clear,$x,8)=(vec($line,$x,8)+vec($clear,$x-$bpp,8))%256; + } + } elsif($filter==2) { + foreach my $x (0..length($line)-1) { + vec($clear,$x,8)=(vec($line,$x,8)+vec($prev,$x,8))%256; + } + } elsif($filter==3) { + foreach my $x (0..length($line)-1) { + vec($clear,$x,8)=(vec($line,$x,8)+floor((vec($clear,$x-$bpp,8)+vec($prev,$x,8))/2))%256; + } + } elsif($filter==4) { + # die "paeth/png filter not supported."; + foreach my $x (0..length($line)-1) { + vec($clear,$x,8)=(vec($line,$x,8)+_paeth_predictor(vec($clear,$x-$bpp,8),vec($prev,$x,8),vec($prev,$x-$bpp,8)))%256; + } + } + $prev=$clear; + foreach my $x (0..($columns*$comp)-1) { + vec($clearstream,($n*$columns*$comp)+$x,$bpc)=vec($clear,$x,$bpc); + # print STDERR "".vec($clear,$x,$bpc).","; + } + # print STDERR "\n"; + } + + $obj->{' stream'} = $clearstream; +} + +sub _depredict_tiff { + my ($self) = @_; + + warn "The TIFF predictor logic has not been implemented"; +} + +1; + diff --git a/PDF-API2/lib/PDF/API2/Resource/XObject/Image/PNG.pm b/PDF-API2/lib/PDF/API2/Resource/XObject/Image/PNG.pm index 679c1400efb9..25646d4ce05f 100644 --- a/PDF-API2/lib/PDF/API2/Resource/XObject/Image/PNG.pm +++ b/PDF-API2/lib/PDF/API2/Resource/XObject/Image/PNG.pm @@ -4,8 +4,7 @@ our $VERSION = '2.023.1'; # VERSION use base 'PDF::API2::Resource::XObject::Image'; -use Compress::Zlib; -use POSIX qw(ceil floor); +use POSIX qw(ceil); use IO::File; use PDF::API2::Util; @@ -152,9 +151,7 @@ sub new { # $dict->{Filter}=PDFArray(PDFName('ASCIIHexDecode')); $dict->{BitsPerComponent}=PDFNum(8); $self->{SMask}=$dict; - my $scanline=1+ceil($bpc*$w/8); - my $bpp=ceil($bpc/8); - my $clearstream=unprocess($bpc,$bpp,1,$w,$h,$scanline,\$self->{' stream'}); + my $clearstream=PDF::API2::Basic::PDF::Filter::Predictor->new($self)->infilt; foreach my $n (0..($h*$w)-1) { vec($dict->{' stream'},$n,8)=vec($trns,vec($clearstream,$n,$bpc),8); # print STDERR vec($trns,vec($clearstream,$n,$bpc),8)."=".vec($clearstream,$n,$bpc).","; @@ -173,6 +170,7 @@ sub new { my $dict=PDFDict(); $self->{DecodeParms}=PDFArray($dict); # $dict->{Predictor}=PDFNum(15); + $dict->{Alpha}=PDFNum(1); $dict->{BitsPerComponent}=PDFNum($bpc); $dict->{Colors}=PDFNum(1); $dict->{Columns}=PDFNum($w); @@ -189,9 +187,7 @@ sub new { $dict->{BitsPerComponent}=PDFNum($bpc); $self->{SMask}=$dict; } - my $scanline=1+ceil($bpc*2*$w/8); - my $bpp=ceil($bpc*2/8); - my $clearstream=unprocess($bpc,$bpp,2,$w,$h,$scanline,\$self->{' stream'}); + my $clearstream=PDF::API2::Basic::PDF::Filter::Predictor->new($self)->infilt; delete $self->{' nofilt'}; delete $self->{' stream'}; foreach my $n (0..($h*$w)-1) { @@ -210,6 +206,7 @@ sub new { my $dict=PDFDict(); $self->{DecodeParms}=PDFArray($dict); # $dict->{Predictor}=PDFNum(15); + $dict->{Alpha}=PDFNum(1); $dict->{BitsPerComponent}=PDFNum($bpc); $dict->{Colors}=PDFNum(3); $dict->{Columns}=PDFNum($w); @@ -226,9 +223,7 @@ sub new { $dict->{BitsPerComponent}=PDFNum($bpc); $self->{SMask}=$dict; } - my $scanline=1+ceil($bpc*4*$w/8); - my $bpp=ceil($bpc*4/8); - my $clearstream=unprocess($bpc,$bpp,4,$w,$h,$scanline,\$self->{' stream'}); + my $clearstream=PDF::API2::Basic::PDF::Filter::Predictor->new($self)->infilt; delete $self->{' nofilt'}; delete $self->{' stream'}; foreach my $n (0..($h*$w)-1) { @@ -254,64 +249,6 @@ sub new_api { return($obj); } -sub PaethPredictor { - my ($a, $b, $c)=@_; - my $p = $a + $b - $c; - my $pa = abs($p - $a); - my $pb = abs($p - $b); - my $pc = abs($p - $c); - if(($pa <= $pb) && ($pa <= $pc)) { - return $a; - } elsif($pb <= $pc) { - return $b; - } else { - return $c; - } -} - -sub unprocess { - my ($bpc,$bpp,$comp,$width,$height,$scanline,$sstream)=@_; - my $stream=uncompress($$sstream); - my $prev=''; - my $clearstream=''; - my $lastrow=($height||(length($stream)/$scanline))-1; - foreach my $n (0..$lastrow) { - # print STDERR "line $n:"; - my $line=substr($stream,$n*$scanline,$scanline); - my $filter=vec($line,0,8); - my $clear=''; - $line=substr($line,1); - # print STDERR " filter=$filter"; - if($filter==0) { - $clear=$line; - } elsif($filter==1) { - foreach my $x (0..length($line)-1) { - vec($clear,$x,8)=(vec($line,$x,8)+vec($clear,$x-$bpp,8))%256; - } - } elsif($filter==2) { - foreach my $x (0..length($line)-1) { - vec($clear,$x,8)=(vec($line,$x,8)+vec($prev,$x,8))%256; - } - } elsif($filter==3) { - foreach my $x (0..length($line)-1) { - vec($clear,$x,8)=(vec($line,$x,8)+floor((vec($clear,$x-$bpp,8)+vec($prev,$x,8))/2))%256; - } - } elsif($filter==4) { - # die "paeth/png filter not supported."; - foreach my $x (0..length($line)-1) { - vec($clear,$x,8)=(vec($line,$x,8)+PaethPredictor(vec($clear,$x-$bpp,8),vec($prev,$x,8),vec($prev,$x-$bpp,8)))%256; - } - } - $prev=$clear; - foreach my $x (0..($width*$comp)-1) { - vec($clearstream,($n*$width*$comp)+$x,$bpc)=vec($clear,$x,$bpc); - # print STDERR "".vec($clear,$x,$bpc).","; - } - # print STDERR "\n"; - } - return($clearstream); -} - 1; __END__
From: don.huettl@grantstreet.com
I have one more patch that does a little clean-up, attached.
Subject: PDF-API2-Predictor-pt2.patch
diff --git PDF-API2/lib/PDF/API2/Basic/PDF/Filter/Predictor.pm PDF-API2/lib/PDF/API2/Basic/PDF/Filter/Predictor.pm index 7d2c388dcfc0..813951d0f6fc 100644 --- PDF-API2/lib/PDF/API2/Basic/PDF/Filter/Predictor.pm +++ PDF-API2/lib/PDF/API2/Basic/PDF/Filter/Predictor.pm @@ -22,8 +22,7 @@ sub new { sub outfilt { my ($self) = @_; - warn 'The "outfilt" method is not implemented'; - return; + die 'The "outfilt" method is not implemented'; } sub infilt { @@ -44,7 +43,7 @@ sub infilt { } elsif ($predictor >= 10 && $predictor <= 15) { $self->_depredict_png; } else { - warn "Invalid predictor: $predictor"; + die "Invalid predictor: $predictor"; } return $obj->{' stream'}; @@ -133,7 +132,7 @@ sub _depredict_png { sub _depredict_tiff { my ($self) = @_; - warn "The TIFF predictor logic has not been implemented"; + die "The TIFF predictor logic has not been implemented"; } 1; diff --git PDF-API2/lib/PDF/API2/Resource/XObject/Image/PNG.pm PDF-API2/lib/PDF/API2/Resource/XObject/Image/PNG.pm index bdf3356a9f8d..3fd5832cb675 100644 --- PDF-API2/lib/PDF/API2/Resource/XObject/Image/PNG.pm +++ PDF-API2/lib/PDF/API2/Resource/XObject/Image/PNG.pm @@ -8,6 +8,7 @@ use POSIX qw(ceil); use IO::File; use PDF::API2::Util; +use PDF::API2::Basic::PDF::Filter::Predictor; use PDF::API2::Basic::PDF::Utils; no warnings qw[ deprecated recursion uninitialized ]; @@ -31,7 +32,10 @@ sub new { open($fh,$file); binmode($fh); seek($fh,8,0); + $self->{Length}=PDFNum(-s $file); $self->{' stream'}=''; + $self->{' streamloc'}=0; + $self->{' streamsrc'}=$fh; $self->{' nofilt'}=1; while(!eof($fh)) { read($fh,$buf,4);
I've tried out the patch as found in the xref-streams branch on https://github.com/ssimms/pdfapi2 but I get this failure with a PDF compiled by XeLaTeX: Objind 1 does not exist at index 46 at lib/PDF/API2/Basic/PDF/File.pm line 710. Pull request for the xref-streams branch issued at https://github.com/ssimms/pdfapi2/pull/3
On Thu Dec 24 04:08:40 2015, MELMOTHX wrote:
Show quoted text
> I've tried out the patch as found in the xref-streams branch on > https://github.com/ssimms/pdfapi2 but I get this failure with a PDF > compiled by XeLaTeX: > > Objind 1 does not exist at index 46 at lib/PDF/API2/Basic/PDF/File.pm > line 710. > > Pull request for the xref-streams branch issued at > https://github.com/ssimms/pdfapi2/pull/3
As stated in the third commit, trying it out against PDF::Cropmarks leads to some memory-hungry (possibly endless) recursion. I'm way out of my deeps here, though. Anyway, I hope this helps.
I've merged the xref-streams branch after making a few fixes, including compatibility for older versions of Perl and a fix for the issue that MELMOTHX discovered with object streams. Many thanks! This will be included in the upcoming 2.026 release.


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.