Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Apache-Tika CPAN distribution.

Report information
The Basics
Id:
118433
Status:
rejected
Priority:
Low/Low
Queue:

People
Owner:
Nobody in particular
Requestors:
yahavamsi [...] gmail.com
Cc:
AdminCc:

BugTracker
Severity:
(no value)
Broken in:
(no value)
Fixed in:
(no value)



Subject: CommonsDigester calculates wrong hashes on large files
Date: Wed, 19 Oct 2016 14:35:37 +0300
To: bug-Apache-Tika@rt.cpan.org
From: Yahav Amsalem <yahavamsi@gmail.com>
Hi,

I would like to report the next bug description:

When passing more than one algorithm to CommonsDigester constructor and then trying to digest a file which is larger than 7.5 MB, results wrong hashe calculation for all the algorithms except the first.

The next code will reproduce the bug:

// The file that was used was a simple plain text file with size > 7.5 MB
File file = new File("c:\\testLargeFile.txt");

BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(file));

Metadata metadata = new Metadata();

CommonsDigester digester = new CommonsDigester(20000000,
                CommonsDigester.DigestAlgorithm.MD5,
                CommonsDigester.DigestAlgorithm.SHA1,
                CommonsDigester.DigestAlgorithm.SHA256);

digester.digest(bufferedInputStream, metadata, null);
        
// Will print correct MD5 but wrong SHA1 and wrong SHA256
System.out.println(metadata);
 
Initial direction: from a little research it seems that the inner buffered stream that is being used doesn't reset to 0 position after the first algorithm.


If there are any further questions I would be happy to deliver more details.


Thanks,

Yahav Amsalem


Hi, this bugtracker is for Apache::Tika cpan perl module, you sent the bugreport to the wrong bugt racker, may be https://issues.apache.org/jira/browse/DIGESTER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel is the right place. Gerard Ribugent Sobre Dmc Oct 19 07:35:51 2016, yahavamsi@gmail.com ha escrit:
Show quoted text
> Hi, > > I would like to report the next bug description: > > When passing more than one algorithm to CommonsDigester constructor and > then trying to digest a file which is larger than 7.5 MB, results wrong > hashe calculation for all the algorithms except the first. > > The next code will reproduce the bug: > > *// The file that was used was a simple plain text file with size > 7.5 MB* > *File file = new File("c:\\testLargeFile.txt");* > > *BufferedInputStream bufferedInputStream = new BufferedInputStream(new > FileInputStream(file));* > > *Metadata metadata = new Metadata();* > > *CommonsDigester digester = new CommonsDigester(20000000,* > * CommonsDigester.DigestAlgorithm.MD5,* > * CommonsDigester.DigestAlgorithm.SHA1,* > * CommonsDigester.DigestAlgorithm.SHA256);* > > *digester.digest(bufferedInputStream, metadata, null);* > > *// Will print correct MD5 but wrong SHA1 and wrong SHA256* > *System.out.println(metadata);* > > Initial direction: from a little research it seems that the inner buffered > stream that is being used doesn't reset to 0 position after the first > algorithm. > > > If there are any further questions I would be happy to deliver more details. > > > Thanks, > > Yahav Amsalem
Subject: Re: [rt.cpan.org #118433] CommonsDigester calculates wrong hashes on large files
Date: Thu, 20 Oct 2016 14:54:30 +0300
To: bug-Apache-Tika@rt.cpan.org
From: Yahav Amsalem <yahavamsi@gmail.com>

Hi,

I noticed that after I sent the mail but thought it might be of help to you too.

Anyway, thanks for the kind reply,

Yahav Amsalem


בתאריך 20 באוק׳ 2016 2:00 אחה״צ,‏ "Gerard via RT" <bug-Apache-Tika@rt.cpan.org> כתב:
Show quoted text
<URL: https://rt.cpan.org/Ticket/Display.html?id=118433 >

Hi,

this bugtracker is for Apache::Tika cpan perl module, you sent the bugreport to the wrong bugt racker, may be https://issues.apache.org/jira/browse/DIGESTER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel is the right place.

Gerard Ribugent

Sobre Dmc Oct 19 07:35:51 2016, yahavamsi@gmail.com ha escrit:
> Hi,
>
> I would like to report the next bug description:
>
> When passing more than one algorithm to CommonsDigester constructor and
> then trying to digest a file which is larger than 7.5 MB, results wrong
> hashe calculation for all the algorithms except the first.
>
> The next code will reproduce the bug:
>
> *// The file that was used was a simple plain text file with size > 7.5 MB*
> *File file = new File("c:\\testLargeFile.txt");*
>
> *BufferedInputStream bufferedInputStream = new BufferedInputStream(new
> FileInputStream(file));*
>
> *Metadata metadata = new Metadata();*
>
> *CommonsDigester digester = new CommonsDigester(20000000,*
> *                CommonsDigester.DigestAlgorithm.MD5,*
> *                CommonsDigester.DigestAlgorithm.SHA1,*
> *                CommonsDigester.DigestAlgorithm.SHA256);*
>
> *digester.digest(bufferedInputStream, metadata, null);*
>
> *// Will print correct MD5 but wrong SHA1 and wrong SHA256*
> *System.out.println(metadata);*
>
> Initial direction: from a little research it seems that the inner buffered
> stream that is being used doesn't reset to 0 position after the first
> algorithm.
>
>
> If there are any further questions I would be happy to deliver more details.
>
>
> Thanks,
>
> Yahav Amsalem





This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.