Skip Menu |
 

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Apache-Tika CPAN distribution.

Report information
The Basics
Id: 118433
Status: rejected
Priority: 0/
Queue: Apache-Tika

People
Owner: Nobody in particular
Requestors: yahavamsi [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



MIME-Version: 1.0
X-Spam-Status: No, score=-1.998 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham
X-Spam-Flag: NO
Content-Type: multipart/alternative; boundary="94eb2c062a383d02d9053f3635a1"
Message-ID: <CAP19m1f--NyRfB9v_Af3zydRYOm2mRvAor_Tqsn9tj=681EtaQ [...] mail.gmail.com>
X-Received: by 10.31.238.74 with SMTP id m71mr4656085vkh.27.1476876937526; Wed, 19 Oct 2016 04:35:37 -0700 (PDT)
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Spam-Score: -1.998
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id B8C3A240279 for <cpan-bug+Apache-Tika [...] hipster.bestpractical.com>; Wed, 19 Oct 2016 07:35:50 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tn5B1qQ4J9tb for <cpan-bug+Apache-Tika [...] hipster.bestpractical.com>; Wed, 19 Oct 2016 07:35:47 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id B1637240028 for <bug-Apache-Tika [...] rt.cpan.org>; Wed, 19 Oct 2016 07:35:45 -0400 (EDT)
Received: (qmail 26712 invoked by alias); 19 Oct 2016 11:35:44 -0000
Received: from mail-vk0-f50.google.com (HELO mail-vk0-f50.google.com) (209.85.213.50) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Wed, 19 Oct 2016 04:35:42 -0700
Received: by mail-vk0-f50.google.com with SMTP id b186so23645623vkb.1 for <bug-Apache-Tika [...] rt.cpan.org>; Wed, 19 Oct 2016 04:35:41 -0700 (PDT)
Received: by 10.176.69.141 with HTTP; Wed, 19 Oct 2016 04:35:37 -0700 (PDT)
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i= [...] gmail.com
Delivered-To: cpan-bug+Apache-Tika [...] hipster.bestpractical.com
Subject: CommonsDigester calculates wrong hashes on large files
Return-Path: <yahavamsi [...] gmail.com>
X-RT-Mail-Extension: apache-tika
X-Original-To: cpan-bug+Apache-Tika [...] hipster.bestpractical.com
X-Spam-Check-BY: la.mx.develooper.com
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=AsC5Fiq1P9SlmWSXNm8KDR1Awr++wLdvV/aJgECCgl8=; b=xn34aXK7V7rMQMdRkjj9X6rOGDUBhXXVT4FzHOeZraesd1CbZF0oR5AOEZKNnACzdE aISGtMORUsABj3Rm/IFeLRKIlV+H7zYXm7niyb7r5hBgxGwOFPAzAF9XKqSVDLQT5+2/ I9bVTZEMbuSj7IKNyqcvF+MX5w9ADpnjrHHpGa0FmOIJEATqF65ta2ar5r+tNj1F+eyX 9xqThO5oCRDucUDRU3DaAVe51YjYDAjIUGLQskSHg1dCe7Lbv/5OgnFRls1RgmhL4XZg q5RvTqkw2sEDG+i/rARhi9w6KusxzDqWuexxdA/A6nQs/sVHQITdGt6q60TdokCKhvG8 OX/g==
X-Google-Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=AsC5Fiq1P9SlmWSXNm8KDR1Awr++wLdvV/aJgECCgl8=; b=FvF6UG24K1UCUjccYNVPcnq2lVNLxcaWkoXLfNcPef/KZp48JyUYJk9bCKyjRlT0ro M2eEQwA000NjPR4yEaLZynWiQj9vTncbkzUVuOdf/uJrRDY/5nINuBONMRf1PkHfeGzf XEU3hAXfX21gQhxKfgNBWCqyhVmZwysm7oSTiivantRkm+4+d/CZOWM1FMIvUpsjgiou Mqft9lWyOS1hv1nP1f9ZIVEyE9cdSQ4ItzTQgsZdXjOG+KEYxeg4dtIe+z22qt7yScWB pA14UAo/NJDFfYQJAQXZNGvnUfUlf9OZ/Ewc3o93G/Pfj4I39CGvPNqGsys3cjWVEJm1 ALiw==
Date: Wed, 19 Oct 2016 14:35:37 +0300
X-Spam-Level:
To: bug-Apache-Tika [...] rt.cpan.org
From: Yahav Amsalem <yahavamsi [...] gmail.com>
X-GM-Message-State: AA6/9RkxkRUfC3KZaePja8ECbu1/O18ej4UM+rqbCdFcUXqmfdBlkTikmqDJXfj0i2hYayLdx7luS1yiVudXjw==
X-RT-Interface: Email
Content-Length: 0
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
Content-Length: 1200
Download (untitled) / with headers
text/plain 1.1k
Hi, I would like to report the next bug description: When passing more than one algorithm to CommonsDigester constructor and then trying to digest a file which is larger than 7.5 MB, results wrong hashe calculation for all the algorithms except the first. The next code will reproduce the bug: *// The file that was used was a simple plain text file with size > 7.5 MB* *File file = new File("c:\\testLargeFile.txt");* *BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(file));* *Metadata metadata = new Metadata();* *CommonsDigester digester = new CommonsDigester(20000000,* * CommonsDigester.DigestAlgorithm.MD5,* * CommonsDigester.DigestAlgorithm.SHA1,* * CommonsDigester.DigestAlgorithm.SHA256);* *digester.digest(bufferedInputStream, metadata, null);* *// Will print correct MD5 but wrong SHA1 and wrong SHA256* *System.out.println(metadata);* Initial direction: from a little research it seems that the inner buffered stream that is being used doesn't reset to 0 position after the first algorithm. If there are any further questions I would be happy to deliver more details. Thanks, Yahav Amsalem
content-type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-RT-Original-Encoding: utf-8
Content-Length: 1797
MIME-Version: 1.0
In-Reply-To: <CAP19m1f--NyRfB9v_Af3zydRYOm2mRvAor_Tqsn9tj=681EtaQ [...] mail.gmail.com>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <CAP19m1f--NyRfB9v_Af3zydRYOm2mRvAor_Tqsn9tj=681EtaQ [...] mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-15471-1476961162-112.118433-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 1610
Download (untitled) / with headers
text/plain 1.5k
Hi, this bugtracker is for Apache::Tika cpan perl module, you sent the bugreport to the wrong bugt racker, may be https://issues.apache.org/jira/browse/DIGESTER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel is the right place. Gerard Ribugent Sobre Dmc Oct 19 07:35:51 2016, yahavamsi@gmail.com ha escrit: Show quoted text
> Hi, > > I would like to report the next bug description: > > When passing more than one algorithm to CommonsDigester constructor and > then trying to digest a file which is larger than 7.5 MB, results wrong > hashe calculation for all the algorithms except the first. > > The next code will reproduce the bug: > > *// The file that was used was a simple plain text file with size > 7.5 MB* > *File file = new File("c:\\testLargeFile.txt");* > > *BufferedInputStream bufferedInputStream = new BufferedInputStream(new > FileInputStream(file));* > > *Metadata metadata = new Metadata();* > > *CommonsDigester digester = new CommonsDigester(20000000,* > * CommonsDigester.DigestAlgorithm.MD5,* > * CommonsDigester.DigestAlgorithm.SHA1,* > * CommonsDigester.DigestAlgorithm.SHA256);* > > *digester.digest(bufferedInputStream, metadata, null);* > > *// Will print correct MD5 but wrong SHA1 and wrong SHA256* > *System.out.println(metadata);* > > Initial direction: from a little research it seems that the inner buffered > stream that is being used doesn't reset to 0 position after the first > algorithm. > > > If there are any further questions I would be happy to deliver more details. > > > Thanks, > > Yahav Amsalem
MIME-Version: 1.0
X-Spam-Status: No, score=-3.998 tagged_above=-99.9 required=10 tests=[AWL=2.000, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, FROM_OUR_RT=-4, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham
In-Reply-To: <rt-4.0.18-15471-1476961163-1284.118433-6-0 [...] rt.cpan.org>
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-118433 [...] rt.cpan.org> <CAP19m1f--NyRfB9v_Af3zydRYOm2mRvAor_Tqsn9tj=681EtaQ [...] mail.gmail.com> <rt-4.0.18-15471-1476961163-1284.118433-6-0 [...] rt.cpan.org>
X-Virus-Checked: Checked
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Received: by 10.31.58.19 with SMTP id h19mr263870vka.30.1476964471374; Thu, 20 Oct 2016 04:54:31 -0700 (PDT)
Message-ID: <CAP19m1cggyc7LF8zCYtxhjXN=oDmv+KYT1mBaqkzwLFegJ3KdQ [...] mail.gmail.com>
Content-Type: multipart/alternative; boundary="001a114389c8a98909053f4a9625"
X-Spam-Score: -3.998
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i= [...] gmail.com
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 81BFA240350 for <cpan-bug+Apache-Tika [...] hipster.bestpractical.com>; Thu, 20 Oct 2016 07:54:41 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RYaOr53dxjJV for <cpan-bug+Apache-Tika [...] hipster.bestpractical.com>; Thu, 20 Oct 2016 07:54:39 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id AA2DB24007D for <bug-Apache-Tika [...] rt.cpan.org>; Thu, 20 Oct 2016 07:54:38 -0400 (EDT)
Received: (qmail 29505 invoked by alias); 20 Oct 2016 11:54:37 -0000
Received: from mail-vk0-f47.google.com (HELO mail-vk0-f47.google.com) (209.85.213.47) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Thu, 20 Oct 2016 04:54:35 -0700
Received: by mail-vk0-f47.google.com with SMTP id b186so68138044vkb.1 for <bug-Apache-Tika [...] rt.cpan.org>; Thu, 20 Oct 2016 04:54:35 -0700 (PDT)
Received: by 10.176.4.129 with HTTP; Thu, 20 Oct 2016 04:54:30 -0700 (PDT)
Received: by 10.176.4.129 with HTTP; Thu, 20 Oct 2016 04:54:30 -0700 (PDT)
Delivered-To: cpan-bug+Apache-Tika [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #118433] CommonsDigester calculates wrong hashes on large files
Return-Path: <yahavamsi [...] gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=o3cov3bMOFAx0NOWYShtF52Gw/WwgHsCwwz/xWkfys0=; b=XCx/aPT5JTxi2zzjsSsAI6vPbV2OOLlzpcBdSkRp6fN8++21Uy5N16Om2lJlbiVE+D XIq7AKe4v0Fn+i7w4X0iTRlkEemZUdO1lIQZy+Mr3tPWs2uQRJ4mr+uh4QQvN7gujfvx Si1LxsJxUgGFxk9uAKKpu6AWGi/vWb94RJRx+1NX/Kp/LjIp9Js92crNsdo3PyrBB7Fa wAzh16QrH8uwSLmJzQoJbqeA38yi1i9cN9sig0g8OI/7/Lg0+qhw8H63IE4OgQp1YTbe V+lG8bPSJHAgQw15WKSD/YV+4o5PCdmoXnVV8Vv7WWK4wroqBuW3KsVQ6UWL//RusU6b VmOQ==
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Apache-Tika [...] hipster.bestpractical.com
X-RT-Mail-Extension: apache-tika
X-Google-Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=o3cov3bMOFAx0NOWYShtF52Gw/WwgHsCwwz/xWkfys0=; b=nAum8oTZi6vBlvh3yFkGA13rJzpm4Olb0T3a6nDnN7Aif7R0sdjXPVv2491XQs12Gg RFC5lK9ukipY7kwzMugVlkT6PkYG3fAS2OUYnZ/TSr2GmpahUG4y/RNMLiEi/8Hz4mUH SpzhKgBVN6RsSOsGAfn0HdrnTAD+nGD3ZiUUkg60fAxvT/PbsGH1UTvTk3Wd3QlLxNhi dPy0KoEsa9lyr7JAAUfcyWypxkYNBE2q936clqiWrY4cRyf2ilYzdDz2uqTFnye7Rz7Y 7bWHG4X0rpZK4GVRf0uBoA6D1l/PrmjjJlsrBvsCXCpJBWdAIAilhpfkN+g14yEOjt4/ BnPQ==
Date: Thu, 20 Oct 2016 14:54:30 +0300
X-Spam-Level:
To: bug-Apache-Tika [...] rt.cpan.org
X-GM-Message-State: AA6/9RlC6zKYSMn5Fw4sl0AERjUEjyfzyVuRkIBbwz2YqOe9xBLrU1K5OQnYtL+S7flnyVWc2d7Ga4iwBy4hqQ==
From: Yahav Amsalem <yahavamsi [...] gmail.com>
RT-Message-ID: <rt-4.0.18-9944-1476964482-407.118433-0-0 [...] rt.cpan.org>
Content-Length: 0
content-type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-RT-Original-Encoding: utf-8
Content-Length: 2012
Download (untitled) / with headers
text/plain 1.9k
Hi, I noticed that after I sent the mail but thought it might be of help to you too. Anyway, thanks for the kind reply, Yahav Amsalem בתאריך 20 באוק׳ 2016 2:00 אחה״צ,‏ "Gerard via RT" < bug-Apache-Tika@rt.cpan.org> כתב: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=118433 > > > Hi, > > this bugtracker is for Apache::Tika cpan perl module, you sent the > bugreport to the wrong bugt racker, may be https://issues.apache.org/ > jira/browse/DIGESTER/?selectedTab=com.atlassian.jira.jira-projects-plugin: > summary-panel is the right place. > > Gerard Ribugent > > Sobre Dmc Oct 19 07:35:51 2016, yahavamsi@gmail.com ha escrit:
> > Hi, > > > > I would like to report the next bug description: > > > > When passing more than one algorithm to CommonsDigester constructor and > > then trying to digest a file which is larger than 7.5 MB, results wrong > > hashe calculation for all the algorithms except the first. > > > > The next code will reproduce the bug: > > > > *// The file that was used was a simple plain text file with size > 7.5
> MB*
> > *File file = new File("c:\\testLargeFile.txt");* > > > > *BufferedInputStream bufferedInputStream = new BufferedInputStream(new > > FileInputStream(file));* > > > > *Metadata metadata = new Metadata();* > > > > *CommonsDigester digester = new CommonsDigester(20000000,* > > * CommonsDigester.DigestAlgorithm.MD5,* > > * CommonsDigester.DigestAlgorithm.SHA1,* > > * CommonsDigester.DigestAlgorithm.SHA256);* > > > > *digester.digest(bufferedInputStream, metadata, null);* > > > > *// Will print correct MD5 but wrong SHA1 and wrong SHA256* > > *System.out.println(metadata);* > > > > Initial direction: from a little research it seems that the inner
> buffered
> > stream that is being used doesn't reset to 0 position after the first > > algorithm. > > > > > > If there are any further questions I would be happy to deliver more
> details.
> > > > > > Thanks, > > > > Yahav Amsalem
> > > >
content-type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-RT-Original-Encoding: utf-8
Content-Length: 2955


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.