Skip Menu |
 

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Domain-PublicSuffix CPAN distribution.

Report information
The Basics
Id: 99490
Status: open
Priority: 0/
Queue: Domain-PublicSuffix

People
Owner: Nobody in particular
Requestors: baldwin [...] panix.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: amazonaws.com hosts fail to yield proper domain
Date: Tue, 14 Oct 2014 16:54:53 -0400
To: bug-Domain-PublicSuffix [...] rt.cpan.org
From: "J.D. Baldwin" <baldwin [...] panix.com>
Download (untitled) / with headers
text/plain 2.2k
Hi. I am running your package Domain::PublicSuffix on a Linux host with Perl 5.20.0: $ uname -a Linux hostname 2.6.18-194.11.3.el5 #1 SMP Mon Aug 23 15:51:38 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux amrndhl219 : ~/ti/projects/20141014_track_domains $ perl -v This is perl 5, version 20, subversion 0 (v5.20.0) built for x86_64-linux-thread-multi-ld ... I have a version of effective_tld_names.dat I downloaded today. It contains some special lines for amazonaws.com hosts. In part: // Amazon S3 : https://aws.amazon.com/s3/ // Submitted by Courtney Eckhardt <coec@amazon.com> 2013-03-22 s3.amazonaws.com s3-us-west-2.amazonaws.com s3-us-west-1.amazonaws.com s3-eu-west-1.amazonaws.com s3-ap-southeast-1.amazonaws.com s3-ap-southeast-2.amazonaws.com s3-ap-northeast-1.amazonaws.com s3-sa-east-1.amazonaws.com s3-us-gov-west-1.amazonaws.com s3-fips-us-gov-west-1.amazonaws.com s3-website-us-east-1.amazonaws.com s3-website-us-west-2.amazonaws.com s3-website-us-west-1.amazonaws.com s3-website-eu-west-1.amazonaws.com s3-website-ap-southeast-1.amazonaws.com s3-website-ap-southeast-2.amazonaws.com s3-website-ap-northeast-1.amazonaws.com s3-website-sa-east-1.amazonaws.com s3-website-us-gov-west-1.amazonaws.com I wrote a test program to try to get the root domain of s3.amazonaws.com, which should have returned amazonaws.com, but instead throws an error: $ cat test.pl #!/opt/gnu/bin/perl5.20.0 use strict; use warnings; use Domain::PublicSuffix; my $root_dom = Domain::PublicSuffix->new( { 'data_file' => 'effective_tld_names.dat', 'domain_allow_underscore' => 1 } ) or die "Cannot create Domain::PublicSuffix object: $!"; print $root_dom->get_root_domain( 's3.amazonaws.com' ), "\n"; print "ERROR: " . $root_dom->error(), "\n"; $ ./test.pl ERROR: Domain not valid Although I am mystified as to why s3.amazonaws.com should be considered a "domain," apparently that is the intent of the effective_tld_names.dat file. But it shouldn't be an error to submit it. Seems like a bug to me. I'd appreciate your taking a look at it. Thanks. jd
Download (untitled) / with headers
text/plain 1.2k
On Tue Oct 14 16:55:07 2014, baldwin@panix.com wrote: Show quoted text
> Although I am mystified as to why s3.amazonaws.com should be > considered a "domain," apparently that is the intent of the > effective_tld_names.dat file. But it shouldn't be an error to submit > it. Seems like a bug to me. I'd appreciate your taking a look at it. > Thanks.
This is one of those cases where Domain::PublicSuffix is "working as designed", but the source data is... making some odd choices. The effective_tld_names file is listing all of the potential top level domains. Since someone submitted 's3.amazonaws.com', then 's3.amazonaws.com' is an invalid domain name, as it's expecting an additional subdomain component at the beginning. So, in the case of 'foo.s3.amazonaws.com', the suffix is 's3.amazonaws.com'. It seems silly, but that's what Mozilla apparently allowed. Maybe an 'ignore' parameter is in order. ;) #!/usr/bin/env perl use strict; use warnings; use Test::More; use Domain::PublicSuffix; ok( my $dps = Domain::PublicSuffix->new({ domain_allow_underscore => 1 }) ); is( $dps->get_root_domain('s3.amazonaws.com'), undef, 's3 invalid' ); is( $dps->get_root_domain('foo.s3.amazonaws.com'), 'foo.s3.amazonaws.com', 'foo.s3 valid' ); is( $dps->suffix(), 's3.amazonaws.com', 'foo.s3 suffix is s3' ); done_testing(); 1;
Subject: Re: [rt.cpan.org #99490] amazonaws.com hosts fail to yield proper domain
Date: Tue, 14 Oct 2014 22:06:03 -0400
To: Nicholas Melnick via RT <bug-Domain-PublicSuffix [...] rt.cpan.org>
From: "J.D. Baldwin" <baldwin [...] panix.com>
Download (untitled) / with headers
text/plain 1.5k
Nice. Thanks for the help. I figured it was something odd like that. jd On Tue, Oct 14, 2014 at 09:30:44PM -0400, Nicholas Melnick via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=99490 > > > On Tue Oct 14 16:55:07 2014, baldwin@panix.com wrote: >
> > Although I am mystified as to why s3.amazonaws.com should be > > considered a "domain," apparently that is the intent of the > > effective_tld_names.dat file. But it shouldn't be an error to submit > > it. Seems like a bug to me. I'd appreciate your taking a look at it. > > Thanks.
> > This is one of those cases where Domain::PublicSuffix is "working as designed", but the source data is... making some odd choices. The effective_tld_names file is listing all of the potential top level domains. Since someone submitted 's3.amazonaws.com', then 's3.amazonaws.com' is an invalid domain name, as it's expecting an additional subdomain component at the beginning. So, in the case of 'foo.s3.amazonaws.com', the suffix is 's3.amazonaws.com'. It seems silly, but that's what Mozilla apparently allowed. > > Maybe an 'ignore' parameter is in order. ;) > > #!/usr/bin/env perl > use strict; > use warnings; > use Test::More; > use Domain::PublicSuffix; > > ok( my $dps = Domain::PublicSuffix->new({ domain_allow_underscore => 1 }) ); > > is( $dps->get_root_domain('s3.amazonaws.com'), undef, 's3 invalid' ); > is( $dps->get_root_domain('foo.s3.amazonaws.com'), 'foo.s3.amazonaws.com', 'foo.s3 valid' ); > is( $dps->suffix(), 's3.amazonaws.com', 'foo.s3 suffix is s3' ); > > done_testing(); > > 1;
Download (untitled) / with headers
text/plain 262b
On Tue Oct 14 22:06:15 2014, baldwin@panix.com wrote: Show quoted text
> > Nice. Thanks for the help. I figured it was something odd like that.
Just to clarify -- does this all sound ok, or is there some room here for improvement? (Or, in other words, can I close this out?)
Subject: Re: [rt.cpan.org #99490] amazonaws.com hosts fail to yield proper domain
Date: Wed, 22 Oct 2014 18:55:38 -0400
To: Nicholas Melnick via RT <bug-Domain-PublicSuffix [...] rt.cpan.org>
From: "J.D. Baldwin" <baldwin [...] panix.com>
Download (untitled) / with headers
text/plain 806b
Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=99490 > > > On Tue Oct 14 22:06:15 2014, baldwin@panix.com wrote:
> > > > Nice. Thanks for the help. I figured it was something odd like that.
> > Just to clarify -- does this all sound ok, or is there some room here for improvement?
The "room for improvement" might be some kind of design element that permits a module user to specify "domains to ignore." Maybe with string *or* regex matching, e.g., my $suffix = Domain::PublicSuffix->new( { 'data_file' => '/tmp/effective_tld_names.dat', 'ignore' => [ '/amazonaws\.com$/', '/otherodddomain\.com$/', '/yougettheidea\.com$/, /\.thisonerequiresanotherlevel\.com$' ] } ); But that's maybe more of an overhaul than you are interested in doing. jd


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.