Skip Menu |
 

This queue is for tickets about the WordNet-Similarity CPAN distribution.

Report information
The Basics
Id: 86444
Status: open
Priority: 0/
Queue: WordNet-Similarity

People
Owner: Nobody in particular
Requestors: TPEDERSE [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: possible bug in hso, strong matching of compounds
MIME-Version: 1.0
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
Message-ID: <rt-4.0.13-26682-1372259335-1646.0-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 1646
Download (untitled) / with headers
text/plain 1.6k
This was reported by Hideki Shima of CMU. ----------------------------------------------------- (4) HSO: strong match with compound words ----------------------------------------------------- According to the definition from the paper by Hirst and St-Onge, "any link between two synsets if one word is a compound word or phrase that includes the other word" is a "strong relation" (score of 16). For example, two synsets 01124794 (n) and 01125562 (n) have a hypernym/hyponym link between them, and words associated with these synsets are compound (government <--> misgovernment). So following the definition, I think there is a "strong relation" between the two synsets. Now, using word-pos-sensenumber notation, the synset 01124794 (n) can be represented as government#n#2 etc, and the other synset 01125562 (n) can be represented in two ways: "misgovernment#n#1" and "misrule#n#1" (using WordNet 3.0). WordNet::Similarity gives different results for different wps of same synset: The relatedness of government#n#2 and misgovernment#n#1 using hso is 16. The relatedness of government#n#2 and misrule#n#1 using hso is 4. I was wondering if the line 329 in hso.pm: if($word1 =~ /$word2/ || $word2 =~ /$word1/) { should ideally be a comparison between all words associated with the synsets, rather than the words from wps notation. Below are some more examples. protocol#n#1 tcp/ip#n#1(=transmission_control_protocol/internet_protocol#n#1) company#n#1 ltd.#n#1(=limited_company#n#1) cell_phone#v#1 call#v#3(=phone#v#1) This phenomenon is also very rare and has not been observed in 10k randomly generated noun-noun pairs of synsets.
MIME-Version: 1.0
In-Reply-To: <rt-4.0.13-26682-1372259335-1646.0-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <rt-4.0.13-26682-1372259335-1646.0-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-23174-1443966780-302.86444-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 91
problem has been documented in TODO list of WordNet-Similarity 2.07 patches are welcome :)


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.