Skip Menu |
 

This queue is for tickets about the XML-Checker CPAN distribution.

Report information
The Basics
Id: 1239
Status: open
Priority: 0/
Queue: XML-Checker

People
Owner: Nobody in particular
Requestors: millsb [...] logica.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.12
Fixed in: (no value)

Attachments


Subject: Checking fails for an entity with 62 or more members
Download (untitled) / with headers
text/plain 1.1k
Distribition: XML-Checker-0.12 Perl version: 5.005_03 Operating system: Linux 2.2.14-5.0 (RedHat) Bug details: If a DTD has an element with 62 or more members and one or more of these members is optional or is permitted to occur multiple times, validation of this element always fails with the error message 154 "Bad order of Elements". This appears to be related to the way in which the element is tokenised (sub _tokenize): if there's 62 or more members the tokens are hex strings (for less than 62, a single character from $IDS is used). The regular expression generated out of these tokens (sub setModel) doesn't appear to be correctly formed for tokens that are more than just a single character. Eg if I have an element with 62 optional members: <!ELEMENT FRED (MEMBER1?, MEMBER2?, MEMBER3?, ... MEMBER62?) > Then the regular expression generated looks like: (01?02?03?04?...3d?3e?) Whereas it should, I think, be: ((01)?(02)?(03)?(04)?...(3d)?(3e)?) The attached patch file seems to fix this problem for me, though I'm not confident that I really understand the code well enough to be certain it's right. Regards Brian Mills.
Download C:\tmp\Checker.diff
text/x-diff 1011b
--- Checker.pm 2002-04-23 00:36:42.000000000 +0100 +++ Checker.pm.1 2002-07-05 14:06:58.000000000 +0100 @@ -10,6 +10,14 @@ # - Implied handler? # - Notation, Entity, Unparsed checks, Default handler? # - check no root element (it's checked by expat) ? +# +# ***** +# Patched by B Mills, 05/07/2002: +# sub setModel: Generation of regexp goes wrong if an element has more than 62 members and any +# of these has cardinality other than 1: +# Parentheses are required around each re token, because the tokens are encoded +# as character pairs if there's 62 or more of them. +# ***** package XML::Checker::Term; use strict; @@ -441,7 +449,7 @@ # cp := ( name | choice | seq ) ('?' | '*' | '+')? $n++ while s/<[ncs](\d+)>([?*+]?)/_add (C => 'a', N => $_n++, - S => ($_map{$1}->re . $2))/eg; + S => ('('. $_map{$1}->re .')'. $2))/eg; # choice := '(' ch_l ')' $n++ while s/\(\s*<[ad](\d+)>\s*\)/_add
Download (untitled) / with headers
text/plain 432b
I applied the patch, but got an error with t/chk_batch.t: t/chk_batch.........FAILED tests 2, 16, 30, 44 Failed 4/56 tests, 92.86% okay This test worked before applied the patch - so either the patch broke something or the test is flawed. If you can resubmit a corrected patch, or a patch for the test I will apply this. Also you may want to look into XML::LibXML - it has much better support for DTD validation. Thanks


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.