Skip Menu |
 

This queue is for tickets about the Bio-SamTools CPAN distribution.

Report information
The Basics
Id: 66387
Status: new
Priority: 0/
Queue: Bio-SamTools

People
Owner: Nobody in particular
Requestors: Yi-Shiou.Chen [...] accelrys.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Need to process CIGAR N in dna function
Date: Fri, 4 Mar 2011 15:28:53 -0800
To: "bug-Bio-SamTools [...] rt.cpan.org" <bug-Bio-SamTools [...] rt.cpan.org>
From: Yi-Shiou Chen <Yi-Shiou.Chen [...] accelrys.com>
Download (untitled) / with headers
text/plain 3.3k
According to samtools's calmd function, the MD tag of BAM doesn't carry base information for skipping segments (i.e. N in CIGAR). For example, a read with 40M100N35N in CIGAR can has 75 in MD, without any information about the 100N. The dna function of Bio::DB::Bam::AlignWrapper doesn't consider this case and returns wrong reference sequences for reads with N in CIGAR. Propose to fix the function as follows. Thanks, Yi-Shiou Chen Accelrys sub dna { my $self = shift; my $sam = $self->{sam}; if (my $md = $self->get_tag_values('MD')) { # try to use MD string my $qseq = $self->qseq; #preprocess qseq using cigar array my $cigar = $self->cigar_array; my $seq = ''; my @Npos = (); my @Ncount = (); for my $op (@$cigar) { my ($operation,$count) = @$op; if ($operation eq 'M') { $seq .= substr($qseq,0,$count,''); # include these residues } elsif ($operation eq 'N') { # record "N" positions push @Npos, length($seq); # 0-based push @Ncount, $count; my $ns = $self->start + length($seq); # insert bases in "N" segment my $nseq; if(defined $self->{sam}) { $nseq = $self->{sam}->seq($self->seq_id, $ns, $ns + $count - 1); } else{ $nseq = 'N' x $count; } $seq .= $nseq; } elsif ($operation eq 'S' or $operation eq 'I') { substr($qseq,0,$count,''); # skip soft clipped and inserted residues } } my $start = 0; my $result; while ($md =~ /(\d+)|\^([gatcn]+)|([gatcn]+)/ig) { if (defined $1) { my $c = $1; while($c > 0) { if(@Npos == 0) { $result .= substr($seq,$start,$c); $start += $c; last; } # may cross N segments, need to copy piece by piece if(($start + $c) > $Npos[0]) { my $toAdd = $Npos[0] - $start; $result .= substr($seq, $start, $toAdd).substr($seq, $Npos[0], $Ncount[0]); $start += $toAdd + $Ncount[0]; $c -= $toAdd; shift @Npos; shift @Ncount; } else { $result .= substr($seq, $start, $c); $start += $c; last; } } } elsif ($2) { $result .= $2; } elsif ($3) { $result .= $3; $start += length $3; # insert N segments while(@Npos > 0 and $start >= $Npos[0]) { my $offset = $start - $Npos[0]; $result = substr($result, 0, length($result) - $offset).substr($seq, $Npos[0], $Ncount[0]).substr($result, length($result) - $offset); $start += $Ncount[0]; shift @Npos; shift @Ncount; } } } return $result; } else { return $self->{sam}->seq($self->seq_id,$self->start,$self->end); } }
Download (untitled) / with headers
text/html 18.1k

Message body is not shown because it is too large.



This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.