Liyang Diao
2017-03-24 03:24:33 UTC
Dear all,
I have a large number of bam files, where the number of reference "genomes"
is very large, about 1M (bacterial marker gene alignments). A small
fraction of these genomes is poorly named, resulting in the following error
when I run mpileup:
Could not parse the header line: ##contig=<ID=BADNAMES>"
Since this was a small fraction of the references and I am only interested
in a preliminary exploratory analysis, I went ahead and looked into the
VCFs that were generated, assuming (wrongly?) that alignments to these
areas would simply be ignored.
What I found, however, was that the variants called are incorrect--for
example, I have high-confidence SNPs found in regions of zero coverage.
So I am wondering if there is an easy workaround to this problem, or if I
will have to perform realignments of the data, removing or renaming the
culprit references.
I found that, for some reason, using the -r POSITION flag in mpileup
appears to give reasonable results, but that -l produces bad results as
before, but through searching this help archive I found that -r cannot
accept multiple positions in file format.
Any help would be greatly appreciated!
Thanks
I have a large number of bam files, where the number of reference "genomes"
is very large, about 1M (bacterial marker gene alignments). A small
fraction of these genomes is poorly named, resulting in the following error
when I run mpileup:
Could not parse the header line: ##contig=<ID=BADNAMES>"
Since this was a small fraction of the references and I am only interested
in a preliminary exploratory analysis, I went ahead and looked into the
VCFs that were generated, assuming (wrongly?) that alignments to these
areas would simply be ignored.
What I found, however, was that the variants called are incorrect--for
example, I have high-confidence SNPs found in regions of zero coverage.
So I am wondering if there is an easy workaround to this problem, or if I
will have to perform realignments of the data, removing or renaming the
culprit references.
I found that, for some reason, using the -r POSITION flag in mpileup
appears to give reasonable results, but that -l produces bad results as
before, but through searching this help archive I found that -r cannot
accept multiple positions in file format.
Any help would be greatly appreciated!
Thanks