Discussion:
[Samtools-help] Limit number of bam files Samtools1.3
h***@ecodev.vic.gov.au
2016-11-28 00:16:55 UTC
Permalink
Dear All,

I have a large set of bovine samples (~2700 bam files, on average about
10x coverage) that I want to run mpileup on. I am getting an error that
seems to indicate that there is a hard coded limit of bam files in
Samtools.

I am running Samtools 1.3 using the following command:
module load samtools/1.3.1
module load bcftools/1.3.1

samtools mpileup -r Chr2:10000-20000 -ugf reference.fa -b bams.txt -t
AD,INFO/AD,ADF,ADR,SP | bcftools call -mv -> 1000Sam1.3.test.vcf

I get this error:
[mpileup] fail to load index for XXX.bam
Failed to open -: unknown file type

I have gone and regenerated the XXX.bam.bai file (with Samtools1.3.1), and
rerun and I still get the same error. I have then removed the first bam
file in "bams.txt" and then the error moves to the next bam file in the
list after XXX.bam, lets call it YYY.bam.
[mpileup] fail to load index for YYY.bam
Failed to open -: unknown file type

This seems to indicate that XXX.bam.bai is OK, but that I hit another
cryptic error in Samtools. Possibly a limit of number of bams?

XXX.bam is the 1995th file, and YYY.bam is the 1996th. When YYY.bam
becomes the 1995th (because I removed an earlier bam file in the list)
file the error passes XXX.bam and prints at YYY.bam.

Any help would be greatly appreciated.

Cheers,

Hans

PS: there is a post on limit number of bams on the list already from 2012,
it is related to the file name size, I have checked that that is not an
issue in my case.

-----------------------------------------
Dr. Hans Daetwyler | Research Leader Computational Biology
Biosciences Research | Agriculture Victoria | DEDJTR
Senior Research Fellow | Applied Systems Biology | La Trobe University
AgriBio Centre, 5 Ring Rd., Bundoora 3083, Victoria
T: 03 9032 7037 | E: ***@ecodev.vic.gov.au

********************************************************************************
Department of Economic Development, Jobs, Transport and Resources, Government of
Victoria, Victoria, Australia.

This email, and any attachments, may contain privileged and confidential
information. If you are not the intended recipient, you may not distribute or
reproduce this e-mail or the attachments. If you have received this message in
error, please notify us by return email.
********************************************************************************
James Bonfield
2016-11-30 17:51:33 UTC
Permalink
Post by h***@ecodev.vic.gov.au
XXX.bam is the 1995th file, and YYY.bam is the 1996th. When YYY.bam
becomes the 1995th (because I removed an earlier bam file in the list)
file the error passes XXX.bam and prints at YYY.bam.
Try the linux "ulimit -a" command to see if this is an OS imposed
limitation.

Hopefully "ulimit -n" will permit you to adjust the number of open
files.

If you're unable to raise it, about the only course of action left is
to merge groups of, say, 100 files into intermediaries first. (I'm not
sure whether this has any impact on mpileup & bcftools results though.)

James
--
James Bonfield (***@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova
| Plurima gyrabant gymbolitare vabo;
A Staden Package developer: | Et Borogovorum mimzebant undique formae,
https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi.
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

------------------------------------------------------------------------------
h***@ecodev.vic.gov.au
2016-11-30 21:16:34 UTC
Permalink
Dear James and Martin,

Thank you for the pointers. GATK UG gave me a clear "Too many open
files" error on the same set of files. Once I increased the limit using
"ulimit -n" it all worked fine.

I can confirm that "too many open files" was the issue.

Cheers,

Hans

-----------------------------------------
Dr. Hans Daetwyler | Research Leader Computational Biology
Biosciences Research | Agriculture Victoria | DEDJTR
Senior Research Fellow | Applied Systems Biology | La Trobe University
AgriBio Centre, 5 Ring Rd., Bundoora 3083, Victoria
T: 03 9032 7037 | E: ***@ecodev.vic.gov.au





From: James Bonfield <***@sanger.ac.uk>
To: ***@ecodev.vic.gov.au,
Cc: samtools-***@lists.sourceforge.net
Date: 01/12/2016 04:51 AM
Subject: Re: [Samtools-help] Limit number of bam files Samtools1.3
Post by h***@ecodev.vic.gov.au
XXX.bam is the 1995th file, and YYY.bam is the 1996th. When YYY.bam
becomes the 1995th (because I removed an earlier bam file in the list)
file the error passes XXX.bam and prints at YYY.bam.
Try the linux "ulimit -a" command to see if this is an OS imposed
limitation.

Hopefully "ulimit -n" will permit you to adjust the number of open
files.

If you're unable to raise it, about the only course of action left is
to merge groups of, say, 100 files into intermediaries first. (I'm not
sure whether this has any impact on mpileup & bcftools results though.)

James
--
James Bonfield (***@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia
Tova
| Plurima gyrabant gymbolitare vabo;
A Staden Package developer: | Et Borogovorum mimzebant undique
formae,
https://urldefense.proofpoint.com/v2/url?u=https-3A__sf.net_projects_staden_&d=DgIBAg&c=JnBkUqWXzx2bz-3a05d47Q&r=w6yf14nyxUbiCn2GwWVtJrXHKyEzgK03GZUj0gEfAIs&m=OJrJK-DFZE_fy8G1q5VHCeSXZeReWlwF5ze4wZ0dPS0&s=SYbphhCQuhjEOXegfKYuql47Zx1VSuEZmcmsKxZJvB8&e=
| Momiferique omnes exgrabure Rathi.
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

********************************************************************************
Department of Economic Development, Jobs, Transport and Resources, Government of
Victoria, Victoria, Australia.

This email, and any attachments, may contain privileged and confidential
information. If you are not the intended recipient, you may not distribute or
reproduce this e-mail or the attachments. If you have received this message in
error, please notify us by return email.
********************************************************************************
Martin MOKREJŠ
2016-11-30 19:46:35 UTC
Permalink
Hi Hans,
Post by h***@ecodev.vic.gov.au
Dear All,
I have a large set of bovine samples (~2700 bam files, on average about 10x coverage) that I want to run mpileup on. I am getting an error that seems to indicate that there is a hard coded limit of bam files in Samtools.
module load samtools/1.3.1
module load bcftools/1.3.1
samtools mpileup -r Chr2:10000-20000 -ugf reference.fa -b bams.txt -t AD,INFO/AD,ADF,ADR,SP | bcftools call -mv -> 1000Sam1.3.test.vcf
[mpileup] fail to load index for XXX.bam
Failed to open -: unknown file type
Sounds like a commandline argument parsing problem.
Post by h***@ecodev.vic.gov.au
I have gone and regenerated the XXX.bam.bai file (with Samtools1.3.1), and rerun and I still get the same error. I have then removed the first bam file in "bams.txt" and then the error moves to the next bam file in the list after XXX.bam, lets call it YYY.bam.
[mpileup] fail to load index for YYY.bam
Failed to open -: unknown file type
This seems to indicate that XXX.bam.bai is OK, but that I hit another cryptic error in Samtools. Possibly a limit of number of bams?
XXX.bam is the 1995th file, and YYY.bam is the 1996th. When YYY.bam becomes the 1995th (because I removed an earlier bam file in the list) file the error passes XXX.bam and prints at YYY.bam.
Provided James suggested to change limits using ulimit command (beware bash has a builtin ulimit function so /usr/bin/ulimit isn't called at all) ...I don't think this sounds like a limit of file descriptors per process IMHO.

Instead, I suggest splitting the command into:

samtools mpileup -r Chr2:10000-20000 -ugf reference.fa -b bams.txt -t AD,INFO/AD,ADF,ADR,SP > /tmp/1000Sam1.3.test.pileup

and in addition, observing the process through strace command:

strace -v -f -a 256 samtools mpileup -r Chr2:10000-20000 -ugf reference.fa -b bams.txt -t AD,INFO/AD,ADF,ADR,SP > /tmp/1000Sam1.3.test.pileup 2>/tmp/1000Sam1.3.test.stderr

I assume the bcftools part has nothing to do with the error.

I think the debug output goes to STDERR.

Hope this helps,
Martin
--
Martin Mokrejs, Ph.D.
Adapter/artefact removal from datasets based on the following technologies:
454 / IonTorrent / Evrogen MINT / Clontech SMART / ..., Illumina
http://www.bioinformatics.cz/software/supported-protocols/

------------------------------------------------------------------------------
Loading...