Michael James Clark
2010-02-10 22:48:03 UTC
Hi,
I¹m working with Picard MarkDuplicates with a whole genome dataset for the
first time and it has been running for a very long time (almost 24 hours)
and still is not finished.
One of my files is 240Gb in size, and currently the program is outputting
the results. In this case, I set REMOVE_DUPLICATES=TRUE and
VALIDATION_STRINGENCY=SILENT and my file is pre-sorted.
I¹m more curious what the program is doing that is taking so long. Is there
a setting I should be using that might make it go faster?
Is there an overview of what the MarkDuplicates program does somewhere that
I could look at?
Thanks for your help,
I¹m working with Picard MarkDuplicates with a whole genome dataset for the
first time and it has been running for a very long time (almost 24 hours)
and still is not finished.
One of my files is 240Gb in size, and currently the program is outputting
the results. In this case, I set REMOVE_DUPLICATES=TRUE and
VALIDATION_STRINGENCY=SILENT and my file is pre-sorted.
I¹m more curious what the program is doing that is taking so long. Is there
a setting I should be using that might make it go faster?
Is there an overview of what the MarkDuplicates program does somewhere that
I could look at?
Thanks for your help,
--
Michael James Clark
Graduate Student
Dept. of Human Genetics, UCLA
Laboratory of Stanley Nelson
Gonda Bldg, Rm. 5554
Lab Ph. #: (310)825-7920
Cell Ph. #: (310)415-5207
Email: ***@ucla.edu
Michael James Clark
Graduate Student
Dept. of Human Genetics, UCLA
Laboratory of Stanley Nelson
Gonda Bldg, Rm. 5554
Lab Ph. #: (310)825-7920
Cell Ph. #: (310)415-5207
Email: ***@ucla.edu