Discussion:
[Samtools-help] EOF marker is absent. The input is probably truncated
王森
2011-11-17 04:00:17 UTC
Permalink
hi,
What does this error mean with respect to the completion of my samtools command(samtools sort )?
why?
[bam_header_read] EOF marker is absent. The input is probably truncated.
The commmand:
samtools view -buS -T all.fa -o s_5_1.bam s_5_1.sam > log
samtools sort -n s_5_1.bam s_5_1.sort > sort.log
Thank you!
Ivan Gregoretti
2011-11-17 14:43:37 UTC
Permalink
Hello 王森,

That means that the bam file has a problem. It is truncated, meaning incomplete.

Very big files do get corrupted occasionally. You need to create the
bam file again.

I hope this helps.

Ivan


Ivan Gregoretti, PhD
Post by 王森
hi,
What does this error mean with respect to the completion of my samtools
command(samtools sort )?
why?
[bam_header_read] EOF marker is absent. The input is probably truncated.
samtools view -buS -T all.fa -o s_5_1.bam s_5_1.sam > log
samtools sort -n s_5_1.bam s_5_1.sort > sort.log
Thank you!
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Samtools-help mailing list
https://lists.sourceforge.net/lists/listinfo/samtools-help
Sendu Bala
2011-11-17 15:16:33 UTC
Permalink
Actually, in this case the warning should be ignored. Using samtools view -u gives uncompressed bam, and these do not have the EOF marker.
Hello 王森,
That means that the bam file has a problem. It is truncated, meaning incomplete.
Very big files do get corrupted occasionally. You need to create the
bam file again.
I hope this helps.
Ivan
Post by 王森
hi,
What does this error mean with respect to the completion of my samtools
command(samtools sort )?
why?
[bam_header_read] EOF marker is absent. The input is probably truncated.
samtools view -buS -T all.fa -o s_5_1.bam s_5_1.sam > log
samtools sort -n s_5_1.bam s_5_1.sort > sort.log
Thank you!
Peter Cock
2011-11-17 15:18:38 UTC
Permalink
Post by Sendu Bala
Actually, in this case the warning should be ignored. Using samtools
view -u gives uncompressed bam, and these do not have the EOF marker.
Why not? That sounds like a bug in samtools -u to me.

Peter
Alec Wysoker
2011-11-17 15:29:08 UTC
Permalink
Hi Peter,

The EOF marker is a feature of BGZF compression. It is a gzip block
that is empty. It can't be represented if not compressing. It has
always been optional.

-Alec
Post by Peter Cock
Post by Sendu Bala
Actually, in this case the warning should be ignored. Using samtools
view -u gives uncompressed bam, and these do not have the EOF marker.
Why not? That sounds like a bug in samtools -u to me.
Peter
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Samtools-help mailing list
https://lists.sourceforge.net/lists/listinfo/samtools-help
Peter Cock
2011-11-17 15:35:38 UTC
Permalink
Post by Alec Wysoker
Hi Peter,
The EOF marker is a feature of BGZF compression.  It is a gzip block that is
empty.  It can't be represented if not compressing.  It has always been
optional.
-Alec
Yes, the EOF marker is a 28 byes empty GZIP block (which should
be in the SAM/BAM specification but isn't), but what stops you writing
it to uncompressed BAM file after all the contain containing but
uncompressed GZIP blocks? I'll have a look at the samtools code...

It isn't really optional if key tools complain loudly when it is missing.

Peter
Alec Wysoker
2011-11-17 15:43:09 UTC
Permalink
Sorry, perhaps I misunderstood. I interpretted "not compressed" to mean
not in BGZF format, rather than BGZF format with compression level 0.
Post by Peter Cock
Post by Alec Wysoker
Hi Peter,
The EOF marker is a feature of BGZF compression. It is a gzip block that is
empty. It can't be represented if not compressing. It has always been
optional.
-Alec
Yes, the EOF marker is a 28 byes empty GZIP block (which should
be in the SAM/BAM specification but isn't), but what stops you writing
it to uncompressed BAM file after all the contain containing but
uncompressed GZIP blocks? I'll have a look at the samtools code...
It isn't really optional if key tools complain loudly when it is missing.
Peter
Peter Cock
2011-11-17 15:51:40 UTC
Permalink
Sorry, perhaps I misunderstood.  I interpretted "not compressed" to mean not
in BGZF format, rather than BGZF format with compression level 0.
Yeah, I made the same mistake recently too. In samtools -u means
produce BAM files with BGZF using GZIP compression level zero.
It does not mean produce "naked" BAM files with no BGZF/GZIP
compression at all.

Anyway, I think I've worked out where the EOF marker is (or in
this case isn't) being recorded, function bgzf_close in bgzf.c
called via bam_close via samclose in sam.c

Peter
Sendu Bala
2011-11-17 15:51:40 UTC
Permalink
Post by Peter Cock
Post by Alec Wysoker
Hi Peter,
The EOF marker is a feature of BGZF compression. It is a gzip block that is
empty. It can't be represented if not compressing. It has always been
optional.
-Alec
Yes, the EOF marker is a 28 byes empty GZIP block (which should
be in the SAM/BAM specification but isn't), but what stops you writing
it to uncompressed BAM file after all the contain containing but
uncompressed GZIP blocks? I'll have a look at the samtools code...
It isn't really optional if key tools complain loudly when it is missing.
It is optional; as noted, it's not part of the spec. Heng was kind enough to add it for me to samtools so I could easily be aware of obvious truncation, but it's only a little 1 line warning that can indeed be ignored. It's not a "loud complaint" and it doesn't stop anything from working.

That said, if it's trivial to add an eof marker to uncompressed bam, there probably won't be any complaints.
Peter Cock
2011-11-17 16:14:46 UTC
Permalink
Post by Sendu Bala
Actually, in this case the warning should be ignored. Using samtools
view -u gives uncompressed bam, and these do not have the EOF marker.
Are you sure about that? Which version of samtools were you using?

I've just tried with the current code in Heng Li's github repository, and
the samtools SVN, and it seems to be producing uncompressed BAM
with the 28 byte empty BGZF block as an EOF marker.

Peter
Peter Cock
2011-11-17 16:31:44 UTC
Permalink
Post by Peter Cock
Post by Sendu Bala
Actually, in this case the warning should be ignored. Using samtools
view -u gives uncompressed bam, and these do not have the EOF marker.
Are you sure about that? Which version of samtools were you using?
I've just tried with the current code in Heng Li's github repository, and
the samtools SVN, and it seems to be producing uncompressed BAM
with the 28 byte empty BGZF block as an EOF marker.
Sorry, not looking closely enough at the hexdump. Currently there does
seem to be an empty BGZF block, but because it is using a gzip
compression level of zero it doesn't seem to match the 28 bytes
expected as the EOF marker, rather it looks like different block,

$ ~/repositories/samtools-git/samtools view -u ex1_header.bam |
hexdump -C | tail
0006f8b0 31 31 34 5f 32 36 3a 37 3a 33 37 3a 37 39 3a 35 |114_26:7:37:79:5|
0006f8c0 38 31 00 30 02 00 00 88 88 88 88 88 88 88 88 88 |81.0............|
0006f8d0 88 88 82 18 42 21 41 11 10 12 0b 0b 0b 1c 1c 1c |....B!A.........|
0006f8e0 15 1c 1c 1c 1b 1c 1c 1c 1b 1a 1c 1c 1c 1c 1c 0c |................|
0006f8f0 1c 1c 1c 1c 1c 1c 1c 1c 1c 1c 1c 1c 4d 46 43 12 |............MFC.|
0006f900 41 71 43 1b 4e 4d 43 02 55 51 43 17 48 30 43 00 |AqC.NMC.UQC.H0C.|
0006f910 48 31 43 01 01 00 00 ff ff c7 03 eb 17 6e f9 00 |H1C..........n..|
0006f920 00 1f 8b 08 04 00 00 00 00 00 ff 06 00 42 43 02 |.............BC.|
0006f930 00 1e 00 01 00 00 ff ff 00 00 00 00 00 00 00 00 |................|
0006f940

We're getting this (31 bytes):

1f 8b 08 04 00 00 00 00
00 ff 06 00 42 43 02 00
1e 00 01 00 00 ff ff 00
00 00 00 00 00 00 00


We want this (28 bytes):

1f 8b 08 04 00 00 00 00
00 ff 06 00 42 43 02 00
1b 00 03 00 00 00 00
00 00 00 00 00

Or, "\x1f\x8b\x08\x04\x00\x00\x00\x00\x00\xff\x06\x00BC\x02\x00\x1b\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00"
or, "\037\213\010\4\0\0\0\0\0\377\6\0\102\103\2\0\033\0\3\0\0\0\0\0\0\0\0\0"

With my patch,

$ ~/repositories/samtools-git/samtools view -u ex1_header.bam |
hexdump -C | tail
[bgzf_close] compression level: 0
[bgzf_close] Forcing empty EOF block
0006f8b0 31 31 34 5f 32 36 3a 37 3a 33 37 3a 37 39 3a 35 |114_26:7:37:79:5|
0006f8c0 38 31 00 30 02 00 00 88 88 88 88 88 88 88 88 88 |81.0............|
0006f8d0 88 88 82 18 42 21 41 11 10 12 0b 0b 0b 1c 1c 1c |....B!A.........|
0006f8e0 15 1c 1c 1c 1b 1c 1c 1c 1b 1a 1c 1c 1c 1c 1c 0c |................|
0006f8f0 1c 1c 1c 1c 1c 1c 1c 1c 1c 1c 1c 1c 4d 46 43 12 |............MFC.|
0006f900 41 71 43 1b 4e 4d 43 02 55 51 43 17 48 30 43 00 |AqC.NMC.UQC.H0C.|
0006f910 48 31 43 01 01 00 00 ff ff c7 03 eb 17 6e f9 00 |H1C..........n..|
0006f920 00 1f 8b 08 04 00 00 00 00 00 ff 06 00 42 43 02 |.............BC.|
0006f930 00 1b 00 03 00 00 00 00 00 00 00 00 00 |.............|
0006f93d

I'll remove the debugging to stderr, and submit a github pull request.

Peter
Peter Cock
2011-11-17 16:46:19 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
Post by Sendu Bala
Actually, in this case the warning should be ignored. Using samtools
view -u gives uncompressed bam, and these do not have the EOF marker.
Are you sure about that? Which version of samtools were you using?
I've just tried with the current code in Heng Li's github repository, and
the samtools SVN, and it seems to be producing uncompressed BAM
with the 28 byte empty BGZF block as an EOF marker.
Sorry, not looking closely enough at the hexdump. Currently there does
seem to be an empty BGZF block, but because it is using a gzip
compression level of zero it doesn't seem to match the 28 bytes
expected as the EOF marker, rather it looks like different block,
$ ~/repositories/samtools-git/samtools view -u ex1_header.bam |
hexdump -C | tail
0006f8b0  31 31 34 5f 32 36 3a 37  3a 33 37 3a 37 39 3a 35  |114_26:7:37:79:5|
0006f8c0  38 31 00 30 02 00 00 88  88 88 88 88 88 88 88 88  |81.0............|
0006f8d0  88 88 82 18 42 21 41 11  10 12 0b 0b 0b 1c 1c 1c  |....B!A.........|
0006f8e0  15 1c 1c 1c 1b 1c 1c 1c  1b 1a 1c 1c 1c 1c 1c 0c  |................|
0006f8f0  1c 1c 1c 1c 1c 1c 1c 1c  1c 1c 1c 1c 4d 46 43 12  |............MFC.|
0006f900  41 71 43 1b 4e 4d 43 02  55 51 43 17 48 30 43 00  |AqC.NMC.UQC.H0C.|
0006f910  48 31 43 01 01 00 00 ff  ff c7 03 eb 17 6e f9 00  |H1C..........n..|
0006f920  00 1f 8b 08 04 00 00 00  00 00 ff 06 00 42 43 02  |.............BC.|
0006f930  00 1e 00 01 00 00 ff ff  00 00 00 00 00 00 00 00  |................|
0006f940
1f 8b 08 04 00 00 00 00
00 ff 06 00 42 43 02 00
1e 00 01 00 00 ff ff  00
00 00 00 00 00 00 00
1f 8b 08 04 00 00 00 00
00 ff 06 00 42 43 02 00
1b 00 03 00 00 00 00
00 00 00 00 00
Or, "\x1f\x8b\x08\x04\x00\x00\x00\x00\x00\xff\x06\x00BC\x02\x00\x1b\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00"
or, "\037\213\010\4\0\0\0\0\0\377\6\0\102\103\2\0\033\0\3\0\0\0\0\0\0\0\0\0"
With my patch,
$ ~/repositories/samtools-git/samtools view -u ex1_header.bam |
hexdump -C | tail
[bgzf_close] compression level: 0
[bgzf_close] Forcing empty EOF block
0006f8b0  31 31 34 5f 32 36 3a 37  3a 33 37 3a 37 39 3a 35  |114_26:7:37:79:5|
0006f8c0  38 31 00 30 02 00 00 88  88 88 88 88 88 88 88 88  |81.0............|
0006f8d0  88 88 82 18 42 21 41 11  10 12 0b 0b 0b 1c 1c 1c  |....B!A.........|
0006f8e0  15 1c 1c 1c 1b 1c 1c 1c  1b 1a 1c 1c 1c 1c 1c 0c  |................|
0006f8f0  1c 1c 1c 1c 1c 1c 1c 1c  1c 1c 1c 1c 4d 46 43 12  |............MFC.|
0006f900  41 71 43 1b 4e 4d 43 02  55 51 43 17 48 30 43 00  |AqC.NMC.UQC.H0C.|
0006f910  48 31 43 01 01 00 00 ff  ff c7 03 eb 17 6e f9 00  |H1C..........n..|
0006f920  00 1f 8b 08 04 00 00 00  00 00 ff 06 00 42 43 02  |.............BC.|
0006f930  00 1b 00 03 00 00 00 00  00 00 00 00 00           |.............|
0006f93d
I'll remove the debugging to stderr, and submit a github pull request.
Peter
Patch here: https://github.com/peterjc/samtools/tree/u-eof

Pull request here: https://github.com/lh3/samtools/pull/7

Before the patch,

$ ~/repositories/samtools-git/samtools view -u ex1_header.bam |
samtools sort - test

So using a pipe works fine, but using a file:

$ ~/repositories/samtools-git/samtools view -u ex1_header.bam > test_old.bam
$ samtools sort test_old.bam test
[bam_header_read] EOF marker is absent. The input is probably truncated.

With the patch,

$ ~/repositories/samtools-git/samtools view -u ex1_header.bam > test_new.bam
$ samtools sort test_new.bam test

(no errors - good)

Further testing welcome, for instance is it possible via the samtools
command line interface to select other compression levels? They too
may generate different empty BGZF blocks, in which case my patch
could be modified to always write the 28 bytes explicitly.

Regards,

Peter
John Marshall
2011-11-17 17:19:13 UTC
Permalink
Post by Peter Cock
Further testing welcome, for instance is it possible via the samtools
command line interface to select other compression levels? They too
may generate different empty BGZF blocks, in which case my patch
could be modified to always write the 28 bytes explicitly.
Whether or not they can currently be selected via the samtools command line, other compression levels can certainly be selected via the bgzf_open() API function and friends. So for a patch here to be worthwhile, IMHO it ought to work with arbitrary compression levels.

John
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
Peter Cock
2011-11-17 17:20:54 UTC
Permalink
Post by John Marshall
Post by Peter Cock
Further testing welcome, for instance is it possible via the samtools
command line interface to select other compression levels? They too
may generate different empty BGZF blocks, in which case my patch
could be modified to always write the 28 bytes explicitly.
Whether or not they can currently be selected via the samtools command
line,
Well, fast compression can be selected with -1.
Post by John Marshall
other compression levels can certainly be selected via the bgzf_open()
API function and friends.  So for a patch here to be worthwhile, IMHO
it ought to work with arbitrary compression levels.
Indeed - I agree and have revised the patch on github.

Thanks,

Peter
Peter Cock
2012-02-16 13:29:29 UTC
Permalink
Dear Heng,

I was reminded of this open bug in samtools bgzf.c via a query on SEQanswers,
http://seqanswers.com/forums/showthread.php?p=65067#post65067

Could you review the proposed patch to ensure the 'standard' 28 bytes
empty BGZF block EOF marker is always used at the end of a BAM file,
regardless of the compression level used for the previous data containing
blocks?

https://github.com/lh3/samtools/pull/7

Overall the patch is just adding one line of code:

diff --git a/bgzf.c b/bgzf.c
index 216cd04..62fb489 100644
--- a/bgzf.c
+++ b/bgzf.c
@@ -627,6 +627,9 @@ int bgzf_close(BGZF* fp)
if (fp->open_mode == 'w') {
if (bgzf_flush(fp) != 0) return -1;
{ // add an empty block
+ // add 28 bytes EOF empty block
+ fp->compress_level=Z_DEFAULT_COMPRESSION;
//don't need value again
+ // (different compression levels would give
different empty blocks)
int count, block_length = deflate_block(fp, 0);
#ifdef _USE_KNETFILE
count = fwrite(fp->compressed_block, 1,
block_length, fp->x.fpw);



Thank you,

Peter

王森
2011-11-17 02:21:43 UTC
Permalink
hi,
What does this error mean with respect to the completion of my samtools command(samtools sort )?
why?
[bam_header_read] EOF marker is absent. The input is probably truncated.

The commmand:
samtools view -buS -T all.fa -o s_5_1.bam s_5_1.sam > log
samtools sort -n s_5_1.bam s_5_1.sort > sort.log
Thank you!
Peter Cock
2011-11-21 16:35:49 UTC
Permalink
Post by 王森
hi,
What does this error mean with respect to the completion of my samtools
command(samtools sort )?
why?
[bam_header_read] EOF marker is absent. The input is probably truncated.
samtools view -buS -T all.fa  -o s_5_1.bam s_5_1.sam > log
samtools sort -n  s_5_1.bam  s_5_1.sort  >  sort.log
Thank you!
It is a minor bug with -u output (possibly -1 output as well), and
can be ignored. There is a patch,

http://sourceforge.net/mailarchive/message.php?msg_id=28413844

Peter
Loading...