FORUMCategory: BioinformaticsShuffle reads?
flefebvre Staff asked 2 years ago



Hi, a colleague told me it is important to shuffle reads when converting a bam file to fastq. What is your take on this?

1 Answers
Best Answer
jflucier Staff answered 2 years ago



BAM files are ordered alignment of reads.
The aligner uses blocks of paired reads to estimate the insert size. If you don’t shuffle your original bam, the blocks of insert size will not be randomly distributed across the genome, rather they will all come from the same region, biasing the insert size calculation. This is a very important step which is unfortunately often overlooked.
See https://gatkforums.broadinstitute.org/gatk/discussion/2908/howto-revert-a-bam-file-to-fastq-format for more information.

flefebvre Staff replied 2 years ago

Excellent, thank you.