Hi, a colleague told me it is important to shuffle reads when converting a bam file to fastq. What is your take on this?
BAM files are ordered alignment of reads.
The aligner uses blocks of paired reads to estimate the insert size. If you don’t shuffle your original bam, the blocks of insert size will not be randomly distributed across the genome, rather they will all come from the same region, biasing the insert size calculation. This is a very important step which is unfortunately often overlooked.
See https://gatkforums.broadinstitute.org/gatk/discussion/2908/howto-revert-a-bam-file-to-fastq-format for more information.
Excellent, thank you.