FORUMCategory: BioinformaticsDuplicates read names in .bam file
flefebvre Staff asked 4 years ago

Hi, I have received Illumina paired ends sequencing as bam files on the human genome. Now I need to submit the corresponding Fastq files to a public repository but it is giving me errors about duplicate reads in my fastqs. I converted to fastq using the command bedtools bamtofastq -i <BAM> -fq <FASTQ>​. The version of Bedtools was v2.25.0​.I am concerned that the original bam files had problems. The genome center who generated the data is telling me to use Picard but I do not understand how this would change anything.Can you help me with this? thanks

1 Answers
Best Answer
nibeh Staff answered 4 years ago

The duplicate names you’re seeing may be a result of secondary alignments. Instead of using Bedtools, try using Picard’s “SamtoFastq”. Setting the option “INCLUDE_NON_PRIMARY_ALIGNMENTS” to False might solve your problems. Let me know if this works.

flefebvre Staff replied 4 years ago

That was the problem thank you!