FORUMCatégorie: BioinformaticsDuplicates read names in .bam file
flefebvre personnel demandée il y a 4 ans

Hi, I have received Illumina paired ends sequencing as bam files on the human genome. Now I need to submit the corresponding Fastq files to a public repository but it is giving me errors about duplicate reads in my fastqs. I converted to fastq using the command bedtools bamtofastq -i <BAM> -fq <FASTQ>​. The version of Bedtools was v2.25.0​.I am concerned that the original bam files had problems. The genome center who generated the data is telling me to use Picard but I do not understand how this would change anything.Can you help me with this? thanks

1 Réponses
Best Answer
nibeh personnel répondue il y a 4 ans

The duplicate names you’re seeing may be a result of secondary alignments. Instead of using Bedtools, try using Picard’s « SamtoFastq ». Setting the option « INCLUDE_NON_PRIMARY_ALIGNMENTS » to False might solve your problems. Let me know if this works.

flefebvre personnel répondue il y a 4 ans

That was the problem thank you!