FORUMExtract all reads from bam file Except region
Phillip A Richmond asked 2 months ago



Hello,
I’m looking to extract reads from a bam file, for everything EXCEPT a single region. Ideally, I would be able to cut out all reads which map over this region, but keep all the other reads in the file. Just wondering if there’s a better solution other than what I show below:
A brute force solution is just:
samtools view -L regions.bed <in.bam>
where regions.bed is all regions in my file except my region. It would look something like, assuming the cut-out region is on chrom 5:1,000,000-2,000,000
regions.bed
chr1   1
1  1  249250621
2  1  243199373
3  1  198022430
4  1  191154276
5  1  1000000
5  2000000  180915260
6  1  171115067
7  1  159138663
8  1  146364022
9  1  141213431
10  1  135534747
11  1  135006516
12  1  133851895
13  1  115169878
14  1  107349540
15  1  102531392
16  1  90354753
17  1  81195210
18  1  78077248
19  1  59128983
20  1  63025520
21  1  48129895
22  1  51304566
X  1  155270560

If you’ve got a better solution I’d love to hear it!
Thanks,
Phi

2 Answers
jrosner Staff answered 2 months ago



Hi Phil,
Sorry about the delayed response, we had some issues with our mail server!  But all is good now…
Ok, so another way you could do this, which might be easier, would be to use the -U option:

-U FILE: Write alignments that are not selected by the various filter options to FILE. When this option is used, all alignments (or all alignments intersecting the regions specified) are written to either the output file or this file, but never both.

 
So you would get something like this:

samtools view -b -L regions_to_exclude.bed -U all_other_regions.bam input.bam >  excluded_regions.bam

Similar to what you were doing, but might be easier this way to create the bed file with excluded regions.  especially if there are many.
give it a try and see if it works.

Phillip A Richmond answered 4 days ago



Ended up using this code which works like a charm:
 
SIM_BASE=GLS_GCA_2-191745598-191745646_2-191743598-191747646
REF_STR_BAM=${STR_ANALYSIS_DIR}/STR_Simulation/str_bams/GRCh37/${SIM_BASE}_0_18x.sorted.bam
EXP_STR_BAM=${STR_ANALYSIS_DIR}/STR_Simulation/str_bams/GRCh37/${SIM_BASE}_650_18x.sorted.bam
OUTPUT_BAM=${OUTPUT_DIR}/ERR2304566.${SIM_BASE}_650_het.bam
# Check files
ls $NATIVE_BAM
ls $REF_STR_BAM
ls $EXP_STR_BAM
# Cut out the BAM given the bed file
samtools view -@ $THREADS -hu -L $CUTOUT_BED \
        $NATIVE_BAM \
        | samtools merge -r \
        -@ $THREADS \
        $OUTPUT_BAM \
        – \
        $REF_STR_BAM \
        $EXP_STR_BAM
samtools index $OUTPUT_BAM
Where cutout_bed is a bed file containing everything except the region to be cut out:

1 0 249250621
2 0 191743598
2 191747646 243199373
3 0 198022430
4 0 191154276
5 0 180915260
6 0 171115067
7 0 159138663
8 0 146364022
9 0 141213431
10 0 135534747
11 0 135006516
12 0 133851895
13 0 115169878
14 0 107349540
15 0 102531392
16 0 90354753
17 0 81195210
18 0 78077248
19 0 59128983
20 0 63025520
21 0 48129895
22 0 51304566
X 0 155270560
Y 0 59373566
MT 0 16569
 
Cheers,
Phil
 

jrosner Staff replied 4 hours ago

thanks for posting your solution Phil