FORUMBlast to multiple genomes
spongemicrobiome asked 11 months ago



I am running blast on the command line.  I’d like to blast one protein against >100 genomes.  Normally I’d concatenate the genomes and make them into a database. However, the genome contigs have internal headers not unique to the sample name.  For example:
Genome 1

>contig1
>contig2

Genome 2

>contig1
>contig2

If i concatenate them then i’d have no way to know if which sample the query matched to.
I tried: for i in *faa; do bastp -bd prtoen1.faa – query $i -out $i.blasted; done.  But i get one output file for each genome (hundreds of files).    Ideally, i’d like one output.  Has anyone tried to do this before?

2 Answers
zhibin Staff answered 11 months ago



How about add genome name in front of each contig?
sed -i -e ‘s/^>/>genome1_/’ genome1.faa

spongemicrobiome replied 11 months ago

I was thinking about that but I have a directory with >500 genomes. I was wondering which is easier loop the blast or loop renaming of files? both of which i am stuck with!

spongemicrobiome replied 11 months ago

thanks for sed -i -e ‘s/^>/>genome1_/’ genome1.faa
Super handy!

spongemicrobiome replied 11 months ago

is there a way to loop that through a directory of genomes with different file names?

jhgalvez Staff replied 11 months ago

Use wildcards to substitute differences in the names so you can loop through directories with different names: https://ryanstutorials.net/linuxtutorial/wildcards.php

spongemicrobiome replied 11 months ago

cool! thank you for your help this is a great forum!

zhibin Staff replied 11 months ago

You can try this

for i in *.faa; do sed -i -e “s/^>/>$(echo $i|cut -f 1 -d ‘.’)_/” $i; done

It will take name before the first “.” as genome name

Please test a couple of files first!

spongemicrobiome replied 11 months ago

i get:
bash: /: Is a directory

zhibin Staff replied 11 months ago

make sure when you copy the code, ” and ‘ are correct.

When I copied ” became “ and ”, ‘ became ‘ and ’

spongemicrobiome replied 11 months ago

that’s exactly what it was the ” and ‘ got changed when i copied it. It now worked!!!! Thank you so much for your help! again really useful forum!

jhgalvez Staff answered 11 months ago



You can include more than one fasta file when creating indices, just add them as arguments separated by spaces (as opposed to manually concatenating them into a single file). Still, if the headers are exactly the same and the file names are exactly the same, you might keep running into issues. In that case, the best practice would be to re-label your headers so that they are unique, removing any ambiguity. 
 
Hope this helps!