FORUMBlast to multiple genomes
spongemicrobiome demandée il y a 1 an



I am running blast on the command line.  I’d like to blast one protein against >100 genomes.  Normally I’d concatenate the genomes and make them into a database. However, the genome contigs have internal headers not unique to the sample name.  For example:
Genome 1

>contig1
>contig2

Genome 2

>contig1
>contig2

If i concatenate them then i’d have no way to know if which sample the query matched to.
I tried: for i in *faa; do bastp -bd prtoen1.faa – query $i -out $i.blasted; done.  But i get one output file for each genome (hundreds of files).    Ideally, i’d like one output.  Has anyone tried to do this before?

2 Réponses
zhibin personnel répondue il y a 1 an



How about add genome name in front of each contig?
sed -i -e ‘s/^>/>genome1_/’ genome1.faa

spongemicrobiome répondue il y a 1 an

I was thinking about that but I have a directory with >500 genomes. I was wondering which is easier loop the blast or loop renaming of files? both of which i am stuck with!

spongemicrobiome répondue il y a 1 an

thanks for sed -i -e ‘s/^>/>genome1_/’ genome1.faa
Super handy!

spongemicrobiome répondue il y a 1 an

is there a way to loop that through a directory of genomes with different file names?

jhgalvez personnel répondue il y a 1 an

Use wildcards to substitute differences in the names so you can loop through directories with different names: https://ryanstutorials.net/linuxtutorial/wildcards.php

spongemicrobiome répondue il y a 1 an

cool! thank you for your help this is a great forum!

zhibin personnel répondue il y a 1 an

You can try this

for i in *.faa; do sed -i -e « s/^>/>$(echo $i|cut -f 1 -d ‘.’)_/ » $i; done

It will take name before the first « . » as genome name

Please test a couple of files first!

spongemicrobiome répondue il y a 1 an

i get:
bash: /: Is a directory

zhibin personnel répondue il y a 1 an

make sure when you copy the code,  » and ‘ are correct.

When I copied  » became “ and ”, ‘ became ‘ and ’

spongemicrobiome répondue il y a 1 an

that’s exactly what it was the  » and ‘ got changed when i copied it. It now worked!!!! Thank you so much for your help! again really useful forum!

jhgalvez personnel répondue il y a 1 an



You can include more than one fasta file when creating indices, just add them as arguments separated by spaces (as opposed to manually concatenating them into a single file). Still, if the headers are exactly the same and the file names are exactly the same, you might keep running into issues. In that case, the best practice would be to re-label your headers so that they are unique, removing any ambiguity. 
 
Hope this helps!