I am running blast on the command line. I’d like to blast one protein against >100 genomes. Normally I’d concatenate the genomes and make them into a database. However, the genome contigs have internal headers not unique to the sample name. For example:
Genome 1
>contig1
>contig2
Genome 2
>contig1
>contig2
If i concatenate them then i’d have no way to know if which sample the query matched to.
I tried: for i in *faa; do bastp -bd prtoen1.faa – query $i -out $i.blasted; done. But i get one output file for each genome (hundreds of files). Ideally, i’d like one output. Has anyone tried to do this before?
How about add genome name in front of each contig?
sed -i -e ‘s/^>/>genome1_/’ genome1.faa
I was thinking about that but I have a directory with >500 genomes. I was wondering which is easier loop the blast or loop renaming of files? both of which i am stuck with!
thanks for sed -i -e ‘s/^>/>genome1_/’ genome1.faa
Super handy!
is there a way to loop that through a directory of genomes with different file names?
Use wildcards to substitute differences in the names so you can loop through directories with different names: https://ryanstutorials.net/linuxtutorial/wildcards.php
cool! thank you for your help this is a great forum!
You can try this
for i in *.faa; do sed -i -e “s/^>/>$(echo $i|cut -f 1 -d ‘.’)_/” $i; done
It will take name before the first “.” as genome name
Please test a couple of files first!
i get:
bash: /: Is a directory
make sure when you copy the code, ” and ‘ are correct.
When I copied ” became “ and ”, ‘ became ‘ and ’
that’s exactly what it was the ” and ‘ got changed when i copied it. It now worked!!!! Thank you so much for your help! again really useful forum!
You can include more than one fasta file when creating indices, just add them as arguments separated by spaces (as opposed to manually concatenating them into a single file). Still, if the headers are exactly the same and the file names are exactly the same, you might keep running into issues. In that case, the best practice would be to re-label your headers so that they are unique, removing any ambiguity.
Hope this helps!