FORUMBlast taxonomy
RocksandBugs asked 4 months ago



  • Hi, I’m interested in getting the scientific names of my blast hits, ran locally, into my output files. I understand that the only taxonomy information stored directly in the BLAST database is the taxid and that the rest needs to be pulled from an additional database also provided by NCBI:ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz I have downloaded the two NCBI files: taxdb.btd and taxdb.bti into the same folder as the blast databases (I am using the downloaded refseq_protein database) and the path is defined by BLASTDB for that folder. I have since tried to use extra output formats (i.e. sscinames or sskingdoms) with my blast commands, but they won’t return any output. For example:  $ blastp -db refseq_protein.00 -max_target_seqs 1 -outfmt ‘6 qseqid sseqid evalue staxids sscinames scomnames stitle’ -query RpL2.fasta Do I need to make a new blast database with the refseq_protein database, using the -taxid_map option? And if so how do I do this and do I use the taxdb.btd or taxdb.bti files? There is some information on this on online forums, but nothing that is outlined clearly, so any help would be much appreciated! Thanks!

    jrosner Staff replied 4 months ago

    Hi there RocksandBugs, just wanted to let you know that I’m working on finding someone that can help you out with this.
    We’re a bit short on people right now with so many on vacation, so apologies! But I’m hoping I can get you some help by the end of the week.
    Cheers

    jrosner Staff replied 4 months ago

    oh, and forgot to mention… I did find this:
    https://www.mathworks.com/matlabcentral/mlc-downloads/downloads/submissions/15970/versions/2/previews/taxoblastdemo/html/taxoblastdemo.html?access_key=
    so, it’s written in Matlab, but have a look at the section “Classifying BLAST Hits by Scientific Name”
    might just provide some insight on how to do what you want… and if so, please let me know!

    RocksandBugs replied 4 months ago

    Hi jrosner,

    Thank you very much for your help. I didn’t have a chance to try this as the response from jflucier solved the problem.

    Thank you again for your time and help!

    1 Answers
    Best Answer
    jflucier Staff answered 4 months ago



  • Hello,

    i downloaded a preformatted database and it seem to work. Here is what I have done:


    # update_blastdb provided with blast+ install
    update_blastdb 16SMicrobial

    # the archive includes the taxdb.btd and taxdb.bti files
    tar -xvzf 16SMicrobial.tar.gz

    $ blastn -db ./16SMicrobial -query test_16s.fa -outfmt '6 qseqid sseqid evalue staxids sscinames scomnames stitle'
    NC_001318.1:c446118-444581 gi|1230874590|ref|NR_148750.1| 0.0 64897 Borrelia bissettii Borrelia bissettii Borreliella bissettii strain DN127 16S ribosomal RNA, complete sequence
    NC_001318.1:c446118-444581 gi|507148149|ref|NR_102956.1| 0.0 64897 Borrelia bissettii Borrelia bissettii Borreliella bissettii strain DN127 16S ribosomal RNA, partial sequence
    NC_001318.1:c446118-444581 gi|444439539|ref|NR_074854.1| 0.0 664662 Borrelia bavariensis Borrelia bavariensis Borreliella bavariensis strain PBi 16S ribosomal RNA, partial sequence
    NC_001318.1:c446118-444581 gi|310974942|ref|NR_036806.1| 0.0 100177 Borrelia lusitaniae Borrelia lusitaniae Borrelia lusitaniae strain Poti B2 16S ribosomal RNA gene, partial sequence
    NC_001318.1:c446118-444581 gi|559795161|ref|NR_104748.1| 0.0 29518 Borrelia afzelii Borrelia afzelii Borrelia afzelii strain VS461 16S ribosomal RNA gene, partial sequence
    NC_001318.1:c446118-444581 gi|559795280|ref|NR_104871.1| 0.0 88916 Borrelia spielmanii Borrelia spielmanii Borrelia spielmanii strain PC-Eq17N5 16S ribosomal RNA gene, partial sequence
    NC_001318.1:c446118-444581 gi|315614511|ref|NR_024713.2| 0.0 87162 Borrelia sinica Borrelia sinica Borrelia sinica strain CMN3 16S ribosomal RNA gene, partial sequence
    NC_001318.1:c446118-444581 gi|310974943|ref|NR_036807.1| 0.0 445987 Borrelia valaisiana VS116 Borrelia valaisiana VS116 Borrelia valaisiana strain VS116 16S ribosomal RNA gene, partial sequence
    NC_001318.1:c446118-444581 gi|636558650|ref|NR_114707.1| 0.0 64897 Borrelia bissettii Borrelia bissettii Borreliella bissettii strain DN127 16S ribosomal RNA, partial sequence
    NC_001318.1:c446118-444581 gi|1230874663|ref|NR_148824.1| 0.0 373543 Borrelia californiensis Borrelia californiensis Borreliella californiensis strain CA446 16S ribosomal RNA, partial sequence

    The update_blastdb download precompiled database from this url ftp://ftp.ncbi.nlm.nih.gov/blast/db/. The refseq_protein is available for download along with the taxdb files. To download run this command:

    update_blastdb refseq_protein

    Hope this helps.

    RocksandBugs replied 4 months ago

    Hi jflucier,

    Thank you very much for this! Despite downloading the database previously, it never downloaded the taxdb files with each database.

    This works great!

    Cheers!