FORUMContinued from the question….RUNNING BUSCO ON CEDAR
sjossey asked 3 weeks ago



  • So I followed the suggestions from the question I asked before for running BUSCO and it runs great for the test data but when I run it from a script for my data I see the following error. 
     
    INFO ****** Step 3/3, current time: 11/19/2018 10:05:57 ******
    INFO Running HMMER to confirm orthology of predicted proteins:
    INFO [hmmsearch] Error: Failed to open sequence file /scratch/sjossey/BUSCO/run_M6_Mas3_8_Nov18/augustus_output/extracted_proteins/EOG090A00H1.faa.1 for reading
    INFO [hmmsearch] Error: Failed to open sequence file /scratch/sjossey/BUSCO/run_M6_Mas3_8_Nov18/augustus_output/extracted_proteins/EOG090A0CFQ.faa.1 for reading
     
    Any suggestions to solve this problem will be great.
    Thanks,
    Sushma

    Eloi Mercier Staff replied 3 weeks ago

    Let’s start by verifying that your files are here. What does it say when you type:
    ls /scratch/sjossey/BUSCO/run_M6_Mas3_8_Nov18/augustus_output/extracted_proteins/EOG090A00H1.faa.1
    ls /scratch/sjossey/BUSCO/run_M6_Mas3_8_Nov18/augustus_output/extracted_proteins/EOG090A0CFQ.faa.1

    Eloi

    sjossey replied 3 weeks ago

    The file is not there

    ls: cannot access ‘/scratch/sjossey/BUSCO/run_M6_Mas3_8_Nov18/augustus_output/extracted_proteins/EOG090A00H1.faa.1’: No such file or directory

    So I think I see what the problem is but why is it not there

    Eloi Mercier Staff replied 3 weeks ago

    Can you try again please after using another version of Augustus from: /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx2/Compiler/gcc5.4/augustus/3.2.3/bin/augustus

    sjossey replied 3 weeks ago

    I will try that as soon as Cedar performance issue is resolved

    sjossey replied 2 weeks ago

    Thanks…The run looks much better now… just two errors as I have pasted below whereas before ~150 ‘Error: Failed to open sequence file’.

    INFO ****** Step 3/3, current time: 11/30/2018 14:04:56 ******
    INFO Running HMMER to confirm orthology of predicted proteins:
    INFO [hmmsearch] 50 of 499 task(s) completed at 11/30/2018 14:04:56
    INFO [hmmsearch] 100 of 499 task(s) completed at 11/30/2018 14:04:57
    INFO [hmmsearch] 150 of 499 task(s) completed at 11/30/2018 14:04:57
    INFO [hmmsearch] 250 of 499 task(s) completed at 11/30/2018 14:04:57
    INFO [hmmsearch] Error: Failed to open sequence file /scratch/sjossey/BUSCO/BUSCO_2/run_M6_Mas3_8_Nov18Aug3.2/augustus_output/extracted_proteins/EOG090A01PA.faa.1 for reading
    INFO [hmmsearch] 300 of 499 task(s) completed at 11/30/2018 14:04:57
    INFO [hmmsearch] 350 of 499 task(s) completed at 11/30/2018 14:04:58
    INFO [hmmsearch] 400 of 499 task(s) completed at 11/30/2018 14:04:58
    INFO [hmmsearch] Error: Failed to open sequence file /scratch/sjossey/BUSCO/BUSCO_2/run_M6_Mas3_8_Nov18Aug3.2/augustus_output/extracted_proteins/EOG090A0APF.faa.3 for reading
    INFO [hmmsearch] 450 of 499 task(s) completed at 11/30/2018 14:04:58
    INFO [hmmsearch] 499 of 499 task(s) completed at 11/30/2018 14:04:59

    If I could fix those two it would be great, but results now does look much better.
    Sushma

    sjossey replied 2 weeks ago

    So is it that the installation of augustus v3.3 has some problems?

    Eloi Mercier Staff replied 2 weeks ago

    It looks like this is a minor bug in BUSCO. According to the author:
    “After checking, there is a minor issue with busco when augustus fails to make a prediction and produces an empty file. No protein can be extracted and the file is missing for hmmsearch which complains. However, it has no other consequences than this error message. It does not affect the final score and you know that the busco id in the error message is reported as a missing gene, which is correct.”

    You can read more at https://gitlab.com/ezlab/busco/issues/77

    sjossey replied 2 weeks ago

    But the final results are very different so for the same genome with augustus v3.3
    INFO Results:
    INFO C:76.7%[S:75.2%,D:1.5%],F:13.3%,M:10.0%,n:4104

    It found only 76.7% complete genes were found..which was a very disappointing result…I realized it was wrong after I ran on some genomes with known BUSCO output and produced low values for results

    Whereas the augustus 3.2
    INFO Results:
    INFO C:94.8%[S:93.0%,D:1.8%],F:2.6%,M:2.6%,n:4104
    That is 94.8% complete genes ..I am pleased with that..but wanted to see if it can be further improved

    Thanks,
    Sushma

    Eloi Mercier Staff replied 2 weeks ago

    I agree that the results look better with augustus 3.2. There might be a configuration issue with 3.3.

    1 Answers
    jrosner Staff answered 3 weeks ago



  • (see reply in the comments section above)

    original thread post is here:

    Running Busco on Cedar