FORUMTrinity program freezing in Phase 2 of the de novo assembly when relatively big data (102G) files are used
vguerracanedo asked 10 months ago



  • Dear Compute Canada community, 
    Has anyone encounter the problem of not being able to go past phase 2 in Trinity? If so, how did you solve it? My Trinity run freezes at phase 2, and it will only complete phase 2 after I cancel the program and run it again. 
     
    I’m trying to run a paired end data set (one left data file of 102G and one right data file of 102G). 
     
    ===
    Memory requested
    ===

    #SBATCH –nodes=1
    #SBATCH –cpus-per-task=32
    #SBATCH –mem=128000

    ===
    Trinity command
    ===
    #S._. are my left or right files

    Trinity \
        –seqType fq \
        –left “${S1_F}”,”${S2_F}”,”${S3_F}”,”${S4_F}”,”${S5_F}”,”${S6_F}”,”${S7_F}”,”${S8_F}”,”${S9_F}” \
        –right “${S1_R}”,”${S2_R}”,”${S3_R}”,”${S4_R}”,”${S5_R}”,”${S6_R}”,”${S7_R}”,”${S8_R}”,”${S9_R}” \
        –CPU 32 \
        –max_memory 62G \
        –output “${OUT_DIR}”

    ===
    slurm output
    ====

    ———— Trinity Phase 2: Assembling Clusters of Reads ———————
    ——————————————————————————–
    Thursday, October 26, 2017: 01:10:57 CMD: /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx2/Compiler/intel2016.4/trinity/2.5.0/trinityrnaseq-Trinity-v2.5.0/trinity-plugins/BIN/ParaFly -c recursive_trinity.cmds -CPU 32 -v -shuffle 
    Number of Commands: 97589
    succeeded(97587)   99.9979% completed.
     ==== 
    # General note: Phase 2 won’t go to 100% and my final assembly ends up having an unusual number of assembled sequences. 

     
    Best, 
    Vanessa

    1 Answers
    jrosner Staff answered 10 months ago



  • Hi Vanessa,
    A couple of comments on this…
    First, I was going to recommend posting on the developer’s google group, but i can see that you’ve already done this!  What you might have missed, is that there is an existing discussion thread on that topic titled — Trinity phase 2 : paraFly stuck at 99.9999% for days.
    While there wasn’t a solid answer to this issue, there were some suggestions, one of which was using the parameter:

    --normalize_reads

    If you try this, be aware that you might need to lower your max_memory setting, that is, it might fail with an error message like “job killed: vmem XXXXXX exceeded limit”. 
    If this is the case, try lowering your max_memory conservatively, say from 62G to 50G, and see if that makes a difference.  Keep lowering it until the memory error goes away.
    Now, here’s another thought… I’ve had similar issues with other software in the past when reading in sequence data.  In particular, if you have a corrupt file or even a single corrupt read, it can cause the software to hang.  Perhaps this is why killing and relaunching works?  That is, it picks up immediately after the problem sequence. 
    And so, does this happen when analyzing other data sets or just this one?  If it is in fact just this one dataset, you could try running FastQC on your sequence files to find any possible corrupted or truncated files.  If you find one (or more) corrupted reads, just delete them and try running again.
    So, give these suggestions a try and let us know what works or doesn’t work.
    Cheers

    vguerracanedo replied 10 months ago

    Hi Jamie,

    Thank you for the fast response.

    I forgot to mention in my past post that I’m using the new version of Trinity. This new version is the version that Brian said would work better for this problem in the post you noted. And in this version the normalization step has been added as a default step (so no need to –normalize-reads). I’m sorry for creating a confusion by not noting the Trinity version that I am using.

    I have had a few memory problems in the recent past, so I’ll try to increase the ratio of the memory requested next just in case it may fix my problem. And I will double check the quality of my files.

    I’ll report back if I find a solution.

    Thank you for your time!

    V

    vguerracanedo replied 10 months ago

    Here is an update from Brian (one of the Trinity designers) in case it may be helpful to someone in here in the future:
    I do hear reports of this occasionally but I have no idea where the problem is, whether it’s java, openmp, etc. It’s not easily reproduced either. Sometimes keeping the CPU level lower such as at 20 or less results in fewer issues. I suspect it’s java related.

    If/when trinity v3 is available, it’ll lack java.

    Sorry I don’t have better insights.

    -Brian