FORUMTrinity
Audrée L asked 1 month ago



Hi! I’m a student who started with bioinformatics 1-2 months ago. I’ve been trying for the past days to run Trinity on my quality-controlled reads, but everytime, it doesn’t seem to work. I’ve tried increasing my memory up to 80G, but it doesn’t seem to work. I have submitted a 200G job, but it’s still in the queue.
This is my sbatch script:
#!/bin/bash
#SBATCH -c 6 # Number of CPUS requested. If omitted, the default is 1 CPU.
#SBATCH --mem=80G # mem in gb
#SBATCH -t 14-0:0:0 # How long will your job run for? If omitted, the default is 3 hours.
#SBATCH -J essai_3 # Name of job
module load gcc/7.3.0
module load openmpi/3.1.4
module load samtools
module load jellyfish
module load salmon
module load trinity/2.9.0
Trinity --seqType fq --max_memory 80G --CPU 6 --left 003.Index_3.GR_RNA_BS3-3_R1_paired.fastq.gz --right 003.Index_3.GR_RNA_BS3-3_R2_paired.fastq.gz
And this is the error that I keep getting, at the “Inchworm step”:

done parsing 1428854866 Kmers, 1428854866 added, taking 2604 seconds.
TIMING KMER_DB_BUILDING 2604 s.
-populating the kmer seed candidate list.
Kcounter hash size: 1428854866
sh: line 1: 4374 Killed /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx2/Compiler/gcc7.3/trinity/2.9.0/trinityrnaseq-v2.9.0/Inchworm/bin//inchworm --kmers jellyfish.kmers.25.asm.fa --run_inchworm -K 25 --monitor 1 --DS --num_threads 2 --PARALLEL_IWORM -L 25 --no_prune_error_kmers > /scratch/alemi055/ete2020/RNA_Arctic/quality_control_trimmed_paired/trinity_out_dir/inchworm.DS.fa.tmp
Error, cmd: /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx2/Compiler/gcc7.3/trinity/2.9.0/trinityrnaseq-v2.9.0/Inchworm/bin//inchworm --kmers jellyfish.kmers.25.asm.fa --run_inchworm -K 25 --monitor 1 --DS --num_threads 2 --PARALLEL_IWORM -L 25 --no_prune_error_kmers > /scratch/alemi055/ete2020/RNA_Arctic/quality_control_trimmed_paired/trinity_out_dir/inchworm.DS.fa.tmp died with ret 35072 No such file or directory at /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx2/Compiler/gcc7.3/trinity/2.9.0/trinityrnaseq-v2.9.0/PerlLib/Pipeliner.pm line 186.
Pipeliner::run(Pipeliner=HASH(0x17ec578)) called at /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx2/Compiler/gcc7.3/trinity/2.9.0/trinityrnaseq-v2.9.0/Trinity line 2621
eval {...} called at /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx2/Compiler/gcc7.3/trinity/2.9.0/trinityrnaseq-v2.9.0/Trinity line 2610
main::run_inchworm("/scratch/alemi055/ete2020/RNA_Arctic/quality_control_trimmed_"..., "/scratch/alemi055/ete2020/RNA_Arctic/quality_control_trimmed_"..., undef, "", 25, 0) called at /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx2/Compiler/gcc7.3/trinity/2.9.0/trinityrnaseq-v2.9.0/Trinity line 1725
main::run_Trinity() called at /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx2/Compiler/gcc7.3/trinity/2.9.0/trinityrnaseq-v2.9.0/Trinity line 1396
eval {...} called at /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx2/Compiler/gcc7.3/trinity/2.9.0/trinityrnaseq-v2.9.0/Trinity line 1395
If it indicates bad_alloc(), then Inchworm ran out of memory. You'll need to either reduce the size of your data set or run Trinity on a server with more memory available.
** The inchworm process failed.slurmstepd: error: Detected 1 oom-kill event(s) in step 43788761.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
Does anyone have any idea what might be causing this? I am working in /scratch, but I tried running the script in my /projects directory, and still the same error message.
Thank you 🙂

flefebvre Staff replied 1 month ago

Hi Audrée, Rob Syme provided an answer below

1 Answers
Rob Syme Staff answered 1 month ago



Hi Audrée
 
The memory requirement of Trinity depend to a large extent on the complexity of the sample, but a job exceeding 80G of RAM is certainly not unusual. Your approach of increasing the memory allocation and resubmission to the queue is exactly what I’d recommend. 
Sometimes the complexity of the sample can be mitigated by in-silico read normalisation, as described here: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Insilico-Normalization
Another helpful tip when running trinity is to reduce job time by running the job in $SLURM_TMPDIR. Trinity creates many thousands of small files, which can cause slowdown when operating on the networked file system. I’d recommend running cd $SLURM_TMPDIR before running trinity, making sure to cp your results files back to the networked file system when you are finished. 
I hope this helps, feel free to post a follow-up question here if anything is unclear.