FORUMRunning QIIME2 – Out of memory error
NickB asked 1 year ago

I’m trying to run the following DADA2 command in qiime, but after a while I get an out of memory error.

qiime dada2 denoise-paired \
–i-demultiplexed-seqs paired-end-demux.qza \
–p-trim-left-f 0 \
–p-trunc-len-f 245 \
–p-trim-left-r 0 \
–p-trunc-len-r 205 \
–p-max-ee 2 \
–p-n-threads 64 \
–o-representative-sequences rep-seqs-dada2-all.qza \
–o-table table-dada2-all.qza \
–o-denoising-stats stats-dada2-all.qza

The error is :

Plugin error from dada2:
An error was encountered while running DADA2 in R (return code -9), please inspect stdout and stderr to learn more.
Debug info has been saved to /tmp/qiime2-q2cli-err-5e5q8rfb.log

How do I get more memory for my instance?

3 Answers
jrosner Staff answered 1 year ago

Hi NickB,
Seems like more memory will do the trick, just wondering… how are you submitting your job?
i.e. as a batch job or are you requesting an interactive session?
have a look here
also, i see that you’re requesting 64 threads, while there are a few nodes on Graham that can satisfy this request, most nodes have 32 cores, so you’d be better off specifying a maximum of 32, but be aware that the more cores and memory you request can affect how long you wait in the queue for the requested resources to become available.
Now, looking at the QIIME forum, it looks like 8GB is a typical requirement, but you can always check this by launching your job and then running the “top” command to see how much memory it uses.
So, have a look at the link on running jobs and see if you can specify something like 8 cores and 16Gb mem, then change your thread flag in the qiime command to match the number of cores, i.e. –p-n-threads 8
Give it a go and let me know how if you get it to run

NickB replied 1 year ago

Hi jrosner,

This was only my 2nd day with Compute Canada and I work only one day a week on the project. I haven’t read all the doc yet and just started the job doc last time.

I have a 900MB file to process with 11 samples. How long do you think it can take?

Thank you

jrosner Staff replied 1 year ago

Hi Nick, I don’t have experience running QIIME, but even if I did, this is somewhat difficult to answer. The amount of time will really depend on the quality and quantity of resources you’re able to throw at it. In particular, multi-threading will split up the analysis and run in parallel, so in theory 2 threads over a single thread would cut your time in half… this however doesn’t account for overhead and non-paralellizable parts of the job.

NickB answered 1 year ago

I read the doc about running jobs, but do the nodes have QIIME2 installed?
Will this script run?

#SBATCH –time=1:30:00
#SBATCH –job-name=dada2
#SBATCH –output=sample_1_2_dada2.out
#SBATCH –account=def-XXXX
#SBATCH –workdir=/home/users/XXXX/cnete
#SBATCH –cpus-per-task=16
#SBATCH –mem-per-cpu=32G
qiime dada2 denoise-paired –i-demultiplexed-seqs paired-end-demux-s1_2.qza –p-trim-left-f 0 –p-trunc-len-f 245 –p-trim-left-r 0 –p-trunc-len-r 205 –p-max-ee 2 –p-n-threads 64 –o-representative-sequences rep-seqs-dada2-all.qza –o-table table-dada2-s1_2.qza –o-denoising-stats stats-dada2-s1_2.qza


NickB replied 1 year ago

Confirmed this job does not run.

NickB answered 1 year ago

I still get the memory error when running this job I allocated 32G per CPU :

#SBATCH –time=1:30:00
#SBATCH –job-name=dada2
#SBATCH –output=sample_1_2_dada2.out
#SBATCH –account=def-XXX
#SBATCH –workdir=/home/users/XXX/cnete
#SBATCH –cpus-per-task=16
#SBATCH –mem-per-cpu=32G
module load singularity/3.1
singularity shell -B ~/ core_2019.1.sif
cd cnete/dataset/
qiime dada2 denoise-paired –i-demultiplexed-seqs samples_1_2/paired-end-demux-s1_2.qza –p-trim-left-f 0 –p-trunc-len-f 245 –p-trim-left-r 0 –p-trunc-len-r 205 –p-max-ee 2 –p-n-threads 64 –o-representative-sequences rep-seqs-dada2-all.qza –o-table table-dada2-s1_2.qza –o-denoising-stats stats-dada2-s1_2.qza

What am I missing?

jrosner Staff replied 1 year ago

So, you’re asking for 16 cores, with 32G per core which i don’t think is right.
you could try either
(this will give a total of 32G)
(this assigns a total of 32G for the entire node)
give these a try, but my guess is the latter is the one you want

Wade Klaver Staff replied 1 year ago

I spoke with a friend who has done a lot of QIIME2 work and he had some input. I think you are on the right track with 16-32GB/core. So, –cpus-per-task=8 and –mem-per-cpu=16G should be reasonable given that fits within the memory profile of the majority of nodes. Once you get that running effectively, you could titrate down to the minimal memory amount to get more cores on task. i.e. 16 cores and 8G? The -p-n-threads 64 should likely be dropped to the number of CPUs requested.
A couple of other things:
1) Depending on the size of your sequences, you may want to consider extending the time allotment.
2) 19.4 apparently includes a new version of DADA2 which is rumoured to be ~10x faster, potentially rendering my last comment obsolete. It may be time to update your singularity image.

NickB replied 1 year ago

Sorry for the late reply, I only work one day a week on the project.

I’m not familiar with Singularity. I’ve executed the image, but how do I execute a command right after launching the image.

When running my script, it just opens the command line waiting for inputs instead of executing my task. I don’t think my job loads.

I will update my image today.

Wade Klaver Staff replied 1 year ago

OK… if you update your singularity image, things should perform a bit faster.
Looking at your command, it seems I overlooked something. Not a lot will happen with what you have there. You are correct; it will essentially launch a shell on the singularity image and wait for it to return before just executing your QIIME command. It should also start the job in the cnete directory, so the “cd” command is not required. Generally try to be explicit with paths.

Try something like:
singularity exec -B ~/ qiime dada2 denoise-paired –i-demultiplexed-seqs dataset/samples_1_2/paired-end-demux-s1_2.qza –p-trim-left-f 0 –p-trunc-len-f 245 –p-trim-left-r 0 –p-trunc-len-r 205 –p-max-ee 2 –p-n-threads 16 –o-representative-sequences dataset/rep-seqs-dada2-all.qza –o-table dataset/table-dada2-s1_2.qza –o-denoising-stats dataset/stats-dada2-s1_2.qza

Note I have pushed the dataset directory name into the file names… this is not ideal for longer term operation since, as you begin to run multiple jobs at once, you will start overwriting your own files, but it should server to get you going. Once you have things working, I would look at encoding the data directories and input/output data file names into environment variables so as to better manage your data locations, but we can cross that bridge when you come to it.