I am running rpkm on a set of 14 samples (roughly the same size assembly and same size sam file). But on one of the samples i get the following error:
/var/spool/slurmd/job23090490/slurm_script: line 9: 63436 Killed /home/xxxx//rpkm/./rpkm_runningfile -c /home/xxxx/assembly/contig.fa -a /home/xxxx/read_mapping/sample.sam -s /home/xxx/sample_rpkmstats.txt -o /home/xxxx/sample/_rpkm.csv –m
slurmstepd: error: Detected 1 oom-kill event(s) in step 23090490.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
I ran all the job with the following specs:
#!/bin/bash
#BATCH –mem=128000M
#SBATCH –nodes=2
#SBATCH –cpus-per-task=32
#SBATCH –time=0-23:59
with 1 node, 32 CPUS, and 125G
Any ideas why one would fail and the others work?
Thanks!
Hi there,
It looks like your job is being killed because it is over the memory limit you have requested. So, first thing to try would be to increase the memory you’re requesting… perhaps 50% more for starters.
Give it a try and let us know.
p.s. what system are you working on… Cedar? Graham? Other?
cedar – thanks using more memory (157000) than the other samples did the trick!
Nice! glad that worked