I’m currently having trouble with a long job (~6 days) on mp2 that keeps on hitting the walltime. I can’t (easily) split the job and it’s already multi-threaded as much as possible. Now I’m wondering if the choice of the storage unit might help speed up the computation. The job creates a few dozens big files and is constantly writing in them. They are ~2-3Gb when the job hits the walltime. Maybe a faster I/O would help?
I was first using our “nobackup” folders in ‘/nfs3_ib/…’. Now I’m using ‘/mnt/parallel_scratch_mp2_wipe_on_december_2018/…’.
Speed-wise, is it worth trying to use the local scratch $LSCRATCH and move the files at the end of the job? Is that the recommended approach in general?
As flefebvre mentioned, you should write to mp2 support first. Sometimes, if it’s not a routine job, they are willing to extend the walltime beyond the allowed limit. This could help if you think that a few more days would allow the job to finish successfully. But again, you should write to mp2 support to see what they suggest, because if I/O speed is the main bottleneck, they could offer additional solutions.
Hope this helps!
Yes, good idea, thanks, I’ll ask them and report back!
Apparently it might be worth it, I’m trying it now. This is what mp2 support responded:
“Yes it is worth trying LSCRATCH, for program which do a lot of ios, I have seen 10x speedup. Constant writing on the scratch is quite slow. The SCRATCH is optimized for few big io and not multiple small ones.
If you are limited by the ios, you could also try the $RAMDISK. It use the node RAM, if you have enough unused memory you could see a further speedup using it. “
The IO rate on “parallel_scratch_mp2” is 10Gb/s but is shared amongst all users. Performance will therefore vary depending on what other users are doing.
NFS is definitely slower, although I ignore to what extent.
The IO rate on $LSCRATCH is 120 Mb/s but as you say, you will have to copy files in and out at job start and end.
Another option is $RAMDISK, IO rate of 1.8 Gb/s per core. That is probably your best option if the job doesn’t require much RAM, and if you don’t mind copying in and out as for $LSCRATCH
hope this helps
For you reference:
You can also email mp2 support at email@example.com for mp2 tailored advice.
Thanks for the information. So, for example, what do you do when you run your pipelines? Directly on “parallel_scratch_mp2”? Or first in $LSCRATCH and move the files at the end of the job?
Hi Jean, if you mean in GenPipes, I think $LSCRATCH is just used for tools that require a temporary directory. We certainly do not copy all files to local scratch, e.g. bam files.
Ok, thanks for the information.