FORUMBest/recommended way to use the different storage units?
Jean M asked 4 months ago



  • I’m currently having trouble with a long job (~6 days) on mp2 that keeps on hitting the walltime. I can’t (easily) split the job and it’s already multi-threaded as much as possible. Now I’m wondering if the choice of the storage unit might help speed up the computation. The job creates a few dozens big files and is constantly writing in them. They are ~2-3Gb when the job hits the walltime. Maybe a faster I/O would help?
    I was first using our “nobackup” folders in ‘/nfs3_ib/…’. Now I’m using ‘/mnt/parallel_scratch_mp2_wipe_on_december_2018/…’.
    Speed-wise, is it worth trying to use the local scratch $LSCRATCH and move the files at the end of the job? Is that the recommended approach in general?
    Thanks

    2 Answers
    Best Answer
    jhgalvez Staff answered 4 months ago



  • As flefebvre mentioned, you should write to mp2 support first. Sometimes, if it’s not a routine job, they are willing to extend the walltime beyond the allowed limit. This could help if you think that a few more days would allow the job to finish successfully. But again, you should write to mp2 support to see what they suggest, because if I/O speed is the main bottleneck, they could offer additional solutions. 
     
    Hope this helps! 

    Jean M replied 4 months ago

    Yes, good idea, thanks, I’ll ask them and report back!

    Jean M replied 4 months ago

    Apparently it might be worth it, I’m trying it now. This is what mp2 support responded:

    “Yes it is worth trying LSCRATCH, for program which do a lot of ios, I have seen 10x speedup. Constant writing on the scratch is quite slow. The SCRATCH is optimized for few big io and not multiple small ones.
    If you are limited by the ios, you could also try the $RAMDISK. It use the node RAM, if you have enough unused memory you could see a further speedup using it. “

    flefebvre Staff answered 4 months ago



  • Hello Jean, 
    The IO rate on “parallel_scratch_mp2” is 10Gb/s but is shared amongst all users. Performance will therefore vary depending on what other users are doing.
    NFS is definitely slower, although I ignore to what extent.
    The IO rate on $LSCRATCH is 120 Mb/s but as you say, you will have to copy files in and out at job start and end.
    Another option is $RAMDISK, IO rate of 1.8 Gb/s per core. That is probably your best option if the job doesn’t require much RAM, and if you don’t mind copying in and out as for $LSCRATCH
    hope this helps
     
    For you reference:
    https://wiki.calculquebec.ca/w/Utiliser_l%27espace_de_stockage#tab=tab7
    You can also email mp2 support at mammouth@ccs.usherbrooke.ca for mp2 tailored advice.
     

    Jean M replied 4 months ago

    Thanks for the information. So, for example, what do you do when you run your pipelines? Directly on “parallel_scratch_mp2”? Or first in $LSCRATCH and move the files at the end of the job?

    flefebvre Staff replied 4 months ago

    Hi Jean, if you mean in GenPipes, I think $LSCRATCH is just used for tools that require a temporary directory. We certainly do not copy all files to local scratch, e.g. bam files.

    Jean M replied 4 months ago

    Ok, thanks for the information.