FORUMPlanning to use snakemake for bioinformatics pipeline: how is the wall time set for all the pipeline?
mobze asked 9 months ago



Hello,

I’m thinking using the SnakeMake workflow for a bioinformatics pipeline. I’ve never used it, but want to implement it and send the job (snakemake) to Beluga. The problem that I’m not sure how to address is “will the snakemake always crash because of the wall-time that needs to be set for the various steps?” Basically, If I send a SnakeMake on the cluster, and the pipeline takes a lot of time to run, do I have to estimate the time for the whole analysis to complete or I need to send a certain number of rules and estimate the time for just those rules to finish running? I’m worried that sending a Snakemake to the cluster might have compatibility issues…
Thanks for your support,

1 Answers
Best Answer
zhibin Staff answered 9 months ago



You need to create a config file and put resources needed for each step in the config file. You can have a default (__default__) session for steps missing in the config file. When you submit the snakemake job, you need to tell snakemake how to submit jobs to the cluster with something like, –cluster “sbatch -A {cluster.account} -p {cluster.partition} -n {cluster.n} -t {cluster.time}”. It is better create a profile file.
The detail is at https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html#cluster-configuration-deprecated
 

zhibin Staff replied 9 months ago

You need to estimate how long the whole pipeline will need though because the main snakemake job should be running until all the steps are finished.

mobze replied 9 months ago

OK! That’s what I was looking for! It seems that the cluster configuration is deprecated and that now it’s recommended to have “profiles”. https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles

Also for future reference, this is relevant for the cluster configuration: https://bioinformatics.stackexchange.com/questions/4977/running-snakemake-on-cluster. Also, there is a SLURM easy script (https://github.com/dpryan79/Misc/blob/master/MPIIE_internal/SlurmEasy) that might be useful. I’m just starting planning to do this kind of job, so learning on how to implement it to the cluster.