FORUMPlanning to use snakemake for bioinformatics pipeline: how is the wall time set for all the pipeline?
mobze demandée il y a 8 mois



Hello,

I’m thinking using the SnakeMake workflow for a bioinformatics pipeline. I’ve never used it, but want to implement it and send the job (snakemake) to Beluga. The problem that I’m not sure how to address is « will the snakemake always crash because of the wall-time that needs to be set for the various steps? » Basically, If I send a SnakeMake on the cluster, and the pipeline takes a lot of time to run, do I have to estimate the time for the whole analysis to complete or I need to send a certain number of rules and estimate the time for just those rules to finish running? I’m worried that sending a Snakemake to the cluster might have compatibility issues…
Thanks for your support,

1 Réponses
Best Answer
zhibin personnel répondue il y a 8 mois



You need to create a config file and put resources needed for each step in the config file. You can have a default (__default__) session for steps missing in the config file. When you submit the snakemake job, you need to tell snakemake how to submit jobs to the cluster with something like, –cluster « sbatch -A {cluster.account} -p {cluster.partition} -n {cluster.n} -t {cluster.time} ». It is better create a profile file.
The detail is at https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html#cluster-configuration-deprecated
 

zhibin personnel répondue il y a 8 mois

You need to estimate how long the whole pipeline will need though because the main snakemake job should be running until all the steps are finished.

mobze répondue il y a 8 mois

OK! That’s what I was looking for! It seems that the cluster configuration is deprecated and that now it’s recommended to have « profiles ». https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles

Also for future reference, this is relevant for the cluster configuration: https://bioinformatics.stackexchange.com/questions/4977/running-snakemake-on-cluster. Also, there is a SLURM easy script (https://github.com/dpryan79/Misc/blob/master/MPIIE_internal/SlurmEasy) that might be useful. I’m just starting planning to do this kind of job, so learning on how to implement it to the cluster.