FORUMSTAR reference genome on Graham, running an alignment job
OmriNach demandée il y a 7 mois



Hello, new user and bioinformatics novice,
when running an alignment job using STAR, the first thing is to build the reference genome. Is that necessary on graham or is one already built? And if so, which path should I refer to (Looking for human genome 38-25)

1 Réponses
flefebvre personnel répondue il y a 7 mois



Hi OmriNach, the BNT and C3G maintain several NGS-ready genomes under « /cvmfs/ref.mugqic/genomes/species/ » which includes GRCh38 and its STAR index under « /cvmfs/ref.mugqic/genomes/species/Homo_sapiens.GRCh38/genome/star_index ».  
This should be available on all CC systems.
 
I believe the accession version would be GCA_000001405.15, but we would need to get back to you on this. If you absolutely need a specific patch version (e.g. 25?), you will be better off creating your own index. Because of the index file sizes, it is technically not feasible for us to maintain multiple versions.
You can also email requests or questions about this resource to help@c3g.ca.
 
 

OmriNach répondue il y a 7 mois

Thanks for the reply, I don’t think i’ll need a specific ascension version. So in the –genomeDir option I just specify the star_index path of the genome of interest on the graham comp?

OmriNach répondue il y a 7 mois

Thanks for the reply, I don’t think i’ll need a specific ascension version. So in the –genomeDir option I just specify the star_index path of the genome of interest on the graham comp?

flefebvre personnel répondue il y a 7 mois

Yes, for instance an example from our runs:

STAR –runMode alignReads \
–genomeDir /cvmfs/soft.mugqic/CentOS6/genomes/species/Homo_sapiens.GRCh38/genome/star_index/Ensembl87.sjdbOverhang99 \
–readFilesIn \
A.trim.pair1.fastq.gz \
A.trim.pair2.fastq.gz \
–runThreadN 16 \
–readFilesCommand zcat \
–outStd Log \
–outSAMunmapped Within \
–outSAMtype BAM Unsorted \
–outFileNamePrefix alignment_1stPass/A/ \
–outSAMattrRGline ID: »A » PL: »ILLUMINA » PU: »A LB: »A » SM: »A » CN: »McGill University and Genome Quebec Innovation Centre » \
–limitGenomeGenerateRAM 100000000000 \
–limitIObufferSize 4000000000

If you aren’t already using a framework to run your jobs, our platform also has developed a framework called GenPipes to automate all this on CC systems:

https://bitbucket.org/mugqic/genpipes/src/master/

It is good for novice users to learn how to run the tools directly. Eventually, however, you will want to use a pipeline framework like GenPipes, nextflow-core etc.

OmriNach répondue il y a 7 mois

Thanks for the example, that helps a bunch. I am currently using snakemake as my pipeline automator. Will check out the other ones you suggested as well.