FORUMFiltering a large WGS datatset with VCFtools, but no filtered VCF output produced. Any suggestions?
Cam asked 1 week ago



Hi,

I am trying to filter my ~30 gb WGS dataset (currently in VCF format) using VCFtools in Cedar. I have prepared a bash script and have successfully ran the script based on the .out file, however, I am unable to locate the actual filtered dataset.

Here is my .sh that has run successfully but won’t produce a filtered VCF:

#!/bin/bash
#SBATCH –time=1:00:00
#SBATCH –mem=5000M
#SBATCH –mail-user=<moirc@uoguelph.ca>
#SBATCH –mail-type=ALL
echo ‘The job that Cam submitted is running.’
module load nixpkgs/16.09  intel/2018.3 vcftools/0.1.14
vcftools –vcf CunnerFreebayesCam.vcf –mac 2 –min-meanDP 7 –max-missing-count 1 –minQ 200 –out /projects/def-boulding/Sharedfiles/CunSNPfilter.vcf

Here is the slurm output file showing that it worked, even though a filtered VCF file can’t be located:

VCFtools – 0.1.14
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
        –vcf CunnerFreebayesCam.vcf
        –mac 2
        –max-missing-count 1
        –min-meanDP 7
        –minQ 200
        –out CunSNPfilter.vcf

After filtering, kept 15 out of 15 Individuals
After filtering, kept 2941047 out of a possible 25402547 Sites
Run Time = 238.00 seconds

Any Ideas will be greatly appreciated!

Best,
Camden Moir
University of Guelph
moirc@uoguelph.ca

1 Answers
nibeh Staff answered 1 week ago



Hi Cam,
Try adding the  – -recode flag to produce a new vcf file, and let me know if that works.
See here: http://vcftools.sourceforge.net/man_latest.html