Filtering a large WGS datatset with VCFtools, but no filtered VCF output produced. Any suggestions?
Cam asked 1 year ago


I am trying to filter my ~30 gb WGS dataset (currently in VCF format) using VCFtools in Cedar. I have prepared a bash script and have successfully ran the script based on the .out file, however, I am unable to locate the actual filtered dataset.

Here is my .sh that has run successfully but won’t produce a filtered VCF:

#SBATCH –time=1:00:00
#SBATCH –mem=5000M
#SBATCH –mail-user=<>
#SBATCH –mail-type=ALL
echo ‘The job that Cam submitted is running.’
module load nixpkgs/16.09  intel/2018.3 vcftools/0.1.14
vcftools –vcf CunnerFreebayesCam.vcf –mac 2 –min-meanDP 7 –max-missing-count 1 –minQ 200 –out /projects/def-boulding/Sharedfiles/CunSNPfilter.vcf

Here is the slurm output file showing that it worked, even though a filtered VCF file can’t be located:

VCFtools – 0.1.14
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
        –vcf CunnerFreebayesCam.vcf
        –mac 2
        –max-missing-count 1
        –min-meanDP 7
        –minQ 200
        –out CunSNPfilter.vcf

After filtering, kept 15 out of 15 Individuals
After filtering, kept 2941047 out of a possible 25402547 Sites
Run Time = 238.00 seconds

Any Ideas will be greatly appreciated!

Camden Moir
University of Guelph

1 Answers
nibeh Staff answered 1 year ago

Hi Cam,
Try adding the  – -recode flag to produce a new vcf file, and let me know if that works.
See here: