In my project we have been processing tumour genome data from fresh sources with good success. Now that our fresh sample resource has been depleted we are turning our attention to FFPE sources. Is there anything we should watch out for when moving toward the FFPE space? I expect the data to be « lower quality » but I don’t know the specifics of what that might look like.
There are a number of factors to consider when processing FFPE genomes. Below is a short list of some things to watch out for:
- FFPE-extracted DNA is often more fragmented than DNA extracted from fresh frozen tissue. This usually leads to smaller insert sizes which can lead to significant amounts of read pairs overlapping. This may cause difficultly in:
- Estimating your X coverage of genomes (ie. do you want to count overlapping fragment twice?)
- Variant calling where base errors (more on this below) in a fragment may be sequenced twice. This can cause false positives when a variant caller treats every read as a unique observation. For example, this is the case for Strelka.
- Reads may contain significant amounts of adapter on the ends which can cause difficulty in aligning with some older aligners. It can also cause difficulty for de novo assembly.
- Base errors are more common in FFPE genome data. There are some reagents that can be applied which do improve this, so care must be taken when comparing variant calls from data that were treated different during library construction.
- Genome coverage will be less even than you would have seen with PCR free genomes. PCR is often required to make FFPE libraries so there are rich GC and rich AT regions that are hard to capture.
- CNV results will be much noisier when using FFPE derived data. The noise is more than just what we can account for with the above metrics. Some of the best improvements in this area just treat the FFPE depth profiles with signal processing techniques rather than knowledge about when FFPE is doing to the DNA.