FORUMDESeq2 error in design~ code
Morris asked 4 months ago



Hello everyone,

I’m using DESeq2 for RNA sequencing DGE analysis and I’m having error with this code
dds=DESeqDataSetFromTximport(txi,colData = coldata,design= ~ batch+condition)
Error in DESeqDataSet(se, design = design, ignoreRank) :
all variables in design formula must be columns in colData
Here is my coldata
coldata <- read.table(file.path(“Salm_txt_DEseq_update.txt”), header=TRUE)
and I have a column condition with16 rows 12 treated samples and 4 untreated sample
if I remove batch from the chmd line
dds=DESeqDataSetFromTximport(txi,colData = coldata,design=~condition)
it works ok .. but I wish to find out what’s wrong with the batch argument
any help would be appreciated
thank you

Rob Syme Staff replied 4 months ago

Hi Morris.

You’ve described that your file “Salm_txt_DEseq_update.txt” contains a column “condition”. Does that same file contain a column “batch”?

1 Answers
Rob Syme Staff answered 4 months ago



For those coming to this question through search, the problem is probably a missing column “batch” in the coldata (“Salm_txt_DEseq_update.txt” in this case) data frame. Variables used in constructing the design formula (condition and batch in Morris’ example) must refer to columns the dataframe passed as coldata in the call to DESeqDataSetFromTximport.

Morris replied 4 months ago

Hi Rob, sorry for the delay
yes my file now contains a column “batch” and the error is fixed but I can’t see variation after adding batch, which probably is due to
a mistake for using this command.
I have 4 different conditions (A,B,C,D), and for each condition I have two samples, there is a batch effect only in one condition (D) so, should I use a different
argument of batch for only this two samples (D)?
for example, here is the batch column of the data frame:
Condition Batch
A I
A I
B II
B II
C III
C III
D II
D III

Thank you

Rob Syme Staff replied 3 months ago

Hi Morris. I’m unsure what you mean when you say that you “can’t see variation after adding batch”. What outputs are you considering? It would be helpful if you could describe the output you were expecting and the output you are seeing.

Morris replied 3 months ago

Hi Rob,
I mean that when I see the plots like PCA, or heatmap to see how much variations there in my samples, they look exactly the same.
I would expect an output where samples of the same group are close with each other and less scattered

thanks

Rob Syme Staff replied 3 months ago

Gotcha. I’m pretty sure that both the PCA and heatmap figures will show data that is independent of the design formula. These figures show (sometimes transformed) per-sample metrics, but not per-condition metrics.

The design formula is used to build the model from which the differential expression p-values are calculated. Do the p-values that are returned from the results() function change when you modify the design formula?

Morris replied 3 months ago

Hi Rob,
yes the p-values improve , that’s what has to change which is the most important part I’d say,

thank you πŸ™‚

Rob Syme Staff replied 3 months ago

Great! It sounds like your problem is resolved πŸ™‚
In summary:
The initial problem of “Error in DESeqDataSet” was resolved be introducing the relevant column into the data frame passed in as the coldata variable in the function DESeqDataSetFromTximport.
The subsequent confusion was that Morris was expecting that the PCA and heatmaps would change in response to changes in the design formula. This is as expected, as the design formula only affects the results of comparisons, but the PCA and heatmaps plot transforms of the read counts and are independent of the experimental design.
Does that sound about right, Morris?

Morris replied 3 months ago

Yes Rob,
thanks a lot for your time