I’m using DESeq2 for RNA sequencing DGE analysis and I’m having error with this code
dds=DESeqDataSetFromTximport(txi,colData = coldata,design= ~ batch+condition)
Error in DESeqDataSet(se, design = design, ignoreRank) :
all variables in design formula must be columns in colData
Here is my coldata
coldata <- read.table(file.path(“Salm_txt_DEseq_update.txt”), header=TRUE)
and I have a column condition with16 rows 12 treated samples and 4 untreated sample
if I remove batch from the chmd line
dds=DESeqDataSetFromTximport(txi,colData = coldata,design=~condition)
it works ok .. but I wish to find out what’s wrong with the batch argument
any help would be appreciated
You’ve described that your file “Salm_txt_DEseq_update.txt” contains a column “condition”. Does that same file contain a column “batch”?
For those coming to this question through search, the problem is probably a missing column “
batch” in the
coldata (“Salm_txt_DEseq_update.txt” in this case) data frame. Variables used in constructing the design formula (
batch in Morris’ example) must refer to columns the dataframe passed as
coldata in the call to
Hi Rob, sorry for the delay
yes my file now contains a column “batch” and the error is fixed but I can’t see variation after adding batch, which probably is due to
a mistake for using this command.
I have 4 different conditions (A,B,C,D), and for each condition I have two samples, there is a batch effect only in one condition (D) so, should I use a different
argument of batch for only this two samples (D)?
for example, here is the batch column of the data frame:
Hi Morris. I’m unsure what you mean when you say that you “can’t see variation after adding batch”. What outputs are you considering? It would be helpful if you could describe the output you were expecting and the output you are seeing.
I mean that when I see the plots like PCA, or heatmap to see how much variations there in my samples, they look exactly the same.
I would expect an output where samples of the same group are close with each other and less scattered
Gotcha. I’m pretty sure that both the PCA and heatmap figures will show data that is independent of the design formula. These figures show (sometimes transformed) per-sample metrics, but not per-condition metrics.
The design formula is used to build the model from which the differential expression p-values are calculated. Do the p-values that are returned from the
results() function change when you modify the design formula?
yes the p-values improve , that’s what has to change which is the most important part I’d say,
thank you 🙂
Great! It sounds like your problem is resolved 🙂
The initial problem of “Error in DESeqDataSet” was resolved be introducing the relevant column into the data frame passed in as the coldata variable in the function DESeqDataSetFromTximport.
The subsequent confusion was that Morris was expecting that the PCA and heatmaps would change in response to changes in the design formula. This is as expected, as the design formula only affects the results of comparisons, but the PCA and heatmaps plot transforms of the read counts and are independent of the experimental design.
Does that sound about right, Morris?
thanks a lot for your time