FORUMTranscript expression using HISAT, Stringtie and Ballgown
Morris asked 2 years ago

Hello everyone,
I’m new to the tuxedo suit and I’m having some problem with ballgown.
basically when I give the command:
pf_rna<-ballgown(dataDir=”ballgown/”, samplePattern = sample, pData=pheno_data)
I get this:
Rows of pData did not seem to be in the same order as the columns of the expression data. Attempting to rearrange pData…
I’m sure that the name of the sample in pData are the same and the same order of the dataDir/path and I can’t understand why is giving me error.

Since I’m new to it I want to make sure that I’m doing well because I have a doubt about samplePattern.
In samplePattern I create a group of my sample names without path , here it is
sample=c( “PF382_1_EC_E8”,
Can you please tell me if that is correct?

thank you

1 Answers
jhgalvez Staff answered 2 years ago

First of all, for an experiment your size, it might just be easier to use `samples` and define the path to each sample’s output data individually than to bother with `dataDir` and `samplePattern`. I find that it tends to give less errors, and when experiments are not too large, it is not so important to save time using the other two.
That being said it seems to me that your error is related to how you are defining pData, and not necessarily samplePattern. Could you show me how you are defining `pheno_data`? It needs to be a dataframe with each sample in a separate row, and then the phenotypic data (i.e. experimental groups) in each column. If that dataframe does not have the appropriate dimensions, I believe that you will get an error message similar to the one you are getting.

Morris replied 2 years ago

Hi jhgalvez,

thank you very much for you advice,

I tried a similar way that you suggested and it worked:

pheno_data = read.table(file =“/phonotype.txt”, header = TRUE, sep = “\t”)
sample_full_path = paste(“/ballgown/” ,pheno_data [,1], sep = ‘/’ )
bg = ballgown(samples=as.vector(sample_full_path),pData=pheno_data,verbose = TRUE)

I will have to include more samples in my analysis and I will have bigger dataset so I wish to understand and solve the issue with my the command line that I was using.

The way I call the data.frame is this,

pheno_data = read.table(file =“/phonotype.txt”, header = TRUE, sep = “\t”)

And this is the data frame in .txt file,

ids category
PF382_1_EC_E8 SP
PF382_2_EC_E8 SP
PF382_1_dGSK_Cherry_E8 SP
PF382_2_dGSK_Cherry_E8 SP
PF382_2_dGSK_Cherry_FOXO3ATM_mCD8 DP
PF382_1_dGSK_Cherry_FOXO3ATM_mCD8 DP

Can you help me to fix it please?

Thank you very much

jhgalvez Staff replied 2 years ago

It seems that your `pheno_data` table is good, as far as I can tell. One think that I noticed is that in your first example you are defining the `dataDir` like this:


It is better if you define this variable using the “file.path” function, because by adding the / symbol at the end of your previous definition, you might be messing up the paths. So in the end, the command would look something like this:

pf_rna<-ballgown(dataDir=file.path(”ballgown”), samplePattern = sample, pData=pheno_data)

Try that and let me know if it works.

Morris replied 2 years ago

It didn’t work,
can’t understand what it doesn’t like from my pData

bg <- ballgown (dataDir = file.path("ballgown"), samplePattern=sample ,pData=pheno_data)
Mon Jul 8 18:23:14 2019
Mon Jul 8 18:23:14 2019: Reading linking tables
Mon Jul 8 18:23:15 2019: Reading intron data files
Mon Jul 8 18:23:16 2019: Merging intron data
Mon Jul 8 18:23:18 2019: Reading exon data files
Mon Jul 8 18:23:19 2019: Merging exon data
Mon Jul 8 18:23:22 2019: Reading transcript data files
Mon Jul 8 18:23:23 2019: Merging transcript data
Error in ballgown(dataDir = file.path("ballgown"), samplePattern = sample, :
first column of pData does not match the names of the folders containing the ballgown data.
In addition: Warning message:
In ballgown(dataDir = file.path("ballgown"), samplePattern = sample, :
Rows of pData did not seem to be in the same order as the columns of the expression data. Attempting to rearrange pData…

thank you

Morris replied 2 years ago

probably it’s something wrong with ballgown by using this scripts

jhgalvez Staff replied 2 years ago

Yes, I’m not sure either what is wrong. One last strategy would be to define `pData` after building the ballgown object, and not while building it. One way to do this would be:

pf_rna<-ballgown(dataDir=file.path(”ballgown”), samplePattern = sample)
pData(pf_rna) = data.frame(id=sampleNames(pf_rna), group=c("SP","SP","SP","SP","SP","SP","DP","DP"))

Maybe doing it this way will solve the issue. Otherwise, I'm out of ideas. Good luck!

Morris replied 2 years ago

If I do in this way, will be recognized only one sample

id group
1 PF382_1_EC_E8 SP
2 PF382_1_EC_E8 SP
3 PF382_1_EC_E8 SP
4 PF382_1_EC_E8 SP
5 PF382_1_EC_E8 SP
6 PF382_1_EC_E8 SP
7 PF382_1_EC_E8 DP
8 PF382_1_EC_E8 DP
> pf_rna
ballgown instance with 249321 transcripts and 1 samples
It doesn’t want to recognize the names of the folders in the list of ballgown/ path

Thank you for you time jhgalvez

Morris replied 2 years ago

problem solved by arranging the file names in the file.csv the same way as they appear on Rstudio (it’s probably in alphabetic order)

tnx 🙂