FORUMRunning RepeatMasker
sjossey asked 4 months ago



  • I ran repeat masker with the following command

    RepeatMasker -species mammal -pa 32 -s GCF_000493695.1_BalAcu1.0_genomic.fna

    The organism is a minke whale, a mammal, and is known to have ~40% repeats but my analysis gives me only 2%
    Only detecting Simple repeats and Low complexity regions in the genome.
    How am I suppose to make the repeat masker detect all kinds of repeats, Do I need to download the database to my scratch folder?
    Thanks,
    Sushma

    2 Answers
    jrosner Staff answered 4 months ago



  • Hi sjossey,

    full disclosure, i’m not an expert with RepeatMasker, but i did some digging around and it seems that this software is only for low complexity and simple repeats. you can adjust the sensitivity (as you’ve done using the -s option), but that’s about it. another option might be if you can find an alternate database?

    Now, was able to dig up the following paper as well:
    “Minke whale genome and aquatic adaptation in cetaceans”
    https://www.nature.com/articles/ng.2835

    In this paper, they describe using a number of different masking tools:
    “The genome was searched for repetitive elements using Tandem Repeats Finder version 4.04. Transposable elements were identified using homology-based approaches. The Repbase (version 16.10) database of known repeats and a de novo repeat library generated by RepeatModeler were used. This database was used to find repeats with software such as RepeatMasker version 3.3.0.”

    Let me know if this is helpful…
    but if you’re still having problems i’ll see if I can find someone with a little more knowledge to help you out
    (many of us are on vacation right now!!!)

    sjossey replied 4 months ago

    Thanks for the help and taking the time

    I presumed the repeatmasker in cedar is using Repbase. My output looks like this

    number of length percentage
    elements* occupied of sequence
    ————————————————–
    SINEs: 0 0 bp 0.00 %
    Alu/B1 0 0 bp 0.00 %
    MIRs 0 0 bp 0.00 %

    LINEs: 0 0 bp 0.00 %
    LINE1 0 0 bp 0.00 %
    LINE2 0 0 bp 0.00 %
    L3/CR1 0 0 bp 0.00 %
    RTE 0 0 bp 0.00 %

    LTR elements: 0 0 bp 0.00 %
    ERVL 0 0 bp 0.00 %
    ERVL-MaLRs 0 0 bp 0.00 %
    ERV_classI 0 0 bp 0.00 %
    ERV_classII 0 0 bp 0.00 %

    DNA elements: 0 0 bp 0.00 %
    hAT-Charlie 0 0 bp 0.00 %
    TcMar-Tigger 0 0 bp 0.00 %

    Unclassified: 0 0 bp 0.00 %

    Total interspersed repeats: 0 bp 0.00 %

    Small RNA: 0 0 bp 0.00 %

    Satellites: 0 0 bp 0.00 %
    Simple repeats: 941278 37980227 bp 1.56 %
    Low complexity: 196297 9559739 bp 0.39 %

    the paper ‘Insights into the Evolution of Longevity from the Bowhead Whale Genome’ under Experimental Procedures the following paragraph
    “Evaluation of Repeat Elements
    To evaluate the percentage of repeat elements, RepeatMasker (v.4.0.3; http://www.repeatmasker.org/) was used to identify repeat elements, with parameters set as “-s -species mammal.” RMBlast was used as a sequence search engine to list out all types of repeats. Percentage of repeat elements was calculated as the total number of repeat region divided by the total length of the genome, excluding the N-region. ”

    So do I need to have Repbase database from the site (https://www.girinst.org/server/RepBase/index.php) in my scratch folder or can it be installed in Cedar so I can run it.

    Thanks!
    Sushma

    jrosner Staff answered 4 months ago



  • Hi Sushma,

    For the sake of expediency, I would advise that you download this yourself for now.
    There are some licensing details we need to look into, and currently the RepBase registration link is broken.
    But we will look into how we might be able to provide this centrally in the near future, and follow up here.

    Cheers!

    sjossey replied 4 months ago

    I am trying to register in the database too.
    The link for registration is
    https://www.girinst.org/accountservices/register.php

    Thanks again,
    Sushma

    jrosner Staff replied 4 months ago

    Hi again, I had a chance to read through the RepBase license agreement and also spoke with some other members of the team…
    Under the current conditions of the license, it looks like Compute Canada won’t be able to host this centrally, which is unfortunate, but it is what it is.
    And so, my earlier advice holds true, that is, you need to register and download your own personal copy that can be shared only within your research group.

    sjossey replied 4 months ago

    Thank you very much,
    Sushma