FORUMDownloading/saving database for Virmet
sjossey asked 4 months ago



  • Hi,
    I installed Virmet in my scratch folder of Cedar but when I try to download the database using

    $ virmet fetch --viral n

    It gives the following error

    Traceback (most recent call last):

    File “/scratch/sjossey/ENV/bin/virmet”, line 11, in <module>

    load_entry_point(‘VirMet==1.1.1’, ‘console_scripts’, ‘virmet’)()

    File “/scratch/sjossey/ENV/lib/python3.5/site-packages/VirMet-1.1.1-py3.5.egg/virmet/__main__.py”, line 120, in main

    args.func(args)

    File “/scratch/sjossey/ENV/lib/python3.5/site-packages/VirMet-1.1.1-py3.5.egg/virmet/__main__.py”, line 44, in fetch_db

    fetch.main(args)

    File “/scratch/sjossey/ENV/lib/python3.5/site-packages/VirMet-1.1.1-py3.5.egg/virmet/fetch.py”, line 21, in main

    cml_search = viral_query(‘n’)

    File “/scratch/sjossey/ENV/lib/python3.5/site-packages/VirMet-1.1.1-py3.5.egg/virmet/common.py”, line 134, in viral_query

    os.mkdir(target_dir)

    FileNotFoundError: [Errno 2] No such file or directory: ‘/data/virmet_databases/viral_nuccore’

    
    

    Since, we have no permission to make folders in the /data in Cedar, I was hoping if anyone has figured out how the database can be made in the scratch folder instead of the default.
    Thanks,
    Sushma

    3 Answers
    jhgalvez Staff answered 4 months ago



  • Modify line 13 of script `src/virmet/common.py` with the path to your scratch (or wherever you want to save the data). That way, the database will be downloaded to a file for which you have write permissions. 

    sjossey answered 4 months ago



  • Thanks for the quick response,
    But now after I made the changes to common.py I get a new error 

    $ virmet fetch --viral n
    Traceback (most recent call last):
    File "/scratch/sjossey/ENV/bin/virmet", line 11, in <module>
    load_entry_point('VirMet==1.1.1', 'console_scripts', 'virmet')()
    File "/scratch/sjossey/ENV/lib/python3.5/site-packages/VirMet-1.1.1-py3.5.egg/virmet/__main__.py", line 120, in main
    args.func(args)
    File "/scratch/sjossey/ENV/lib/python3.5/site-packages/VirMet-1.1.1-py3.5.egg/virmet/__main__.py", line 44, in fetch_db
    fetch.main(args)
    File "/scratch/sjossey/ENV/lib/python3.5/site-packages/VirMet-1.1.1-py3.5.egg/virmet/fetch.py", line 42, in main
    assert accs_1 == accs_2, accs_1 ^ accs_2
    AssertionError: {''}

    Hope you can help me figure out the error. And when I check the folder to which I gave the path to, the folder was made with some files but all are empty.
    Thanks,
    Sushma
     

    jhgalvez Staff replied 4 months ago

    It’s hard for me to understand where the error is coming from using this output alone. You should have a log file somewhere, the contents of that log file will be more useful to debug. If you could send that I will try to look into what is happening.

    sjossey replied 4 months ago

    I found a log file hope you can help figure it out

    $ cat virmet.log
    INFO 2018/07/12 12:34:30 __main__.py: main() 116: /scratch/sjossey/ENV/bin/virmet fetch –viral n
    DEBUG 2018/07/12 12:34:40 __init__.py: wrapper() 410: $HOME=/home/sjossey
    DEBUG 2018/07/12 12:34:40 __init__.py: wrapper() 410: matplotlib data path /scratch/sjossey/ENV/lib/python3.5/site-packages/matplotlib/mpl-data
    DEBUG 2018/07/12 12:34:40 __init__.py: rc_params_from_file() 1157: loaded rc file /scratch/sjossey/ENV/lib/python3.5/site-packages/matplotlib/mpl-data/matplotlibrc
    DEBUG 2018/07/12 12:34:40 __init__.py: () 1867: matplotlib version 2.2.2
    DEBUG 2018/07/12 12:34:40 __init__.py: () 1868: interactive is False
    DEBUG 2018/07/12 12:34:40 __init__.py: () 1869: platform is linux
    DEBUG 2018/07/12 12:34:40 __init__.py: () 1870: loaded modules: [‘copyreg’, ‘pandas.core.dtypes.common’, ‘keyword’, ‘pandas._libs.properties’, ‘pandas.io.formats.format’, ‘selectors’, ‘pkg_resources.extern.six.moves.urllib’, ‘dateutil.relativedelta’, ‘dateutil.easter’, ‘datetime’, ’email.feedparser’, ‘_random’, ‘numpy.linalg.linalg’, ‘pathlib’, ‘_warnings’, ‘pandas.core.indexes.multi’, ‘pandas.core.frame’, ‘platform’, ‘collections.abc’, ‘pandas._libs.tslibs.frequencies’, ‘pandas.plotting._tools’, ‘numpy.lib.twodim_base’, ‘pandas.tseries’, ‘linecache’, ‘dateutil.parser’, ‘binascii’, ‘six’, ‘numpy.core.machar’, ‘pandas.core.indexes.datetimes’, ‘pandas.io.formats.common’, ‘glob’, ‘token’, ‘matplotlib._color_data’, ‘pandas.core.series’, ‘numpy.lib.utils’, ‘_thread’, ‘matplotlib.colors’, ‘pandas.compat’, ‘numpy.testing.nose_tools.decorators’, ‘pandas._libs.algos’, ‘pandas.core.indexes.interval’, ‘pandas.core.util.hashing’, ‘pkg_resources.extern.packaging._compat’, ‘distutils.version’, ‘io’, ‘numpy._globals’, ‘plistlib’, ‘urllib.request’, ‘numpy.version’, ‘pandas.tseries.offsets’, ‘_io’, ‘pandas.plotting._style’, ‘pkg_resources._vendor.six.moves’, ‘numpy.polynomial.laguerre’, ‘pandas.core.index’, ‘numpy.lib.arrayterator’, ‘imp’, ‘ctypes’, ‘ssl’, ‘matplotlib.cbook’, ‘pandas.api.types’, ‘pandas.core.tools.datetimes’, ‘logging.handlers’, ‘importlib.machinery’, ‘json.scanner’, ‘pkg_resources.extern.appdirs’, ‘uu’, ‘numpy.lib.mixins’, ‘_opcode’, ‘itertools’, ‘numpy._import_tools’, ‘numpy.testing.nose_tools’, ‘random’, ‘pandas._libs.lib’, ‘tokenize’, ‘zipimport’, ‘unittest.loader’, ‘_bisect’, ‘pyexpat’, ‘numpy.core.memmap’, ‘six.moves.urllib’, ‘pandas._libs.hashing’, ‘numpy’, ‘pandas.io.formats.printing’, ‘marshal’, ’email.iterators’, ‘re’, ’email._policybase’, ‘numpy.polynomial.chebyshev’, ‘pandas._libs.hashtable’, ‘numpy.core._internal’, ‘pandas._libs.tslibs.timedeltas’, ‘pandas.core.indexes.datetimelike’, ‘pandas.core.indexes.base’, ‘collections’, ‘pandas.core.sparse’, ‘pwd’, ‘mpl_toolkits’, ‘pandas.compat.numpy’, ‘importlib.util’, ‘pkg_resources.extern.packaging._structures’, ‘urllib’, ‘dis’, ‘dateutil’, ‘_ssl’, ‘matplotlib.testing’, ‘opcode’, ‘unittest.case’, ‘bisect’, ‘unittest.util’, ‘sysconfig’, ‘numpy.ctypeslib’, ‘numpy.matrixlib’, ‘getopt’, ‘pandas._libs’, ‘numpy.lib._datasource’, ‘_bootlocale’, ‘pkg_resources’, ‘_locale’, ‘pandas.core.strings’, ‘distutils.util’, ‘pandas.io.common’, ‘cython_runtime’, ‘numpy.core.umath’, ‘bz2’, ‘numpy.core’, ‘matplotlib.fontconfig_pattern’, ‘numpy.ma.extras’, ‘ipaddress’, ‘pandas.core.indexes.timedeltas’, ‘xml.parsers.expat.errors’, ‘warnings’, ’email._parseaddr’, ‘base64’, ‘pandas.core.dtypes.api’, ‘numpy.lib.ufunclike’, ‘numpy.lib.stride_tricks’, ‘pandas.core.dtypes.missing’, ‘numpy.lib.shape_base’, ‘urllib.response’, ‘pandas.core.dtypes.concat’, ‘pytz.tzfile’, ‘queue’, ‘numpy.lib.financial’, ‘_hashlib’, ‘pandas.core.common’, ‘numpy.core.fromnumeric’, ‘numpy.compat’, ‘unittest’, ‘pandas.plotting._converter’, ‘xml’, ’email.base64mime’, ‘importlib._bootstrap’, ‘distutils.debug’, ‘pandas.core.dtypes.generic’, ‘distutils.log’, ‘pandas.core.indexing’, ‘numpy.lib.format’, ‘numpy.compat.py3k’, ‘pandas.core.internals’, ‘encodings.utf_8’, ‘gzip’, ‘pyexpat.model’, ‘pandas.core.dtypes.dtypes’, ‘pytz.tzinfo’, ‘argparse’, ‘encodings.aliases’, ‘subprocess’, ‘pandas._libs.sparse’, ‘pandas.core.dtypes’, ‘matplotlib’, ‘_csv’, ‘pandas.core’, ‘zlib’, ‘pandas.core.sparse.array’, ‘pandas.core.tools.timedeltas’, ‘pandas.core.indexes.range’, ‘ntpath’, ‘pkg_resources.extern’, ‘numpy.linalg.lapack_lite’, ‘pandas.core.indexes.numeric’, ‘numpy.polynomial.polyutils’, ‘numpy.core.numeric’, ‘distutils.errors’, ‘unittest.runner’, ‘dateutil.parser.isoparser’, ‘numpy.core.defchararray’, ‘_compression’, ‘pandas.errors’, ‘numpy._distributor_init’, ‘xml.parsers.expat’, ‘_decimal’, ‘locale’, ‘_json’, ‘_posixsubprocess’, ‘shutil’, ’email.message’, ‘pandas’, ‘pandas._libs.index’, ‘ast’, ‘numpy.core.records’, ’email.header’, ‘virmet.__main__’, ‘math’, ‘__future__’, ‘string’, ‘_ast’, ‘pkg_resources._vendor’, ‘pandas.core.missing’, ‘numpy.lib.scimath’, ‘_virtualenv_distutils’, ‘numpy.core.numerictypes’, ‘_string’, ‘numpy.core.getlimits’, ‘unittest.suite’, ‘unittest.main’, ‘copy’, ‘_datetime’, ‘dateutil.tz.tz’, ‘pickle’, ‘json.encoder’, ‘six.moves’, ‘matplotlib.cbook._backports’, ‘matplotlib.cbook.deprecation’, ‘numpy.polynomial.legendre’, ‘socket’, ‘pkg_resources._vendor.six’, ‘types’, ‘functools’, ‘operator’, ‘threading’, ‘_codecs’, ‘pandas.compat.chainmap’, ’email.errors’, ‘calendar’, ‘distutils.fancy_getopt’, ‘enum’, ‘virmet’, ‘pandas.plotting._compat’, ‘_heapq’, ‘pytz.lazy’, ‘pandas.core.categorical’, ‘pkg_resources.extern.packaging.version’, ‘sys’, ‘http.client’, ‘pandas.core.base’, ‘importlib’, ‘numpy.fft.fftpack’, ‘pandas.core.dtypes.inference’, ‘urllib.parse’, ‘pandas.core.indexes.period’, ‘numbers’, ‘pandas._libs.tslibs.timezones’, ‘encodings.latin_1′, ’email.charset’, ‘sre_constants’, ‘numpy.ma.core’, ‘_operator’, ‘zipfile’, ‘numpy.lib.arraypad’, ‘_lzma’, ‘pkg_resources.extern.six.moves’, ‘pkg_resources.extern.packaging.requirements’, ‘pandas.core.util’, ‘errno’, ‘pandas.core.algorithms’, ‘contextlib’, ‘heapq’, ‘distutils.spawn’, ‘numpy.lib.arraysetops’, ‘_compat_pickle’, ‘distutils.sysconfig’, ‘pandas._libs.tslibs.fields’, ‘dateutil._version’, ‘numpy.testing’, ‘numpy.polynomial.hermite’, ‘numpy.__config__’, ‘site’, ‘hashlib’, ‘pandas.io.formats’, ‘numpy.polynomial’, ‘numpy.matrixlib.defmatrix’, ‘inspect’, ‘distutils.dep_util’, ‘pkg_resources._vendor.packaging.__about__’, ‘matplotlib.compat.subprocess’, ‘pkg_resources.extern.packaging.specifiers’, ’email.utils’, ‘_cython_0_27_2’, ‘dateutil._common’, ‘difflib’, ‘weakref’, ‘numpy.lib.index_tricks’, ‘pandas.core.indexes.category’, ‘pandas.core.indexes.frozen’, ‘numpy.core.info’, ‘lzma’, ‘importlib._bootstrap_external’, ‘pkg_resources.py31compat’, ‘__main__’, ‘stat’, ‘pandas.io’, ‘_imp’, ‘pandas.api’, ‘codecs’, ‘numpy.linalg’, ‘pprint’, ‘gettext’, ‘numpy.testing.nose_tools.nosetester’, ‘gc’, ‘pandas.core.generic’, ‘numpy.polynomial._polybase’, ‘pandas.core.dtypes.cast’, ‘unittest.result’, ‘_collections’, ‘numpy.testing.nose_tools.utils’, ‘_stat’, ‘_weakref’, ‘numpy.random.mtrand’, ‘numpy.linalg.info’, ‘pandas.util’, ‘dateutil.tz._common’, ‘numpy.lib.info’, ‘urllib.error’, ‘pkg_resources.extern.six’, ‘pkgutil’, ‘matplotlib._version’, ‘numpy.lib._version’, ‘time’, ‘_socket’, ‘pandas.core.config_init’, ’email’, ‘textwrap’, ‘numpy.core.multiarray’, ‘pandas.core.accessor’, ‘numpy.lib._iotools’, ‘pandas.core.sorting’, ‘pandas.core.ops’, ‘numpy.linalg._umath_linalg’, ‘posix’, ‘numpy.lib.nanfunctions’, ‘builtins’, ‘numpy.core.shape_base’, ‘_weakrefset’, ‘_frozen_importlib_external’, ’email.parser’, ‘numpy.lib.type_check’, ‘_frozen_importlib’, ‘dateutil.tz._factories’, ‘pandas.plotting’, ‘xml.parsers’, ‘os’, ‘atexit’, ‘pytz’, ‘numpy.testing.decorators’, ‘pandas._libs.tslib’, ‘_sre’, ‘posixpath’, ‘numpy.lib.function_base’, ‘json’, ’email.encoders’, ‘numpy.compat._inspect’, ‘pandas.core.groupby’, ‘pandas._libs.tslibs.strptime’, ‘json.decoder’, ‘pkg_resources.extern.packaging.markers’, ‘fnmatch’, ‘numpy.polynomial.hermite_e’, ‘pytz.exceptions’, ‘_signal’, ‘numpy.fft.helper’, ‘pandas.core.api’, ‘pandas.io.formats.console’, ‘numpy.add_newdocs’, ‘os.path’, ‘encodings.cp437’, ‘dateutil.parser._parser’, ‘cycler’, ‘numpy.random.info’, ‘numpy.lib.polynomial’, ‘traceback’, ‘tempfile’, ‘pyparsing’, ‘pandas._libs.interval’, ‘numpy.fft.info’, ‘numpy.polynomial.polynomial’, ‘pandas.util._decorators’, ‘_bz2’, ‘pandas.core.indexes.api’, ‘numpy.random’, ‘virmet.fetch’, ‘reprlib’, ‘pandas.compat.numpy.function’, ‘http’, ‘sre_compile’, ‘numpy.ma’, ‘importlib.abc’, ‘decimal’, ‘numpy.core.arrayprint’, ‘numpy.lib’, ‘_struct’, ‘grp’, ‘pandas.core.nanops’, ‘pkg_resources.extern.packaging’, ‘pandas.core.indexes.accessors’, ‘pandas._libs.period’, ‘pandas.core.indexes’, ‘_functools’, ‘pandas.core.config’, ‘unicodedata’, ‘distutils.dist’, ‘csv’, ‘numpy.fft.fftpack_lite’, ‘numpy.core._methods’, ‘numpy.testing.utils’, ‘sre_parse’, ‘numpy.core.einsumfunc’, ’email.quoprimime’, ‘logging’, ‘xml.parsers.expat.model’, ‘_pickle’, ‘pandas._libs.join’, ‘pandas.io.formats.terminal’, ‘dateutil.tz’, ‘numpy.lib.npyio’, ‘quopri’, ‘encodings’, ‘genericpath’, ‘numpy.testing.nosetester’, ‘six.moves.urllib.request’, ’email._encoded_words’, ‘_ctypes’, ‘ctypes._endian’, ‘abc’, ‘_collections_abc’, ‘pyexpat.errors’, ‘signal’, ‘pkg_resources.extern.pyparsing’, ‘unittest.signals’, ‘numpy.fft’, ‘pandas.util._validators’, ‘distutils’, ‘pandas._libs.tslibs’, ‘mmap’, ‘pandas.tseries.frequencies’, ‘pandas.plotting._misc’, ‘mtrand’, ‘pandas.plotting._core’, ‘matplotlib.rcsetup’, ‘struct’, ‘matplotlib.compat’, ‘pandas._libs.tslibs.parsing’, ‘select’, ‘pandas.core.tools’, ‘numpy.core.function_base’]
    INFO 2018/07/12 12:34:43 fetch.py: main() 16: now in fetch_data
    INFO 2018/07/12 12:34:43 fetch.py: main() 19: downloading viral nuccore sequences
    INFO 2018/07/12 12:34:43 common.py: run_child() 56: Running instance of esearch
    ERROR 2018/07/12 12:34:43 common.py: run_child() 64: Execution of esearch -db nuccore -query “txid10239 [orgn] AND \”complete genome\” [Title] NOT txid131567 [orgn]” > ncbi_search failed with returncode 127: /bin/sh: esearch: command not found

    ERROR 2018/07/12 12:34:43 common.py: run_child() 65: esearch -db nuccore -query “txid10239 [orgn] AND \”complete genome\” [Title] NOT txid131567 [orgn]” > ncbi_search
    INFO 2018/07/12 12:34:43 common.py: run_child() 56: Running instance of efetch
    ERROR 2018/07/12 12:34:43 common.py: run_child() 64: Execution of efetch -format fasta viral_database.fasta failed with returncode 127: /bin/sh: efetch: command not found

    ERROR 2018/07/12 12:34:43 common.py: run_child() 65: efetch -format fasta viral_database.fasta
    INFO 2018/07/12 12:34:43 common.py: run_child() 56: Running instance of efetch
    ERROR 2018/07/12 12:34:43 common.py: run_child() 64: Execution of efetch -format docsum viral_seqs_info.tsv failed with returncode 127: /bin/sh: efetch: command not found
    /bin/sh: xtract: command not found

    ERROR 2018/07/12 12:34:43 common.py: run_child() 65: efetch -format docsum viral_seqs_info.tsv
    INFO 2018/07/12 12:34:43 fetch.py: main() 35: downloaded viral seqs info in /home/sjossey/scratch/VirMet-1.1.1/data/virmet_databases/viral_nuccore
    INFO 2018/07/12 12:34:43 fetch.py: main() 36: saving viral taxonomy
    INFO 2018/07/12 12:34:43 common.py: run_child() 56: Running instance of cut
    DEBUG 2018/07/12 12:34:43 common.py: run_child() 62: Completed
    INFO 2018/07/12 12:34:43 common.py: run_child() 56: Running instance of grep
    DEBUG 2018/07/12 12:34:43 common.py: run_child() 62: Completed

    Thanks!
    Sushma

    jhgalvez Staff replied 4 months ago

    I have a few other things to do today, but I took a quick look at the logs and it seems to me that the main issue is that virmet is not able to download the database from ncbi. Which leads me to the question: you have only tried to run this using interactive working nodes, right? Those nodes have no internet connection, so they can’t download databases. I suggest you run only the steps that require downloads on a login node (since they have internet connection) or contact Cedar support to see if they can grant you access to a working node with internet connection so you can run this job.

    Once you have tried to run this on a node with internet access, let me know if it works or if it still gives you errors. I will be happy to help more if I can.

    sjossey replied 4 months ago

    Thank you, I will try that.
    Sushma

    sjossey replied 4 months ago

    Thank you, I will try that.
    Sushma

    sjossey replied 4 months ago

    From what I learned from cedar support all cedar nodes have internet connection so that is not the issue

    sjossey replied 4 months ago

    From what I learned from cedar support all cedar nodes have internet connection so that is not the issue

    jhgalvez Staff replied 4 months ago

    Yes, I later remembered that all nodes in cedar have internet access, so that couldn’t be reason why it is failing to download the database. But it is clear to me, upon looking at the logs that the `efetch` and `esearch` commands are giving errors, and I think that they are the reason why `virmet fetch –viral n` command is failing. However, I’m not sure how to fix it. I suspect it might have to do with the change NCBI made this last May to its API (see here https://www.ncbi.nlm.nih.gov/books/NBK25497/#chapter2.Usage_Guidelines_and_Requiremen ). This change might mean that the virmet commands are no longer valid and that is why you are getting this error. Unfortunately, if this is the case, then I don’t see a way we can fix it and it should be a matter of opening a ticket with the virmet developers (https://github.com/ozagordi/VirMet/issues). Good luck!

    sjossey replied 4 months ago

    I realized that and so I installed NCBI edirect tools in my scratch and put it in my path but it still does not seem to work for some reason. I am thinking I am not installing the edirect tool right maybe need to get it installed correctly.

    sjossey answered 4 months ago



  • I am doing it on the login node and unfortunately that is all I can see the error message