FORUMDownloading/saving database for Virmet
sjossey asked 1 year ago



Hi,
I installed Virmet in my scratch folder of Cedar but when I try to download the database using

$ virmet fetch --viral n

It gives the following error

Traceback (most recent call last):

File “/scratch/sjossey/ENV/bin/virmet”, line 11, in <module>

load_entry_point(‘VirMet==1.1.1’, ‘console_scripts’, ‘virmet’)()

File “/scratch/sjossey/ENV/lib/python3.5/site-packages/VirMet-1.1.1-py3.5.egg/virmet/__main__.py”, line 120, in main

args.func(args)

File “/scratch/sjossey/ENV/lib/python3.5/site-packages/VirMet-1.1.1-py3.5.egg/virmet/__main__.py”, line 44, in fetch_db

fetch.main(args)

File “/scratch/sjossey/ENV/lib/python3.5/site-packages/VirMet-1.1.1-py3.5.egg/virmet/fetch.py”, line 21, in main

cml_search = viral_query(‘n’)

File “/scratch/sjossey/ENV/lib/python3.5/site-packages/VirMet-1.1.1-py3.5.egg/virmet/common.py”, line 134, in viral_query

os.mkdir(target_dir)

FileNotFoundError: [Errno 2] No such file or directory: ‘/data/virmet_databases/viral_nuccore’


Since, we have no permission to make folders in the /data in Cedar, I was hoping if anyone has figured out how the database can be made in the scratch folder instead of the default.
Thanks,
Sushma

3 Answers
jhgalvez Staff answered 1 year ago



Modify line 13 of script `src/virmet/common.py` with the path to your scratch (or wherever you want to save the data). That way, the database will be downloaded to a file for which you have write permissions. 

sjossey answered 1 year ago



Thanks for the quick response,
But now after I made the changes to common.py I get a new error 

$ virmet fetch --viral n
Traceback (most recent call last):
File "/scratch/sjossey/ENV/bin/virmet", line 11, in <module>
load_entry_point('VirMet==1.1.1', 'console_scripts', 'virmet')()
File "/scratch/sjossey/ENV/lib/python3.5/site-packages/VirMet-1.1.1-py3.5.egg/virmet/__main__.py", line 120, in main
args.func(args)
File "/scratch/sjossey/ENV/lib/python3.5/site-packages/VirMet-1.1.1-py3.5.egg/virmet/__main__.py", line 44, in fetch_db
fetch.main(args)
File "/scratch/sjossey/ENV/lib/python3.5/site-packages/VirMet-1.1.1-py3.5.egg/virmet/fetch.py", line 42, in main
assert accs_1 == accs_2, accs_1 ^ accs_2
AssertionError: {''}

Hope you can help me figure out the error. And when I check the folder to which I gave the path to, the folder was made with some files but all are empty.
Thanks,
Sushma
 

jhgalvez Staff replied 1 year ago

It’s hard for me to understand where the error is coming from using this output alone. You should have a log file somewhere, the contents of that log file will be more useful to debug. If you could send that I will try to look into what is happening.

sjossey replied 1 year ago

I found a log file hope you can help figure it out

$ cat virmet.log
INFO 2018/07/12 12:34:30 __main__.py: main() 116: /scratch/sjossey/ENV/bin/virmet fetch –viral n
DEBUG 2018/07/12 12:34:40 __init__.py: wrapper() 410: $HOME=/home/sjossey
DEBUG 2018/07/12 12:34:40 __init__.py: wrapper() 410: matplotlib data path /scratch/sjossey/ENV/lib/python3.5/site-packages/matplotlib/mpl-data
DEBUG 2018/07/12 12:34:40 __init__.py: rc_params_from_file() 1157: loaded rc file /scratch/sjossey/ENV/lib/python3.5/site-packages/matplotlib/mpl-data/matplotlibrc
DEBUG 2018/07/12 12:34:40 __init__.py: () 1867: matplotlib version 2.2.2
DEBUG 2018/07/12 12:34:40 __init__.py: () 1868: interactive is False
DEBUG 2018/07/12 12:34:40 __init__.py: () 1869: platform is linux
DEBUG 2018/07/12 12:34:40 __init__.py: () 1870: loaded modules: [‘copyreg’, ‘pandas.core.dtypes.common’, ‘keyword’, ‘pandas._libs.properties’, ‘pandas.io.formats.format’, ‘selectors’, ‘pkg_resources.extern.six.moves.urllib’, ‘dateutil.relativedelta’, ‘dateutil.easter’, ‘datetime’, ’email.feedparser’, ‘_random’, ‘numpy.linalg.linalg’, ‘pathlib’, ‘_warnings’, ‘pandas.core.indexes.multi’, ‘pandas.core.frame’, ‘platform’, ‘collections.abc’, ‘pandas._libs.tslibs.frequencies’, ‘pandas.plotting._tools’, ‘numpy.lib.twodim_base’, ‘pandas.tseries’, ‘linecache’, ‘dateutil.parser’, ‘binascii’, ‘six’, ‘numpy.core.machar’, ‘pandas.core.indexes.datetimes’, ‘pandas.io.formats.common’, ‘glob’, ‘token’, ‘matplotlib._color_data’, ‘pandas.core.series’, ‘numpy.lib.utils’, ‘_thread’, ‘matplotlib.colors’, ‘pandas.compat’, ‘numpy.testing.nose_tools.decorators’, ‘pandas._libs.algos’, ‘pandas.core.indexes.interval’, ‘pandas.core.util.hashing’, ‘pkg_resources.extern.packaging._compat’, ‘distutils.version’, ‘io’, ‘numpy._globals’, ‘plistlib’, ‘urllib.request’, ‘numpy.version’, ‘pandas.tseries.offsets’, ‘_io’, ‘pandas.plotting._style’, ‘pkg_resources._vendor.six.moves’, ‘numpy.polynomial.laguerre’, ‘pandas.core.index’, ‘numpy.lib.arrayterator’, ‘imp’, ‘ctypes’, ‘ssl’, ‘matplotlib.cbook’, ‘pandas.api.types’, ‘pandas.core.tools.datetimes’, ‘logging.handlers’, ‘importlib.machinery’, ‘json.scanner’, ‘pkg_resources.extern.appdirs’, ‘uu’, ‘numpy.lib.mixins’, ‘_opcode’, ‘itertools’, ‘numpy._import_tools’, ‘numpy.testing.nose_tools’, ‘random’, ‘pandas._libs.lib’, ‘tokenize’, ‘zipimport’, ‘unittest.loader’, ‘_bisect’, ‘pyexpat’, ‘numpy.core.memmap’, ‘six.moves.urllib’, ‘pandas._libs.hashing’, ‘numpy’, ‘pandas.io.formats.printing’, ‘marshal’, ’email.iterators’, ‘re’, ’email._policybase’, ‘numpy.polynomial.chebyshev’, ‘pandas._libs.hashtable’, ‘numpy.core._internal’, ‘pandas._libs.tslibs.timedeltas’, ‘pandas.core.indexes.datetimelike’, ‘pandas.core.indexes.base’, ‘collections’, ‘pandas.core.sparse’, ‘pwd’, ‘mpl_toolkits’, ‘pandas.compat.numpy’, ‘importlib.util’, ‘pkg_resources.extern.packaging._structures’, ‘urllib’, ‘dis’, ‘dateutil’, ‘_ssl’, ‘matplotlib.testing’, ‘opcode’, ‘unittest.case’, ‘bisect’, ‘unittest.util’, ‘sysconfig’, ‘numpy.ctypeslib’, ‘numpy.matrixlib’, ‘getopt’, ‘pandas._libs’, ‘numpy.lib._datasource’, ‘_bootlocale’, ‘pkg_resources’, ‘_locale’, ‘pandas.core.strings’, ‘distutils.util’, ‘pandas.io.common’, ‘cython_runtime’, ‘numpy.core.umath’, ‘bz2’, ‘numpy.core’, ‘matplotlib.fontconfig_pattern’, ‘numpy.ma.extras’, ‘ipaddress’, ‘pandas.core.indexes.timedeltas’, ‘xml.parsers.expat.errors’, ‘warnings’, ’email._parseaddr’, ‘base64’, ‘pandas.core.dtypes.api’, ‘numpy.lib.ufunclike’, ‘numpy.lib.stride_tricks’, ‘pandas.core.dtypes.missing’, ‘numpy.lib.shape_base’, ‘urllib.response’, ‘pandas.core.dtypes.concat’, ‘pytz.tzfile’, ‘queue’, ‘numpy.lib.financial’, ‘_hashlib’, ‘pandas.core.common’, ‘numpy.core.fromnumeric’, ‘numpy.compat’, ‘unittest’, ‘pandas.plotting._converter’, ‘xml’, ’email.base64mime’, ‘importlib._bootstrap’, ‘distutils.debug’, ‘pandas.core.dtypes.generic’, ‘distutils.log’, ‘pandas.core.indexing’, ‘numpy.lib.format’, ‘numpy.compat.py3k’, ‘pandas.core.internals’, ‘encodings.utf_8’, ‘gzip’, ‘pyexpat.model’, ‘pandas.core.dtypes.dtypes’, ‘pytz.tzinfo’, ‘argparse’, ‘encodings.aliases’, ‘subprocess’, ‘pandas._libs.sparse’, ‘pandas.core.dtypes’, ‘matplotlib’, ‘_csv’, ‘pandas.core’, ‘zlib’, ‘pandas.core.sparse.array’, ‘pandas.core.tools.timedeltas’, ‘pandas.core.indexes.range’, ‘ntpath’, ‘pkg_resources.extern’, ‘numpy.linalg.lapack_lite’, ‘pandas.core.indexes.numeric’, ‘numpy.polynomial.polyutils’, ‘numpy.core.numeric’, ‘distutils.errors’, ‘unittest.runner’, ‘dateutil.parser.isoparser’, ‘numpy.core.defchararray’, ‘_compression’, ‘pandas.errors’, ‘numpy._distributor_init’, ‘xml.parsers.expat’, ‘_decimal’, ‘locale’, ‘_json’, ‘_posixsubprocess’, ‘shutil’, ’email.message’, ‘pandas’, ‘pandas._libs.index’, ‘ast’, ‘numpy.core.records’, ’email.header’, ‘virmet.__main__’, ‘math’, ‘__future__’, ‘string’, ‘_ast’, ‘pkg_resources._vendor’, ‘pandas.core.missing’, ‘numpy.lib.scimath’, ‘_virtualenv_distutils’, ‘numpy.core.numerictypes’, ‘_string’, ‘numpy.core.getlimits’, ‘unittest.suite’, ‘unittest.main’, ‘copy’, ‘_datetime’, ‘dateutil.tz.tz’, ‘pickle’, ‘json.encoder’, ‘six.moves’, ‘matplotlib.cbook._backports’, ‘matplotlib.cbook.deprecation’, ‘numpy.polynomial.legendre’, ‘socket’, ‘pkg_resources._vendor.six’, ‘types’, ‘functools’, ‘operator’, ‘threading’, ‘_codecs’, ‘pandas.compat.chainmap’, ’email.errors’, ‘calendar’, ‘distutils.fancy_getopt’, ‘enum’, ‘virmet’, ‘pandas.plotting._compat’, ‘_heapq’, ‘pytz.lazy’, ‘pandas.core.categorical’, ‘pkg_resources.extern.packaging.version’, ‘sys’, ‘http.client’, ‘pandas.core.base’, ‘importlib’, ‘numpy.fft.fftpack’, ‘pandas.core.dtypes.inference’, ‘urllib.parse’, ‘pandas.core.indexes.period’, ‘numbers’, ‘pandas._libs.tslibs.timezones’, ‘encodings.latin_1′, ’email.charset’, ‘sre_constants’, ‘numpy.ma.core’, ‘_operator’, ‘zipfile’, ‘numpy.lib.arraypad’, ‘_lzma’, ‘pkg_resources.extern.six.moves’, ‘pkg_resources.extern.packaging.requirements’, ‘pandas.core.util’, ‘errno’, ‘pandas.core.algorithms’, ‘contextlib’, ‘heapq’, ‘distutils.spawn’, ‘numpy.lib.arraysetops’, ‘_compat_pickle’, ‘distutils.sysconfig’, ‘pandas._libs.tslibs.fields’, ‘dateutil._version’, ‘numpy.testing’, ‘numpy.polynomial.hermite’, ‘numpy.__config__’, ‘site’, ‘hashlib’, ‘pandas.io.formats’, ‘numpy.polynomial’, ‘numpy.matrixlib.defmatrix’, ‘inspect’, ‘distutils.dep_util’, ‘pkg_resources._vendor.packaging.__about__’, ‘matplotlib.compat.subprocess’, ‘pkg_resources.extern.packaging.specifiers’, ’email.utils’, ‘_cython_0_27_2’, ‘dateutil._common’, ‘difflib’, ‘weakref’, ‘numpy.lib.index_tricks’, ‘pandas.core.indexes.category’, ‘pandas.core.indexes.frozen’, ‘numpy.core.info’, ‘lzma’, ‘importlib._bootstrap_external’, ‘pkg_resources.py31compat’, ‘__main__’, ‘stat’, ‘pandas.io’, ‘_imp’, ‘pandas.api’, ‘codecs’, ‘numpy.linalg’, ‘pprint’, ‘gettext’, ‘numpy.testing.nose_tools.nosetester’, ‘gc’, ‘pandas.core.generic’, ‘numpy.polynomial._polybase’, ‘pandas.core.dtypes.cast’, ‘unittest.result’, ‘_collections’, ‘numpy.testing.nose_tools.utils’, ‘_stat’, ‘_weakref’, ‘numpy.random.mtrand’, ‘numpy.linalg.info’, ‘pandas.util’, ‘dateutil.tz._common’, ‘numpy.lib.info’, ‘urllib.error’, ‘pkg_resources.extern.six’, ‘pkgutil’, ‘matplotlib._version’, ‘numpy.lib._version’, ‘time’, ‘_socket’, ‘pandas.core.config_init’, ’email’, ‘textwrap’, ‘numpy.core.multiarray’, ‘pandas.core.accessor’, ‘numpy.lib._iotools’, ‘pandas.core.sorting’, ‘pandas.core.ops’, ‘numpy.linalg._umath_linalg’, ‘posix’, ‘numpy.lib.nanfunctions’, ‘builtins’, ‘numpy.core.shape_base’, ‘_weakrefset’, ‘_frozen_importlib_external’, ’email.parser’, ‘numpy.lib.type_check’, ‘_frozen_importlib’, ‘dateutil.tz._factories’, ‘pandas.plotting’, ‘xml.parsers’, ‘os’, ‘atexit’, ‘pytz’, ‘numpy.testing.decorators’, ‘pandas._libs.tslib’, ‘_sre’, ‘posixpath’, ‘numpy.lib.function_base’, ‘json’, ’email.encoders’, ‘numpy.compat._inspect’, ‘pandas.core.groupby’, ‘pandas._libs.tslibs.strptime’, ‘json.decoder’, ‘pkg_resources.extern.packaging.markers’, ‘fnmatch’, ‘numpy.polynomial.hermite_e’, ‘pytz.exceptions’, ‘_signal’, ‘numpy.fft.helper’, ‘pandas.core.api’, ‘pandas.io.formats.console’, ‘numpy.add_newdocs’, ‘os.path’, ‘encodings.cp437’, ‘dateutil.parser._parser’, ‘cycler’, ‘numpy.random.info’, ‘numpy.lib.polynomial’, ‘traceback’, ‘tempfile’, ‘pyparsing’, ‘pandas._libs.interval’, ‘numpy.fft.info’, ‘numpy.polynomial.polynomial’, ‘pandas.util._decorators’, ‘_bz2’, ‘pandas.core.indexes.api’, ‘numpy.random’, ‘virmet.fetch’, ‘reprlib’, ‘pandas.compat.numpy.function’, ‘http’, ‘sre_compile’, ‘numpy.ma’, ‘importlib.abc’, ‘decimal’, ‘numpy.core.arrayprint’, ‘numpy.lib’, ‘_struct’, ‘grp’, ‘pandas.core.nanops’, ‘pkg_resources.extern.packaging’, ‘pandas.core.indexes.accessors’, ‘pandas._libs.period’, ‘pandas.core.indexes’, ‘_functools’, ‘pandas.core.config’, ‘unicodedata’, ‘distutils.dist’, ‘csv’, ‘numpy.fft.fftpack_lite’, ‘numpy.core._methods’, ‘numpy.testing.utils’, ‘sre_parse’, ‘numpy.core.einsumfunc’, ’email.quoprimime’, ‘logging’, ‘xml.parsers.expat.model’, ‘_pickle’, ‘pandas._libs.join’, ‘pandas.io.formats.terminal’, ‘dateutil.tz’, ‘numpy.lib.npyio’, ‘quopri’, ‘encodings’, ‘genericpath’, ‘numpy.testing.nosetester’, ‘six.moves.urllib.request’, ’email._encoded_words’, ‘_ctypes’, ‘ctypes._endian’, ‘abc’, ‘_collections_abc’, ‘pyexpat.errors’, ‘signal’, ‘pkg_resources.extern.pyparsing’, ‘unittest.signals’, ‘numpy.fft’, ‘pandas.util._validators’, ‘distutils’, ‘pandas._libs.tslibs’, ‘mmap’, ‘pandas.tseries.frequencies’, ‘pandas.plotting._misc’, ‘mtrand’, ‘pandas.plotting._core’, ‘matplotlib.rcsetup’, ‘struct’, ‘matplotlib.compat’, ‘pandas._libs.tslibs.parsing’, ‘select’, ‘pandas.core.tools’, ‘numpy.core.function_base’]
INFO 2018/07/12 12:34:43 fetch.py: main() 16: now in fetch_data
INFO 2018/07/12 12:34:43 fetch.py: main() 19: downloading viral nuccore sequences
INFO 2018/07/12 12:34:43 common.py: run_child() 56: Running instance of esearch
ERROR 2018/07/12 12:34:43 common.py: run_child() 64: Execution of esearch -db nuccore -query “txid10239 [orgn] AND \”complete genome\” [Title] NOT txid131567 [orgn]” > ncbi_search failed with returncode 127: /bin/sh: esearch: command not found

ERROR 2018/07/12 12:34:43 common.py: run_child() 65: esearch -db nuccore -query “txid10239 [orgn] AND \”complete genome\” [Title] NOT txid131567 [orgn]” > ncbi_search
INFO 2018/07/12 12:34:43 common.py: run_child() 56: Running instance of efetch
ERROR 2018/07/12 12:34:43 common.py: run_child() 64: Execution of efetch -format fasta viral_database.fasta failed with returncode 127: /bin/sh: efetch: command not found

ERROR 2018/07/12 12:34:43 common.py: run_child() 65: efetch -format fasta viral_database.fasta
INFO 2018/07/12 12:34:43 common.py: run_child() 56: Running instance of efetch
ERROR 2018/07/12 12:34:43 common.py: run_child() 64: Execution of efetch -format docsum viral_seqs_info.tsv failed with returncode 127: /bin/sh: efetch: command not found
/bin/sh: xtract: command not found

ERROR 2018/07/12 12:34:43 common.py: run_child() 65: efetch -format docsum viral_seqs_info.tsv
INFO 2018/07/12 12:34:43 fetch.py: main() 35: downloaded viral seqs info in /home/sjossey/scratch/VirMet-1.1.1/data/virmet_databases/viral_nuccore
INFO 2018/07/12 12:34:43 fetch.py: main() 36: saving viral taxonomy
INFO 2018/07/12 12:34:43 common.py: run_child() 56: Running instance of cut
DEBUG 2018/07/12 12:34:43 common.py: run_child() 62: Completed
INFO 2018/07/12 12:34:43 common.py: run_child() 56: Running instance of grep
DEBUG 2018/07/12 12:34:43 common.py: run_child() 62: Completed

Thanks!
Sushma

jhgalvez Staff replied 1 year ago

I have a few other things to do today, but I took a quick look at the logs and it seems to me that the main issue is that virmet is not able to download the database from ncbi. Which leads me to the question: you have only tried to run this using interactive working nodes, right? Those nodes have no internet connection, so they can’t download databases. I suggest you run only the steps that require downloads on a login node (since they have internet connection) or contact Cedar support to see if they can grant you access to a working node with internet connection so you can run this job.

Once you have tried to run this on a node with internet access, let me know if it works or if it still gives you errors. I will be happy to help more if I can.

sjossey replied 1 year ago

Thank you, I will try that.
Sushma

sjossey replied 1 year ago

Thank you, I will try that.
Sushma

sjossey replied 1 year ago

From what I learned from cedar support all cedar nodes have internet connection so that is not the issue

sjossey replied 1 year ago

From what I learned from cedar support all cedar nodes have internet connection so that is not the issue

jhgalvez Staff replied 1 year ago

Yes, I later remembered that all nodes in cedar have internet access, so that couldn’t be reason why it is failing to download the database. But it is clear to me, upon looking at the logs that the `efetch` and `esearch` commands are giving errors, and I think that they are the reason why `virmet fetch –viral n` command is failing. However, I’m not sure how to fix it. I suspect it might have to do with the change NCBI made this last May to its API (see here https://www.ncbi.nlm.nih.gov/books/NBK25497/#chapter2.Usage_Guidelines_and_Requiremen ). This change might mean that the virmet commands are no longer valid and that is why you are getting this error. Unfortunately, if this is the case, then I don’t see a way we can fix it and it should be a matter of opening a ticket with the virmet developers (https://github.com/ozagordi/VirMet/issues). Good luck!

sjossey replied 1 year ago

I realized that and so I installed NCBI edirect tools in my scratch and put it in my path but it still does not seem to work for some reason. I am thinking I am not installing the edirect tool right maybe need to get it installed correctly.

sjossey answered 1 year ago



I am doing it on the login node and unfortunately that is all I can see the error message