clusterjob.utils module

Collection of utility functions

clusterjob.utils.set_executable(filename)[source]

Set the exectuable bit on the given filename

clusterjob.utils.write_file(filename, data)[source]

Write data to the file with the given filename

clusterjob.utils.split_seq(seq, n_chunks)[source]

Split the given sequence into n_chunks. Suitable for distributing an array of jobs over a fixed number of workers.

>>> split_seq([1,2,3,4,5,6], 3)
[[1, 2], [3, 4], [5, 6]]
>>> split_seq([1,2,3,4,5,6], 2)
[[1, 2, 3], [4, 5, 6]]
>>> split_seq([1,2,3,4,5,6,7], 3)
[[1, 2], [3, 4, 5], [6, 7]]
clusterjob.utils.read_file(filename)[source]

Return the contents of the file with the given filename as a string

>>> write_file('read_write_file.txt', 'Hello World')
>>> read_file('read_write_file.txt')
'Hello World'
>>> os.unlink('read_write_file.txt')
clusterjob.utils.upload_file(localfile, remote, remotefile, scp='scp')[source]

Run {scp} {localfile} {remote}:{remotefile}

Parameters:
  • localfile (str) – relative or absolute path to a local file
  • remote (str) – Host on which to put the file
  • remotefile (str) – remote path where to put the file. May start with ‘~’ to indicate the home directory.
  • scp (str) – the scp executables. If not a full path, the executable must be in $PATH.
Raises:

subprocess.CalledProcessError – if call to scp fails.

clusterjob.utils.run_cmd(cmd, remote, rootdir='', workdir='', ignore_exit_code=False, ssh='ssh')[source]

Run the given cmd in the given workdir, either locally or remotely, and return the combined stdout/stderr

Parameters:
  • cmd (list of str or str) – Command to execute, as list consisting of the command, and options. Alternatively, the command can be given a single string, which will then be executed as a shell command. Only use shell commands when necessary, e.g. when the command involves a pipe.
  • remote (None or str) – If None, run command locally. Otherwise, run on the given host (via SSH)
  • rootdir (str, optional) – Local or remote root directory. The workdir variable is taken relative to rootdir. If not specified, effectively the current working directory is used as the root for local commands, and the home directory for remote commands. Note that ~ may be used to indicate the home directory locally or remotely.
  • workdir (str, optional) – Local or remote directory from which to run the command, relative to rootdir. If rootdir is empty, ~ may be used to indicate the home directory.
  • ignore_exit_code (boolean, optional) – By default, subprocess.CalledProcessError will be raised if the call has an exit code other than 0. This exception can be supressed by passing ignore_exit_code=False
  • ssh (str, optional) – The executable to be used for ssh. If not a full path, the executable must be in $PATH

Example

>>> import tempfile, os, shutil
>>> tempfolder = tempfile.mkdtemp()
>>> scriptfile = os.path.join(tempfolder, 'test.sh')
>>> with open(scriptfile, 'w') as script_fh:
...     script_fh.writelines(["#!/bin/bash\n", "echo Hello $1\n"])
>>> set_executable(scriptfile)
>>> run_cmd(['./test.sh', 'World'], remote=None, workdir=tempfolder)
'Hello World\n'
>>> run_cmd("./test.sh World | tr '[:upper:]' '[:lower:]'", remote=None,
...         workdir=tempfolder)
'hello world\n'
>>> shutil.rmtree(tempfolder)
clusterjob.utils.time_to_seconds(time_str)[source]

Convert a string describing a time duration into seconds. The supported formats are:

minutes
minutes:seconds
hours:minutes:seconds
days-hours
days-hours:minutes
days-hours:minutes:seconds
days:hours:minutes:seconds
Raises:ValueError – if time_str has an invalid format.

Examples

>>> time_to_seconds('10')
600
>>> time_to_seconds('10:00')
600
>>> time_to_seconds('10:30')
630
>>> time_to_seconds('1:10:30')
4230
>>> time_to_seconds('1-1:10:30')
90630
>>> time_to_seconds('1-0')
86400
>>> time_to_seconds('1-10')
122400
>>> time_to_seconds('1-1:10')
90600
>>> time_to_seconds('1-1:10:30')
90630
>>> time_to_seconds('1:1:10:30')
90630
>>> time_to_seconds('1 1:10:30')
Traceback (most recent call last):
...
ValueError: '1 1:10:30' has invalid pattern
clusterjob.utils.mkdir(name, mode=488)[source]

Implementation of mkdir -p: Creates folder with the given name and the given permissions (mode)

  • Create missing parents folder
  • Do nothing if the folder with the given name already exists
  • Raise OSError if there is already a file with the given name