Concepts

What is it?

Talon is a system for periodically running jobs that have constraints.

A job will be run for a period when triggered, but will block until all its constraints have been met and it can been allocated suitable resources.

The schedule contains details of jobs to be run and any context needed to run them. Periods and the runs and artifacts that occur within them are stored in the state store along with details of the resources currently available to the system.

Configuration specifies what implementations to use for for the schedule, state and scheduler along with the types of job, run, period, trigger, constraint and resource available to the schedule.

Anti-goals

Talon does not aim to do any of the following:

  • Get software, including itself, to where executors need it.
  • Get data used by jobs run by executors to those executors.
  • Implementation of executors is not the focus, managing constraints and state is the focus.

Job

Attributes

  • name
  • poll frequency - how long to wait between checking if constraints have been met (default to once a minute)

Run

  • job
  • period
  • executor
  • started
  • Progress
  • finished

The executor indicates who ‘owns’ the job.

Each run may produce zero or more ref:artifacts <artifact>.

Artifacts

  • artifacts are identified by a url giving their location
  • each artifact is associated with a period and a run
  • could be a file, or some data

Progress

Run progress has a state that is one of the following:

  • blocked - triggered, but one or more constraints have not been met
  • ready - constraints have been met
  • running - an executor is executing this run
  • succeeded
  • failed
  • skipped - cancelled by a constraint

When running, progress may be expressed as:

  • percentage-based
  • duration-based
  • item-based

When item-based, each item may also have a state.

Period

Attributes:

  • type
  • start
  • environment?

Types:

  • daily
  • weekly
  • monthly
  • quarterly
  • one shot?

Constraint

Types:

  • start by
  • once other specified job has completed
  • not more than one successful run in a period
  • not more than one instance can be running at once
  • not more than one instance can be blocked at once

When checked, contraints will return one of the following actions to take:

  • block - this constraint has not been met
  • fail - this constraint can never be met and the run should be failed
  • pass - this constraint has been met.
  • cancel - this constraint has indicated that the current run should be skipped

Resources

  • host of a specific type
  • a particular environment?
  • actively generated by a job? (or should those be artifacts?)
  • resource within a period?
  • resource within an environment?
  • resources are namespaced, think of them as a nest dictionary hierarchy

Trigger

These create the run.

Examples:

  • extended cron-style, eg: “every 15 minutes between 7-10am, hourly until 4pm, every 5 minutes between 4-5pm”
  • external
  • prior job run completed

Executor

  • SGE
  • SSH
  • in-process
  • Remote node.

Schedule

  • jobs
  • parameters for jobs: - simple data - complex data? - artifact specifications?

State

Stores information about the runs of jobs.

Config

  • config for state store
  • config for config2 store
  • types of resource, constraint, etc
  • auth data sources