Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Neuml Txtai Workflow Scheduling

From Leeroopedia


Knowledge Sources
Domains Data_Processing, Workflow, Pipeline
Last Updated 2026-02-09 00:00 GMT

Overview

Workflow Scheduling and Serving is the principle of moving composed workflows from ad-hoc execution to production deployment through two complementary mechanisms: cron-based scheduling and YAML-driven application configuration. Scheduling enables workflows to run automatically on a recurring basis, while the Application class provides a declarative configuration layer that instantiates pipelines, workflows, agents, and embeddings from a single YAML file and exposes them as an API-ready service.

Description

This principle addresses the gap between development-time workflow prototyping and production deployment. During development, workflows are composed and executed programmatically in Python. For production, two additional capabilities are needed:

Cron Scheduling: The Workflow.schedule method accepts a cron expression, an iterable of input elements, and an optional iteration count. It runs the workflow on the specified schedule, blocking in a loop that sleeps until the next scheduled time, executes the workflow, and repeats. Key behaviors:

  • Uses the croniter library to parse cron expressions and compute the next execution time.
  • Sleeps precisely until the next scheduled time using time.sleep.
  • Catches and logs exceptions during workflow execution, preventing a single failure from terminating the schedule.
  • Supports both indefinite execution (iterations=None) and a fixed number of runs.
  • Uses local timezone for scheduling via datetime.now().astimezone().

YAML-Driven Application: The Application class reads a YAML configuration (from a file path, a YAML string, or a Python dictionary) and automatically:

  1. Creates all configured pipelines via PipelineFactory, resolving dependencies between them (e.g., extractor depending on similarity).
  2. Creates all configured workflows via WorkflowFactory, resolving task actions to pipeline callables and scheduling workflows that have a schedule configuration.
  3. Creates all configured agents with resolved LLM and tool references.
  4. Initializes the embeddings index, loading existing data if available.

The Application class also manages a ThreadPool that runs scheduled workflows in background threads, and provides a wait() method to block until all scheduled workflows complete.

Usage

Use Workflow Scheduling and Serving when you need to:

  • Automate recurring data processing -- for example, ingesting new documents every hour, rebuilding search indexes nightly, or running translation pipelines on a schedule.
  • Deploy workflows as services by defining the entire pipeline/workflow/embeddings stack in a YAML file and serving it via the txtai API server.
  • Manage complex pipeline dependencies declaratively, letting the Application class handle resolution order and cross-references.
  • Run multiple scheduled workflows concurrently in a single application using the built-in thread pool.
  • Separate configuration from code by defining pipelines, workflows, and their parameters in YAML rather than Python.

Theoretical Basis

This principle draws on several established patterns:

Cron Scheduling: The Unix cron model provides a well-understood, compact notation for specifying recurring schedules. By integrating cron directly into the Workflow class, txtai avoids requiring external schedulers (e.g., Airflow, Celery Beat) for simple recurring tasks, reducing deployment complexity.

Declarative Configuration: The YAML-driven Application follows the Configuration as Code principle, where the behavior of the system is defined by a declarative specification rather than imperative code. This enables:

  • Version-controlled configuration alongside code.
  • Environment-specific overrides without code changes.
  • Easier auditing and review of the deployed pipeline topology.

Service Locator Pattern: The Application class acts as a service locator, resolving named pipelines and workflows to their instances. Tasks reference pipelines by name (e.g., "summary", "translate"), and the Application resolves these to the actual callable objects during construction.

Background Execution: Scheduled workflows run in a ThreadPool, implementing the Active Object pattern where long-running operations execute in separate threads while the main thread remains responsive to API requests.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment