Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Togethercomputer Together python Evaluation CLI

From Leeroopedia
Knowledge Sources
Domains CLI, Evaluation
Last Updated 2026-02-15 16:00 GMT

Overview

Concrete CLI tool for creating and managing LLM evaluation jobs from the command line provided by the Together Python SDK.

Description

The evaluation Click command group provides terminal commands for creating evaluation jobs (classify, score, compare), listing jobs with formatted tables, and retrieving job details and status. It supports both simple field-reference mode (pre-generated responses in input data) and detailed model configuration mode (generate responses on the fly).

Usage

Use these CLI commands when managing LLM evaluations from a terminal or shell script rather than Python code.

Code Reference

Source Location

Signature

together evaluation create --type classify|score|compare --judge-model MODEL --judge-model-source SOURCE --judge-system-template TEMPLATE --input-data-file-path PATH [options...]
together evaluation list [--status STATUS] [--limit N]
together evaluation retrieve EVALUATION_ID
together evaluation status EVALUATION_ID

Import

together evaluation <subcommand>

I/O Contract

Inputs

Name Type Required Description
--type Choice Yes Evaluation type: classify, score, or compare
--judge-model str Yes Judge model name or URL
--judge-model-source Choice Yes Source: serverless, dedicated, or external
--judge-system-template str Yes System template for the judge
--input-data-file-path str Yes Path to input data file
--labels str Yes (classify) Comma-separated classification labels
--pass-labels str Yes (classify) Comma-separated passing labels
--min-score float Yes (score) Minimum score boundary
--max-score float Yes (score) Maximum score boundary
--pass-threshold float Yes (score) Passing threshold

Outputs

Name Type Description
create output JSON Evaluation job creation response with workflow_id
list output Table Formatted table of evaluation jobs
retrieve output JSON Full evaluation job details
status output JSON Current status and results

Usage Examples

# Create a classify evaluation
together evaluation create \
  --type classify \
  --judge-model meta-llama/Llama-4-Scout-17B-16E-Instruct \
  --judge-model-source serverless \
  --judge-system-template "Classify this response." \
  --input-data-file-path file-abc123 \
  --model-field response \
  --labels "good,bad" \
  --pass-labels "good"

# List evaluation jobs
together evaluation list --limit 10

# Check status
together evaluation status WORKFLOW_ID

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment