Implementation:Togethercomputer Together python Evaluation CLI

Knowledge Sources	Together Python
Domains	CLI, Evaluation
Last Updated	2026-02-15 16:00 GMT

Overview

Concrete CLI tool for creating and managing LLM evaluation jobs from the command line provided by the Together Python SDK.

Description

The evaluation Click command group provides terminal commands for creating evaluation jobs (classify, score, compare), listing jobs with formatted tables, and retrieving job details and status. It supports both simple field-reference mode (pre-generated responses in input data) and detailed model configuration mode (generate responses on the fly).

Usage

Use these CLI commands when managing LLM evaluations from a terminal or shell script rather than Python code.

Code Reference

Source Location

Repository: Together Python
File: src/together/cli/api/evaluation.py
Lines: 1-479

Signature

together evaluation create --type classify|score|compare --judge-model MODEL --judge-model-source SOURCE --judge-system-template TEMPLATE --input-data-file-path PATH [options...]
together evaluation list [--status STATUS] [--limit N]
together evaluation retrieve EVALUATION_ID
together evaluation status EVALUATION_ID

Import

together evaluation <subcommand>

I/O Contract

Inputs

Name	Type	Required	Description
--type	Choice	Yes	Evaluation type: classify, score, or compare
--judge-model	str	Yes	Judge model name or URL
--judge-model-source	Choice	Yes	Source: serverless, dedicated, or external
--judge-system-template	str	Yes	System template for the judge
--input-data-file-path	str	Yes	Path to input data file
--labels	str	Yes (classify)	Comma-separated classification labels
--pass-labels	str	Yes (classify)	Comma-separated passing labels
--min-score	float	Yes (score)	Minimum score boundary
--max-score	float	Yes (score)	Maximum score boundary
--pass-threshold	float	Yes (score)	Passing threshold

Outputs

Name	Type	Description
create output	JSON	Evaluation job creation response with workflow_id
list output	Table	Formatted table of evaluation jobs
retrieve output	JSON	Full evaluation job details
status output	JSON	Current status and results

Usage Examples

# Create a classify evaluation
together evaluation create \
  --type classify \
  --judge-model meta-llama/Llama-4-Scout-17B-16E-Instruct \
  --judge-model-source serverless \
  --judge-system-template "Classify this response." \
  --input-data-file-path file-abc123 \
  --model-field response \
  --labels "good,bad" \
  --pass-labels "good"

# List evaluation jobs
together evaluation list --limit 10

# Check status
together evaluation status WORKFLOW_ID

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment