Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Togethercomputer Together python Endpoints Resource

From Leeroopedia
Revision as of 13:56, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Togethercomputer_Together_python_Endpoints_Resource.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Infrastructure, Model_Deployment
Last Updated 2026-02-15 16:00 GMT

Overview

Concrete tool for managing dedicated inference endpoints on the Together AI platform provided by the Together Python SDK.

Description

The Endpoints class provides a complete API resource for creating, listing, retrieving, updating, and deleting dedicated inference endpoints. It supports autoscaling configuration, hardware selection, availability zone management, and endpoint lifecycle control. Both synchronous (Endpoints) and asynchronous (AsyncEndpoints) variants are provided.

⚠️ Deprecation Notice: The `disable_prompt_cache` parameter (CLI flag: `--no-prompt-cache`) in `create()` and `update()` is deprecated and will be removed in a future version.

Usage

Import this class when you need to programmatically manage dedicated model deployment endpoints, including creating new endpoints with specific hardware, adjusting autoscaling parameters, or querying available hardware configurations.

Code Reference

Source Location

Signature

class Endpoints:
    def __init__(self, client: TogetherClient) -> None: ...

    def list(
        self,
        type: Optional[Literal["dedicated", "serverless"]] = None,
        usage_type: Optional[Literal["on-demand", "reserved"]] = None,
        mine: Optional[bool] = None,
    ) -> List[ListEndpoint]: ...

    def create(
        self,
        *,
        model: str,
        hardware: str,
        min_replicas: int,
        max_replicas: int,
        display_name: Optional[str] = None,
        disable_prompt_cache: bool = True,
        disable_speculative_decoding: bool = True,
        state: Literal["STARTED", "STOPPED"] = "STARTED",
        inactive_timeout: Optional[int] = None,
        availability_zone: Optional[str] = None,
    ) -> DedicatedEndpoint: ...

    def get(self, endpoint_id: str) -> DedicatedEndpoint: ...
    def delete(self, endpoint_id: str) -> None: ...

    def update(
        self,
        endpoint_id: str,
        *,
        min_replicas: Optional[int] = None,
        max_replicas: Optional[int] = None,
        state: Optional[Literal["STARTED", "STOPPED"]] = None,
        display_name: Optional[str] = None,
        inactive_timeout: Optional[int] = None,
    ) -> DedicatedEndpoint: ...

    def list_hardware(self, model: Optional[str] = None) -> List[HardwareWithStatus]: ...
    def list_avzones(self) -> List[str]: ...

Import

from together import Together

client = Together()
# Access via client.endpoints

I/O Contract

Inputs

Name Type Required Description
model str Yes (create) The model to deploy on the endpoint
hardware str Yes (create) Hardware configuration ID (e.g., "1x_nvidia_h100_80gb_sxm")
min_replicas int Yes (create) Minimum number of replicas to maintain
max_replicas int Yes (create) Maximum number of replicas to scale to
endpoint_id str Yes (get/update/delete) Unique identifier of the endpoint
type Literal["dedicated", "serverless"] No Filter endpoints by type
state Literal["STARTED", "STOPPED"] No Desired endpoint state
inactive_timeout int No Minutes of inactivity before auto-stop (0 to disable)

Outputs

Name Type Description
list() returns List[ListEndpoint] List of endpoint summary objects
create() returns DedicatedEndpoint Created endpoint details with ID, state, autoscaling config
get() returns DedicatedEndpoint Full endpoint details
update() returns DedicatedEndpoint Updated endpoint details
list_hardware() returns List[HardwareWithStatus] Available hardware configs with pricing and availability
list_avzones() returns List[str] Available deployment zones

Usage Examples

Create a Dedicated Endpoint

from together import Together

client = Together()

# Create a dedicated endpoint with autoscaling
endpoint = client.endpoints.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    hardware="1x_nvidia_h100_80gb_sxm",
    min_replicas=1,
    max_replicas=3,
    display_name="My Llama Endpoint",
    state="STARTED",
)

print(f"Endpoint ID: {endpoint.id}")
print(f"State: {endpoint.state}")

List and Manage Endpoints

# List only your dedicated endpoints
my_endpoints = client.endpoints.list(type="dedicated", mine=True)

# Get details for a specific endpoint
details = client.endpoints.get("endpoint-abc123")

# Update autoscaling
updated = client.endpoints.update(
    "endpoint-abc123",
    min_replicas=2,
    max_replicas=5,
)

# Stop an endpoint
client.endpoints.update("endpoint-abc123", state="STOPPED")

# Delete an endpoint
client.endpoints.delete("endpoint-abc123")

Query Hardware and Availability Zones

# List hardware compatible with a specific model
hardware = client.endpoints.list_hardware(model="meta-llama/Llama-4-Scout-17B-16E-Instruct")
for hw in hardware:
    print(f"{hw.id}: {hw.specs.gpu_type} x{hw.specs.gpu_count} - ${hw.pricing.cents_per_minute/100:.2f}/min")

# List availability zones
zones = client.endpoints.list_avzones()
print(f"Available zones: {zones}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment