Principle:OpenHands OpenHands Solvability Analysis
| Knowledge Sources | |
|---|---|
| Domains | Platform_Integration, Webhook_Processing |
| Last Updated | 2026-02-11 21:00 GMT |
Overview
Solvability analysis is the pattern of using LLM-based classification to assess whether an issue or task is suitable for automated resolution before committing expensive compute resources.
Description
Not every issue that triggers a webhook event is a good candidate for automated resolution. Some issues are too vague, require human judgment, involve complex architectural decisions, or need access to resources the agent cannot reach. Solvability analysis acts as a triage step that evaluates the issue before spawning a full agent session, preventing wasted computation and improving the quality of agent outputs by filtering out tasks that would likely fail.
The solvability analysis pattern involves three phases:
- Context Gathering -- Collecting the issue title, body, comments, repository metadata, and related files into a structured context document. This provides the LLM with the information it needs to make an informed assessment.
- LLM Classification -- Submitting the context to a large language model with a carefully engineered prompt that asks it to evaluate task difficulty, clarity, and feasibility. The LLM acts as a "judge" that produces a structured assessment.
- Timeout-Bounded Execution -- Running the analysis within a strict time budget to prevent the triage step itself from becoming a bottleneck. If the LLM does not respond within the timeout, the system proceeds with a default assumption (typically "attempt resolution").
Usage
Apply solvability analysis when:
- Agent execution is expensive (GPU time, API calls, long-running sandboxes)
- The issue stream contains a mix of automatable and non-automatable tasks
- Stakeholders want visibility into why certain issues were or were not attempted
- Resource allocation needs to be optimized (e.g., prioritizing high-confidence issues)
Theoretical Basis
1. LLM-as-a-Judge Pattern
The LLM-as-a-judge pattern uses a language model as a classifier or evaluator rather than as a generative agent. The key distinction is that the model produces a judgment (structured assessment) rather than a solution (code changes). This pattern has been studied extensively in the context of automated evaluation:
judge(context, prompt) -> assessment:
assessment = LLM(
system_prompt = SOLVABILITY_CRITERIA,
user_prompt = format_context(context),
)
return parse_assessment(assessment)
The judgment typically includes:
- A difficulty rating (trivial, moderate, complex, infeasible)
- A confidence score
- A natural-language rationale
2. Task Difficulty Estimation
Solvability analysis is a form of task difficulty estimation -- predicting the effort required to complete a task before beginning it. This is analogous to story point estimation in agile development, but performed by an LLM rather than a human team. The estimation function maps from task features to a difficulty space:
difficulty: TaskFeatures -> {trivial, moderate, complex, infeasible}
TaskFeatures = {
title_clarity: float, # how well-defined the task title is
description_detail: float, # level of detail in the description
scope_boundedness: float, # whether the scope is bounded
dependency_count: int, # external dependencies required
context_available: bool, # whether sufficient context is provided
}
3. Resource Allocation Under Uncertainty
Solvability analysis enables a two-tier resource allocation strategy:
allocate(issue):
solvability = analyze_solvability(issue)
if solvability.difficulty in {trivial, moderate}:
assign_to_agent(issue, priority=HIGH)
elif solvability.difficulty == complex:
assign_to_agent(issue, priority=LOW)
else: # infeasible
notify_human(issue, solvability.rationale)
This strategy maximizes the expected value of agent compute by preferentially allocating resources to issues with high expected success probability.
4. Timeout-Bounded Analysis
The analysis itself is bounded by a timeout to prevent it from becoming a bottleneck:
analyze_with_timeout(issue, timeout):
try:
result = await wait_for(analyze(issue), timeout=timeout)
return result
except TimeoutError:
return DEFAULT_ASSESSMENT # "proceed with attempt"
The timeout value represents a trade-off between assessment quality (more time allows more thorough analysis) and pipeline throughput (shorter timeouts keep the overall processing latency low).