Principle:MaterializeInc Materialize Test Result Aggregation
| Knowledge Sources | CI noise reduction, automated error triage, pattern matching for defect classification |
|---|---|
| Domains | Continuous Integration, Error Classification, Issue Tracking, Test Analytics |
| Last Updated | 2026-02-08 |
Overview
Test result aggregation is the principle of automatically classifying CI test failures as known issues or unknown errors by matching error patterns against a database of open GitHub issues, thereby reducing CI noise and accelerating developer triage.
Description
In large-scale CI systems, test failures are a daily occurrence. Many of these failures are caused by known issues -- bugs that have already been identified, filed as GitHub issues, and are being actively worked on. When developers see a CI failure, they must manually investigate the logs, identify the error, search for related issues, and determine whether the failure is new or known. This manual triage process is time-consuming and demoralizing.
Automated error triage eliminates this manual work by:
- Collecting errors from multiple sources: JUnit XML test reports, plain-text log files, and secret detection results.
- Fetching known issues from GitHub, where each issue contains a regex pattern in a specially formatted body that describes the error signature.
- Pattern matching each observed error against all known issue patterns. Errors are classified into three categories:
- Known issues (open): The error matches an open GitHub issue. The failure is expected and may be annotated with
ci-ignore-failure: trueto prevent it from failing the build. - Potential regressions (closed): The error matches a closed GitHub issue. This may indicate a regression -- a bug that was fixed but has reappeared.
- Unknown errors: The error does not match any known issue. This is likely a new bug that requires investigation.
- Known issues (open): The error matches an open GitHub issue. The failure is expected and may be annotated with
- Annotating the build with structured information about each error, including links to the matching GitHub issue, error details, and failure history on the main branch.
- Recording analytics in a test analytics database for trend analysis and flakiness tracking.
The classification result also influences the build's final exit code:
- If all errors are known and marked
ci-ignore-failure: true, the build can be marked as successful despite the test failure. - If there are unknown errors but the test itself passed, the build is marked as failed (to flag unexpected error logs).
Usage
Apply automated error triage when:
- The CI system has a significant volume of test failures, and many are caused by known issues.
- Developers spend substantial time investigating failures that turn out to be known.
- The project uses GitHub issues to track known bugs, and those issues can include error regex patterns.
- The team wants to distinguish between new failures (requiring immediate attention) and known flaky tests.
- Build analytics and failure trend tracking are desired.
Theoretical Basis
The theoretical foundation of automated error triage is pattern-based classification, a simplified form of text classification where each class (known issue) is defined by a regex pattern rather than a trained model.
Algorithm:
For each observed error e with text t(e):
- For each known issue
iwith regexr(i), states(i), and optional filtersapply_to(i),location(i):- If
r(i)matchest(e)ands(i) = openand filters pass: classifyeas known issue linked toi. - If
r(i)matchest(e)ands(i) = closedand filters pass: classifyeas potential regression linked toi.
- If
- If no issue matched: classify
eas unknown error.
Deduplication: Each known issue is reported at most once per build step, even if the pattern matches multiple errors. This prevents annotation spam for frequently recurring errors.
Ignore-failure semantics: The ci-ignore-failure flag on a known issue indicates that the failure is expected and should not block the build. The build's exit code is overridden to 0 if all errors are known and all matching issues have this flag set. This allows known flaky tests to continue running (providing signal about whether they are still failing) without blocking development.
Regression detection: Matching against closed issues provides early warning of regressions. If a closed issue's pattern matches a new error, it is likely that the fix has regressed and the issue should be reopened.
Error sources: The system handles multiple error formats:
- Log files: Scanned for regex matches against raw log text.
- JUnit XML: Parsed to extract test case failures with structured message and text fields.
- Secret detection: Identifies accidentally leaked credentials in logs.