Implementation:Tensorflow Serving Retrier
| Knowledge Sources | |
|---|---|
| Domains | Resilience, Utility |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
A utility function that retries a given operation with a fixed interval between attempts, supporting configurable retry counts and an optional predicate to cancel retries based on error status.
Description
The Retry() function executes a provided function (retried_fn) and, if it fails, retries it up to max_num_retries additional times. Between each retry, the function sleeps for retry_interval_micros microseconds using the default environment. An optional should_retry predicate allows early cancellation of the retry loop based on the status returned by the retried function. The function logs each retry attempt at INFO level and each failure at ERROR level. It returns the status from the last invocation of retried_fn. The retry loop terminates when: (1) retried_fn returns OK, (2) the maximum number of retries is exhausted, or (3) should_retry returns false.
Usage
Use this for operations that may transiently fail, such as connecting to a remote model source, loading a model file, or establishing a gRPC connection. The fixed-interval approach is suitable for moderate retry scenarios in the serving system.
Code Reference
Source Location
- Repository: Tensorflow_Serving
- File:
tensorflow_serving/util/retrier.h(header),tensorflow_serving/util/retrier.cc(implementation) - Lines: 1-44 (header), 1-57 (implementation)
Signature
absl::Status Retry(
const string& description,
uint32 max_num_retries,
int64_t retry_interval_micros,
const std::function<absl::Status()>& retried_fn,
const std::function<bool(absl::Status)>& should_retry =
[](absl::Status status) { return true; });
Import
#include "tensorflow_serving/util/retrier.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| description | const string& |
Yes | A human-readable description used for logging |
| max_num_retries | uint32 |
Yes | Maximum number of retry attempts after the initial call |
| retry_interval_micros | int64_t |
Yes | Microseconds to sleep between retry attempts |
| retried_fn | std::function<absl::Status()> |
Yes | The function to execute and potentially retry |
| should_retry | std::function<bool(absl::Status)> |
No | Predicate to decide whether to continue retrying; defaults to always true |
Outputs
| Name | Type | Description |
|---|---|---|
| return | absl::Status |
The status returned by the last call to retried_fn |
Usage Examples
Retrying a Connection
auto status = Retry(
"connecting to model source",
/*max_num_retries=*/5,
/*retry_interval_micros=*/1000000, // 1 second
[&]() -> absl::Status {
return ConnectToSource(source_path);
});
TF_RETURN_IF_ERROR(status);
Retrying with Conditional Cancellation
auto status = Retry(
"loading model",
/*max_num_retries=*/3,
/*retry_interval_micros=*/500000,
[&]() { return loader->Load(); },
[](absl::Status s) {
// Don't retry on permission denied
return s.code() != absl::StatusCode::kPermissionDenied;
});