A QOS dynamic config is a map from a full method name (namespace.of.Service/Method
) to the settings for that method. It has the type ugrpc::client::ClientQos and usually provided to the ugrpc::client::SimpleClientComponent constructor.
There is also a __default__
, which is the default for all the missing methods and usually used for most "lightweight" methods of the service.
__default__
is NOT recursively merged with per-method settings: if a key for a specific method exists, __default__
is ignored for it.
Carefully check that you have entered the full method name correctly. There is currently no protection against typos in this area; an invalid method is silently ignored.
If the timeout-ms
field is not specified for the current request, the timeout is considered infinite.
In a request, in addition to timeout-ms
, the deadline of the current handler is also considered; the minimum of the two is used.
Timeouts apply to the entire RPC, from the moment the RPC is created until it is closed. To prevent the stream from being randomly terminated, override the QOS for streaming RPCs to an infinite timeout (by omitting the timeout-ms
field).
There is no option to configure a timeout for each individual message in a stream (from sending a request to receiving the corresponding response). Instead, it is common practice in gRPC to use keepalive pings to ensure that the server is still alive:
channel-args
static config option of ugrpc::client::ClientFactoryComponent.For unary RPCs, clients automatically retry calls that fail with an error.
Conditions for a retry to occur:
MaxAttempts
;For non-codegenerated clients, retries are disabled by default. They can be enabled in any of the ways listed below.
The maximum number of attempts can be configured as follows, in order of increasing priority:
Retries can be disabled by specifying 1 attempt.
The following tags are set in the call span:
max_attempts=<MaxAttempts>
(the limit on the number of attempts)attempts=<attempts performed>
(the number of attempts actually made)The following tags are also added to all client logs:
max_attempts=<MaxAttempts>
(the limit on the number of attempts)attempts=<current attempt>
(the number of the current attempt, starting from 1)Middlewares are run for each attempt, and logs are written, including a log with an error status, if any.
The client span is shared across all attempts. With retries, a single client span in the tracing system will have multiple child spans from the downstream service's handler.
By default, ugrpc leaves the default retry behavior of grpc-core as is, without any additional configuration. This means that only low-level HTTP/2 errors will be retried (such retries are called Transparent retries).
You can set your own settings for grpc-core retries via the static config:
You can read more about grpc-core retry settings in the official documentation (https://grpc.io/docs/guides/retry/).
Feature | userver-grpc retries | grpc-core retries |
---|---|---|
QOS | + | - |
static config | retry-config.attempts | default-service-config (grpc-core format) |
middlewares | hooks are called on every retry attempt | hooks are called once for the entire RPC |
observability | each attempt is written separately to logs and metrics, span tags are available | no information about intermediate attempts |
retriable status codes | hardcoded list | can be configured via static config |
gRPC call becomes committed | retries are performed regardless of response metadata | No further retries will be attempted |
Conditions for a retry to occur:
RST_STREAM(REFUSED_STREAM)
, but not RST_STREAM(INTERNAL)
.See When Retries are Valid for more info.
It was also discovered experimentally that a Python grpc-io server causes retries to be aborted if the status is returned as follows:
To prevent this issue, you should do this instead:
perAttemptTimeoutMs
field. But after we started using this option, it turned out that grpc-core sometimes encounters a race condition and hangs indefinitely.