With deadline propagation, the timeout of each specific request is set not statically, but depending on the time remaining to process the calling service's request.
If the service understands that the parent request is to time out sooner than the other timeouts could fire, then it performs the request with the time remaining to process the parent request. Because of this, after the deadline is reached, no extra time is spent on processing the request, for which no one is waiting anymore.
In general, the deadline propagation feature starts working and significantly helps to save CPU resources only when requests are massively canceled by deadline, that is, when the CPU is overloaded, Congestion Control is turned on, some hosts or services are failing, and so on. That is, it is a mechanism for increasing the stability of multiservice architectures.
Let's say, for example, that a request goes through the following services:
A -> B -> C
Let the static timeout of the service A be 20 seconds, B is 15 seconds, C is 10 seconds.
Without deadline propagation, request processing may look like this:
With deadline propagation:
See also Google's guide.
A -> B -> C
How canceling requests through connection closure would supposedly work:
In practice, it won't work that way:
In HTTP, we use custom headers, as described below.
In gRPC, we use the built-in deadline mechanism based on the grpc-timeout
header.
Database clients use timeout fields specific for them.
Summary:
The engine uses a deadline (i.e. time_point
) to measure time, and it is transmitted between hosts as a timeout (i.e. duration
). This approach has the drawback it does not take into account RTT between services.
The decision to transmit the duration
was made based on the fact that the clocks may not be synchronized accurately enough between hosts. In case of an unfortunate combination of circumstances, the service may reject the request prematurely. This problem especially affects requests with small timeouts.
The deadline propagation mechanism in userver uses custom headers.
What follows are the minimum semantic requirements for services that interact with userver-based services over HTTP to support deadline propagation.
X-YaTaxi-Client-TimeoutMs
X-YaTaxi-Deadline-Expired: <any non-empty value>
Deadline expired
Task-inherited deadline is by default propagated from the handler task to child tasks created via utils::*Async*
. There it is used in all clients that support it. This is implemented via server::request::kTaskInheritedData
and server::request::GetTaskInheritedDeadline
.
In background tasks that are started from the task of the request, but do not affect its completion, the deadline should not be propagated from the request tasks. Blocking such deadline propagation can be achieved by the following mechanisms:
concurrent::BackgroundTaskStorage::AsyncDetach
utils::AsyncBackground
engine::AsyncNoSpan
(don't use it if you are not sure that you need it!)utils::Async
(instead of utils::AsyncBackground
), requests performed in them will be interrupted along with the parent task**In some cases, it makes sense to ignore the inherited deadline and complete the request, even if no one is waiting for response from the current handle. To do this, make such a request in the scope of a server::request::DeadlinePropagationBlocker
.
If there is a header X-YaTaxi-Client-timeoutMs
in the request, the handler:
server::request::TaskInheritedData::deadline
, which is then used in clients498 Deadline Expired
is returned;Deadline expired
.Metrics:
deadline-received
(monotonic counter) - counts requests that have a deadline specified;cancelled-by-deadline
(monotonic counter) - counts requests the handling of which was cancelled by deadline (deadline expired by the end of handling, or some operation estimated that the deadline would surely expire).Log tags of the request's tracing::Span
:
deadline_received_ms=...
if the calling service has set a deadline for the requestcancelled_by_deadline=1
;dp_original_body
- the user-provided response body (if any) that was replaced by Deadline expired
;dp_original_body_size
- the size of this body in bytes.To disable deadline propagation in the static config:
server.listener.handler-defaults.deadline_propagation_enabled: false
<handle component>.deadline_propagation_enabled: false
To disable deadline propagation in the dynamic config:
false
The default HTTP status code for Deadline expired
responses is a custom userver-specific 498 Deadline Expired
code. The code is deliberately chosen in the 4xx range, because it is not a server error by itself. Given infinite time, the server would probably handle the request successfully. However, some environments may fail to handle a non-standard code, in which case you may want to configure it.
To configure HTTP status code for Deadline expired
responses:
server.listener.handler-defaults.deadline_expired_status_code: 504
<handle component>.deadline_expired_status_code: 504
The mechanism works similar to HTTP handlers. The deadline set in the context of the gRPC client is automatically passed to the context of the gRPC service. If there is a deadline:
If the deadline has already expired by the time the request is processed, then:
grpc::statusCode::DEADLINE_EXCEEDED
is returned with the message Deadline propagation: Not enough time to handle this call
.Checking the deadline when performing streaming Read
or Write
operations is not yet implemented.
Metrics:
grpc.server.by-destination.deadline-propagated {grpc_destination=SERVICE_NAME/METHOD_NAME}
(RATE) - counts calls with a set deadline;grpc.server.by-destination.cancelled-by-deadline-propagation {grpc_destination=SERVICE_NAME/METHOD_NAME}
(RATE) - counts calls for which the RPC was canceled by deadline.Log tags of the request's tracing::Span
:
deadline_received_ms=...
if the calling client has set a deadline for the request (deadlines over a year are not taken into account);cancelled_by_deadline=1
if the request processing was interrupted by the deadline.To disable deadline propagation in the static config:
deadline_propagation
from the list of middlewares
of components of servicesTo disable deadline propagation in the dynamic config:
false
If there is a task-inherited deadline, the client:
clients::http::CancelException
X-YaTaxi-Client-TimeoutMs
from the timeout (regardless of whether it was decreased through deadline propagation)clients::http::CancelException
server::request::DeadlineSignal
to the current handle that we cancelled the request because of DPX-YaTaxi-Deadline-Expired
header, it converts the response to clients::http::TimeoutException
Metrics:
timeout-updated-by-deadline
(monotonic counter) - counts requests for which the deadline was set and affected the timeout (while the request was not necessarily canceled by the deadline)cancelled-by-deadline
(monotonic counter) - counts requests which was cancelled by the task-inherited deadlineLog tags of the request's tracing::Span
:
propagated_timeout_ms
- if the deadline was set and affected the timeout (while the request was not necessarily canceled by the deadline)cancelled_by_deadline=1
- if the request was canceled by the task-inherited deadlineTo disable deadline propagation in the static config:
http-client.set-deadline-propagation-header: false
To disable deadline propagation in the dynamic config:
false
If there is a task-inherited deadline, the client uses it as an upper bound for the built-in RPC deadline as implemented by grpc++.
Metrics:
grpc.client.by-destination.deadline-propagated {grpc_destination=SERVICE_NAME/METHOD_NAME}
(RATE) - RPCs for which the original deadline was overridden by the propagated deadline (while the request was not necessarily canceled by deadline);grpc.client.by-destination.cancelled-by-deadline-propagation {grpc_destination=SERVICE_NAME/METHOD_NAME}
(RATE) - RPCs that were canceled due to the propagated deadline.Request span Tags:
deadline_updated=1
- if the RPC deadline was overridden by the task-inherited deadline;timeout_ms=...
- the final deadline value, represented as a timeout.To disable deadline propagation in the static config:
deadline_propagation
from the list of middlewares
of components of gRPC services or clientsTo disable deadline propagation in the dynamic config:
false
stores::mongo::CancelledException
is thrownmaxTimeMS
is not set or is less strict than the deadline, then maxTimeMS
is updatedstores::mongo::ClusterUnavailableException
)If the deadline has expired before the request is actually sent, then:
cancelled
cancelled_by_deadline=true
If the request is sent with deadline propagation enabled, then:
timeout_ms
is included in the log - it is the time remaining until the deadline expires at the request initiationmax_time_ms
is included in the log - it corresponds to the value of maxTimeMS
(only included for those request types where maxTimeMs
is allowed)Execute
, the exception stores::postgres::ConnectionInterrupted
is thrownredis::RequestException
is thrown with GetStatus() == redis::ReplyStatus::kTimeoutError
redis::RequestException
with GetStatus() == redis::ReplyStatus::kTimeoutError