userver: Production configs and best practices
Loading...
Searching...
No Matches
Production configs and best practices

A good production ready service should have functionality for various cases:

  • Overload
    • Service should respond with HTTP 429 codes to some requests while still being able to handle the rest
  • Debugging of a running service
    • inspect logs
    • get more logs from the suspiciously behaving service and then turn the logging level back
    • profile memory usage
    • see requests in flight
  • Experiments
    • Should be a way to turn on/off arbitrary functionality without restarting the service
  • Metrics and Logs
  • Functional testing

This tutorial shows a configuration of a typical production ready service. For information about service interactions with other utilities and services in container see Deploy Environment Specific Configurations.

Before you start

Make sure that you can compile and run core tests and read a basic example Writing your first HTTP server.

int main

utils::DaemonMain initializes and starts the component system with the provided command line arguments:

#include <userver/alerts/handler.hpp>
#include <userver/storages/secdist/provider_component.hpp>
int main(int argc, char* argv[]) {
const auto component_list = components::ComponentList()
.Append<components::DefaultSecdistProvider>()
.Append<alerts::Handler>()
// Put your handlers and components here
;
return utils::DaemonMain(argc, argv, component_list);
}

A path to the static config file should be passed from a command line to start the service:

bash
./samples/userver-samples-production_service --config /etc/production-service/static_config.yaml

Static config

Full static config could be seen at samples/production_service/static_config.yaml

Important parts are described down below.

Variables

Static configs tend to become quite big, so it is a good idea to move changing parts of it into variables. To do that, declare a config_vars field in the static config and point it to a file with variables.

# yaml
config_vars: /etc/production_service/config_vars.yaml

A file with config variables could look like this.

Now in static config you could use $variable-name to refer to a variable, *#fallback fields are used if there is no variable with such name in the config variables file:

# yaml
http-client:
fs-task-processor: fs-task-processor
user-agent: $server-name
user-agent#fallback: 'userver-based-service 1.0'

Task processors

A good practice is to have at least 3 different task processors:

# yaml
task_processors:
fs-task-processor: # for blocking operations
thread_name: fs-worker
worker_threads: $fs_worker_threads
worker_threads#fallback: 2
main-task-processor: # for nonblocking operations
thread_name: main-worker
worker_threads: $main_worker_threads
worker_threads#fallback: 6
monitor-task-processor: # for monitoring
thread_name: mon-worker
worker_threads: $monitor_worker_threads
worker_threads#fallback: 1
event_thread_pool: # ev pools to deal with OS events
threads: $event_threads
threads#fallback: 2

Moving blocking operations into a separate task processor improves responsiveness and CPU usage of your service. Monitor task processor helps to get statistics and diagnostics from server under heavy load or from a server with a deadlocked threads in the main task processor.

Warning
This setup is for an abstract service on an abstract 8 core machine. Benchmark your service on your hardware and hand-tune the thread numbers to get optimal performance.

Listeners/Monitors

Note the components::Server configuration:

# yaml
server:
listener:
# If your service is behind nginx or some other local proxy, it is
# efficient to accepts incoming requests from unix-socket
#
# unix-socket: /var/run/production_service/service.socket
port: $server-port
port#fallback: 8085
connection:
in_buffer_size: 32768
requests_queue_size_threshold: 100
task_processor: main-task-processor
listener-monitor:
# Listen on localhost:8085 for developer/utility requests
port: $monitor-server-port
port#fallback: 8086
connection:
in_buffer_size: 32768
requests_queue_size_threshold: 100
task_processor: monitor-task-processor
logger_access: ''
logger_access_tskv: ''
max_response_size_in_flight: 1000000000
server-name: $server-name

In this example we have two listeners. it is done to separate clients and utility/diagnostic handlers to listen on different ports or even interfaces.

Utility handlers

Your server has the following utility handlers:

# yaml
handler-inspect-requests:
path: /service/inspect-requests
method: GET
task_processor: monitor-task-processor
handler-jemalloc:
path: /service/jemalloc/prof/{command}
method: POST
task_processor: monitor-task-processor
handler-log-level:
path: /service/log-level/{level}
method: GET,PUT
task_processor: monitor-task-processor
handler-on-log-rotate:
path: /service/on-log-rotate/
method: POST
task_processor: monitor-task-processor
handler-dynamic-debug-log:
path: /service/log/dynamic-debug
method: GET,PUT,DELETE
task_processor: monitor-task-processor
handler-dns-client-control:
path: /service/dnsclient/{command}
method: POST
task_processor: monitor-task-processor
handler-server-monitor:
path: /service/monitor
method: GET
task_processor: monitor-task-processor
handler-fired-alerts:
path: /service/fired-alerts
method: GET
task_processor: monitor-task-processor

All those handlers live on a separate components.server.listener-monitor address, so you have to request them using the listener-monitor credentials:

bash
$ curl http://localhost:8085/service/log-level/
{"init-log-level":"info","current-log-level":"info"}
$ curl -X PUT 'http://localhost:8085/service/log-level/warning'
{"init-log-level":"info","current-log-level":"warning"}

Ping

This is a server::handlers::Ping handle that returns 200 if the service is OK, 500 otherwise. Useful for balancers, that would stop sending traffic to the server if it responds with codes other than 200.

# yaml
handler-ping:
path: /ping
method: GET
task_processor: main-task-processor # !!!
throttling_enabled: false
url_trailing_slash: strict-match

Note that the ping handler lives on the task processor of all the other handlers. Smart balancers may measure response times and send less traffic to the heavy loaded services.

bash
$ curl --unix-socket service.socket http://localhost/ping -i
HTTP/1.1 200 OK
Date: Thu, 01 Jul 2021 12:46:07 UTC
Content-Type: text/html; charset=utf-8
X-YaRequestId: 39e3f54b86984b8ca5235876dc566b27
Server: sample-production-service 1.0
X-YaTraceId: 4d7f8aa03e2d4e4d80a92a3ccecfbe6d
Connection: keep-alive
Content-Length: 0

Dynamic configs of a sample production service

Here's a configuration of a dynamic config related components components::DynamicConfigClient, components::DynamicConfig, components::DynamicConfigClientUpdater.

Service starts with some dynamic config values from defaults and updates dynamic values from a configs service at startup. If the first update fails, the values are retrieved from dynamic-config.fs-cache-path file (if it exists).

# yaml
dynamic-config:
updates-enabled: true
fs-cache-path: $config-cache
fs-task-processor: fs-task-processor
dynamic-config-client:
config-url: $config-server-url
http-retries: 5
http-timeout: 20s
service-name: $service-name
dynamic-config-client-updater:
config-settings: false
first-update-fail-ok: true
full-update-interval: 1m
update-interval: 5s
Note
Dynamic configs is an essential part of a reliable service with high availability. Those could be used as an emergency switch for new functionality, selector for experiments, limits/timeouts/log-level setup, proxy setup. See Dynamic config schemas for more info and Writing your own configs server for insights on how to implement such service.

Congestion Control

See Congestion Control.

congestion_control::Component limits the active requests count. In case of overload it responds with HTTP 429 codes to some requests, allowing your service to properly process handle the rest.

All the significant parts of the component are configured by dynamic config options USERVER_RPS_CCONTROL and USERVER_RPS_CCONTROL_ENABLED.

# yaml
congestion-control:
fake-mode: $testsuite-enabled
load-enabled: true

It is a good idea to disable it in unit tests to avoid getting HTTP 429 on an overloaded CI server.

Metrics

Metrics is a convenient way to monitor the health of your service.

Typical setup of components::SystemStatisticsCollector and components::StatisticsStorage is quite trivial:

# yaml
system-statistics-collector:
fs-task-processor: fs-task-processor

With such setup you could poll the metrics from handler server::handlers::ServerMonitor that we've configured in previous section. However a much more mature approach is to write a component that pushes the metrics directly into the remote metrics aggregation service or to write a handler that provides the metrics in the native aggregation service format.

To produce metrics in declarative style use the utils::statistics::MetricTag or register your metrics writer in utils::statistics::Storage via RegisterWriter to produce metrics on per component basis. To test metrics refer to the testsuite metrics testing.

List of userver built-in metrics could be found at Service Statistics and Metrics (Prometheus/Graphite/...).

Alerts

Alerts is a way to propagate critical errors from your service to a monitoring system.

When the code identifies that something bad happened and a user should be notified about that, alert_storage.FireAlert() is called with the appropriate arguments. Then the alert subsystem notifies an external monitoring system (or a user) about the alert event though the specific HTTP handler.

Secdist - secrets distributor

Storing sensitive data aside from the configs is a good practice that allows you to set different access rights for the two files.

components::Secdist configuration is straightforward:

# yaml
default-secdist-provider:
config: $secdist-path

Refer to the storages::secdist::SecdistConfig config for more information on the data retrieval.

Testsuite related components

server::handlers::TestsControl is a handle that allows controlling the service from test environments. That handle is used by the testsuite from functional tests to mock time, invalidate caches, testpoints and many other things. This component should be disabled in production environments.

components::TestsuiteSupport is a lightweight storage to keep minor testsuite data. This component is required by many high-level components and it is safe to use this component in production environments.

Build

This sample requires configs service, so we build and start one from our previous tutorials.

bash
mkdir build_release
cd build_release
cmake -DCMAKE_BUILD_TYPE=Release ..
make userver-samples-config_service
./samples/userver-samples-config_service &
make userver-samples-production_service
python3 ../samples/tests/prepare_production_configs.py
./samples/userver-samples-production_service --config /tmp/userver/production_service/static_config.yaml

Functional testing

Functional tests are used to make sure that the service is working fine and implements the required functionality. A recommended practice is to build the service in Debug and Release modes and tests both of them, then deploy the Release build to the production, disabling all the tests related handlers.

Debug builds of the userver provide numerous assertions that validate the framework usage and help to detect bugs at early stages.

Typical functional tests for a service consist of a conftest.py file with mocks+configs for the sereffectivelyvice and a bunch of test_*.py files with actual tests. Such approach allows to reuse mocks and configurations in different tests.

Full sources

See the full example at