.. _telem-guide: Telemetrics ########### This guide describes the |CL-ATTR| telemetry solution. .. important:: Telemetry in |CL| is **opt-in**. The telemetry client is **not** active and sends **no** data until you explicitly enable it. .. note:: The telemetry functionality adheres to `Intel privacy policies `_ regarding the collection and use of :abbr:`PII (Personally Identifiable Information)` and is open source. No intentionally identifiable information about the user or system owner is collected. .. contents:: :local: :depth: 1 Overview ******** Telemetrics in |CL| is a client and server solution used to collect data from running |CL| systems to help quickly identify and fix bugs in the OS. Both client and server are customizable, and an API is available on the client side for instrumenting your code for debug and analysis. Telemetry, one of the key features of |CL|, enables developers to observe and proactively address issues in the OS before end users are impacted. Telemetrics is a `portmanteau word `_ made from: * Telemetry, which is sensing and reporting data. * Analytics, which is using visualization and statistical inferencing to make sense of the reported data. |CL| telemetry reports system-level debug/crash information using specialized probes. The probes monitor system tasks such as swupd, kernel oops, machine error checks, and the BIOS error report table for unhandled hardware failures. Telemetry enables real-time issue reporting to allow system developers to focus quickly on an issue and monitor corrective actions. |CL| telemetry is fully customizable and can also be used during software development for debugging purposes. You can use the libtelemetry library in your code to create custom telemetry records. You can also use the telem-record-gen utility in script files for light-touch record creation where instrumenting code files doesn't make sense. For more information on configuring the telemetry client, refer to section `Client Configuration`_. The |CL| telemetrics solution is an **opt-in** choice on the client side. By default, the telemetry client is disabled until you choose to enable it. Enabling the client is covered in this guide. Architecture ============ |CL| telemetry has two fundamental components, which are shown in Figure 1: * Client, which generates and delivers records to the backend server via the network. * Backend, which receives records sent from the client and displays the cumulative content through a specialized web interface. .. figure:: /_figures/telemetrics/telemetry-e2e.png :alt: Figure 1, Telemetry Architecture Figure 1: :guilabel:`|CL| Telemetry Architecture` The telemetry client provides the front end of the telemetrics solution and includes the following components: * telemprobd, which is a daemon that receives and prepares telemetry records from probes and spools them to disk. * telempostd, which is a daemon that manages spooled telemetry records and delivers these records according to configurable settings. * probes, which collect specific types of data from the operating system. * libtelemetry, which is the API that telemetrics probes use to create records. The telemetry backend provides the server-side component of the telemetrics solution and consists of: * Nginx web server. * Two Flask apps: * Collector, which is an ingestion web app for records received from client probes. * TelemetryUI, which is a web app that exposes different views to visualize the telemetry data. * PostgreSQL as the underlying database server. .. note:: The default telemetry backend server is hosted by the Intel |CL| development team and is not viewable outside the Intel firewall. To collect your own records, you must set up your own telemetry backend server. How to use ********** From a workflow perspective, the |CL| telemetrics system is straightforward. On the client side, the main decisions after installation and enabling telemetry involve what to do with the record data generated by the probes. You can send the data to the default telemetry server or a custom backend server, keep the data local to the system, or both. The backend server has a more complex setup, but once it's running, it is simple to configure and use. This section describes some of the possible scenarios for configuring the |CL| telemetrics system, and suggests which ones make sense according to your needs. For more information on configuring the telemetry client, refer to section `Client Configuration`_. Scenarios ========= #. Enable telemetry: You must opt-in and start telemetry before probes can generate records. You can configure the client before starting telemetry by creating a custom :file:`telemetrics.conf` file that you place in the :file:`/etc/telemetrics` directory. If you choose to use the built-in default settings, records will be sent to the telemetrics backend server managed by the |CL| development team at Intel. #. Save record data locally: You can configure the telemetry client to save records locally. This is convenient when you want instant feedback during a development cycle, or to track system issues if you believe there is a machine-specific problem. The client can be set not to send records at all or to both keep the records locally and send to the backend server. #. Set up a server to collect data: Whether you are managing a network of |CL| systems or you don't want to send records to the default telemetry server, you can set up a backend server to collect your records. The backend server can be installed on any Linux system and provides the same dashboard as the default server. #. Instrument your code with the libtelemetry API: The :command:`telemetrics` bundle includes the libtelemetry C library, which exposes an API used by the telemprobd and telempostd daemons. You can use these in your applications as well. The API documentation is located in the :file:`telemetry.h` file in `Telemetrics client`_ repository. Examples ******** .. contents:: :local: :depth: 1 Enable or disable telemetry =========================== #. Enabling during installation: During the initial installation of |CL|, you are requested to join the stability enhancement program and allow |CL| to collect anonymous reports to improve system stability. If you choose not to join this program, then the telemetry software bundle is not added to your system. If you do choose to join the program, the installer will automatically enable telemetry on your system by installing the telemetrics bundle, creating the file :file:`/etc/telemetrics/opt-in`, and enabling the telemetrics systemd services to run after installation is complete and the system is restarted. #. Enabling after install: To install telemetry on your system, run the following commands: .. code-block:: bash sudo swupd bundle-add telemetrics sudo telemctl opt-in sudo telemctl start This installs the necessary software, enables telemetry by creating the file :file:`/etc/telemetrics/opt-in`, and starts the :command:`telemprobd` and :command:`telempostd` daemons. Your system will begin to send telemetry data to the backend server. #. Disabling after install: To disable both of the telemetry daemons, run the following command: .. code-block:: bash sudo telemctl stop #. Opt in to telemetry: To opt-in to the telemetry services, simply enter the opt-in command: .. code-block:: bash sudo telemctl opt-in sudo telemctl start This creates the :file:`/etc/telemetrics/opt-in` file, if it doesn't already exist. You will need to explicitly start the telemetry services after you have opted in. #. Opt out of telemetry: To stop sending telemetrics data from your system, opt out of the telemetry service: .. code-block:: bash sudo telemctl opt-out This removes the file :file:`/etc/telemetrics/opt-in` and stops the telemetry services. Saving data locally =================== This example requires |CL| to be installed and telemetry to be enabled on the system. To change how records are managed, copy the default :file:`/usr/share/defaults/telemetrics/telemetrics.conf` file to :file:`/etc/telemetrics/telemetrics.conf` and edit it. The changes in the :file:`/etc/telemetrics/telemetrics.conf` file will override the built-in defaults referenced in the :file:`/usr/share/defaults/telemetrics/telemetrics.conf` file. You will need root permissions to create and edit files in :file:`/etc`. For each example, and for any time you make changes to the configuration file, you must restart the client daemons to pick up the changes: .. code-block:: bash sudo telemctl restart The :command:`telemctl journal` command gives you access to features and options of the telemetry journal to assist with system analytics and debug. :command:`telemctl journal` has a number of options to help filter records. Use :command:`-h` or :command:`--help` to view usage options. #. Keep a local copy and send records to backend server: To keep a local copy of the telemetry record and also send it on to the backend server, we will need to change the :guilabel:`record_retention_enabled` configuration key value to :guilabel:`true`. #. Keep all records -- don't send to backend server: To keep records on the system without sending them to a backend server, set the :guilabel:`record_server_delivery_enabled` key value to :guilabel:`false`. Note that you will also need to ensure the :guilabel:`record_retention_enabled` configuration key value is set to :guilabel:`true` or the system will not keep local copies. #. Keep and send records to custom server: This assumes you have set up a custom server according to the next example. The server is identified by the :guilabel:`server` setting, and by default records are sent to the |CL| server :guilabel:`server=https://clr.telemetry.intel.com/v2/collector`. To change this, you can use an IP address or fully qualified domain name. Set up a backend server to collect telemetry records ==================================================== For this example, start with a clean installation of |CL| on a new system using the :ref:`bare-metal-install-server` getting started guide and: #. Join the :guilabel:`Stability Enhancement Program` to install and enable the telemetrics components. #. Select the manual installation method with the following settings: * Set the hostname to :guilabel:`clr-telem-server`, * Create an administrative user named :guilabel:`clear` and add this user to sudoers #. Log in with your administrative user, from your :file:`$HOME` directory, run :command:`git` to clone the :guilabel:`telemetrics-backend` repository into the :file:`$HOME/telemetrics-backend` directory: .. code-block:: console git clone https://github.com/clearlinux/telemetrics-backend .. note:: You may need to set up the :envvar:`https_proxy` environment variable if you have issues reaching github.com. #. Change your current working directory to :file:`telemetrics-backend/scripts`. #. Before you install the telemetrics backend with the :file:`deploy.sh` script file in the next step, here is an explanation of the options to be specified: * :command:`-a install` to perform an install * :command:`-d clr` to install to a |CL| distro * :command:`-H localhost` to set the domain to localhost .. caution:: The :file:`deploy.sh` shell script has minimal error checking and makes several changes to your system. Be sure that the options you define on the cmdline are correct before proceeding. #. Run the shell script from the :file:`$HOME/telemetrics-backend/scripts` directory: .. code-block:: console ./deploy.sh -H localhost -a install -d clr The script starts and lists all the defined options and prompts you for the :guilabel:`PostgreSQL` database password. .. code-block:: console Options: host: localhost distro: clr action: install repo: https://github.com/clearlinux/telemetrics-backend source: master type: git DB password: (default: postgres): #. For the :guilabel:`DB password:`, press the :kbd:`Enter` key to accept the default password `postgres`. .. note:: The :file:`deploy.sh` script uses :command:`sudo` to run commands and you may be prompted to enter your user password at any time while the script is executing. If this occurs, enter your user password to execute the :command:`sudo` command. #. After all the server components have been installed, you are prompted to enter the :guilabel:`PostgreSQL` database password to change it as illustrated below: .. code-block:: console Enter password for 'postgres' user: New password: Retype new password: passwd: password updated successfully Enter `postgres` for the current value of the password and then enter a new password. Retype it to verify the new password and the :guilabel:`PostgreSQL` database password will be updated. #. After the installation is complete, you can use your web browser to view the new server by opening the browser on the system and typing in :command:`localhost` in the address bar. You should see a web page similar to the one shown in Figure 2 below. .. figure:: /_figures/telemetrics/telemetry-backend-1.png :alt: Telemetry UI Figure 2: :guilabel:`Telemetry UI` Create records with telem-record-gen ==================================== The :command:`telemetrics` bundle provides a record generator tool called `telem-record-gen`. This tool can be used to create records from shell scripts or the command line when it is not desirable to write a probe in C. Records are sent to the backend server, and can also be echoed to stdout. There are three ways to supply the payload to the record: #. On the command line, use the :command:`-p ` option: .. code-block:: bash telem-record-gen -c a/b/c -n -o -p 'payload goes here' .. code-block:: console record_format_version: 4 classification: a/b/c severity: 1 machine_id: FFFFFFFF creation_timestamp: 1539023189 arch: x86_64 host_type: innotek GmbH|VirtualBox|1.2 build: 25180 kernel_version: 4.14.71-404.lts payload_format_version: 1 system_name: clear-linux-os board_name: VirtualBox|Oracle Corporation cpu_model: Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz bios_version: VirtualBox event_id: 2236710e4fc11e4a646ce956c7802788 payload goes here #. Specify a file that contains the payload with the option :command:`-P path/to/file`. .. code-block:: bash telem-record-gen -c a/b/c -n -o -P ./payload_file.txt .. code-block:: console record_format_version: 4 classification: a/b/c severity: 1 machine_id: FFFFFFFF creation_timestamp: 1539023621 arch: x86_64 host_type: innotek GmbH|VirtualBox|1.2 build: 25180 kernel_version: 4.14.71-404.lts payload_format_version: 1 system_name: clear-linux-os board_name: VirtualBox|Oracle Corporation cpu_model: Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz bios_version: VirtualBox event_id: d73d6040afd7693cccdfece479df9795 payload read from file #. If the :command:`-p` or :command:`-P` options are absent, the tool reads from stdin so you can use it in a :file:`heredoc` in scripts. .. code-block:: bash #telem-record-gen -c a/b/c -n -o << HEOF payload read from stdin HEOF .. code-block:: console record_format_version: 4 classification: a/b/c severity: 1 machine_id: FFFFFFFF creation_timestamp: 1539023621 arch: x86_64 host_type: innotek GmbH|VirtualBox|1.2 build: 25180 kernel_version: 4.14.71-404.lts payload_format_version: 1 system_name: clear-linux-os board_name: VirtualBox|Oracle Corporation cpu_model: Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz bios_version: VirtualBox event_id: 2f070e8e71679f2b1f28794e3a6c42ee payload read from stdin Set a static machine id ======================= The machine id reported by the telemetry client is rotated every three days for privacy reasons. If you wish to have a static machine id for testing purposes, you can opt in by creating a file named :file:`opt-in-static-machine-id` in the directory :file:`/etc/telemetrics/`. #. Create a directory :file:`telemetrics`. .. code-block:: bash sudo mkdir -p /etc/telemetrics #. Create the file and replace the "unique machine ID" with your desired static machine ID. .. code-block:: bash echo "unique machine ID" | sudo tee /etc/telemetrics/opt-in-static-machine-id .. note:: The machine ID is different from the system hostname. Instrument your code with the libtelemetry API ============================================== Prerequisites ------------- Confirm that the telemetrics header file is located on the system at :file:`usr/include/telemetry.h`. The `latest version`_ of the file can also be found on github for reference, but installing the :command:`telemetrics` bundle will install the header file that matches your |CL| version. #. Includes and variables: You must include the following headers in your code to use the API: .. code-block:: console #define _GNU_SOURCE #include #include #include #include Use the following code to create the variables needed to hold the data for the record to be created: .. code-block:: console uint32_t severity = 1; uint32_t payload_version = 1; char classification[30] = "org.clearlinux/hello/world"; struct telem_ref *tm_handle = NULL; char *payload; int ret = 0; Severity: Type: uint32_t Value: Severity field value. Accepted values are in the range 1-4, with 1 being the lowest severity and 4 being the highest severity. Values provided outside of this range are clamped to 1 or 4 [low, med, high, crit]. Payload_version: Type: uint32_t Value: Payload format version. The only currently supported value is 1, which indicates that the payload is a freely-formatted (unstructured) string. Values greater than 1 are reserved for future use. Classification: Type: char array Value: It should have the form, DOMAIN/PROBENAME/REST: DOMAIN is the reverse domain to use as a namespace for the probe (e.g. org.clearlinux), PROBENAME is the name of the probe, and REST is an arbitrary value that the probe should use to classify the record. The maximum length for the classification string is 122 bytes. Each sub-category may be no longer than 40 bytes long. Two \'/\' delimiters are required. Tm_handle: Type: Telem_ref struct pointer Value: Struct pointer declared by the caller. The struct is initialized if the function returns success. Payload: Type: char pointer Value: The payload to set. #. For this example, we'll set the payload to “hello” by using :command:`asprintf()`: .. code-block:: console if (asprintf(&payload, "hello\n") < 0) { exit(EXIT_FAILURE); } The functions :command:`asprintf()` and :command:`vasprintf()` are analogs of :command:`sprintf(3)` and :command:`vsprintf(3)`, except that they allocate a string large enough to hold the output including the terminating null byte ('\0'), and return a pointer to it via the first argument. This pointer should be passed to :command:`free(3)` to release the allocated storage when it is no longer needed. #. Create the new telemetry record: The function :command:`tm_create_record()` initializes a telemetry record and sets the severity and classification of that record, as well as the payload version number. The memory needed to store the telemetry record is allocated and should be freed with :command:`tm_free_record()` when no longer needed. .. code-block:: console if ((ret = tm_create_record(&tm_handle, severity, classification, payload_version)) < 0) { printf("Failed to create record: %s\n", strerror(-ret)); ret = 1; goto fail; } #. Set the payload field of a telemetrics record: The function :command:`tm_set_payload()` attaches the provided telemetry record data to the telemetry record. The current maximum payload size is 8192b. .. code-block:: console if ((ret = tm_set_payload(tm_handle, payload)) < 0) { printf("Failed to set record payload: %s\n", strerror(-ret)); ret = 1; goto fail; } free(payload); The :command:`free()` function frees the memory space pointed to by `ptr`, which must have been returned by a previous call to :command:`malloc()`, :command:`calloc()`, or :command:`realloc()`. Otherwise, or if :command:`free(ptr)` has already been called before, undefined behavior occurs. If `ptr` is NULL, no operation is performed. #. Send a record to the telemetrics daemon: The function :command:`tm_send_record()` delivers the record to the local :command:`telemprobd(1)` service. Since the telemetry record was allocated by the program it should be freed with :command:`tm_free_record()` when it is no longer needed. .. code-block:: console if ((ret = tm_send_record(tm_handle)) < 0) { printf("Failed to send record to daemon: %s\n", strerror(-ret)); ret = 1; goto fail; } else { printf("Successfully sent record to daemon.\n"); ret = 0; } fail: tm_free_record(tm_handle); tm_handle = NULL; return ret; #. A full sample application with compiling flags: Create a new file :file:`test.c` and add the following code: .. code-block:: console #define _GNU_SOURCE #include #include #include #include int main(int argc, char **argv) { uint32_t severity = 1; uint32_t payload_version = 1; char classification[30] = "org.clearlinux/hello/world"; struct telem_ref *tm_handle = NULL; char *payload; int ret = 0; if (asprintf(&payload, "hello\n") < 0) { exit(EXIT_FAILURE); } if ((ret = tm_create_record(&tm_handle, severity, classification, payload_version)) < 0) { printf("Failed to create record: %s\n", strerror(-ret)); ret = 1; goto fail; } if ((ret = tm_set_payload(tm_handle, payload)) < 0) { printf("Failed to set record payload: %s\n", strerror(-ret)); ret = 1; goto fail; } free(payload); if ((ret = tm_send_record(tm_handle)) < 0) { printf("Failed to send record to daemon: %s\n", strerror(-ret)); ret = 1; goto fail; } else { printf("Successfully sent record to daemon.\n"); ret = 0; } fail: tm_free_record(tm_handle); tm_handle = NULL; return ret; } Compile with the gcc compiler, using this command: .. code-block:: bash gcc test.c -ltelemetry -o test_telem Test to ensure the program is working: .. code-block:: bash ./test_telem Successfully sent record to daemon. .. note:: A full example of the `heartbeat probe`_ in C is documented in the source code. Reference ********* .. contents:: :local: :depth: 1 The telemetry API ================= Installing the :command:`telemetrics` bundle includes the libtelemetry C library, which exposes an API used by the telemprobd and telempostd daemons. You can use these in your applications as well. The API documentation is found in the :file:`telemetry.h` file in `Telemetrics client`_ repository. Client configuration ==================== The telemetry client will look for the configuration file located at :file:`/etc/telemetrics/telemetrics.conf` and use it if it exists. If the file does not exist, the client will use the default configuration defined at build time. There is a sample configuration file located at :file:`/usr/share/defaults/telemetrics/telemetrics.conf` and represents the default values that are used when the programs are built. To modify or customize the configuration, copy the file from :file:`/usr/share/defaults/telemetrics/telemetrics.conf` to the file :file:`/etc/telemetrics/telemetrics.conf` and edit it to add your customizations. .. code-block:: bash sudo mkdir -p /etc/telemetrics cp /usr/share/defaults/telemetrics/telemetrics.conf /etc/telemetrics/telemetrics.conf .. note:: Telemetrics configuration is a layered mechanism since the defaults are defined at build time and each field can be overwritten individually. Therefore you only need to add the specific field that you want to change from the default value to your customized value in the :file:`/etc/telemetrics/telemetrics.conf` file. Configuration options --------------------- The client can use the following configuration options from the config file: server This specifies the web server to which telempostd sends the telemetry records. socket_path This specifies the path of the unix domain socket on which telemprobd listens for connections from the probes. spool_dir This configuration option is related to spooling. If the daemon is not able to send the telemetry records to the backend server due to reasons such as the network availability, then it stores the records in a spool directory. This option specifies the path of the spool directory. This directory should be owned by the same user as the daemon. record_expiry This is the time, in minutes, after which the records in the spool directory are deleted by the daemon. spool_process_time This specifies the time interval, in seconds, that the daemon waits before checking the spool directory for records. The daemon picks up the records in the order of modification date and tries to send the record to the server. It sends a maximum of 10 records at a time. If it was able to send a record successfully, it deletes the record from the spool. If the daemon finds a record older than the "record_expiry" time, then it deletes that record. The daemon looks at a maximum of 20 records in a single spool run loop. rate_limit_enabled This determines whether rate-limiting is enabled or disabled. When enabled, there is a threshold on both records sent within a window of time, and record bytes sent within a window a time. record_burst_limit This is the maximum amount of records allowed to be passed by the daemon within the record_window_length of time. If set to -1, the rate-limiting for record bursts is disabled. record_window_length The time, in minutes (0-59), that establishes the window length for the record_burst_limit. For example, if record_burst_window=1000 and record_window_length=15, then no more than 1000 records can be passed within any given fifteen-minute window. byte_burst_limit This is the maximum amount of bytes that can be passed by the daemon within the byte_window_length of time. If set to -1, the rate-limiting for byte bursts is disabled. byte_window_length This is the time, in minutes (0-59), that establishes the window length for the byte_burst_limit. rate_limit_strategy This is the strategy chosen once the rate-limiting threshold has been reached. Currently the options are 'drop' or 'spool', with spool being the default. If spool is chosen, records will be spooled and sent at a later time. record_retention_enabled When this key is enabled (true), the daemon saves a copy of the payload on disk from all valid records. To avoid the excessive use of disk space, only the latest 100 records are kept. The default value for this configuration key is false. record_server_delivery_enabled This key controls the delivery of records to the server; when enabled (default value), the record will be posted to the address in the configuration file. If this configuration key is disabled (false), records will not be spooled or posted to backend. This configuration key can be used in combination with record_retention_enabled to keep copies of telemetry records locally only. .. note:: Configuration options may change as the telemetry client evolves. Please use the comments in the default file itself as the most accurate reference for configuration. Client run-time options ======================= The |CL| telemetry client provides an admin tool called :guilabel:`telemctl` for managing the telemetry services and probes. The tool is located in :file:`/usr/bin`. Running it with no argument results in the following: .. code-block:: bash sudo telemctl .. code-block:: console /usr/bin/telemctl - Control actions for telemetry services stop Stops all running telemetry services start Starts all telemetry services restart Restarts all telemetry services is-active Checks if telemprobd and telempostd are active opt-in Opts in to telemetry, and starts telemetry services opt-out Opts out of telemetry, and stops telemetry services journal Prints telemetry journal contents. Use -h argument for more options start/stop/restart ------------------ The commands to start, stop, and restart the telemetry services manage all required services and probes on the system. There is no need to separately start/stop/restart the two client daemons telemprobd and telempostd. The :command:`restart` command option will call :command:`telemctl stop` followed by :command:`telemctl start` . is-active --------- The :command:`is-active` option reports whether the two client daemons are active. This is useful to verify that the :command:`opt-in` and :command:`opt-out` options have taken effect, or to ensure that telemetry is functioning on the system. Note that both daemons are verified. .. code-block:: bash sudo telemctl is-active .. code-block:: console telemprobd : active telempostd : active .. _Telemetrics client: https://github.com/clearlinux/telemetrics-client/ .. _latest version: https://github.com/clearlinux/telemetrics-client/tree/master/src .. _heartbeat probe: https://github.com/clearlinux/telemetrics-client/tree/master/src/probes/hello.c