.. _performance: Performance ########### |CL-ATTR| is built with optimizations across the whole stack for improved performance. |CL| achieves its performance through a variety of design decisions and software building techniques. .. contents:: :local: :depth: 1 Overview ******** The |CL| philosophy is to do everything with performance in mind. The |CL| team applies this philosophy in the project's codebase and operating culture. Below are some examples of the |CL| philosophy: **Consider performance holistically.** Performance optimizations are considered across hardware and software. |CL| shows the performance potential of a holistic approach on Linux, using Intel® architecture with optimizations across the full stack. **Optimize for runtime performance.** In general, |CL| will trade the one-time cost of longer build time and larger storage footprint for the repeated benefit of improved runtime performance. |CL| users benefit from the optimized software but aren't affected by the increased build time because the |CL| team builds the software before distributing it to |CL| clients. **Optimize performance for server and cloud use cases first.** Design decisions that optimize performance for server and cloud also benefit other use cases, such as IoT devices and desktop clients. |CL| has become well-known for the performance it can deliver. `Phoronix publishes Linux performance comparisons `_ that include |CL|. Software build toolchain ************************ |CL| uses many techniques in its software build toolchain to improve software performance, such as aggressive compiler flags and CPU-specific optimizations. If maintained manually, these techniques can become complex to support due to the volume of packages and the potential for technical drift of package performance configurations. The |CL| team built the :ref:`autospec` tool to manage this complexity and to apply the techniques used in the software build toolchain across the entire project. autospec is available as part of the OS for developers to use when they build their own projects on |CL|. Latest versions of compilers and low-level libraries ==================================================== |CL| is a rolling release distribution and follows upstream software repositories, including compilers and libraries, for updates. |CL| includes upstream source-level optimizations as soon as they're available. A benchmark approach to compiler performance ============================================ |CL| chooses the compiler used to build each software package on a case-by-case basis to maximize performance. Typically, |CL| uses the open source `GNU Compiler Collection `_ (GCC) with the standard low-level libraries `Glibc `_ and `libstdc++ `_ for C and C++ programming languages. If there is a performance advantage, |CL| will build packages with `Clang / LLVM `_. |CL| uses patched compilers and low-level libraries for exact control of the software build. Patches include changes that default to more aggressive optimizations or optimizations that haven't yet been merged upstream. View the full list of patches in the autospec repositories on GitHub: * https://github.com/clearlinux-pkgs/gcc * https://github.com/clearlinux-pkgs/glibc * https://github.com/clearlinux-pkgs/llvm Aggressive compiler flags ========================= |CL| uses aggressive `compiler flags `_ to optimize software builds for runtime performance. Some significant flags that |CL| often implements are: `mtune and march `_ Options used to tune generated code with optimized instructions for specific CPU types instead of creating generic code for maximum compatibility. |CL| defines its minimum hardware requirements to be second-generation Intel® microarchitecture code name Westmere (released in 2010) or later. This enables compiler optimizations that are available only on newer architectures. Whenever possible, |CL| tunes code for the Haswell generation processors or newer. |CL| sets :command:`march=westmere` and :command:`mtune=haswell`. .. note:: |CL| doesn't require Advanced Encryption Standard (AES), so it should run on some Intel CPUs from the first generation of Intel® microarchitecture code name Nehalem (released in 2008). Refer to the `recommended minimum system requirements `_ for specific requirements. `O3 `_ The largest preset of compiler options optimizations for performance. O3 favors runtime performance. View the "Optimize Options" section of the GCC man page for additional information: :command:`man gcc` `LTO `_ Link-time optimization that performs an optimization between compiled object files and creation of executable binaries by adding extra information to the compiled object to help the linker. `PGO `_ Profile guided optimization or field guided optimization performs optimization based on information sampled during the execution of the program. Compiler flags are set at different levels in the |CL| build environment: User flags The set of default flags used by |CL| when a user compiles software from source. The flags are exported as system-wide environment variables from the `/usr/share/defaults/etc/profile `_ file to the user’s shell by default. These are the standard variables read by the compiler, named :command:`*FLAGS`, depending on the compiler. .. note:: Source code may come with software build systems that override these values. This will cause a difference in expected flags. The |CL| autospec tooling will attempt to ignore these overrides, but the build system may still need patching. A manual build will not ignore the build system override values if they exist. Global flags Compiler flags applied at a global level for all packages. The |CL| RPM configuration (`clr-rpm-config `_) contains global compiler flags. Search the :file:`macros` file for :command:`global_cflags` and search the :file:`rpmrc` file for :command:`optflags`. Global compiler flags may be overridden. .. note:: |CL| doesn't use RPMs to install software. |CL| distributes software in the form of :ref:`bundles-guide`. The RPM format is only used during the |CL| build process as a way to resolve dependencies. Per-package flags Compiler flags applied at a per-package level. The package's autospec repository contains the package-specific compiler flags. Search the :file:`.spec` file for the section starting with :command:`export CFLAGS`. Multiple builds of libraries with CPU-specific optimizations ============================================================ To fully use the capabilities in different generations of CPU hardware, |CL| will perform multiple builds of libraries with CPU-specific optimizations. For example, |CL| builds libraries with Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512). |CL| can then dynamically link to the library with the newest optimization based on the processor in the running system. Runtime libraries used by ordinary applications benefit from these CPU specific optimizations. The autospec repository for Python* shows an example of this optimization: https://github.com/clearlinux-pkgs/python3 Kernel ****** A modern kernel with variants optimized for different platforms =============================================================== |CL| is a rolling release distribution that uses the newest upstream Linux kernel. The Linux kernel has frequent updates which can include performance enhancements. It's a policy of the |CL| team to try to upstream any performance enhancements in the Linux kernel for all to use. |CL| `builds different kernel variants `_ for compatibility with specific platforms. For example, kernels meant to run on virtual machines skip support for much of the physical hardware that doesn’t show up in VM environments and will slow down boot. View the kernel configuration and patches to the default native kernel in the autospec repository: https://github.com/clearlinux-pkgs/linux/ Utility to enforce kernel runtime parameters ============================================ The Linux kernel exposes parameters for tuning the behavior of drivers and devices such as certain buffers and resource management strategies. |CL| uses a small utility, `clr-power-tweaks `_, to set and enforce kernel parameter values weighted towards performance upon boot. View the set performance values by running :command:`sudo clr_power --debug`. Operating system **************** Operating system and software build-time optimizations set the stage for high performance. Decisions made after the installation of |CL| are equally as important. CPU performance governor ======================== |CL| uses the performance CPU governor which calls for the CPU to operate at maximum clock frequency. In other words, P-state P0. The idea behind prioritizing maximum CPU performance is that the faster a program finishes execution, the faster the CPU can return to a low energy idle state. See the `CPU Power and Performance documentation `_ for further details. Restructured boot sequence ========================== To optimize boot speed, |CL| uses a restructured order for boot processes that minimizes the time services wait on slow operations and the time boot processes wait on each other. Systemd-bootchart is a tool for graphing the boot sequence and writes logs to a file under :file:`/run/log`. The tool and corresponding log file make diagnosing slow boot problems easier. All |CL| systems have `systemd-bootchart `_ enabled by default for every boot. systemd-bootchart configuration is non-blocking to not materially slow down boot performance. Related topics ************** * :ref:`cpu-performance` * `A Linux* OS for Linux Developers `_ * `The Performance Race `_ * `Boosting Python* from profile-guided to platform-specific optimizations `_ * `Transparent use of library packages optimized for Intel® architecture `_ *Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries.*