Maintenance , Power and Performance

Toolchains—a fresh wind in the sails of a new tech world

06 Oct, 2019

by Victor Rodriguez Bahena

This blog describes the latest features of GNU* Compiler Collection (GCC) and GNU C Library (GLIBC).

Introduction

The technology industry evolves rapidly. Every day, new mobile and cloud technologies are devised to solve challenging problems. All these software projects must be built in the first place and we must have tools for that. Compilers, assemblers, linkers, and libraries are some elements of the tool box needed by developers. The open source community behind the development of GNU toolchain projects uses the innovation to propel the core of the technology we use every day.

A broad range of open source projects are built using the GNU toolchains. To give some examples, the operating system kernel, image processing libraries, and web server-side scripting languages are built with GNU toolchains. Every year, new functional features, performance improvements, and security protections are released on the latest toolchains key projects: GNU Compiler Collection (GCC) and GNU C Library (GLIBC).

In the Clear Linux* Project, we decided to use and improve the latest GCC compiler technology to boost the performance and security of a Linux-based system for open source developers. We encourage users to employ the latest technologies that can improve applications for customers by boosting their performance and also providing a more robust layer of protection against security attacks.The following examples showcase some of these features.

GLIBC

getcpu wrapper function

One of the first features listed on the GNU C Library 2.29 release notes is the getcpu() wrapper function. The getcpu() function identifies the processor and node on which the calling thread or process is currently running. This functionality is shown in the following example:

#define _GNU_SOURCE
#include <stdio.h>
#include <sched.h>

int main(){
      unsigned int *cpu;
      unsigned int *node;
      int ret;

      ret = getcpu(cpu,node);
      printf("cpu : %d\n",(int *)cpu);
      printf("node : %d\n",(int *)node);
      return ret;
}

This functionality has been added to the GNU C Library since version 2.29. It was added to provide a fast way to identify on which node the current process is running. With this new GLIBC feature, is possible to take advantage of this CPU information at the user space level, to work on code optimizations on a multi-node NUMA (Non-uniform Memory Architecture) system.

Optimized mathematical functions

Another feature listed on the GNU C Library is a set of optimizations to the generic mathematical functions. One of these functions is sincosf(), which fulfills the need of several software applications that need sine and cosine of the same angle x. This function computes both at the same time, and stores the results in *sin and *cos pointers. An example of this is shown below:

     void sincosf(float x, float *sin, float *cos);

The core algorithm of this function is in s_sincosf.c file and syscall_polly.h where it uses the sincosf_poly function. This function computes the sine and cosine using the polynomial P algorithm [3]. There are some advantages to using the polynomial method. First, the memory requirements that are needed to implement such polynomials are quite small. Also, polynomials only require multiplication, addition, and subtraction of floating-point numbers that normally take very few CPU cycles for processors with floating point cores. The implementation of this algorithm for the function is part of the sincosf_poly.h

A series of patches were merged into the latest version of the GNU C Library. For x86_64 platforms, the patches prove the C Library was faster than s_sincosf-sse2.S, which was written in assembly. At the same time, a set of patches were integrated to update s_sincosf.h to use generic vector computations, and to use generic s_sincosf.c for s_sincosf-fma.c. An example of code using this function could be:

#define _GNU_SOURCE
#include <math.h>
#include <stdio.h>

int main(int argc, const char * argv[]){
    float value = 0.5;

    float _cosine;
    float _sine;

      sincosf(value, &_sine, &_cosine);

    printf("The sine of %f is %f\n", value,_sine);
    printf("The cosine of %f is %f\n", value,_cosine);

    return 0;
}

During the latest GLIBC release, there were several optimizations on mathematical functions such as exp, exp2, log, log2, pow, sinf, cosf, and tanf. This blog just presents the background of one of those optimizations. More information about others is public on the GLIBC repository.

Optimize string manipulation functions Intel® AVX2 technology

Applications often need to manipulate strings. To solve this, C/C++ supports a large number of string handling functions to reduce the complexity, error-prone and waste of time recreating popular functions. During the development of GLIBC 2.29 there were optimizations to x86-64 string functions such as strcat/strncat, strcpy/strncpy, and stpcpy/stpncpy by using Intel® Advanced Vector Extensions 2 (Intel® AVX2) technology. After this change, the functions use vector comparison as much as possible. In general, the larger the source string, the greater performance gain observed compared to SSE2 unaligned routines.

GCC

Every year, the Linux* community awaits the release of a new version of the GNU Compiler Collection (GCC). The GCC community works hard to provide usability improvements, bug fixes, new security features, and performance improvements.

The GCC 9 Release Series changes list includes a full list of changes, new features, and fixes for this release. This blog provides some code examples to show how to use some of the new compiler features

Improve diagnostics and debugging information

One of the new features is the improvement in diagnostics and debugging information provided to developers. GCC's diagnostics now print source code with a left margin showing code line numbers. For example, in default GCC9 if you compile code that is missing a semicolon ( ; ):

 1 #include <stdio.h>
 2
 3 int main(){
 4  int a = 0;
 5  printf("%d\n",a)
 6 }

You get this output:

diagnostic.c: In function ‘main’:
diagnostic.c:5:18: error: expected ‘;’ before ‘}’ token
    5 | printf("%d\n",a)
      |         ^
      |         ;
    6 | }
      |

The above example shows the exact line number where the error exists. If you don't want this information, you can disable line numbers with the flag -fno-d iagnostics-show-line-numbers:

 $ gcc diagnostic.c -o diagnostic -fno-diagnostics-show-line-numbers
diagnostic.c: In function ‘main’:
diagnostic.c:5:18: error: expected ‘;’ before ‘}’ token
  printf("%d\n",a)
                  ^
                  ;
 }
 ~

Another feature introduced for GCC 9 is the new option -fdiagnostics-format=json for emitting diagnostics in a machine-readable format. If you have only a few warnings, it is possible to handle them by displaying them on the screen. However if you have many, you might need to post-process them with scripts. Because of this, GCC also enables the flag -fdiagnostics-format=FORMAT, where it is possible to select a different format for printing diagnostics. FORMAT is ‘text’ or ‘json’. The next example prints the report in json format:

#include <stdio.h>

void foo(int a){
      a +10;
}

void main(){
      char *a;
      foo(a);
}

Compiling the example source code above generates this warning:

diagnostic.c: In function ‘main’:
diagnostic.c:8:6: warning: passing argument 1 of ‘foo’ makes integer from pointer without a cast [-Wint-conversion]
    8 | foo(a);
      |   ^
      |   |
      |   char *
diagnostic.c:3:14: note: expected ‘int’ but argument is of type ‘char *’
    3 | void foo(int a){
      |

Using the flag -fdiagnostics-format=json generates the next output:

gcc diagnostic.c -fdiagnostics-format=json
[{"kind": "warning", "option": "-Wint-conversion", "children": [{"kind": "note", "locations": [{"caret": {"line": 3, "file": "diagnostic.c", "column": 14}, "start": {"line": 3, "file": "diagnostic.c", "column": 10}}], "message": "expected ‘int’ but argument is of type ‘char *’"}], "locations": [{"caret": {"line": 8, "file": "diagnostic.c", "column": 6}, "label": "char *"}], "message": "passing argument 1 of ‘foo’ makes integer from pointer without a cast"}]

Which after opening with a json editor, could be seen as:

[ 
   { 
      "kind":"warning",
      "option":"-Wint-conversion",
      "children":[ 
         { 
            "kind":"note",
            "locations":[ 
               { 
                  "caret":{ 
                     "line":3,
                     "file":"diagnostic.c",
                     "column":14
                  },
                  "start":{ 
                     "line":3,
                     "file":"diagnostic.c",
                     "column":10
                  }
               }
            ],
            "message":"expected ‘int’ but argument is of type ‘char *’"
         }
      ],
      "locations":[ 
         { 
            "caret":{ 
               "line":8,
               "file":"diagnostic.c",
               "column":6
            },
            "label":"char *"
         }
      ],
      "message":"passing argument 1 of ‘foo’ makes integer from pointer without a cast"
   }
]

Information on inlining decisions

The new release of GCC also makes numerous improvements to the information provided by the -fopt-info flag. In the new log, messages are prefaced with optimized, missed, or note rather than the old behavior of all messages being prefixed with the same note label.

One example where the label optimization is being used to show information is inlining decisions. An inline decision is when the compiler decides to place a new copy of the function in each place it is called. By definition, Inline functions should be small so they can be substituted in the place of where its function call is made.

Let’s take the next simple code block as an example and see if the compiler made the inline optimization:

#include <stdio.h>
void foo( int * x ){
    *x = *x + 10;
}

int main() {
      int var = 0;
      for(int cnt=0;cnt<100;cnt++){
            foo(&var);
      }
      printf("var is now %d\n",var);
      return 0;
}

$ gcc inline.c -O2 -fopt-info-inline-all
inline.c:14:3: note: Considering inline candidate foo/11.
inline.c:14:3: optimized: Inlining foo/11 into main/12.
inline.c:17:2: missed: not inlinable: main/12 -> printf/13, function body not available

Unit growth for small function inlining: 16->16 (0%)

Inlined 1 calls, eliminated 0 functions

The compiler found the foo function was going to be called multiple times (in this case, 100 times) and decided to inline this function into the main. This is not the case for the printf that is marked as missed and cannot be inlined.

Another example of when label optimization is displayed on the log is vectorization. The output from the vectorizer has been rationalized, so failed attempts to vectorize a loop are displayed in the form shown below:

[LOOP-LOCATION]: couldn't vectorize this loop
[PROBLEM-LOCATION]: because of [REASON]

An example of this is shown in the simple code block below:

cat vect.c
#define MAX 1000000

int a[256], b[256], c[256];

void foo()
{
    int i,x;
    for (x=0; x<MAX; x++)
    {
        for (i=0; i<256; i++)
        {
            a[i] = b[i] + c[i];
        }
     }
}

int main()
{
    foo();
    return 0;
}

When you compile with the -O2 -ftree-vectorize -fopt-info-all-vec flags, all the debug information is displayed:

$ gcc -O2 -ftree-vectorize -fopt-info-all-vec vect.c
vect.c:8:5: missed: couldn't vectorize loop
vect.c:12:18: missed: not vectorized: complicated access pattern.
vect.c:10:9: optimized: loop vectorized using 16 byte vectors
vect.c:5:6: note: vectorized 1 loops in function.
vect.c:8:5: missed: couldn't vectorize loop
vect.c:12:18: missed: not vectorized: complicated access pattern.
vect.c:10:9: optimized: loop vectorized using 16 byte vectors
vect.c:17:5: note: vectorized 1 loops in function.

It is crucial for some applications to clearly see the internal steps the compiler takes for optimization. Because of this, a new option, -fsave-optimization-record was added, which writes a <SRCFILE>.opt-record.json.gz file describing the optimization decisions made by GCC. This is similar to the output of -fopt-info, but with additional metadata, such as the inlining chain and profile information (if available).

Using the previous example, the command is:

gcc -O2 -ftree-vectorize -fopt-info-all-vec vect.c -fsave-optimization-record

This generates the file:

vect.c.opt-record.json.gz

Improvements on code generation

GCC also has improvements related to code generation. One example is the improvement made to the switch statement conversion. In this new release of GCC, the switch statements can be translated to a linear function expression by the use of the -ftree-switch-conversion. This means that the compiler tries to find any linear function a * x + y that can apply to the given values on the switch. Let's take one of the examples proposed in the patch that introduced this change.

int foo (int how) {
  switch (how) {
    case 2: how = 205; break;
    case 3: how = 305; break;
    case 4: how = 405; break;
    case 5: how = 505; break;
    case 6: how = 605; break;
  }
  return how;
}

void main(){
      int var = 3;
      foo(var);
}

When we compile the code block below without the -ftree-switch-conversion flag, the foo function is generated as:

0000000000001129 <foo>:
    1129:   83 ff 06          cmp  $0x6,%edi
    112c:   77 34             ja  1162 <foo+0x39>
    112e:   89 fa             mov  %edi,%edx
    1130:   48 8d 0d cd 0e 00 00    lea  0xecd(%rip),%rcx    # 2004 <_IO_stdin_used+0x4>
    1137:   48 63 04 91       movslq (%rcx,%rdx,4),%rax
    113b:   48 01 c8          add  %rcx,%rax
    113e:   ff e0             jmpq *%rax
    1140:   b8 cd 00 00 00   mov  $0xcd,%eax
    1145:   c3         retq
    1146:   b8 31 01 00 00   mov  $0x131,%eax
    114b:   eb f8             jmp  1145 <foo+0x1c>
    114d:   b8 95 01 00 00   mov  $0x195,%eax
    1152:   eb f1             jmp  1145 <foo+0x1c>
    1154:   b8 f9 01 00 00   mov  $0x1f9,%eax
    1159:   eb ea             jmp  1145 <foo+0x1c>
    115b:   b8 5d 02 00 00   mov  $0x25d,%eax
    1160:   eb e3             jmp  1145 <foo+0x1c>
    1162:   89 f8             mov  %edi,%eax
    1164:   eb df             jmp  1145 <foo+0x1c>

However, when we compile with the flag (included in -O2 ) the objdump of the functions looks like:

0000000000001129 <foo>:
    1129:   89 f8             mov  %edi,%eax
    112b:   8d 57 fe          lea  -0x2(%rdi),%edx
    112e:   83 fa 04          cmp  $0x4,%edx
    1131:   77 06             ja  1139 <foo+0x10>
    1133:   6b c7 64          imul $0x64,%edi,%eax
    1136:   83 c0 05          add  $0x5,%eax
    1139:   c3         retq

The compiler took the switch statement and transformed it into 100 * how + 5 (for this example). This is the linear function expression for which the compiler was searching to optimize the generated code.

More examples like this are described in the GCC 9 Release Series changes list, git history logs, and community mailing list. Things like -flive-patching have been introduced in this release to provide a safe compilation for live-patching. The live-patching support gives developers control over the optimizations/behavior when compiling code for the context of applying it as a live patch.

Conclusion

With the early adoption of the latest GNU toolchain technologies, the Clear Linux Project sustains their leading-edge adoption of the latest open source technologies. Many of these new features allow developers to showcase the improved performance of their applications, especially for mathematical algorithms. At the same time, these new capabilities boost the developer experience with a detailed summary of the decisions being made by the compiler. The scalability and application of these new features are limited only by the imagination of world-wide developers.

Call to Action

Want to get involved?

Join the GCC GNU mailing list: https://gcc.gnu.org/ml/gcc/

Find out more about the Clear Linux Project

Site: clearlinux.org
Twitter: @clearlinux
Forum: community.clearlinux.org

References

[1] https://sourceware.org/ml/libc-announce/2019/msg00000.html

[2] https://www.phoronix.com/scan.php?page=news_item&px=Glibc-2.29-Released

[3] http://www.krisgarrett.net/upload/481408/documents/9F5ADB2DA8146659.pdf

Blogs & News