Function Multi-Versioning¶
In this tutorial, we will use FMV on general code and on FFT library code (FFTW). Upon completing the tutorial, you will be able to use this technology on your code and use the libraries to deploy architecture-based optimizations to your application code.
Description¶
CPU architectures often gain interesting new instructions as they evolve but application developers find it difficult to take advantage of those instructions. The reluctance to lose backward-compatibility is one of the main roadblocks slowing developers from using advancements in newer computing architectures. FMV, which first appeared in GCC 4.8, is a way to have multiple implementations of a function, each using different architecture-specialized instruction-set extensions. GCC 6 introduces changes to FMV to make it even easier to bring architecture-based optimizations to the application code.
Install and configure a Clear Linux OS host on bare metal¶
First, follow our guide to Install Clear Linux* OS from the live desktop. Once the bare metal installation and initial configuration are complete, add the desktop-dev bundle to the system. desktop-dev contains the necessary development tools like GCC and Perl*.
To install the bundles, run the following command in the
$HOME
directory:
sudo swupd bundle-add desktop-dev
Detect loop vectorization candidates¶
Now, we need to detect the loop vectorization candidates to be cloned for multiple platforms with FMV. As an example, we will use the following simple C code:
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <sys/time.h>
4 #define MAX 1000000
5
6 int a[256], b[256], c[256];
7
8 void foo(){
9 int i,x;
10 for (x=0; x<MAX; x++){
11 for (i=0; i<256; i++){
12 a[i] = b[i] + c[i];
13 }
14 }
15 }
16
17
18 int main(){
19 foo();
20 return 0;
21 }
Save the example code as example.c
in the current directory and
build with the following flags:
gcc -O3 -fopt-info-vec example.c -o example
The build generates the following output:
example.c:11:9: note: loop vectorized
example.c:11:9: note: loop vectorized
The output shows that line 11 is a good candidate for vectorization:
for (i=0; i<256; i++){
a[i] = b[i] + c[i];
Generate the FMV patch¶
To generate the FMV patch with the make-fmv-patch project, we must clone the project and generate a log file with the loop vectorized information:
git clone https://github.com/clearlinux/make-fmv-patch.git gcc -O3 -fopt-info-vec example.c -o example &> log
To generate the patch files, execute:
perl ./make-fmv-patch/make-fmv-patch.pl log .
The
make-fmv-patch.pl
script takes two arguments: <buildlog> and <sourcecode>. Replace <buildlog> and <sourcecode> with the proper values and execute:perl make-fmv-patch.pl <buildlog> <sourcecode>
The command generates the following
example.c.patch
patch:--- ./example.c 2017-09-27 16:05:42.279505430 +0000 +++ ./example.c~ 2017-09-27 16:19:11.691544026 +0000 @@ -5,6 +5,7 @@ int a[256], b[256], c[256]; +__attribute__((target_clones("avx2","arch=atom","default"))) void foo(){ int i,x; for (x=0; x<MAX; x++){
We recommend you use the
make-fmv-patch
script to add the attribute generating the target clones on the function foo. Thus, we can have the following code:#include <stdio.h> #include <stdlib.h> #include <sys/time.h> #define MAX 1000000 int a[256], b[256], c[256]; __attribute__((target_clones("avx2","arch=atom","default"))) void foo(){ int i,x; for (x=0; x<MAX; x++){ for (i=0; i<256; i++){ a[i] = b[i] + c[i]; } } } int main(){ foo(); return 0; }
Changing the value of the $avx2 variable, we can change the target clones when adding the patches or in the
make-fmv-patch.pl
script:my $avx2 = '__attribute__((target_clones("avx2","arch=atom","default")))'."\n";
Compile the code again with FMV and add the option to analyze the objdump log:
gcc -O3 example.c -o example -g objdump -S example | less
You can see the multiple clones of the foo function:
foo foo.avx2.0 foo.arch_atom.1
The cloned functions use AVX2 registers and vectorized instructions. To verify, enter the following commands:
vpaddd (%r8,%rax,1),%ymm0,%ymm0 vmovdqu %ymm0,(%rcx,%rax,1)
FFT project example using FFTW¶
To follow the same approach with a package like FFTW, use the -fopt-info-vec flag to get a build log file similar to:
~/make-fmv-patch/make-fmv-patch.pl results/build.log fftw-3.3.6-pl2/
patching fftw-3.3.6-pl2/libbench2/verify-lib.c @ lines (36 114 151 162 173 195 215 284)
patching fftw-3.3.6-pl2/tools/fftw-wisdom.c @ lines (150)
patching fftw-3.3.6-pl2/libbench2/speed.c @ lines (26)
patching fftw-3.3.6-pl2/tests/bench.c @ lines (27)
patching fftw-3.3.6-pl2/libbench2/util.c @ lines (181)
patching fftw-3.3.6-pl2/libbench2/problem.c @ lines (229)
patching fftw-3.3.6-pl2/tests/fftw-bench.c @ lines (101 147 162 249)
patching fftw-3.3.6-pl2/libbench2/mp.c @ lines (79 190 215)
patching fftw-3.3.6-pl2/libbench2/caset.c @ lines (5)
patching fftw-3.3.6-pl2/libbench2/verify-r2r.c @ lines (44 187 197 207 316 333 723)
For example, the fftw-3.3.6-pl2/tools/fftw-wisdom.c.patch
file
generates the following patches:
1 --- fftw-3.3.6-pl2/libbench2/verify-lib.c 2017-01-27 21:08:13.000000000 +0000
2 +++ fftw-3.3.6-pl2/libbench2/verify-lib.c~ 2017-09-27 17:49:21.913802006 +0000
3 @@ -33,6 +33,7 @@
4
5 double dmax(double x, double y) { return (x > y) ? x : y; }
6
7 +__attribute__((target_clones("avx2","arch=atom","default")))
8 static double aerror(C *a, C *b, int n)
9 {
10 if (n > 0) {
11 @@ -111,6 +112,7 @@
12 }
13
14 /* make array hermitian */
15 +__attribute__((target_clones("avx2","arch=atom","default")))
16 void mkhermitian(C *A, int rank, const bench_iodim *dim, int stride)
17 {
18 if (rank == 0)
19 @@ -148,6 +150,7 @@
20 }
21
22 /* C = A + B */
23 +__attribute__((target_clones("avx2","arch=atom","default")))
24 void aadd(C *c, C *a, C *b, int n)
25 {
26 int i;
27 @@ -159,6 +162,7 @@
28 }
29
30 /* C = A - B */
31 +__attribute__((target_clones("avx2","arch=atom","default")))
32 void asub(C *c, C *a, C *b, int n)
33 {
34 int i;
35 @@ -170,6 +174,7 @@
36 }
37
38 /* B = rotate left A (complex) */
39 +__attribute__((target_clones("avx2","arch=atom","default")))
40 void arol(C *b, C *a, int n, int nb, int na)
41 {
42 int i, ib, ia;
43 @@ -192,6 +197,7 @@
44 }
45 }
With these patches, we can select where to apply the FMV technology, which makes it even easier to bring architecture-based optimizations to application code.
Congratulations!
You have successfully installed an FMV development environment on Clear Linux OS. Furthermore, you used cutting edge compiler technology to improve the performance of your application based on Intel® architecture and profiling of the specific execution of your application.
Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries.