Deep Learning Reference Stack v6.0 Now Available

Beth Dean

20 Apr, 2020

Deep Learning Reference Stack v6.0 Now Available

Today, we’re pleased to announce the Deep Learning Reference Stack 6.0 release, incorporating customer feedback and delivering an enhanced user experience with support for expanded use cases. Similar to the Deep Learning Reference Stack 5.0, this release further incorporates enhancements of Natural Language Processing (NLP), which helps computers process and analyze large amounts of natural language data, among other features.

With this update, Intel further enables developers to quickly prototype and deploy DL workloads, reducing complexity while maintaining the ability to customize solutions. Among the features in this release:

  TensorFlow* 1.15.2 and TensorFlow* 2.2.0, an end-to-end open source platform for machine learning (ML).

  PyTorch* 1.4.0, an open source machine learning framework that accelerates the path from research prototyping to production deployment.

  PyTorch Lightning*, a lightweight wrapper for PyTorch designed to help researchers set up boilerplate state-of-the-art training.

  Transformers*, a state-of-the-art Natural Language Processing (NLP) for TensorFlow 2.0 and PyTorch.

  Intel® OpenVINO™ model server version 2020.1, delivering improved neural network performance on Intel processors, helping unlock cost-effective, real-time vision applications [1]

  Intel Deep Learning Boost (DL Boost) with AVX-512 Vector Neural Network Instruction (Intel AVX-512 VNNI) designed to accelerate deep neural network-based algorithms.

  Deep Learning Compilers (TVM* 0.6), an end-to-end compiler stack.

Benefits of the Deep Learning Reference Stack

With this release, Intel focused on incorporating Natural Language Processing (NLP) in the Deep Learning Reference Stack to demonstrate that pretrained language models can be used to achieve state-of-the-art results [2]. Libraries included in the stack for NLP can be used for Natural language processing, machine translation, and for  building embedding layers for transfer learning.

Kubeflow Pipelines*, a platform for building and deploying portable, scalable, machine learning (ML) workflows, are used for deployment of deep learning containerized images. This enables and simplifies orchestration of machine learning pipelines, making it easy for developers to use numerous use cases and applications on the Deep Learning Reference Stack.

We incorporated Transformers[3] library, a state-of-the-art general-purpose library that includes a number of pretrained models for Natural Language Understanding (NLU) and Natural Language Generation (NLG). This helps to seamlessly move from pre-trained or fine-tuned models to productization.

For the PyTorch-based Deep Learning Reference Stack, we also incorporated Flair [4] along with Transformers, a simple NLP library that allows developers to apply natural language processing (NLP) models to text, such as named entity recognition (NER), part-of-speech tagging (PoS), sense disambiguation, and classification.

In addition to the Docker deployment model, we integrated the Deep Learning Reference Stack with Function-as-a-Service (FaaS) technologies, which are scalable event-driven compute platforms. We created a Conditional Generative Adversarial Nets (cGAN) use case where Fn [5] and OpenFaaS [6] dynamically manage and deploy event-driven, independent inference functions on the Deep Learning Reference Stack based on machine resources.

Additionally, we provided simple and easy to use end-to-end use cases for the Deep Learning Reference Stack to help developers quickly prototype and bring up the stack in their environments. Some examples:

  Galaxies Identification: Demonstrates the use of the Deep Learning Reference Stack to detect and classify galaxies by their morphology using image processing and computer vision algorithms on Intel® Xeon® processors.

  Using AI to Help Save Lives: A Data Driven Approach for Intracranial Hemorrhage Detection: Focus on solving two problems facing the domains of medical diagnosis and artificial intelligence.  First, we designed an AI training pipeline to help detect intracranial hemorrhage (ICH), a serious condition often caused by traumatic brain injuries. ICH must be diagnosed and treated as quickly as possible to avoid disability or death of the patient [7]. Second, we tackled the complexity of creating an AI pipeline with multiple software frameworks, configurations, and dependencies. Our solution was to use the System Stacks for Linux* OS, a purpose-built collection of containers that provide integrated, and tuned AI frameworks.

We’ll unveil additional use cases targeting developer and service provider needs in the coming weeks.

  

  

 

 

 

 

 

 

This release also supports the latest versions of popular developer tools and frameworks:

  Operating System: Clear Linux* OS, an open source Linux* distribution.

  Orchestration: Kubernetes to manage and orchestrate containerized applications for multi-node clusters with Intel platform awareness.

  Containers: Docker Containers and Kata Containers with Intel® VT Technology for enhanced protection.

  Libraries: oneAPI Deep Neural Network Library (oneDNN), an open-source performance library for deep learning applications. The library includes basic building blocks for neural networks optimized for Intel Architecture Processors and Intel Processor Graphics.

  Runtimes: Python application and service execution support.

  Deployment: As mentioned earlier, Kubeflow Seldon*, and Kubeflow Pipelines* are used for the deployment of the Deep Learning Reference Stack.

  User Experience: Jupyter Hub*, a multi-user Hub that spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server.

Multiple layers of the Deep Learning Reference Stack are performance-tuned for Intel Architecture (IA), offering significant advantages over other stacks.

Performance gains for the Deep Learning Reference Stack with TensorFlow 2.2.0 and Inception-v4 as follows:

Second Generation Intel® Xeon® Scalable Platform –2 socket Intel® Xeon® Platinum 8280 Processor (2.7GHz, 28 cores), HT On, Turbo On, Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.02.01.0010.010620200716 (ucode:0x500002c), Clear Linux 32700, Kernel 5.5.13-924.native, Deep Learning Framework: TensorFlow* v2.2.0,Inception-v4 (https://github.com/IntelAI/models/tree/master/benchmarks/image_recognition/tensorflow/inceptionv4) ,Compiler: gcc v9.3.1, oneDNN version: v1.2.2, BS=32,64,128, synthetic data, 2 inference instance/2 socket, Datatype: INT8

Intel® Xeon® Scalable Platform –2 socket Intel® Xeon® Platinum 8180 Processor (2.5GHz, 28 cores), HT On, Turbo On, Total Memory 384 GB (12 slots/ 32GB/ 2666 MHz), BIOS: SE5C620.86B.02.01.0010.010620200716 (ucode:0x2000065), Clear Linux 32700, Kernel 5.5.13-924.native, Deep Learning Framework: TensorFlow* v2.2.0,Inception-v4 (https://github.com/IntelAI/models/tree/master/benchmarks/image_recognition/tensorflow/inceptionv4) ,Compiler: gcc v9.3.1, oneDNN version: v1.2.2, BS=32,64,128, synthetic data, 2 inference instance/2 socket, Datatype: INT8, FP32

 

Performance gains for the Deep Learning Reference Stack with PyTorch 1.4.0 and ResNet-50 v1.5 as follows:

 

Second Generation Intel® Xeon® Scalable Platform –2 socket Intel® Xeon® Platinum 8280 Processor (2.7GHz, 28 cores), HT On, Turbo On, Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.02.01.0010.010620200716 (ucode:0x500002c), Clear Linux 32700, Kernel 5.5.13-924.native, Deep Learning Framework: PyTorch* v1.4.0, ResNet-50 v1.5 (https://github.com/intel/optimized-models/tree/v1.0.10/pytorch),Compiler: gcc v9.3.1,oneDNN version: v0.21.1, BS=32,64,128, synthetic data, 2 inference instance/2 socket, Datatype: INT8

Intel® Xeon® Scalable Platform –2 socket Intel® Xeon® Platinum 8180 Processor (2.5GHz, 28 cores), HT On, Turbo On, Total Memory 384 GB (12 slots/ 32GB/ 2666 MHz), BIOS: SE5C620.86B.02.01.0010.010620200716 (ucode:0x2000065), Clear Linux 32700, Kernel 5.5.13-924.native, PyTorch* v1.4.0, ResNet-50 v1.5 (https://github.com/intel/optimized-models/tree/v1.0.10/pytorch),Compiler: gcc v9.3.1, oneDNN version: v0.21.1, BS=32,64,128, synthetic data, 2 inference instance/2 socket, Datatype: INT8, FP32

Intel will continue working to help ensure popular frameworks and topologies run best on Intel architecture, giving customers a choice in the right solution for their needs. We are using this stack to innovate on our current Intel® Xeon® Scalable processors and plan to continue performance optimizations for coming generations.

Visit the Clear Linux* Stacks page to learn more and download the Deep Learning Reference Stack code, and contribute feedback. As always, we welcome ideas for further enhancements through the stacks mailing list.

[1] https://software.intel.com/en-us/articles/OpenVINO-RelNotes

[2] https://ruder.io/nlp-imagenet/

[3] https://www.clearlinux.org/clear-linux-documentation/guides/stacks/dlrs.html#using-transformers-for-natural-language-processing

[4] https://github.com/flairNLP/flair

[5] https://github.com/intel/stacks-usecase/tree/master/pix2pix/fn

[6] https://github.com/intel/stacks-usecase/tree/master/pix2pix/openfaas

[7] Hssayeni, M. (2020). Computed Tomography Images for Intracranial Hemorrhage Detection and Segmentation (version 1.3.1). PhysioNet. https://doi.org/10.13026/4nae-zg36

 


Notices and Disclaimers

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information, visit www.intel.com/benchmarks.

Performance results are based on testing as of 04/06/2020 and may not reflect all publicly available security updates. No product or component can be absolutely secure.

System configuration

Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Intel technologies may require enabled hardware, software or service activation.

Notice Revision #20110804

Intel, the Intel logo, and Intel Xeon® are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

© Intel Corporation