No
Yes
View More
View Less
Working...
Close
OK
Cancel
Confirm
System Message
Delete
My Schedule
An unknown error has occurred and your request could not be completed. Please contact support.
Scheduled
Wait Listed
Personal Calendar
Speaking
Conference Event
Meeting
Interest
There aren't any available sessions at this time.
Conflict Found
This session is already scheduled at another time. Would you like to...
Loading...
Please enter a maximum of {0} characters.
{0} remaining of {1} character maximum.
Please enter a maximum of {0} words.
{0} remaining of {1} word maximum.
must be 50 characters or less.
must be 40 characters or less.
Session Summary
We were unable to load the map image.
This has not yet been assigned to a map.
Search Catalog
Reply
Replies ()
Search
New Post
Microblog
Microblog Thread
Post Reply
Post
Your session timed out.
This web page is not optimized for viewing on a mobile device.Visit this site in a desktop browser or download the mobile app to access the full set of features.
GTC 2018 Silicon Valley
Favorite
Remove from My Interests
Browse & Search for sessions, and click "Add to Schedule" to save sessions to your agenda.

Note sessions are first come, first serve on the day of the conference. Arrive early to the room for high priority sessions.

Sign-up is required for Conference + Training pass holders to reserve seats in Instructor-Led Labs.

Featured Sessions

TDLIW01 - Pre-GTC DLI Workshop: Fundamentals of Deep Learning for Computer Vision

Explore the fundamentals of deep learning by training neural networks and using results to improve performance and capabilities.

In this hands-on course, you’ll learn the basics of deep learning by training and deploying neural networks. You’ll learn how to:

  • Implement common deep learning workflows, such as image classification and object detection.
  • Experiment with data, training parameters, network structure, and other strategies to increase performance and capability.
  • Deploy your neural networks to start solving real-world problems.

Upon completion, you’ll be able to start solving problems on your own with deep learning. You will need to purchase a special pass to attend this full-day workshop.

See GTC Pricing for more information.

8 hours Pre-GTC DLI Workshops Mike Mendelson - Deep Learning Institute Curriculum Developer, NVIDIA
Favorite
TDLIW02 - Pre-GTC DLI Workshop: Fundamentals of Natural Language Processing

Pre-requisite: ‘Fundamentals of Deep Learning for Computer Vision’ or similar deep learning experience

In this course, you will receive hands-on training on the latest techniques for understanding textual input using Natural Language Processing. You’ll learn how to:

  • Classify words to accurately understand their meaning
  • Handle factual queries and their semantic meaning
  • Train Machine Translators from one language to another Upon completion of this course, you’ll be proficient in Natural Language Processing using neural networks in any application.

You will need to purchase a special pass to attend this full-day workshop.

See GTC Pricing for more information.

8 hours Pre-GTC DLI Workshops Charles Killam - Certified Instructor, NVIDIA
Favorite
TDLIW03 - Pre-GTC DLI Workshop: Perception for Autonomous Vehicles

Pre-requisite: ‘Fundamentals of Deep Learning for Computer Vision’ or similar deep learning experience

In this course, you’ll learn how to design, train, and deploy deep neural networks for autonomous vehicles using the NVIDIA DRIVE™ PX2 development platform. Learn how to:

  • Integrate sensor input using the DriveWorks software stack
  • Train a semantic segmentation neural network
  • Optimize, validate, and deploy a trained neural network using TensorRT Upon completion of this course, students will be able to create and optimize perception components for autonomous vehicles using DRIVE PX2.

You will need to purchase a special pass to attend this full-day workshop. See GTC Pricing for more information.

8 hours Pre-GTC DLI Workshops Aaraadhya Narra - Solutions Architect, DLI Certified Instructor, NVIDIA
Favorite
TDLIW04 - Pre-GTC DLI Workshop: Fundamentals of Accelerated Computing with CUDA C/C++

Pre-requisite: None

Duration: 8 hours

Format: Self-paced online or instructor-led

Languages: English

The CUDA computing platform enables the acceleration of CPU-only applications to run on the world's fastest massively parallel GPUs. Experience C/C++ application acceleration by:

  • Accelerating CPU-only applications to run their latent parallelism on GPUs
  • Utilizing essential CUDA memory management techniques to optimize accelerated applications
  • Exposing accelerated application potential for concurrency and exploiting it with CUDA streams
  • Leveraging command line and visual profiling to guide and check your work

Upon completion of this workshop, you'll be able to accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques. You’ll understand an iterative style of CUDA development that will allow you to ship accelerated applications fast.

See GTC Pricing for more information.

8 hours Pre-GTC DLI Workshops Joshua Wyatt - Content Developer, NVIDIA Deep Learning Institute, NVIDIA
Favorite
SE0000 - Welcome Reception

At this reception, meet NVIDIA staff and other GTC alumni to get tips, especially if you're a first-timer.

Special Event - 2 h Special Event
Favorite
SE0002 - Dinner with Strangers (Sun)

Join a random group of GTC attendees for enlightening conversations over a self-hosted dinner in great restaurants nearby. Less creepy than it sounds, this is one of the more popular programs at GTC.

Sign up in Main Lobby.

Special Event - 2 h Special Event
Favorite
CE8164 - Connect with the Experts: CUDA-based Raytracing and Rendering

We will answer your questions on the design and implementation of renderers based on raytracing using CUDA, and discuss how to get the best performance out of NVIDIA hardware in your renderer. 

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Carsten Waechter - Ray Tracing Software Architect, NVIDIA
Pascal Gautron - Senior Developer Technology Engineer, NVIDIA
Favorite
L8111A - Jetson Developer Tools Training Labs

This lab is focused on teaching you how to maximize the productivity when developing software for the Jetson platform. You will experience first hand how to manage source code on the host PC to cross-compile the software, initiate remote debugging sessions to debug CPU C/C++ and CUDA C code. Through a comprehensive set of exercises, you will also learn how to use the CUDA Visual Profiler for optimizing CUDA kernels, use the Tegra System Profiler for optimizing CPU code and tracing multi-process system-wide activities, and use Tegra Graphics Debugger for debugging and profiling 3D graphics applications. Prerequisites: Basic CUDA-C and C++ coding skills.

120 Minutes Instructor-Led Lab Sebastien Domine - VP SW Eng. Developer Tools, NVIDIA
Favorite
L8119 - Programming GPU-Accelerated OpenPOWER Systems with OpenACC

In this tutorial you will learn how to handle the massive computing performance offered by POWER systems with NVLink-attached GPUs – the technology also powering Sierra and Summit, two of the fastest supercomputers in the US. We will present the POWER architecture and highlight the available software stack, before we dive into programming the attached GPUs with OpenACC. By using real-world examples we will get to know the hardware architectures of both CPU and GPU and learn the most important OpenACC directives on the way. The resulting GPU-accelerated program can easily be used on other GPU-equipped machines and architectures, by nature of OpenACC's portable approach. The lab requires the attendees to bring their own laptop. We will work on IBM Minsky servers (POWER8 CPUs with P100 GPUs).

120 Minutes Instructor-Led Lab Andreas Herten - Post-Doctoral Researcher GPUs in HPC, Jülich Supercomputing Centre
Favorite
L8143 - Image Segmentation with TensorFlow

Prerequisites: Image Classification with DIGITS

Duration: 2 hours

Framework: Caffe with DIGITS and TensorRT

Image (or semantic) segmentation is the task of placing each pixel of an image into a specific class. In this lab, you'll segment MRI images to measure parts of the heart by:

• Comparing image segmentation with other computer vision problems

• Experimenting with TensorFlow tools such as TensorBoard and the TensorFlow Python API

• Learning to implement effective metrics for assessing model performance

Upon completion of this lab, you'll be able to set up most computer vision workflows using deep learning.

Presented by the NVIDIA Deep Learning Institute (DLI).

120 Minutes Instructor-Led Lab Mike Mendelson - Deep Learning Institute Curriculum Developer, NVIDIA
Favorite
L8152 - Medical Image Segmentation with DIGITS

Prerequisites: 'Fundamentals of Deep Learning with Computer Vision' or similar experience

Duration: 2 hours

Framework: Caffe

Image (or semantic) segmentation is the task of placing each pixel of an image into a specific class. In this lab, you'll segment MRI images to measure parts of the heart by:

• Extending Caffe with custom Python layers

• Implementing the process of transfer learning

• Creating fully convolutional neural networks from popular image classification networks

Upon completion, you'll be able to set up most computer vision workflows using deep learning.

Presented by the NVIDIA Deep Learning Institute (DLI).

120 Minutes Instructor-Led Lab Steven Steinke - Curriculum Developer, NVIDIA
Favorite
L8167 - Image Creation using Generative Adversarial Networks using TensorFlow and DIGITS This lab will guide you through the process of training a Generative Adversarial Network (GAN) to generate image contents in DIGITS. You'll learn how to: • Use Generative Adversarial Networks (GANs) to create handwritten numbers • Visualize the feature space and use attribute vector to generate image analogies • Train a GAN to generate images with set attributes Upon completion, you'll be able to use GANs to generate images by manipulating feature space. Prerequisites: Fundamentals of Deep Learning with Computer Vision or similar experience 120 Minutes Instructor-Led Lab Jonathan Bentz, NVIDIA
Favorite
S8225 - Sharing Physically Based Materials Between Renderers with MDL We'll discuss the basics of NVIDIA's Material Definition Language, showing how a single material can be used to define matching appearances between different renderers and rendering techniques. End users will learn how physically based definitions can be defined, while developers will learn what's entailed in supporting MDL within their own products or renderers. 50-minute Talk Jan Jordan - Software Product Manager MDL, NVIDIA
Lutz Kettner - Director, Rendering Software and Material Definition, NVIDIA
Favorite
S8236 - Singularity: Reproducible, Trusted Containers for Scientific Computing

Singularity is a container technology which is widely supported by HPC centers and service providers because it facilitates extreme mobility of compute via verifiable, trusted containers. This talk will cover a high level view of container computing and an introduction to Singularity, description of the Singularity Image Format (SIF), as well as technical recipes and usage examples with GPUs. After attending this talk, you will have a strong understanding of containerization and how to leverage this technology to create extremely reproducible workflows.

50-minute Talk Gregory Kurtzer - CEO, SyLabs
Favorite
S8250 - Maximizing The Power of GPU For Diverse Workloads of Enterprise Digital Workspaces On VMware vSphere

Enterprise Digital Workspaces support diverse workloads including virtual desktops, deep learning, big data. Nvidia GPUs bring high performance computing (HPC) for graphics, GPGPU, especially machine learning workloads. They also provide HW encode and decode to accelerate the processing of video contents. In this session, we will explore performance and resource utilization of various workloads that leverage different capabilities of GPU like graphics, compute and H.264 HW encode / decode. Nvidia virtualized GPUs and VMware vSphere brings in tremendous combined benefits for both GPU-based workloads and data center management via virtualization. We will present results of our research on running diverse workloads on vSphere platform using Nvidia GRID GPUs. We explore vSphere features of Suspend/Resume and vMotioning of vGPU based virtual machines. We will quantify benefits of vGPU for data center management using VMware vSphere and describe techniques for efficient management of workloads and datacenter resources.

50-minute Talk Uday Kurkure - Staff Engineer, VMware
Hari Sivaraman - Staff Engineer, VMware
Favorite
S8286 - Quick and Easy DL Workflow Proof of Concept Spin up a deep learning (DL) proof-of-concept on a budget. We'll walk you through a DL workflow in the cloud leveraging DIGITS, then download a trained model, and run inference on a Jetson TX2. This session considers multiple options such as Nimbix, AMI, and NGC on Tesla P100, Tesla V100, and NVIDIA DGX-1 servers. This tutorial will be a combination of lecture, live demos, and detailed instructions. 50-minute Talk Jeffrey Weiss - Director, Solution Architects, NVIDIA
Alec Gunny - Solutions Architect, NVIDIA
Kenneth Hester - Solution Architect, NVIDIA
Favorite
S8382 - Zero to GPU Hero with OpenACC GPUs are often the fastest way to obtain your scientific results, but many students and domain scientists don't know how to get started. In this tutorial we will take an application from simple, serial loops to a fully GPU-enabled application. Students will learn a profile-guided approach to accelerating applications, including how to find hotspots, how to use OpenACC to accelerated important regions of code, and how to get the best performance they can on GPUs. No prior experience in GPU-programming or OpenACC is required, but experience with C, C++, or Fortran is a must. Several books will be given away to attendees who complete this tutorial. 80 Minutes Tutorial Jeffrey Larkin - Senior DevTech Software Engineer, NVIDIA
Favorite
S8391 - Investigating Data Augmentation Strategies for Advancing Deep Learning Training We saw the huge success of the deep learning paradigm and the superhuman capability in numerous benchmarks in image, video, audio, or text. However, it poses huge challenges as adopting the methods in industrial applications (mainly due to the lack of quality tracking data) as the neural networks consume enormous parameters and require relatively huge quality training data. We'll aim for investigating the "data augmentation" strategies – increasing quality training data for robust inference – across different learning problems mainly in image, video, 3D, and IoT data streams. We'll first quantify the importance of training data for deep neural networks then review numerous strategies, such as crawling from the web, utilizing generative models, 3D computer graphics, augmented reality, engagement in social media, gaming, etc. We'll compare the effectiveness among the diverse strategies. As generally taking the data from other domains, we also need to deal with the cross-domain learning problem. We'll provide detailed insights from our recent work published in top conferences (e.g., CVPR, ICCV, AAAI, etc.) and those cases in industrial applications. 50-minute Talk Winston Hsu - Professor, National Taiwan University
Favorite
S8467 - Playing FPS Games with Deep Reinforcement Learning Advances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions. However, most of these games take place in 2D environments that are fully observable to the agent. We present the first architecture to tackle 3D environments in first-person shooter games that involve partially observable states. Typically, deep reinforcement learning methods only utilize visual input for training. We present a method to augment these models to exploit game feature information, such as the presence of enemies or items, during the training phase. Our model is trained to simultaneously learn these features along with minimizing a Q-learning objective, which is shown to dramatically improve the training speed and performance of our agent. Our architecture is also modularized to allow different models to be independently trained for different phases of the game. We show that the proposed architecture substantially outperforms built-in AI agents of the game as well as average humans in deathmatch scenarios. 25-minute Talk Devendra Singh Chaplot - Ph.D. student, Carnegie Mellon University
Favorite
S8483 - Empowering CUDA Developers with Virtual Desktops You've just been tasked with deploying the NVIDIA CUDA Toolkit to a group of developers. Wouldn't it be great if you could save time deploying it, protect the developers work, reduce the amount of unique workstation hardware needed, & get more out of your hardware investment? This session will show how this can be done with VMware Horizon Virtual Desktops leveraging vGPUs and the CUDA Toolkit. The CUDA Toolkit is a core component of most developer's desktops and provides the underpinnings for many development operations that take advantage of GPU technology. It can, and often is, difficult to install on Virtual Machines. We will walk through its deployment on Linux virtual machines, insuring requirements for both the CUDA Toolkit & VMware Horizon with vGPU are met. 50-minute Talk Tony Foster - Sr. Advisor, Technical Marketing Ready Bundles for HPC, Dell EMC
Favorite
S8512 - Accelerating Generative Design by Leveraging GPUs on the Cloud We'll walk through the use of GPU accelerated voxel-based stress solver in the level set topology optimization engine used for Autodesk Generative Design. We'll discuss how the solver benefits from executing on the GPU over our CPU implementation and why this is important from both a costing and efficiency standpoint. Autodesk has partnered closely with Amazon to deliver cloud-based simulation on their platform and we will talk about how we are driving GPU usage on the cloud and how we have used the nvidia-docker plugin for PCIe passthrough to run on Amazon's GPU compute systems. 50-minute Talk Jerran Schmidt - Design Engineer, Autodesk
Christopher Hebert - Developer Technology Engineer, NVIDIA
Favorite
S8586 - Writing Graph Primitives with Gunrock Learn how to use Gunrock, a state-of-the-art CUDA-based graph-processing library specifically designed for the GPU, to develop fast, efficient, and complex graph primitives. Gunrock achieves a balance between performance and expressiveness by coupling high-performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. Gunrock is a stable, powerful, and forward-looking substrate for GPU-based, graph-centric research and development. Like many graph frameworks, it leverages a bulk-synchronous programming model and targets iterative convergent graph computations. We believe that Gunrock offers both the best performance on GPU graph analytics as well as the widest range of primitives. 80 Minutes Tutorial Muhammad Osama - Graduate Researcher, University of California Davis
Favorite
S8587 - Recent Progress in Accelerating Monte Carlo Simulation on GPU for Pricing and Risk Management of Financial Instruments

Learn about recent progress in accelerating Monte Carlo simulation on the GPU in applications for pricing financial instruments and risk management. We'll focus on the forward Monte Carlo simulation, which allows for a natural parallelization across CUDA cores, and present a recent extension of our implementation to a broad selection of industry standard valuation models for different asset classes, including hybrid models that can be used to price multi-currency and multi-asset portfolios. Even with increasing complexity and dimensionality of valuation models, our benchmarks show stable GPU speedup factors in the ranges of 20x and 30x for calculations with floating point double precision FP64 and single precision FP32, respectively. We also briefly summarize a most recent research project on a more complex backward (/American / Least Squares) Monte Carlo simulation method, based on regression algorithms used to price general financial instruments with optionality. The latter method heavily relies on matrix calculations and benefits from using GPU- accelerated libraries, cuBLAS for linear algebra and cuSOLVER for solvers.

25-minute Talk Serguei Issakov - Global Head of Quantitative Research and Development, Senior Vice Pres, Numerix
Favorite
S8596 - Overcoming Missing Modalities in Remote Sensing

Recent advances in earth observation are opening up a new exciting area for exploration of satellite image data. We'll teach you how to analyse this new data source with deep neural networks. Focusing on emergency response, you will learn how to apply deep neural networks for semantic segmentation on satellite imagery. We will specifically focus on multimodal segmentation and the challenge of overcoming missing modality information during inference time. It is assumed that registrants are already familiar with fundamentals of deep neural networks.

25-minute Talk Damian Borth - Director, German Research Center for Artificial Intelligence (DFKI)
Benjamin Bischke - PhD Candidate, German Research Center for Artificial Intelligence (DFKI)
Favorite
S8660 - A Deep Neural Network for Estimating Depth from Stereo We present a deep neural network architecture for estimating 3D depth from stereo images. The network is modeled after computer vision stereo matching pipelines to simplify training process. Our loss function consists of a photometric loss term and Lidar based loss terms. This combination makes it possible to train our DNN in a supervised, semi-supervised and completely unsupervised way. Our DNN produces depth maps that have accuracy similar to Lidar based depth. We also compare our stereo DNN architecture to other stereo architectures as well as to a monocular depth DNN architecture. We demonstrate qualitative and quantitative test results. 50-minute Talk Nikolai Smolyanskiy - Principal Deep Learning and Computer Vision Engineer, NVIDIA
Alexey Kamenev - Senior Deep Learning and Computer Vision Engineer, NVIDIA
Favorite
S8666 - Deploying Autonomous Vehicles with NVIDIA DRIVE

DRIVE PX is an open platform for Autonomous Driving Ecosystem. It’s been adopted by over 300 partners in the automotive ecosystem to develop solutions for vehicles that are intelligent and autonomous. This talk will outline the technical challenges facing development of autonomous intelligent vehicles and provide details of how the next generation of DRIVE AI car computer i.e. DRIVE Xavier and DRIVE Pegasus address these challenges.

50-minute Talk Srikanth Sundaram - Senior Product Manager DRIVE PX 2, NVIDIA
Favorite
S8704 - NVIDIA IndeX - Advanced Large-Scale Data Visualizations on the NVIDIA GPU Cloud (NGC)

NVIDIA IndeX incorporates NVIDIA's hardware and software technology to enable interactive high-quality 3D visual exploration and real time evaluation of computed and simulated large data for a wide range of scientific fields: NVIDIA IndeX is deployed for DGX technology and can be made available as a container on the cloud, such as AWS or NGC. With NVIDIA IndeX scientists gain unique insights into unlimited size and complexity of 3D data and NV-IndeX's in-situ solution allows scientists envisioning remarkable new data simulation and visualization workflows. We present NVIDIA IndeX's CUDA programming interface for implementing novel visualization techniques, illustrates CUDA programs that produce various high-fidelity visualizations and demonstrates large-scale data visualization on the NVIDIA GPU Cloud based on custom visualization techniques.

25-minute Talk Marc Nienhaus - Sr. Manager Software Engineering, NVIDIA IndeX, NVIDIA
Alexander Kuhn - Senior Software Engineer, NVIDIA
Henning Lux - Senior Software Engineer, NVIDIA
Favorite
S8727 - Improving NAMD Performance on Volta GPUs In 2007, NAMD was the first full-featured production molecular dynamics software to use CUDA for accelerating its costliest computations. We'll describe our latest efforts, techniques, and results in our quest to optimize NAMD to make best use of the tremendous computational capabilities of state-of-the-art Volta GPUs, particularly in new dense node configurations such as the NVIDIA DGX and ORNL Summit systems that feature NVLink-connected GPUs. In existence now for over 20 years, NAMD is a sophisticated parallel molecular dynamics program. NAMD development has emphasized parallel scalability to support large-size and long-timescale biomolecular simulations running on petascale supercomputers. As GPU technology has evolved, NAMD has benefited from moving greater amounts of work to the GPU. NVIDIA's release of Volta has now shifted the balance almost entirely to the GPU, with the small remaining CPU calculations often posing bottlenecks to NAMD's performance. Attendees will learn optimization strategies and pitfalls for achieving higher performance as Amdahl's Law poses an ever increasing challenge for mature GPU-accelerated codes like NAMD. 50-minute Talk David Hardy - Research Programmer, University of Illinois at Urbana-Champaign
Ke Li - HPC Developer Technology Engineer, NVIDIA
John Stone - Senior Research Programmer, University of Illinois at Urbana Champaign
Favorite
S8782 - A Cross-Field VR Case Study to Treat Children with Autism Spectrum Disorder We build a contextualized learning system with realistic interaction for medical education. This system is intended to integrate Virtual Reality (VR) with the knowledge of occupational therapy, especially for autistic children. Our system supports a variety of scenes to facilitate training for children's confidence, adaptability and social ability. Adopting our system, the training content is no longer limited to the traditional treatment room. Clearly, therapist and children are able to save their arranging time and focus on immersive training. 25-minute Talk Huai-Sheng Huang - Assistant Professor, Fu Jen Catholic University - Department of Information Management
Favorite
S8823 - Latest Tools and Techniques for Training and Deploying Deep Neural Networks in Educational Environments

Craig Morioka, UCLA Adjunct Associate Professor of Radiological Sciences, and Dima Lituiev, Postdoctoral Scholar at the University of California San Francisco, Institute for Computational Health Sciences, will discuss how they empower their fellow faculty, staff, and students with the latest techniques in training and deploying deep neural networks through NVIDIA’s Deep Learning Institute (DLI) University Ambassador Program - a new AI and Deep Learning education enablement program for universities. This will include a dive into the benefits of an online learning platform, which uses GPUs in the cloud, by stepping through the DLI’s online Image Segmentation and Radiomics labs. The Image Segmentation lab leverages an example from medical image analysis where it is often important to separate pixels corresponding to different types of tissue or cells for the purposes of diagnostics and treatment planning. Dima uses image segmentation in his research to facilitate diagnostics of kidney rejection by analyzing histological slides from patients with kidney transplants. We will explore how the Tensorflow code is structured and how the Tensorboard tool can be used to visualize structure and training dynamics of segmentation models. The focus of the Radiomics lab is detection of the 1p19q co-deletion biomarker using deep learning - specifically convolutional neural networks – using the Keras and TensorFlow computing frameworks. Attendees will also learn how they can apply to become a DLI University Ambassador and bring the latest in Deep Learning and AI education to their academic communities.

 
50-minute Talk Joseph Bungo - Deep Learning Institute (DLI) Program Manager, NVIDIA
Dmytro Lituiev - Postdoctoral Research Fellow, UC Berkeley and UCSF
Craig Morioka - Adjunct Associate Professor of Radiological Sciences, UCLA
Favorite
S8873 - GBM Inferencing on GPU We'll present a novel GPU implementation for batched GBM inferencing. We'll also present detailed performance comparison of our implementation against the state-of-the-art libraries such as XGBoost and Treelite. We'll then compare inference performance on various real-world datasets. 50-minute Talk Shankara Rao Thejasw Nanditale - Compute Devtech Engineer, NVIDIA
Vinay Deshpande - Compute DevTech Engineer, NVIDIA
Favorite
S8891 - Computer-Augmented Healthcare: Opportunities and Challenges

The Role of Data in Achieving Precision and Value in Healthcare The goal of healthcare is to provide the most effective treatment to every patient in the most efficient way. Data plays a key role in every aspect of this process — from decision support systems that provide a clinician with the right information at the right time, to scheduling algorithms that predict patient flow and schedule accordingly, to analytics to coach and support patients in achieving or maintaining a healthy lifestyle. Achieving the vision of a data-informed healthcare system will require fundamental advances in many areas including causal inference, inference on complex, high-dimensional and heterogeneous data, missing data, process modeling, bias reduction, statistical validation, and model adaptation, to name a few. In this talk, I will illustrate some of these challenges through concrete examples within the Malone Center.

25-minute Talk Gregory Hager - Professor and Director, The Malone Center for Engineering in Healthcare, Johns Hopkins University
Favorite
S8964 - Sensing Technologies for an Autonomous Tomorrow (Presented by Analog Devices)

The future of autonomous transport is upon us. In order to provide safe, reliable transport for all, it is essential to have the most accurate, real time 3D map around the vehicle. The 360 degree safety shield created using radar, LIDAR, cameras, and IMUs make up the perception sensor suite is the foundation to making this a reality. Data from high performance imaging radar, LIDAR, and cameras are fused together giving the vehicle it's sense of sight, whereas the IMU gives the vehicle is sense of feeling, while also ensuring it maintains its heading. The large amount of data generated from Analog Devices' Drive360 sensors will require high performance AI computers in the vehicle such as NVIDIA's Drive Pegasus to generate the real time 3D map. Together, Analog Devices & NVIDIA can enable safe, reliable autonomous transportation for all.

25-minute Talk Chris Jacobs - VP, Autonomous Transportation & Automotive Safety, Analog Devices
Favorite
S8979 - An Introduction to CUDA Programming Session 1 of 4 (Presented by Acceleware)

Join us for an informative introduction to CUDA programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!

80 Minutes Tutorial Dan Cyca - Chief Technology Officer, Acceleware
Chris Mason - Technical Product Manager, Acceleware
Favorite
S81028 - Earth Observation From Space: Deep Learning based Satellite Image Analysis

Learn how recent advances in Earth observation are opening up a new exciting area for exploration of satellite image data with deep learning. Focusing on real-world scenarios, we will teach you how to analyze this exciting remote sensing data source with deep neural networks. An automated satellite image understanding is of high interest for various research fields and industry sectors such as the insurance, agriculture or investing industry. You will learn how to apply deep neural networks in natural disaster situations and for the classification of land-use, land-cover and building types.

25-minute Talk Patrick Helber - PhD candidate, German Research Center for Artificial Intelligence
Favorite
S8122 - Dissecting the Volta GPU Architecture through Microbenchmarking

We'll present the architectural details of the Volta GPU discovered via our micro-benchmarks and reveal the geometry and latency of Volta's complex memory hierarchy, the format of its encoded instructions, and the latency of commonly used instructions. The knowledge being shared enables developers to craft better optimized code than what is currently possible through publicly available information and tool chains.

25-minute Talk Zhe Jia - R&D Engineer, Citadel Securities
Favorite
S8743 - Deep Learning for Locomotion Animation We'll examine tools and technologies that NVIDIA's GameWorks team is building to leverage the power of deep learning for content creation, demonstrating recent research in ways that neural networks can be used to generate realistic looking human animation. We'll talk about how to apply GPUs for high-performance runtime inferencing of these networks for use in games or real-time VFX scenarios. 25-minute Talk Gavriel State - Senior Director, System Software, NVIDIA
Favorite
S8934 - Stitching 8K Video in the Cloud with Pixvana SPIN Studio and VRWorks At Pixvana, we are building a video creation and delivery platform for the emerging mediums of virtual, and mixed reality (XR). Pixvana SPIN Studio is built on a cloud media processing system using AWS and Azure GPU instances to create and deliver high-quality VR video. In this talk, Sean Safreed and Paul Barsic will discuss the new cloud-based stitching module built on Nvidia VR Works and running on CUDA/Linux. The talk will introduce the architecture for cloud stitching, interaction with AWS and Azure and dive into the end-to-end functions used to go from camera source to final 360 VR media. 25-minute Talk Sean Safreed - Co Founder and CPO, Pixvana
Favorite
S8956 - Interactive Visualization of Massive Geoscience Datasets DecisionSpace® Geosciences (DSG) delivers a collaborative geoscience interpretation environment with integration across multi-domain workflows and data types. In this session, we will discuss our approach to integrating NVIDIA IndeX with DSG. This integration provides a scalable visualization solution capable of interactively rendering and manipulating massive geoscience datasets such as terabyte size seismic volumes and horizons. We will also discuss our usage of the latest features of IndeX to provide advanced visualization capabilities. 25-minute Talk Venkat Viswanathan - Development Manager, Platform & Visualization, Halliburton-Landmark
Favorite
S8963 - How Will Machine Learning and Artificial Intelligence Change the Practice of Healthcare

This session will give an overview of new methods that leverage machine learning and causal inference to enable reliable individualized decision-making. We will present applications in different areas of healthcare where real-time inference is changing the practice of medicine. The latter also gives rise to new challenges in developing human-machine collaborative systems.

25-minute Talk Suchi Saria - John C. Malone Assistant Professor, Johns Hopkins University
Favorite
CE8120 - Connect with the Experts: Data Analytics and Machine Learning

Join us in the hangout area to get your technical questions about optimizing data analytics pipelines and machine learning algorithms answered by NVIDIA experts. Learn about the latest capabilities to accelerate entire data analytics pipelines from databases to analytic algorithms, machine learning, and graph analytics. How can GPUs excel at data intensive workloads like complex data analytics tasks? By example, we will demonstrate how to accelerate critical components, covering benchmarks, tools, frameworks, etc. Related presentations: S8289 - How to Get the Most out of GPU Accelerated Database Operators S8417 - Breaking the Speed of Interconnect with Compression for Database Applications S8502 - GOAI One Year Later

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Michael Wendt - Manager, Applied Engineering Solutions, NVIDIA
Keith Kraus - Senior Engineer, NVIDIA
Andrey Adinets - Developer, NVIDIA
Levs Dolgovs - Developer, NVIDIA
Nikolay Sakharnykh - Sr. Developer Technology Engineer, NVIDIA
Favorite
CE8126 - Connect with the Experts: Deep Learning Basics

Attend this session to get your questions on deep learning basics and concepts answered. NVIDIA experts can help you with the fundamentals and provide guidance on how and when to apply Deep Learning and GPUs to your work. No question is too basic to ask.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject.  

1 Hour Connect with the Experts Rajan Arora - Solution Architect, NVIDIA
Xuan Vinh Nguyen, NVIDIA
Robert Crovella - SA Mgr., NVIDIA
Favorite
CE8135 - Connect with the Experts: CUDA Libraries

CUDA libraries accelerate AI and HPC applications, and span across deep learning, linear algebra, signal processing and core math. Stop by to chat with NVIDIA experts whether you are a beginner with "how-to" questions or a CUDA ninja and want to dive deep into strategies to speed up your applications.

1 Hour Connect with the Experts Lung-Sheng Chien - Software Engineer, NVIDIA
Lukasz Ligowski, NVIDIA
Harun Bayraktar, NVIDIA
Steven Rennich, NVIDIA
Murat Guney - AI devtech engineer, NVIDIA
Favorite
S81014 - Advancing State-of-the-Art of Autonomous Vehicles and Robotics Research using AWS GPU Instances (Presented by Amazon Web Services)

Toyota Research Institute's (TRI) mission is to improve the quality of human life through advances in artificial intelligence, automated driving, and robotics. Learn more about their research and how they are using AWS EC2 P3 instances, industry's most powerful GPUs instances, in combination with other AWS services to enable autonomous vehicles and robots at scale.

50-minute Talk Chetan Kapoor - Senior Product Manager - EC2, Amazon Web Services
Adrien Gaidon - Machine Learning Lead, Toyota Research Institute
Mike Garrison - Senior Infrastructure Engineer, Toyota Research Institute
Favorite
S8200 - Domain Adaptation Using Adversarial Training for Semantic Segmentation and Caption Style Transfer We'll introduce the basic concept of domain adaptation and how to use adversarial training to achieve unsupervised domain adaptation. We'll then describe how the technique is used in two tasks: improving semantic segmentation across cities, and transferring language style for image captioning. In particular, we combine domain adaptation with policy gradient-based reinforcement learning approach to transfer language style. The details and results of both tasks are published in ICCV 2017. 25-minute Talk Min Sun - Assistant Professor, National Tsing Hua University
Favorite
S8216 - Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image Learn how to predict a dense depth image from a sparse set of depth measurements and a single RGB image. This approach can be applied to serve as a plug-in module in simultaneous localization and mapping to convert sparse maps to dense maps, and as a super-resolution of LiDAR depth data. We'll describe the performance of our prediction method, explain how to train the depth prediction network, and showcase examples of its applications. Codes and video demonstration are also publicly available. This session is for registrants who are already familiar with basic machine learning techniques. 25-minute Talk Fangchang Ma - Ph.D. Candidate, Massachusetts Institute of Technology
Favorite
S8227 - Integrating the NVIDIA Material Definition Language MDL in Your Application The NVIDIA MDL SDK provides a rich toolset to integrate MDL in a wide range of renderers, from physically based ray tracing to real-time applications. In this tutorial-like session, we'll show how MDL materials and texturing functions can be compiled for OptiX/CUDA, x86, and OpenGL target platforms. We'll present how the MDL Distiller can be used to simplify MDL materials for use with real-time rendering solutions. Developers will learn about the available APIs and example code. 50-minute Talk Sandra Pappenguth - Senior Software Engineer, NVIDIA
Matthias Raab - Senior Graphics Software Engineer, NVIDIA
Favorite
S8264 - Practical Applications of Virtual Reality in Architecture We'll provide an overview of various VR delivery methods, including software, like Enscape and Fuzor, as well as hardware like Oculus Rift and HTC Vive. Perhaps more important than the software and hardware, we'll discuss the dynamic experienced within multi-discipline design teams and with clients. We'll talk about public meetings and client feedback. All this is made possible using NVIDIA Quadro P5000 and P6000 graphics cards. 50-minute Talk Daniel Stine - VDC/BIM Administrator, LHB
Favorite
S8343 - Detection of Financial Statement Fraud using Deep Autoencoder Networks

Explore how auditors are applying deep learning to detect "anomalous" records in large volumes of accounting data. The Association of Certified Fraud Examiners estimates in its Global Fraud Study 2016 that the typical organization loses 5% of its annual revenues due to fraud. At the same time, organizations accelerate the digitization of business processes affecting Enterprise Resource Planning (ERP) systems. These systems collect vast quantities of electronic journal entry data in general- and sub-ledger accounts at an almost atomic level. To conduct fraud, perpetrators need to deviate from regular system usage or posting pattern. This deviation will be weakly recorded and reflected accordingly by a very limited number of "anomalous" journal entries of an organization. To anomalous journal entries, several deep auto-encoder networks are trained using NVIDIA's DGX-1 system. The empirical evaluation on two real-world accounting datasets underpinned the effectiveness of the trained network in capturing journal entries highly relevant for a detailed audit while outperforming several baseline methods.

25-minute Talk Marco Schreyer - Researcher, German Research Center for Artificial Intelligence
Timur Sattarov - Forensic Data Analyst, PricewaterhouseCoopers GmbH WPG
Favorite
S8380 - Image Data Augmentation on GPU: One Method That Does It All Data augmentation is an effective method to boost your deep-learning training performance. There are many ways of doing this augmentation, and the ways to do so are not well established, and not all deep learning frameworks support augmentation natively. We present a method of doing data augmentation that is based on transformation matrices to perturb both space and color, in a way that is easy to use and understand, framework-agnostic, and fast (runs on GPU). This method works especially well for augmentations that need to be applied to both images and labels, typical in object detection and segmentation tasks. Image augmentation is a job that GPU's excel at, and it will significantly reduce the load, and need, for a fast CPU. 25-minute Talk Tim Zaman - Software Engineer, NVIDIA
Favorite
S8399 - Driver Drowsiness Detection for ADAS We'll present an in-car ADAS technology to detect drowsy driving. This technique can be used to alert and awaken the driver, or take corrective actions if required. We employ a CNN-based approach for this technique, which is trained on a mix of synthetic and real images. We'll cover the details of the detection system pipeline and the synthetic dataset generation. We'll also show a demonstration of the detection system in action. 25-minute Talk Sidharth Varier - Senior System Software Engineer, NVIDIA
Favorite
S8504 - Creating Immersive AI-Powered Virtual Reality Simulation Training For Medical Professionals Experiential learning is among the best ways to practice for pediatric emergencies. However, hospitals are spending millions on expensive and inefficient mannequin-based training that does not consistently offer an authentic experience for med students and doctors, or offer convenient repeatability. Come hear about a groundbreaking pilot program that brought together a hospital and two unique VR and AI developer teams to deliver virtual reality training simulations for some of the most high stakes emergencies hospitals see: pediatric trauma. Learn how doctors aided in the design process to create authentic trauma room scenarios; how expert content and simulation developers crafted a VR experience that would have impact in a world where there is no room for error; and why Oculus supported this project with funding and hardware. 25-minute Talk Shauna Heller - President, North America, AiSolve
Favorite
S8608 - A Low-Latency Inference System for Recurrent Neural Networks We'll present cellular batching, which is a new way of performing batching on GPUs to accelerate model inference for recurrent neural networks (RNNs). Existing deep learning systems perform batching by collecting a fixed set of input samples and fusing their underlying dataflow graphs together for execution. This approach does not perform well for RNNs with input-dependent dataflow graphs. We propose cellular batching, which can significantly improve both the latency and throughput of RNN inference. Cellular batching performs batching at the granularity of an RNN "cell'' -- a subgraph with shared weights -- and dynamically assembles a batched block for execution as requests join and leave the system. We show that this new way of batching can reduce the inference latency by 50 to 90 percent, while also increasing the throughput by 10 to 200 percent. 50-minute Talk Jinyang Li - Associate Professor, New York University
Favorite
S8669 - Deep Learning Demystified

What is Deep Learning? In what fields is it useful? How does it relate to artificial intelligence? We'll discuss  deep learning and why this powerful new technology is getting so much attention, learn how deep neural networks are trained to perform tasks with super-human accuracy, and the challenges organizations face in adopting this new approach. We'll also cover some of the best practices, software, hardware, and training resources that many organizations are using to overcome these challenges and deliver breakthrough results.

50-minute Talk William Ramey - Director, Developer Programs, NVIDIA
Favorite
S8681 - Design Empowered by the 3DEXPERIENCE and NVIDIA with Advanced Visualization and VR Experiences

Through the 3DEXPERIENCE platform, Dassault Systèmes provides a paradigm shift in design with a major step forward for designers and engineers in the design creativity and decision workflow. By providing built-in and native high-end real-time and rendering visualization plus VR immersive experience within the 3D design environment of CATIA, designers can now access to a new level of creativity, by combining 3D modeling, visualization, and VR together. We'll cover how CATIA Design solutions on the 3DEXPERIENCE platform used the latest technologies for real-time visualization and VR immersive experience to create, collaborate, and experience 3D product design on native and massive models.

25-minute Talk Stephan Ritz - CATIA Design, Product Experience Roles Portfolio Director, Dassault Systemes
Favorite
S8689 - In-situ Visualization for Novel Earth System Modeling Framework using NVIDIA IndeX ParaView Plugin and Catalyst

We'll cover a novel application of the recently developed state-of-art modeling framework to integrate in-situ visualization and data analysis approach with a model coupling framework. The modeling framework uses NVIDIA IndeX to get more insight about the vast amount of data through in-situ visualizations. Enabling analysis of fast-moving processes and their evolution in both time and space support better understanding of underplaying physical mechanisms. The designed system also provides scalable, real-time visualization and computing of multi-valued volumetric data by integrating Earth System Modeling Framework (ESMF) and NVIDIA IndeX to create easy-to-use, efficient, generic, and standardized modeling environment for earth system science applications.

25-minute Talk Mahendra Roopa - Software Product Manager, NVIDIA
Ufuk Utku Turuncoglu - Assistant Professor, Istanbul Technical University
Favorite
S8715 - Reinforcement Learning for Multiplayer Agents at SEED

Over the last couple of years, neural nets have enabled significant breakthroughs in computer vision, voice generation and recognition, translation, and self-driving cars. Neural nets will also be a powerful enabler for future game development. We'll give an overview of the potential of neural nets in game development, as well as provide an in-depth look at how we can use neural nets combined with reinforcement learning for new types of game AI.  We will also show some new exciting results from applying deep reinforcement learning to AAA games.

50-minute Talk Magnus Nordin - Technical Director, Electronic Arts / SEED
Favorite
S8750 - Porting VASP to GPUs with OpenACC VASP is a software package for atomic-scale materials modeling. It's one of the most widely used codes for electronic-structure calculations and first-principles molecular dynamics. We'll give an overview and status of porting VASP to GPUs with OpenACC. Parts of VASP were previously ported to CUDA C with good speed-ups on GPUs, but also with an increase in the maintenance workload as VASP is otherwise written wholly in Fortran. We'll discuss OpenACC performance relative to CUDA, the impact of OpenACC on VASP code maintenance, and challenges encountered in the port related to management of aggregate data structures. Finally, we'll discuss possible future solutions for data management that would simplify both new development and maintenance of VASP and similar large production applications on GPUs. 50-minute Talk Markus Wetzstein - HPC DevTech Engineer, NVIDIA
Stefan Maintz - DevTech Engineer, NVIDIA
Favorite
S8806 - NVIDIA vGPU and Red Hat Virtualization: High End Virtual Workstations A shared physical graphics processor unit (GPU) exposed to virtual guests as a virtual GPU drastically changes the dynamics of what is possible from both a technical and monetary standpoint in high tech virtual workstations. You are able to run lots of GPU based workloads in multiple VMs on one host utilizing NVIDIA Tesla cards. Attendees will learn about vGPU technology, Virtual Function IO (VFIO) and associated roadmaps. 25-minute Talk Andre Beausoleil - Senior Principal Partner Manager, Red Hat
Favorite
S8881A - NVIDIA Vulkan 2018 Update

Two years after release, Vulkan is a mature and full-featured low-level graphics API, with significant adoption in the developer community.

NVIDIA will present a status update on our Vulkan software stack. We will cover latest Vulkan developments, including extensions, software libraries and tools. We will also cover best practices and lessons learned from our own work with the Vulkan API in the past year.

50-minute Talk Nuno Raposo Subtil - Senior Software Engineer, NVIDIA
Favorite
S8895 - A Component-Based AI Engine Platform for Medical Workflow

As deep learning techniques have been applied to the field of healthcare, more and more AI-based medical systems continue to come forth, which are accompanied by new heterogeneity, complexity and security risks. In the real-world we've seen this sort of situation lead to demand constraints, hindering AI applications development in China's hospitals. First, we'll share our experience in building a unified GPU accelerated AI engine system to feed component-based functionality into the existing workflow of clinical routine and medical imaging. Then, we'll demonstrate in a pipeline of integrating the different types of AI applications (detecting lung cancer, predicting childhood respiratory disease and estimating bone age) as microservice to medical station, CDSS, PACS and HIS system to support medical decision-making of local clinicians. On this basis, we'll describe the purpose of establishing an open and unified, standardized, legal cooperation framework to help AI participants to enter the market in China to build collaborative ecology.

25-minute Talk Xu Chen - Director of AI Research, Winning Health
Favorite
S8215 - Displaying and Interacting with Desktop Apps in VR Displaying traditional desktop applications in virtual reality requires techniques to overcome the limited resolution of current displays while simultaneously taking advantage of the 360 real estate. Interacting with these applications is helped with the use of gestures using the controllers and hands. We'll go over the use of mixed reality for easier keyboard typing when necessary, general safety, and finding things around, such as cables, chairs, and coffee. All techniques described are implemented and available in the commercially available software, called VR Toolbox. 25-minute Talk Rouslan Dimitrov - Programmer, VR Toolbox
Favorite
S8399b - Driver Drowsiness Detection for ADAS (2)

We'll present an in-car ADAS technology to detect drowsy driving. This technique can be used to alert and awaken the driver, or take corrective actions if required. We employ a CNN-based approach for this technique, which is trained on a mix of synthetic and real images. We'll cover the details of the detection system pipeline and the synthetic dataset generation. We'll also show a demonstration of the detection system in action.

25-minute Talk Sidharth Varier - Senior System Software Engineer, NVIDIA
Favorite
S8416 - Real-Time Inference of Deep Generative Models on Tegra X1 at Traffic Intersections

Detecting objects, whether they're pedestrians, bicyclists, or other vehicles, at a traffic intersection is essential to ensure efficient traffic flow and the safety of all participants. We'll present an experiment to assess training and real-time inference of a NVIDIA Tegra X1 SoC module with a suite of GigE Flea3 Point Grey cameras installed on a vehicle. The system is to be trained using a subset of data collected from different types of busy intersections on a university campus and testing is to be done on the remaining data. We'll use a deep generative model that can learn and reconstruct the traffic scene. We'll share our CUDA optimization strategies on the Tegra X1 and the real-time performance of the inference model.

25-minute Talk Menna El-Shaer - Doctoral Student/Researcher, The Ohio State University
Favorite
S8425 - Deep Learning for Surface Reconstruction

We'll present deep learning algorithm in reconstructing the surfaces from massive data points. The deep learning consists of multiple layers in organizing the neurons for optimal neighborhood representations. The implementations are done by slicing into half the standard self organizing map (SOM) network to form multiple layers. The Z-axis distance is omitted in the computation of neighborhood distance when updating the weighted neurons to avoid surface points discontinuity due the layers depth. In this scenario, the distance determining the winning node is computed using 2D calculation from four directions. As the layers increase, the complexity computations arise, and the processing power should increase as well. Thus, we implement CUDA programming to update the weights and distance of the winning node. Reduction techniques are implemented to obtain the smallest distance for the winning node. For weight updating process, each thread is given several nodes to calculate the distance between the winning node and the current node. Two parts are involved in designing and developing the algorithms: point reduction and point optimization for surface reconstruction.

50-minute Talk Siti Mariyam Shamsuddin - Director, UTM Big Data Centre, Universiti Teknologi Malaysia
Shafaatunnur Hasan - Senior Lecturer and GPU Principal Research, UTM Big Data Centre, Universiti Teknologi Malaysia
Favorite
S8426 - Mixing NVIDIA Virtual GPU Solutions and Computational Loads - An Introduction

A guest of this session will learn about considerations regarding mixing VDI and computational workloads on commercially available hypervisors such as VMware vSphere and the limitations of doing so. Current GPU capabilities will be discussed as applied to various industries.

25-minute Talk Eric Kana - Senior Solution Architect, NVIDIA
Favorite
S8450 - VTK-m: Moving HPC Scientific Visualization Forward The VTK-m project is a library to enable scientific visualization algorithms across a range of GPU's and Accelerator's and CPU's. VTK-m is designed around fine-grained concurrency and provides a flexible data and execution models. The abstraction between the low-level hardware architectures and the data parallel high-level code, allows for algorithms to be designed independently of where they are going to be executed. We will cover not only the status of the existing VTK-m algorithms, and the supporting architecture, but will talk about the unique challenges and solutions for writing performance oriented code that targets multiple hardware architectures. 25-minute Talk Robert Maynard - Principal Engineer, Kitware Inc
Favorite
S8476 - Accelerating Graph Algorithms for Government and Industry We'll discuss our efforts regarding the acceleration of large-scale graph algorithms in the context of projects funded by various government agencies. Graph methods are key kernels for large-scale data analytics, as well as for several exascale application domains, including smart grids, computational biology, computational chemistry, and climate science. We'll present our latest results on distributed implementations employing GPUs and accelerators of graph kernels, such as community detection and B-matching, showing how we can tackle large-scale problems with heterogeneous supercomputers. On the basis of the experience and results in optimizing these algorithms for high performance computing platforms, we'll then discuss new requirements, upcoming opportunities, and potential solution for next-generation, high-performance, integrated graph toolkits. 50-minute Talk Antonino Tumeo - Senior Research Scientist, Pacific Northwest National Laboratory
Mahantesh Halappanavar - Senior Research Scientist, Pacific Northwest National Laboratory
Favorite
S8497 - Inside NVIDIA GPU Cloud Deep Learning Framework Containers In this technical deep dive, get an in-depth look at the deep learning containers on NVIDIA GPU Cloud (NGC) and learn how they can simplify your AI projects. NVIDIA pre-integrates and optimizes the top deep learning frameworks such as TensorFlow, PyTorch, and MXNet, and makes them available on NVIDIA GPU Cloud, removing time consuming do-it-yourself software integration. We'll look at the NVIDIA framework optimizations, such as reducing GPU memory overhead, improving multi-GPU scaling, and reducing latency. And we'll talk about the integration of runtimes and drivers in the containers to ensure the correct versions of software are working together for peak performance. You'll leave with an understanding of what make an NVIDIA GPU-optimized deep learning container tick. 25-minute Talk Christopher Lamb - Senior Director, CUDA & Cloud Computing Software, NVIDIA
John Barco - Sr. Director, NVIDIA GPU Cloud, NVIDIA
Favorite
S8528 - Accelerating Bioinformatics: End-to-End Computation of NASA GeneLab Data with GPU Data Frame

Protecting crew health is a critical concern for NASA in preparation of long duration, deep-space missions like Mars. Spaceflight is known to affect immune cells. Splenic B-cells decrease during spaceflight and in ground-based physiological models. The key technical innovation presented by our work is end-to-end computation on the GPU with the GPU Data Frame (GDF), running on the DGXStation, to accelerate the integration of immunoglobulin gene-segments, junctional regions, and modifications that contribute to cellular specificity and diversity. Study results are applicable to understanding processes that induce immunosuppression—like cancer therapy, AIDS, and stressful environments here on earth.

25-minute Talk Venkat Krishnamurthy - Head, Product Management, MapD Technologies
Jacqueline Cenci-McGrody - Solutions Architect (Partner SA), NVIDIA
Favorite
S8580 - Modernizing OpenMP for an Accelerated World OpenMP has come a long way in its first 20 years, but the last few have brought by far the most change. With accelerated computing on the rise, OpenMP integrated features to address distributed memory devices and offloading to accelerators. Now, as we prepare for the next generation of supercomputers and GPUs, OpenMP is growing to meet the challenges of productively programming scientific applications in a world of accelerators, unified memory, and explicitly hierarchical memories. This talk will discuss the present and future of OpenMP as we ramp up to version 5.0, presenting some of the new features incorporated so far and how they are shaped by and in turn how they shape large scale scientific applications. 25-minute Talk Bronis de Supinski - Chief Technology Officer for Livermore Computing, Lawrence Livermore National Laboratory
Tom Scogland - Computer Scientist, Lawrence Livermore National Laboratory
Favorite
S8638 - Make Yield Curve Construction More Intelligent with GPU

The yield curve provides the information of bonds' returns of various maturities, and reflects extremely complex market interactions and monetary policy. The yield curve constructing models, such as the Spline Fitting Model, use a number of bond sample points and model parameters to deduce the yield curve. It involves repeated experiments by choosing appropriate bond samples which rely highly on manual operation. Due to the amount of relevant information and rapid growth of transaction data, this task becomes even more challenging. Some literatures show that deep learning can detect and exploit interactions in the data that are, invisible to any existing financial economic theory. By discovering latent patterns in historical data, it can be a good supplement for choosing active samples and assessing curve's quality. In financial applications, accuracy and speed are both of critical importance. The GPU is applied to both deep learning framework and yield curve construction. Intelligent, fast and accurate, our yield curve construction framework achieves 5x speed up vs manual operation, and provides a feasible way for future practice.

25-minute Talk Joe Zhang - Project Manager, Shanghai Clearing House
Favorite
S8725 - MDL & Substance in Automotive We'll present how Substance Suite helps Color & Trim Designer and Visualization expert for digital material creation. We will specifically focus on material scanning pipeline including AxF format from XRite, Tac7 material scanner and MDL material description in Substance Designer. We will finally highlight for the first time the "Automotive Drop" in Substance Source that will be announced at GTC and will include a wide range of advanced quality materials for this industry including leather, textiles, metal and plastics. 25-minute Talk Nicolas Paulhac - Color, Material and Finish designer, Substance Source product manager, Allegorithmic
Jerome Derel - Chief Product Officer, Allegorithmic
Favorite
S8792 - Geometry-Aware Learning of Maps for Camera Localization

Maps are a key component in image-based camera localization and visual SLAM systems: they are used to establish geometric constraints between images, correct drift in relative pose estimation, and relocalize cameras after lost tracking. The exact definitions of maps, however, are often application-specific and hand-crafted for different scenarios (e.g., 3D landmarks, lines, planes, bags of visual words). We propose to represent maps as a deep neural net called MapNet, which enables learning a data-driven map representation. Unlike prior work on learning maps, MapNet exploits cheap and ubiquitous sensory inputs like visual odometry and GPS in addition to images and fuses them together for camera localization. Geometric constraints expressed by these inputs, which have traditionally been used in bundle adjustment or pose-graph optimization, are formulated as loss terms in MapNet training and also used during inference. In addition to directly improving localization accuracy, this allows us to update the MapNet (i.e., maps) in a self-supervised manner using additional unlabeled video sequences from the scene.

25-minute Talk Jinwei Gu - Senior Research Scientist, NVIDIA
Favorite
S8976 - Create Customer Value with Google Cloud AI (Presented by Google)

In this session, you will learn how Google Cloud helps enterprises make the most out of data, and deliver customer value. We will provide an in-depth overview of the Cloud AI and Data Analytics offering that helps enterprises manage their ML lifecycle, from data ingestion to insights and prediction. We will also demonstrate some breakthrough solutions, like AutoML, that are making ML accessible to everyone.

50-minute Talk Chris Kleban - Product Manager, GPUs on Google Cloud, Google Inc.
Favorite
S8980 - An Introduction to the GPU Memory Model - Session 2 of 4 (Presented by Acceleware)

Explore the memory model of the GPU! This session will begin with an essential overview of the GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. Features such as shared memory configurations and Read-Only Data Cache are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!

80 Minutes Tutorial Dan Cyca - Chief Technology Officer, Acceleware
Chris Mason - Technical Product Manager, Acceleware
Favorite
CE8111 - Connect with the Experts: The Convergence of High Performance Computing and Artificial Intelligence

In this connect with the expert session, we will discuss ways that Deep Learning and Artificial Intelligence can be combined with traditional large scale simulations to accelerate the pace of scientific discovery from High Energy Physics to Life Sciences and Healthcare. Traditional approaches have large scale simulation at the core and data analytics are used on the edges for pre- and post-processing of the data. In more recent approaches, AI and large scale simulations are applied on a cooperative basis where the strengths of each converge to form a powerful new tool for science. Both approaches can be discussed in this session.

Connect directly with NVIDIA engineers and experts from other organizations on specific topics. Come on in and ask a question.

1 Hour Connect with the Experts Christoph Angerer - Senior Developer Technology Engineer, NVIDIA
Favorite
CE8123 - Connect with the Experts: NVIDIA Deep Learning Institute

Want to know how to get started with hands-on training in AI and deep learning? The NVIDIA Deep Learning Institute offers online self-paced courses and instructor-led workshops for developers and data scientists around the world. Join this session to learn more about DLI offerings and get your technical questions answered by our certified instructors and content developers.

1 Hour Connect with the Experts Kelvin Lwin - Certified Instructors, NVIDIA
Joseph Bungo - Deep Learning Institute (DLI) Program Manager, NVIDIA
Steven Steinke - Curriculum Developer, NVIDIA
Favorite
CE8173 - Connect with the Experts: HPC Visualization using NVIDIA IndeX

Attend this session to learn about HPC visualization using the NVIDIA IndeX for NGC and NVIDIA IndeX for ParaView Plugin containers on NVIDIA GPU Cloud. This session presents latest features and solutions for HPC visualization using NVIDIA IndeX.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject.  

1 Hour Connect with the Experts Marc Nienhaus - Sr. Manager Software Engineering, NVIDIA IndeX, NVIDIA
Favorite
S8123 - Finance - Parallel Processing for Derivative Pricing

We'll discuss how massive parallel processing on GPU's can be exploited for pricing financial derivatives. GPU solutions for pricing options enable the development and implementation of more complex, robust financial models that capture the behavior of stock price volatility observed in financial markets. A model is developed to incorporate both stochastic volatility and simultaneous jumps in price and volatility, and is used to price S&P 500 index options (SPX) and S&P 500 ETF options (SPY). The model is based on empirical analysis of the time series for the S&P 500 index and its volatility index, VIX. Numerical solutions, both Monte Carlo simulation and finite difference, are developed and programmed in CUDA C. The compute times are reduced significantly with the GPU solutions, and time tests versus CPU-only implementations show significant reductions with speedup factors between 35 and 500. With GPU solutions, one can use more complex models to price and risk manage an option portfolio.

25-minute Talk Louis Scott - Officer, Federal Reserve Bank of New York
Favorite
S8173 - Continuous Delivery of AI Applications

Deep learning systems are usually developed by data scientists, who are good at mathematics and computer science. But to deploy and operationalize these models for broader use, you need the devops mindset and tools. We'll show you how to connect the workflow between the data scientists and devops. We'll also explore basic continuous integration and delivery concepts and how they can be applied to deep learning models. Using a number of AWS services, we'll showcase how you can take the output of a deep learning model and deploy it to perform predictions in real time with low latency and high availability. In particular, we'll showcase the ease of deploying DL to predict functions using Apache MXNet (a deep learning library), Amazon ECS, Amazon S3, and Amazon ECR, Amazon developer tools, and AWS CloudFormation.

25-minute Talk Asif Khan - Tech Leader, Amazon
Favorite
S8210 - Using NVIDIA VRWorks to Optimize Warehouse-Scale VR Experiences Using NVIDIA VRWorks, the team at VRstudios was able to optimize Terminal 17, their flagship 8-player, 30 minute warehouse scale title, to run at 90 fps. In this talk, we describe the process and challenges related to optimization of Terminal 17. 25-minute Talk James Kelly - VRcade Product Manager, VRstudios, Inc.
Favorite
S8273 - Programming GPU Supercomputers Ten Years From Now We'll briefly review how programming for GPU computing has progressed over the past ten years, and where it is going over the next ten years, specifically for data management and parallel compute management. CUDA languages expose all aspects of data and compute management, allowing and sometimes requiring programmers to take control of both. Libraries typically internalize all compute management, and some internalize all data management as well. Directives virtualize both data and compute management, but don't completely hide either. Future hardware and software capabilities will allow programs to enjoy automatic data movement between DDR memory and GPU device memory, and enhanced caching hardware reduces the need for explicit scratchpad memory programming. As parallel constructs are added to standard programming languages, writing parallel programs for GPU computing will become no more or less difficult than multicore programming. 25-minute Talk Michael Wolfe - Compiler Engineer, NVIDIA
Favorite
S8314 - Multi GPU Programming with MPI Learn how to program multi-GPU systems or GPU clusters using the message passing interface (MPI) and OpenACC or NVIDIA CUDA. We'll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we'll cover advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. We'll also cover the latest improvements with CUDA-aware MPI, interaction with unified memory, the multi-process service (MPS, aka Hyper-Q for MPI), and MPI support in NVIDIA performance analysis tools. 50-minute Talk Jiri Kraus - Senior Devtech Compute, NVIDIA
Favorite
S8446 - Porting Quantum ESPRESSO's PWscf Solver to GPUs with CUDA Fortran Learn how to effectively leverage CUDA Fortran to port scientific applications written in Fortran to GPUs. We'll present in detail the porting effort of Quantum ESPRESSO's Plane-Wave Self-Consistent Field (PWscf) solver, from profiling and identifying time-consuming procedures to performance analysis of the GPU-accelerated solver on several benchmark problems on systems ranging in size from small workstations to large distributed GPU clusters. We'll highlight several tools available in CUDA Fortran to accomplish this, from high-level CUF kernel directives to lower level kernel programming, and provide guidance and best practices in several use cases with detailed examples. 50-minute Talk Everett Phillips - Senior Applied Engineer, NVIDIA
Filippo Spiga - Ph.D., Cambridge University
Joshua Romero - Developer Technology Engineer, NVIDIA
Favorite
S8524 - Leveling Up to Autonomous Design

Data fuels so much of our lives. It accelerates our conversations, our decisions, our very ideas. And in the physical world, data is already acting as an accelerant to how we take these ideas and make them real. As the things we make become increasingly connected, our world becomes increasingly computable. And anything that becomes easily computable, becomes equally mutable. What does this mean for the world we live in? As we let go of our design tools and hand more control to intelligent algorithms, we'll see this reflected in the real world: the world of self-driving everything.

25-minute Talk Radha Mistry - Story Strategist, Autodesk
Favorite
S8532 - Cascaded 3D Fully Convolutional Networks for Medical Image Segmentation

We'll show how recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. FCNs can be trained to automatically segment 3D medical images, such as computed tomography (CT) scans based on manually annotated anatomies like organs and vessels. The presented methods achieve competitive segmentation results while avoiding the need for handcrafting features or training class-specific models, in a clinical setting. We'll explain a two-stage, coarse-to-fine approach that will first use a 3D FCN based on the 3D U-Net architecture to roughly define a candidate region. This candidate region will then serve as input to a second 3D FCN to do a fine prediction. This cascaded approach reduces the number of voxels the second FCN has to classify to around 10 percent of the original 3D medical image, and therefore allows it to focus on more detailed segmentation of the organs and vessels. Our experiments will illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results on many datasets. Code and trained models will be made available.

25-minute Talk Holger Roth - Assistant Professor (Research), Nagoya University
Favorite
S8561 - "Free" In Situ Volume Compression Using NVENC Scientific simulations typically store just a small fraction of their computed timesteps--as few as one in 500--due to I/O and storage limitations. Previous work has demonstrated in situ software-based compression, but at the cost of compute cycles that simulation scientists are loath to sacrifice. We propose the use of the special-purpose video processing unit (VPU), currently unutilized in the HPC context, for inexpensive lossy encoding. Our work demonstrates that video encoding quality is suitable for volumes and recommends encoder settings. We'll show that data can be encoded in parallel with a hybrid CPU/GPU computation with minimal impact on run time. We'll also demonstrate that decoding is fast enough for on-the-fly decompression during analysis. 25-minute Talk Nick Leaf - Graduate Student Researcher, University of California, Davis
Favorite
S8601 - NVIDIA GPU Video Technologies and Video Codec SDK: Updates and Roadmap NVIDIA's video SDK is a set of APIs for hardware-accelerated video encoding and decoding using NVIDIA GPUs. We'll provide an overview of the APIs, with particular emphasis on the latest features, such as FFmpeg support of NVIDIA-accelerated transcoding, quality and performance enhancements. We'll discuss some strategies on efficient usage of GPU video hardware acceleration for use cases such as video inferencing, transcoding, and media archiving. 50-minute Talk Abhijit Patait - Director, System Software, NVIDIA
Favorite
S8617 - Deep Generative Modeling for Speech Synthesis and Sensor Data Augmentation We'll discuss how we could use deep generative modeling in two application domains; in speech synthesis, and in sensor data modeling. We'll give an overview of what generative modeling is and how it could be used for practical AI tasks through these examples. We'll also give a flavor of latent space methods, which we can use to learn more about our data so as to transform them in meaningful ways, with uses in both reconstruction and in generation. 50-minute Talk Praveen Narayanan - Research Scientist, Ford Motor Company
Favorite
S8642 - HPC in Containers - Why Containers, Why HPC, How and Why NVIDIA Are you wondering whether the cloud is relevant to HPC and how it works? Increasingly, applications in high-performance computing are using containers to ease deployment. In this talk, you'll learn what containers are, how they are orchestrated to run together in the cloud, and how communication among containers works. You'll get a snapshot of current support from the ecosystem, and gain insight into why NVIDIA is leading the charge to provide best performance and usability. 50-minute Talk Christopher Newburn - Principal HPC Architect, NVIDIA
Favorite
S8675 - Latest Innovations In Graphics Virtualization With Citrix and NVIDIA The number of virtualized workloads that require hardware acceleration of graphics continues to grow in the data center. From the most demanding 3D applications used by designers and engineers to the rich media content in web browsers and office productivity applications used by power users and knowledge workers. Learn about the latest innovations from Citrix and NVIDIA that continue to push the limits on what is possible with graphics virtualization. 25-minute Talk Allen Furmanski - Senior Product Marketing Manager, Citrix Systems
James Hsu - Technology Integrations and Technical Solutions Development for Window, Citrix Systems
Favorite
S8734 - Production-Level Performance Capture Using Deep Convolutional Neural Networks We'll present a machine learning solution that enables cost-efficient creation of large amounts of high-quality facial animation for digital doubles in games. Remedy Entertainment, NVIDIA, and the University of Southern California recently published "Production-Level Facial Performance Capture Using Deep Convolutional Neural Networks" as part of the Symposium on Computer Animation. We'll cover topics including recording a facial animation dataset for an actor, setting up a deep learning project, preprocessing the data, training a deep convolutional neural network, and evaluating the results. We'll also present a summary of the findings and discuss potential future work. 50-minute Talk Antti Herva - Lead Character Technical Artist, Remedy Games
Favorite
S8796 - Deep Neural Network-Based Cooperative Visual Tracking Through Multiple Flying Robots Human and animal full-body motion capture (MoCap) in outdoor scenarios is a challenging and largely unsolved problem. We'll introduce a multiple flying robots-based solution for it. MoCap systems like Vicon, Optitrack, and the 4D Dynamic Body Scanner at MPI-IS Tuebingen achieve high degrees of accuracy in indoor settings. Besides being bulky, they make use of reflected infrared light and heavily rely on precisely calibrated wall or ceiling-mounted fixed cameras. Consequently, such systems cannot be used to perform MoCap in outdoor scenarios where changing ambient light conditions persist and permanent fixtures in the environment cannot be made. Our outdoor MoCap solution involves flying robots with on-board cameras, Intel i7 CPUs, NVIDIA Jetson TX1 GPU modules, and a deep learning-based approach. 50-minute Talk Aamir Ahmad - Research Scientist, Max Planck Institute for Intelligent Systems
Eric Price - PhD Student, Max Planck Institute for Intelligent Systems
Favorite
S8822 - Optimizing NMT with TensorRT OpenNMT is an open source neural machine translation and neural machine sequencing model. Using Volta Tensor Cores and TensorRT, we're able to improve performance by 100 times over CPU implementation. We'll discuss OpenNMT and how we implement it via TensorRT. We'll show how by using our plugin interface and new TensorRT features, we're able to implement this network at high performance. 50-minute Talk Micah Villmow - Senior Deep Learning Software Engineer, NVIDIA
Favorite
SE00001 - Lunch

Join us for Lunch in the South Hall.

Special Event - 2 h Special Event
Favorite
S8177 - Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices We'll present Hornet, formerly known as cuSTINGER, a data structure designed for sparse dynamic graphs and matrices. Hornet scales to massive datasets while supporting very fast updates, over 200 million updates per second on a single Tesla P100 GPU. We'll show that replacing CSR, a popular data structure for sparse data, with Hornet does not change the execution time. We'll also show that the memory utilization of Hornet is within that of CSR and COO, and briefly show performance results of several analytics using Hornet. We'll cover the programming model for Hornet in a separate talk. 25-minute Talk Oded Green - Research Scientist, Georgia Institute of Technology
Favorite
S8207 - Demystifying the Available Codec Options for your NVIDIA Virtual GPU Deployments

We're sure you heard about Citrix's HDX and VMware's Blast Extreme protocol. Maybe you know about different codecs like H.264, H.265/HEVC, VP9, AV-1 or MJPEG and 2DRLE. We'd like to give you some insights what codec technology can be used in which remoting protocol and what you can expect in regards to density, image quality and granularity when configuring these codecs. What do you think is better, Adaptive Display V2 or Full screen H.264. Better use YUV 4:2:0 or YUV 4:4:4 for H.264 ? PCoIP or Blast Extreme ? Should you use NVenc or not, what options available in VDI and RDSH, on Kepler, Maxwell and Pascal ? You probably want to ask what's recommended to use ? As always: It depends. So please join our session to learn more and discuss about the pros & cons of the available technologies and how you can make the best out of it four YOUR deployment

25-minute Talk Simon Schaber - GRID Solution Architect, NVIDIA
Ronald Grass - Sr. Sales Engineer, Citrix Systems GmbH
Favorite
S8303 - Intelligent Talent Management - AI Drives Transformation Artificial intelligence helps you hire faster and smarter. It also helps you determine your career path, learning, and development. Wondering how? AI platforms have a brain that reads, understands, and analyzes just as human beings do. They can read thousands and millions of resumes, JDs, career progressions, and learning content in a matter of seconds. This equips them with intelligence creating a neural network of skills, demographics, industries, occupations, and courses/certifications. This acts as the central intelligence powering search and match algorithms to find accurate matches to job demands in a few seconds. The NLP layer helps understand intent, for example, it differentiates between 'Worked with a PM' and 'Worked as a PM' to determine that the former could work collaboratively and the latter could drive projects. AI platforms mimic a recruiter or hiring manager's brain to find that right match. What takes HR 20-30 days is done in a few seconds by an AI platform. It helps HR leaders in workforce planning by forecasting what skills and domains to invest, maintain, or upgrade in their organizations, which could be a game changer especially for people-centric organizations. 25-minute Talk Arjun Pratap - CEO, AVR EdGE Networks Pvt. Ltd.
Favorite
S8324 - Synthetic Data Generation for an All-in-One Driver Monitoring System Driver monitoring systems are used to detect many driver attributes like gaze, head pose, eye openness, and other features pertaining to attention and assistance. We'll present a synthetic method of generating data for training DNNs, which caters to the above mentioned features of the subject. We use blender for generating synthetic images, powered by NVIDIA GPUs, which can be scaled to match training needs. Synthetic data generatation allows precise control over data points that are difficult to control in a real environment, like pupil dialation. This approach avoids noisy measurements and results in high accuracy without the need for a high-precision 3D sensor. 25-minute Talk Sagar Bhokre - Senior System Software Engineer, NVIDIA
Favorite
S8355 - How To Train and Execute a Deep Learning Model Able to Re-identify and Extract Attributes from Humans

We'll present a deep learning system able to decide if two people are similar or not. This system use the global appearance of a person, not just the face, to perform the re-identification. Our system also provides attributes (top color, bottom color, genre, length of the clothes, and the hair). We'll describe how to train a system with tensorflow on a GPU cluster and how to use it on a global video analysis system running on GPU devices.

25-minute Talk Matthieu Ospici - AI Engineer, Atos
Favorite
S8389 - Multi-Resolution 3D-Convolutional Neural Network for Object Recognition Voxelized representation of 3D objects is commonly used for training 3D-Convolutional Neural Networks for object detection and classification. However, high-resolution voxelization of CAD models are memory intensive and hence, it is not possible to load multiple models in the GPU for training. We have developed a GPU-accelerated voxelization technique that generates multi-level voxel grids of 3D objects. Instead of creating a single high-resolution voxel grid for the whole object, this technique generates selective region-based high-resolution voxel grids to represent detailed features in the object. We have also developed a multi-resolution 3D-Convolutional Neural Network that uses this hybrid voxelization for accurate object recognition and classification. 25-minute Talk Adarsh Krishnamurthy - Assistant Professor, Iowa State University
Sambit Ghadai - Graduate Student, Iowa State University
Favorite
S8492 - Fire Simulation & Visualization at a Nuclear Power Plant using GPUs See how computational risk analysis is aided by the means of advanced visualization and simulation techniques of a fire propagating in a nuclear plant room by means NVIDIA's GPU based GVDB library. Dive into the visualization techniques used in the process for performing analysis to design & incorporate additional safety measures for Nuclear Plants. 25-minute Talk Ramprasad Sampath - Director of R&D, Centroid LAB
Favorite
S8650 - Cadillac in VR "Cadillac in VR" is the premiere VR showroom experience. In our presentation we want to highlight the needs Cadillac came to us with, our approach for creating this experience, key challenges we faced during development, our final results, and what this might mean for the future of car buying. The needs we discuss will involve key points of change in the automotive industry and how Cadillac wanted to adapt to those changes. Our approach will touch on how we established our underlying philosophy which guided our decision making process throughout development. Following that, we will dive deeper into the technical challenges we faced while developing the experience. The environment, level of detail, lighting, UX/UI, and hardware are all key areas of discussion. We hope to have someone on stage at this point with the experience running to further add emphasis and clarification. Finally, we'll cover how all this came together in our final product and where we think it might take the future of buying a car. 25-minute Talk Mike Konchalski - Director of Technology, All Things Media
Favorite
S8778 - Real Time and Dynamic Risk Assessments for Autonomous Vehicles

Incorporating high fidelity or HD map data and real time traffic data such as speeds and congestion patterns into risk assessments, particularly for ADAS and highly autonomous vehicle operation, is mostly uncharted territory. We'll explore this domain by deploying data tools and techniques that are the intersection of automotive, deep learning, and insurance industries.

25-minute Talk Sunil Chintakindi - Director, Product Engineering and Data Research, Allstate
Favorite
S8805 - Managing Memory of Complex Aggregate Data Structures in OpenACC It is extremely challenging to move data between host and device memories when deep nested complex aggregate data structures are commonly used in an application. This talk will bring users diving into VASP, ICON, and other real-world applications and see how the deep copy issue is solved in these real-world applications with PGI compiler and OpenACC APIs. The OpenACC 2.6 specification includes directives and rules that enable programmer-controlled manual deep copy, albeit in a form that can be intrusive in terms of the number of directives required. The OpenACC committee is designing new directives to extend explicit data management to aggregate data structures in a form that is more elegant and concise. The talk will also cover comparison of unified memory, manual deepcopy, full deepcopy, and true deepcopy. 25-minute Talk Xiaonan Tian - GPU Compiler Engineer, NVIDIA
Favorite
S8919 - Medical Imaging with TensorFlow

Dive in to recent work in medical imaging, where TensorFlow is used to spot cancerous cells in gigapixel images, and helps physicians to diagnose disease. During this talk, we'll introduce concepts in Deep Learning, and show concrete code examples you can use to train your own models. In addition to the technology, we'll cover problem solving process of thoughtfully applying it to solve a meaningful problem. We'll close with our favorite educational resources you can use to learn more about TensorFlow.

25-minute Talk Josh Gordon - Developer Advocate for TensorFlow, Google
Favorite
S8924 - Block-Sparse Recurrent Neural Networks Recurrent neural networks are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modeling. Sparsity is a technique to reduce compute and memory requirements of deep learning models. Sparse RNNs are easier to deploy on devices and high-end server processors. Even though sparse operations need less compute and memory relative to their dense counterparts, the speed-up observed by using sparse operations is less than expected on different hardware platforms. To address this issue, we prune blocks of weights in a layer instead of individual weights. Using these techniques, we can create block-sparse RNNs with sparsity ranging from 80% to 90% with a small loss in accuracy. This technique allows us to reduce the model size by 10x. Additionally, we can prune a larger dense network to recover this loss in accuracy while maintaining high block sparsity and reducing the overall parameter count. Our technique works with a variety of block sizes up to 32x32. Block-sparse RNNs eliminate overheads related to data storage and irregular memory accesses while increasing hardware efficiency compared to unstructured sparsity. 25-minute Talk Eric Undersander - Research Engineer, Baidu USA
Sharan Narang - Systems Researcher, Baidu USA
Favorite
L8111B - Jetson Developer Tools Training Labs - Repeat

This lab is focused on teaching you how to maximize the productivity when developing software for the Jetson platform. You will experience first hand how to manage source code on the host PC to cross-compile the software, initiate remote debugging sessions to debug CPU C/C++ and CUDA C code. Through a comprehensive set of exercises, you will also learn how to use the CUDA Visual Profiler for optimizing CUDA kernels, use the Tegra System Profiler for optimizing CPU code and tracing multi-process system-wide activities, and use Tegra Graphics Debugger for debugging and profiling 3D graphics applications. Prerequisites: Basic CUDA-C and C++ coding skills.

120 Minutes Instructor-Led Lab Sebastien Domine - VP SW Eng. Developer Tools, NVIDIA
Favorite
L8129 - Generate Financial Time Series with Variational Autoencoders

In this lab we explain how generative models such as deep variational autoencoders can generate all kinds of realistic time series data for prices of stocks, FX rates etc. The ability to generate realistic time series is of great importance to improve the robustness of risk management or algorithmic trading applications. We look behind the theory of variational autoencoders and walk step by step through the implementation of a simple Gaussian recurrent variational autoencoder in TensorFlow. After this lab the attendee will be able to build more general generative models and train them with data.

120 Minutes Instructor-Led Lab Daniel Egloff - Founder, Flink AI
Favorite
L8144 - Word Generation with TensorFlow

Prerequisites: 'Image Segmentation with TensorFlow'

Duration: 2 hours

Framework: TensorFlow

Predict the next word of a sentence using a Recurrent Neural Network. Neural networks can transform complex inputs to complex outputs with many different types of data. In this lab, you'll train a network to predict the next word of a sentence using the MSCOCO dataset by:

• Introducing Natural Language Processing (NLP) and Recurrent Neural Networks (RNNs)

• Creating network inputs from text data

• Testing with new data and iterating to improve performance

Upon completion of this lab, you'll be able to train neural networks to understand both images and text.

Presented by the NVIDIA Deep Learning Institute (DLI).

120 Minutes Instructor-Led Lab Rajan Arora - Solution Architect, NVIDIA
Favorite
L8150 - Image Style Transfer with Torch

Framework: TensorFlow, DIGITS

This lab will guide you through how to transfer the look and feel of one image to another image by extracting distinct visual features. See how convolutional neural networks are used for feature extraction, and how these features feed into a generator to create a new image. You'll learn how to:

•Transfer the look and feel of one image to another image by extracting distinct visual features

•Qualitatively determine whether a style is transferred correctly using different techniques

•Use architectural innovations and training techniques for arbitrary style transfer

Upon completion, you'll be able to use neural networks to do arbitrary style transfer that's fast enough to apply even to videos.

Presented by the NVIDIA Deep Learning Institute (DLI).

120 Minutes Instructor-Led Lab Steven Harpster - Technical Marketing Engineer, NVIDIA
Favorite
L8169 - Medical Image Analysis with R and MXNet Convolutional neural networks (CNNs) can be applied to medical image analysis to infer patient status from non-visible images. Train a CNN to infer the volume of the left ventricle of the human heart from time-series MRI data and learn to: • Extend a canonical 2D CNN to more complex data • Use the framework MXNet through the standard Python API and through R • Process high-dimensionality imagery that may be volumetric and have a temporal component Upon completion, you'll know how to use CNNs for non-visible images. Prerequisites: Some experience training neural networks using datasets 120 Minutes Instructor-Led Lab Abel Brown - Certified Instructor, NVIDIA
Favorite
CE8102 - Connect with the Experts: OpenACC - Quick On-ramp to GPUs

This session is designed for anyone who is either looking to start with GPUs or already accelerating their code with OpenACC on GPUs or CPUs. Join OpenACC experts and your fellow OpenACC developers to get an expert advice, discuss your code and learn how OpenACC Directives are used by others.

Connect directly with NVIDIA engineers and experts from other organizations on specific topics. Come on in and ask a question.

1 Hour Connect with the Experts Sunita Chandrasekaran - Assistant Professor, Department of Computer & Information Sciences, University of Delaware
Jeffrey Larkin - Senior DevTech Software Engineer, NVIDIA
Robert Crovella, NVIDIA
Guido Juckeland - Head of Computational Science Group, Helmholtz-Zentrum Dresden-Rossendorf
Michael Wolfe - Compiler Engineer, NVIDIA
Robert Henschel - Director Science Community Tools, Indiana University
Randy Allen - Director, Mentor Graphics
Favorite
CE8128 - Connect with the Experts: Deep Learning Deployment

Attend this session to get your questions on deep neural network deployment answered. Learn more about deployment platforms such as cloud, datacenters and embedded and merits and limitations of each approach. NVIDIA experts can help you choose the right deployment platform for your application and project.

Connect directly with NVIDIA engineers and experts from other organizations on specific topics. Come on in and ask a question.

1 Hour Connect with the Experts Xuan Vinh Nguyen, NVIDIA
Craig Wittenbrink, NVIDIA
Siddharth Sharma, NVIDIA
Christopher Gottbrath - Senior Manager, NVIDIA
Ryan Olson - Solutions Architect, NVIDIA
Dilip Sequeira, NVIDIA
Favorite
CE8148 - Connect with the Experts: Deep Technical Dive into NVIDIA Virtual Workstation and Desktop Solutions

We will be taking a deep dive on both the software and hardware for NVIDIA vGPU technology. Whether in the process of implementing vGPU technology for your enterprise, or want help with a POC, stop by for a chat and a whiteboard session.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject.  

1 Hour Connect with the Experts Luke Wignall - Senior Manager, Pro Viz Performance Engineering & Technical Marketing, NVIDIA
Chenghuan Jia, NVIDIA
Andrew Currid, NVIDIA
Favorite
S81008 - Speed at Scale: Using GPUs to Accelerate Analytics for Extreme Use Cases (Presented by MapD)

It is common knowledge that GPUs can dramatically accelerate HPC and machine learning/AI workloads, but can they do the same for general purpose analytics? In this talk, Todd Mostak, CEO of MapD, will provide real-world examples of how a new generation of GPU-powered analytics platforms can enable enterprises from a range of verticals to dramatically accelerate the process of insight generation at scale. In particular, he will focus on how the key technical differentiators of GPUs: their massive computational bandwidth, fast memory, and native rendering pipeline, make them uniquely suited to allow analysts and data scientists to query, visualize and power machine learning over large, often high-velocity, datasets. Using the open source MapD analytics platform as an example, Todd will detail the technical approaches his team took to leverage the full parallelism of GPUs and demo how the platform allows analysts to interactively explore datasets containing tens of billions of records.

25-minute Talk Todd Mostak - CEO & Founder, MapD
Favorite
S81041 - Using HPC Computational Physics Tools for Advanced Engineering Simulations and Production Deployment (Presented by Amazon Web Services)

AWS offers the most powerful GPU-accelerated cloud infrastructure that delivers unparalleled computational efficiency for advanced engineering simulations and analysis, enabling High Performance Computing (HPC) workloads to run in the cloud at scale. This session features a real-world use case from the advanced product engineering team at Western Digital, who is using HPC solutions to model new technologies and capabilities prior to production. Western Digital's computational tools incorporate the description of physics occurring during the HDD recording process and ultimately result in input to a recording sub system channel model which produces an Error Rate. The length scales involved in the recording model range from a few nanometers in the description of the recording media to microns in the description of the recording head. The power of the current generation of NVIDIA GPUs allows Western Digital to generate enough simulation data so that the same recording sub system channel model, used in experiments, can be employed in studies that include fabrication processes variances. 

50-minute Talk David Hinz - Senior Director, Engineering Services, Cloud and Data Center Computing Operations, Western Digital Technologies, Inc.
David Pellerin - Head of WW Business Development for Hitech/Semiconductor, Amazon Web Services
Byron Lengsfield - Research Scientist, Western Digital
Favorite
S8249 - Machine Learning on VMware vSphere using NVIDIA's Virtualized GPUs You'll learn about enabling virtualized GPUs for machine learning workloads on VMware vSphere and combining GPU performance with data center management benefits of VMware vSphere. NVIDIA's Pascal GPU is the first GPU to offer both virtualized Compute/CUDA and virtualized Graphics. It supports multiple virtual machines (VM) sharing GPU for both compute and graphics capabilities. We will present our research results of machine learning workloads with vSphere platform using NVIDIA's virtualized GPUs. Learn different ways to deploy GPU-based workloads developed with popular machine learning frameworks like TensorFlow and Caffe using VMware DirectPathIO and NVIDIA vGPU solutions. We will discuss use cases to leverage best scheduling methods Equal Share, Fixed Share and Best Effort for virtualized GPUs and illustrate their benefits via our performance study. We address the scalability of machine learning workloads in term of the number of VMs per vSphere server and the number of GPUs per VM. Data center resource utilization of these workloads on vSphere with NVIDIA GPUs is also analyzed and presented. 50-minute Talk Uday Kurkure - Staff Engineer, VMware
Lan Vu - California, VMware
Favorite
S8297 - HornetsNest - Scalable Static and Dynamic Graph Algorithms Made Easy We'll present HornetsNest, a framework for developing static and dynamic graph algorithms with relative ease. Through a small subset of graph primitives, which are the API for our framework, it is possible to implement parallel graph algorithms using a fairly small number of code lines. These graph primitives are optimized in the backend and as such programmers can focus on algorithm design rather than load-balancing, system utilization, and optimizations. Using these primitives, it's possible to implement BFS in roughly 10 lines of code. Performance-wise, this BFS performs as well is its counterpart in the Gunrock library. More importantly, HornestsNest is the first framework to support a wide range of high-performing dynamic graph analytics, including new algorithms for dynamic triangle counting, dynamic page rank, and dynamic Katz centrality. Finally, we'll cover the performance of numerous graph algorithms. 25-minute Talk Oded Green - Research Scientist, Georgia Institute of Technology
Favorite
S8475 - Accelerating Linear Algebra on Small Matrices - from Batched BLAS to Large Scale Solvers Learn how to accelerate many small-sized linear algebra problems - from kernels to large-scale solvers. We describe techniques targeting parallelization, vectorization, and communication, which have become extremely challenging on many-core architectures/GPUs. Standard interfaces, called batched APIs, are proposed to be included in highly-optimized libraries like MAGMA that provide the most extended set of batched BLAS and LAPACK functionalities to date. We'll describe the developments as well as their use to accelerate applications from big data analytics to high-order FEM tensor computations, and low-rank approximations for solvers and preconditioners. We'll also concentrate on the GPU acceleration of a large-scale distributed-memory solver that uses a hierarchically compressed coefficient matrix. 50-minute Talk Ichitaro Yamazaki - Research Scientist, UTK
Stanimire Tomov - Research Director, UTK
Favorite
S8496 - Low-Latency GPU Accelerated Inferencing with TensorRT Come learn how you can optimize the deployment of your trained neural networks using the GPU-accelerated inferencing library called TensorRT. TensorRT is a high-performance tool for low-latency, high-throughput deep neural network (DNN) inference that runs on NVIDIA GPUs. The latest release of TensorRT introduces a novel, framework-agnostic network definition format called universal framework format, allowing TensorRT to support and optimize DNN models trained in multiple deep learning frameworks like Caffe and TensorFlow. It also provides the capability to run inference at reduced precision, giving developers the ability to take advantage of new GPU hardware features like the Volta Tensor Core architecture. This session will be a combination of lecture and live demos. 50-minute Talk Prethvi Kashinkunti - Solutions Architect, NVIDIA
Favorite
S8518 - An Introduction to NVIDIA OptiX We'll explain the NVIDIA OptiX ray tracing engine, a sophisticated library for performing GPU ray tracing. We'll provide an overview of the OptiX ray tracing pipeline and the programmable components. OptiX can be used in many domains, ranging from rendering to acoustic modeling to scientific visualization. We'll dive deeper into the new features of OptiX and present code samples demonstrating best practices for writing a high-performance ray tracer using the OptiX programming model. 80 Minutes Tutorial Ankit Patel - Senior Product Manager, NVIDIA
Detlef Roettger - Senior Developer Technology Engineer, NVIDIA
Favorite
S8600 - Realizing the Future of Making Things with Generative Design Autodesk Generative Design harnesses the compute power of the NVIDIA GPU to deliver a full Design-to-Make workflow for today's product designers and engineers. Learn how the future of computing will enable better performing designs to be created with less time and effort than traditional engineering approaches. Autodesk Generative Design allows the user to fully explore possible design spaces, incorporating materials and manufacturing methods into the creation of design solutions. 50-minute Talk Brian Frank - Sr. Product Line Manager | Simulation, Autodesk
Favorite
S8604 - Developing Agile UAV Autonomy in a Virtual Reality Environment

Despite the high level of interest in autonomous unmanned aerial vehicles (UAVs) over the last few years, the gap between human pilots and UAVs without an external infrastructure remains exceedingly large. Autonomous UAVs face limitations, both in autonomy algorithms and in the platforms and testing environments required to develop the algorithms. We'll discuss a UAV system built around a Jetson TX1 module and a custom carrier board to provide the computation, sensors, and agility required for high-performance flight; a real-time photorealistic image simulation testing environment that acts as a virtual reality environment while a UAV is in flight; and the vision-based algorithms developed using the aforementioned two that enable autonomous agile flight.

25-minute Talk Thomas Sayre-McCord - PhD Candidate, MIT
Favorite
S8607 - Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training We find 99.9 percent of the gradient exchange in distributed SGD is redundant, and we propose deep gradient compression (DGC) to greatly reduce the communication bandwidth and improve the scalability of distributed training. To preserve accuracy during this compression, DGC employs four methods: momentum correction, local gradient clipping, momentum factor masking, and warm-up training. We have applied DGC to image classification, speech recognition, and language modeling with multiple datasets including Cifar10, ImageNet, Penn Treebank, and Librispeech Corpus. In all these scenarios, DGC achieves a gradient compression ratio from 270x to 600x without losing accuracy, cutting the gradient size of ResNet-50 from 97MB to 0.35MB, and for DeepSpeech from 488MB to 0.74MB. DGC enables large-scale distributed training on inexpensive commodity 1Gbps Ethernet and facilitates distributed training on mobile. 50-minute Talk Song Han - scientist, Stanford/Google/MIT
Favorite
S8651 - Extracting Data from Tables and Charts in Natural Document Formats Financial analysis depends on accurate financial data, and these data are often distributed via PDF and other "natural document" formats. While these formats are optimized for easy human comprehension, automatically extracting the data can be quite challenging. We'll describe our work using a deep learning pipeline to extract data from tables and charts in PDF documents. We'll also show some of our latest research, inspired by image captioning models, for directly going from images of tables to a markup language (LaTeX) representation. 50-minute Talk David Rosenberg - Data Scientist, Office of the CTO, Bloomberg
Philipp Meerkamp - Financial Software Engineer, Bloomberg
Favorite
S8685 - Learning with Opponent-Learning Awareness We'll discuss deep reinforcement learning in multi-agent settings, focusing on learning with opponent-learning awareness, a novel multi-agent reinforcement learning method that allows one agent to consider the learning dynamics of another agent. You'll learn that this not only stabilizes learning in multi-agent settings, but also leads to emergence of cooperation. A key question relevant to autonomous cars is how to maintain cooperation between self-interested learning agents in a multi-agent setting. 50-minute Talk Jakob Foerster - Ph.D. Student, University of Oxford
Favorite
S8695 - NVIDIA VR Update Building on our previous talks, we'll give an update on what's happening in the professional VR space at NVIDIA. We first give an update on OpenGL and Vulkan VR functionality and then will talk about how to drive dual-input HMDs from two GPUs efficiently. 50-minute Talk Jan Robert Menzel - Developer Technology Engineer, NVIDIA
Kai Ingo Esser - Sr. Devtech Engineer, NVIDIA
Favorite
S8718 - Optimizing HPC Simulation and Visualization Codes Using the NVIDIA Nsight Systems

Are you readying your application for dense multi-GPU compute hardware like the NVIDIA DGX or ORNL Summit? Are you sure your CPUs and GPUs are all working to their capabilities? Are you looking for a tool to squeeze out that last bit of performance? Come and learn how the new NVIDIA Nsight Systems can help you maximize the performance of your simulation and visualization applications on GPU-accelerated clusters. Learn suggested techniques and best practices for optimizing HPC workloads. NVIDIA engineers and the developers of molecular modeling tools at University of Illinois will share their experiences using the NVIDIA Nsight Systems to analyze and optimize several of their HPC applications, including NAMD, VMD, and Lattice Microbes. The session will highlight several intermediate and advanced profiling techniques and will demonstrate how incorporation of NVTX profiling hooks into the application can help focus profiling activity and improve clarity of profiling results in complex HPC apps.

50-minute Talk Robert Knight - Software Engineer, NVIDIA
Daniel Horowitz - Director of Platform Developer Tools, NVIDIA
John Stone - Senior Research Programmer, University of Illinois at Urbana Champaign
Favorite
S8720 - Crash � Practical Applications of Deep Learning in the Insurance Claims Industry

Deep learning, assisted with GPU acceleration, is pervading many sectors and the insurance space is no exception. We'll illustrate how deep learning applications in image and speech recognition are forming the backbone of innovative applications in the insurance industry. Real-world examples of image and speech deep learning technology are presented, demonstrating how ground-breaking applications have been engineered in the industry to automate decision-support, assist humans, improve customer experiences and reduce costs.

50-minute Talk Nigel Cannings - CTO, Intelligent Voice
Favorite
S8742 - Optimizing for Real-Time Inference Real-time games have an extremely small budget for computations of each frame. Learn the right way to approach real-time performance with inference workloads, taking advantage of the newest technologies available. 50-minute Talk Donald Brittain - Principal Engineer, NVIDIA
Favorite
S8764 - GPU-Enabled Ultrasound Imaging for Real-Time, Fully-Flexible Data Processing

Explore how parallelized programming and DL can radically impact medical ultrasound imaging. In this session, we will describe how the processing of ultrasound signals can be implemented not only providing real-time capabilities, but also a flexible environment for research and innovative new products. In this view, we will i) demonstrate 2D and 3D real-time imaging using open hardware platforms, and ii) provide an overview, how both radical parallelization and DL can be integrated within processing pipelines, providing new applications and improved image quality at unprecedented speed.

25-minute Talk Christoph Hennersperger - Senior Research Scientist, Technical University of Munich | Trinity College Dublin
Favorite
S8766 - Visual Search at eBay

We'll share information and lessons learned from developing a scalable visual search engine to handle a massive volatile inventory like eBay. We'll describe how eBay data is challenging for visual search, how to leverage a single deep neural network to perform multiple tasks efficiently, how to deploy our solution in a distributed cloud infrastructure, and which optimizations we have made for a trade-off between relevance and latency. We'll give examples and insights to benefit computer vision practitioners in the industry who intend to build up visual search engines from scratch.

50-minute Talk Fan Yang - Research Scientist, eBay
Favorite
S8816 - How Deep Learning Could Predict Weather Events How do meteorologists predict weather or weather events such as hurricanes, typhoons, and heavy rain? Predicting weather events were done based on supercomputer (HPC) simulations using numerical models such as WRF, UM, and MPAS. But recently, many deep learning-based researches have been showing various kinds of outstanding results. We'll introduce several case studies related to meteorological researches. We'll also describe how the meteorological tasks are different from general deep learning tasks, their detailed approaches, and their input data such as weather radar images and satellite images. We'll also cover typhoon detection and tracking, rainfall amount prediction, forecasting future cloud figure, and more. 50-minute Talk Sa-Kwang Song - Principal Researcher, Korea Institute of Science and Technology
Favorite
S8827 - ANI-AL: Universal Deep Learning Potentials for Organic Molecules and Materials We'll introduce ANI-AL molecular potentials, which are deep learning based potential energy functions for the fast and accurate prediction of quantum mechanical energies and forces of molecular systems. Thanks to GPU acceleration of training and inference, we successfully implement an automated sampling method that borrows techniques from active learning to automatically drive the systematic improvement of ANI-AL potentials. We'll also present results from applications of the ANI-AL potential in various problems relating to computational chemistry, such as molecular structure optimization, reaction path prediction, vibrational frequency calculation, and molecular dynamics simulations. 50-minute Talk Justin Smith - Graduate Research Assistant, University of Florida
Favorite
S8851 - The Road From GPU-Powered Prototypes to Production-Ready ECUs GPUs provide power-efficient hardware acceleration for graphics processing and deep learning algorithms, making them the ideal compute processors for highly automated driving functionalities. Despite the predominance of GPUs in the development of prototypes, the actual market penetration of GPUs in series-production electronic control units (ECUs) remains comparably low. In this talk we will focus on a key contributor to this problem: deficient support for integration into the design processes of the automotive supply chain and automotive software standards. 50-minute Talk Alexander Much - Chief Expert, Elektrobit Automotive GmbH
Christoph Herzog - Head of Portfolio Management, Elektrobit Automotive GmbH
Favorite
S8908 - ORNL Summit: Enabling Large Scale Science on Summit Through the Center for Accelerated Application Readiness

The Center for Accelerated Application Readiness within the Oak Ridge Leadership Computing Facility is a program to prepare scientific applications for next generation supercomputer architectures. Currently the program consists of thirteen domain science application development projects focusing on preparing codes for efficient use on Summit. Over the last three years, these teams have developed and executed a development plan based on detailed information about Summit's architecture and system software stack. This presentation will highlight the progress made by the teams that have used Titan, the 27 PF Cray XK7 with NVIDIA K20X GPUs, SummitDev, an early IBM Power8+ access system with NVIDIA P100 GPUs, and since very recently Summit, OLCF's new IBM Power9 system with NVIDIA V100 GPUs. The program covers a wide range of domain sciences, with applications including ACME, DIRAC, FLASH, GTC, HACC, LSDALTON, NAMD, NUCCOR, NWCHEM, QMCPACK, RAPTOR, SPECFEM, and XGC

25-minute Talk Jack Wells - Director of Science, Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory
Favorite
S8981 - Asynchronous Operations and Dynamic Parallelism in CUDA - Session 3 of 4 (Presented by Acceleware)

This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. The second part of the session will focus on dynamic parallelism. A programming demo involving asynchronous operations will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!

80 Minutes Tutorial Chris Mason - Technical Product Manager, Acceleware
Dan Cyca - Chief Technology Officer, Acceleware
Favorite
S8488 - Leveraging GPUs for Bayesian Inference We'll present results on speeding up Bayesian inference in NVIDIA DGX-1 server for medical diagnostics. Bayesian inference is an AI technique to reason under uncertainty that is computationally and data intensive. We'll discuss the implications for both inference and training of Bayesian networks. 25-minute Talk Alex Kozlov - Solutions Architect, NVIDIA
Alec Gunny - Solutions Architect, NVIDIA
Favorite
S8551 - ORNL Summit: Scaling Deep Learning for Scientific Workloads on Summit HPC centers have been traditionally configured for simulation workloads, but deep learning has been increasingly applied alongside simulation on scientific datasets. These frameworks do not always fit well with job schedulers, large parallel file systems, and MPI backends. We'll discuss examples of how deep learning workflows are being deployed on next-generation systems at the Oak Ridge Leadership Computing Facility. We'll share benchmarks between native compiled versus containers on Power systems, like Summit, as well as best practices for deploying learning and models on HPC resources on scientific workflows. 25-minute Talk Jack Wells - Director of Science, Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory
Arjun Shankar - Group Leader, Advanced Data & Workflow, and Director of the Compute and Data Environment for Science (CADES), Oak Ridge National Lab
Favorite
S8960 - Computational Pathology at Scale: Changing Clinical Practice One Petabyte at a Time

How can we train medical deep learning models at a petabyte scale and how can these models impact clinical practice? We will discuss possible answers to these questions in the field of Computational Pathology. Pathology is in the midst of a revolution from a qualitative to a quantitative discipline. This transformation is fundamentally driven by machine learning in general and computer vision and deep learning in particular. With the help of PAIGE.AI we are building a clinical-grade AI at Memorial Sloan Kettering Cancer Center. The models are trained based on petabytes of image and clinical data on top of the largest DGX-1 V100 cluster in pathology. The goal is not only to automated cumbersome and repetitive tasks, but to impact diagnosis and treatment decisions in the clinic. This talk will focus on our recent advances in deep learning for tumor detection and segmentation, on how we train these high capacity models with annotations collected from pathologists, and how the resulting systems are implemented in the clinic.

50-minute Talk Thomas Fuchs - Associate Professor, Memorial Sloan Kettering Cancer Center
Favorite
CE8127 - Connect with the Experts: Deep Learning Frameworks for Training

Attend this session to get you questions on deep learning frameworks answered. Learn more about widely used Deep Learning Frameworks such as Caffe, Theano, Torch, TensorFlow, CNTK, and MXNet and let NVIDIA experts can help you with choosing the right framework for your research or project.

Connect directly with NVIDIA engineers and experts from other organizations on specific topics. Come on in and ask a question.

1 Hour Connect with the Experts Pooya Davoodi, NVIDIA
Simon Layton - Senior Deep Learning Engineer, NVIDIA
Michael O'Connor - Director, NVIDIA
Richard Carter, NVIDIA
Natalia Gimelshein, NVIDIA
Alexander James, NVIDIA
Deyu Fu, NVIDIA
Jie Jiang, NVIDIA
Favorite
CE8141 - Connect with the Experts: VR: GL, DX & VK

Come talk to us about anything VR related. We invite you to discuss anything from efficient rendering over multi-GPU rendering to the newest hardware features.

1 Hour Connect with the Experts Jan Robert Menzel - Developer Technology Engineer, NVIDIA
Kai Ingo Esser - Sr. Devtech Engineer, NVIDIA
Favorite
CE8143 - Connect with the Experts: GVDB Voxels for Raytracing & Simulation

GVDB Voxels is a new SDK from NVIDIA for raytracing, modeling and simulation of sparse volumes. GVDB 1.1 is now available at GTC’2018 with improved performance, dynamic topology, and high quality raytracing via OptiX integration. Talk with the experts on GVDB Voxels for application integration and experimentation in the areas of motion pictures, 3D printing and scientific visualization.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Prerna Dogra - Product Manager, DesignWorks, NVIDIA
Rama Hoetzlein - Graphics Research Engineer, NVIDIA
Favorite
L8145 - Image and Video Captioning by Combining CNNs and RNNs Prerequisites: Deep Learning Fundamentals with Computer Vision or similar deep learning experience and Word Generation with TensorFlow

Duration: 2 hours

Framework: TensorFlow

Learn to combine computer vision and natural language processing to describe scenes. Many applications of deep learning require the processing of multiple data types. Train a model that generates a description of an image from raw pixel data by:

• Making use of the output of layers in the middle of neural networks

• Combining data from multiple networks through concatenation and/or averaging

• Harnessing the functionality of CNNs and RNNs

Upon completion of this lab, you'll be able to combine workflows and data to innovate using deep learning.

Presented by the NVIDIA Deep Learning Institute (DLI).
120 Minutes Instructor-Led Lab Rajan Arora - Solution Architect, NVIDIA
Favorite
L8151 - Rendered Image Denoising with Autoencoders Prerequisites: 'Fundamentals of Deep Learning with Computer Vision' or similar experience

Duration: 2 hours Learn how a neural network with an autoencoder can be used to dramatically speed up the removal of noise in ray traced images. You'll learn how to:

• Determine whether noise exists in rendered images

• Use a pre-trained network to denoise some sample images or your own images

• Train your own denoiser using the provided dataset

Upon completion, you'll be able to use autoencoders inside neural networks to train your own rendered image denoiser.

Presented by the NVIDIA Deep Learning Institute (DLI).
120 Minutes Instructor-Led Lab Steven Harpster - Technical Marketing Engineer, NVIDIA
Favorite
L8161A - Jetson 101: Deep Learning Workflow with DIGITS and TensorRT

Learn the workflow of deploying deep learning models on Jetson, NVIDIA's embedded AI supercomputer. This is a hands-on lab session and attendees will have access to both the cloud server for training neural network models and the physical Jetson TX2 Developer Kit. The lab introduces the basic of deep learning, then work through the image classification model training using NVIDIA DIGITS, and examine how the model is to be deployed on Jetson by understanding the network architecture. It will also touch TensorRT, NVIDIA's technology to accelerate the inference performance.

120 Minutes Instructor-Led Lab Chitoku Yato - Technical Product Marketing Manager, NVIDIA
Favorite
L8170 - Image Classification with TensorFlow: Radiomics - 1p19q Chromosome Status Classification Thanks to work being performed at the Mayo Clinic, using deep learning techniques to detect Radiomics from MRI imaging has led to more effective treatments and better health outcomes for patients with brain tumors. Learn to detect the 1p19q co-deletion biomarker by: • Designing and training Convolutional Neural Networks (CNNs) • Using imaging genomics (radiomics) to create biomarkers that identify the genomics of a disease without the use of an invasive biopsy • Exploring the radiogenomics work being done at the Mayo Clinic Upon completion, you'll have unique insight into the novelty and promising results of using deep learning to predict radiomics. Prerequisites: Basic understanding of convolutional neural networks and genomics 120 Minutes Instructor-Led Lab Colin Compas - Deep Learning Certified Instructor, NVIDIA
Favorite
L8172 - DRL for Optimal Execution of Portfolio Transactions Deep reinforcement learning (D-RL) can be trained to optimize execution of large portfolio transactions using a simple environment simulator. Simulator will generate features such as current and lagged values of spread, best bid/ask, volume, volatility, log-return, bid-ask imbalance, etc. for Neural Network to optimize trading trajectories. Upon completion, you should have a starting framework for incorporating higher dimensional order book data into the methodology that allows state-of-the-art Neural Networks to be used for optimization. Prerequisites: Working knowledge of basic scientific python, Basic level knowledge of TensorFlow 120 Minutes Instructor-Led Lab Onur Yilmaz - Deep Learning Solution Architect and Certified Instructor, NVIDIA
Favorite
S81013 - GPU Performance Testing and PowerAI on IBM Cloud (Presented by IBM Cloud)

In this session, you will learn about the latest IBM PowerAI solution, IBM Cloud GPU offerings and see a price-performance comparison, with supporting data, on the number of CPUs required to optimize GPU performance. We've also aggregated extensive test data to determine general best practices such as half-precision deep learning advantages on the Tesla V100 and the implications of neural-network model variable distribution and gradient aggregation techniques on your performance results. Join us to see why NVIDIA GPUs on IBM Cloud offer superior results.

50-minute Talk Brian Wan - IBM Watson and Cloud Platform Software Engineer, IBM
Alex Hudak - IBM Cloud Offering Manager, IBM
Favorite
S8158 - Graph-Centric AI for Cybersecurity Large enterprise networks and computer systems face the daily challenge of cyberattacks, which originate from software and hardware vulnerabilities and result in data theft, service interruption, and monetary loss. To address this challenge, we've developed a set of graph-based machine learning techniques for accelerating threat detection on GPUs. We'll present our research on graph-centric AI that can be used to discover malicious actions in time to prevent irreversible damage to the systems. In the era of big data, these techniques help us to have a deep understanding of critical relationships in computer systems, social networks, and IoT, which is essential in many industry segments, including defense, software, finance, e-commerce, and healthcare. 50-minute Talk Howie Huang - Associate Professor, The George Washington University
Favorite
S8161 - GPU Acceleration of Direct Sparse Matrix Solver for ANSYS Electronics A GPU-accelerated direct sparse matrix solver has been in use at ANSYS since 2016. It achieves high performance on CPUs and GPUs for a wide range of electromagnetic problems, in comparison with state-of-the-art commercial and open-source software. We'll review the current GPU acceleration technique, and describe our recent improvements to the GPU-enabled matrix solver technique, observing up to 1.5x speedup over the existing GPU algorithm. This new innovation enables GPU acceleration of matrix computations that would not benefit from GPUs before. 25-minute Talk Zhen Wang - Senior Research and Development Engineer, ANSYS
Favorite
S8188 - Application of openACC to Computer Aided Drug Discovery software suite "Sanjeevini"

We will demonstrate the features and capabilities of OpenACC for porting and optimizing the ParDOCK docking module of the Sanjeevini suite for computer aided drug discovery developed at the Supercomputing Facility for Bioinformatics and Computational Biology at the Indian Institute of Technology Delhi. We have used OpenACC to efficiently port the existing C++ programming model of ParDOCK software with minimal code modifications to run on latest NVIDIA P100 GPU card. These code modifications and tuning resulted in a six times average speedup of improvements in turnaround time. By implementing openACC, the code is now able to sample ten times more ligand conformations leading to an increase in accuracy. The ACC ported ParDOCK code is now able to predict a correct pose of a protein-ligand interaction from 96.8 percent times, compared to 94.3 percent earlier (for poses under 1 A) and 89.9 percent times compared to 86.7 percent earlier (for poses under 0.5 A).

25-minute Talk Bharatkumar Sharma - Senior Solution Architect, NVIDIA
Abhilash Jayaraj - Research Scholar, Indian Institute of Technology Delhi
Favorite
S8394 - Cerner SkyVue Remote Review with NVIDIA GRID and VMware Horizon Clinical staff need the ability to access cardiology imagery at anytime of day, and sometimes in very low bandwidth situations. Cerner identified this need and took efforts to build a solution to deliver high-performance graphics computing centered in the data center rather than having a physical machine present. The Cerner SkyVue Cardiology application traditionally requires higher performance workstations with substantial graphics cards. By centrally locating these within data centers, clinical staff can access the application from any workstation or mobile device with VMware Horizon View and NVIDIA GRID as the enabling technology. 50-minute Talk Stuart Jackson - Senior Technology Architect, Cerner Corporation
Favorite
S8455 - Deep Learning of Severe Weather Forecast Data Attendees will learn how deep learning models identify severe weather hazards, how deep learning severe weather diagnosis compares with other machine learning methods, and what weather features deep learning considers most important for determining whether a storm will produce severe weather or not. Severe weather hazards, such as tornadoes, hail, high winds, and flash floods, cause billions of dollars in property damage and injure or kill hundreds of people in the U.S. each year. Improved forecasts of the potential for severe weather enables decision makers to take actions to save lives and property. Machine learning and deep learning models extract spatial information from observations and numerical weather prediction model output to predict the probability of severe weather based on whether or not some form of severe weather was reported by the public. Convolutional neural networks and generative adversarial networks are compared against principal component analysis encodings to determine how much skill deep learning adds over traditional methods. The deep learning models are interrogated to identify important variables and spatial features for severe weather prediction. 50-minute Talk David Gagne - Postdoctoral Fellow, National Center for Atmospheric Research
Favorite
S8479 - Efficient Communication Library for Large-Scale Deep-Learning We'll talk about the challenges in a large-scale distributed, GPU-based deep learning, and propose an efficient communication algorithm to achieve state-of-the-art scalability. In detail, we'll explain various ways to speed up GPU-based deep learning, and motivate the large-scale deep-learning in the performance context. Then, we will state that efficient communication is a grand challenge in the large-scale deep-learning, especially with upcoming more powerful GPUs such as Volta architecture Tesla V100. We'll present the technical details on a proposed communication algorithm along with the supporting data collected with more than 100 GPUs. 25-minute Talk Minsik Cho - Research Staff Member, IBM Research
Favorite
S8517 - Applying AI to Simplify Support- Lessons Learnt We'll provide insights into how customer support built on the foundation of AI can help streamline customer support for large enterprises, especially manufacturers. With AI technologies like image recognition and natural language processing maturing, enterprises should strongly consider building an AI-based support platform, especially those with an omni-channel strategy. Delivering an amazing and differentiated user experience will lead to higher net promoter and customer satisfaction scores. By employing AI-based technologies, enterprises can reduce their contacts, consequently reducing their cost and contact. It will also help them sell more replacement parts online. 25-minute Talk Satish Mandalika - CEO & Co Founder, Drishyam.ai
Favorite
S8527 - Efficient Parallel Distributed Approaches for Deep Reinforcement Learning

Deep Reinforcement Learning is quickly becoming one of the most exciting fields in machine learning. Continuous interactions with an environment allow learning agents to learn the optimal execution policy from past experience by optimizing parameterized Neural Network models. However, a single learning agent suffers from limitations on computation resources as well as the unary exposure to the environment. To counter these limitations, we can scale the Deep Reinforcement Learning process by parallelizing training and processes to collect data from the environment. Existing efforts include novel distributed Deep Reinforcement Learning algorithms such as G-A3C, and TRPO, and open-source libraries for implementations, including BigDL and Ray. In this session, we will review key parallel distributed algorithms and libraries for Deep Reinforcement Learning.

50-minute Talk Marcos Campos - Head of Artificial Intelligence, Bonsai
Favorite
S8534 - Making Business Application Intelligent Using SAP Leonardo Machine Learning

The SAP Leonardo Machine Learning provides capabilities, micro-services, applications, and technology that enable the integration and the adaption of ML in the enterprise. We will present how ML technology works and how you can transform your business, with SAP Leonardo ML and the power of SAP Cloud Platform (SCP). One of the ML use cases we built is called Catalog Normalization. This solution processes catalogs received from suppliers, extracts attributes from free-text descriptions and normalizes attribute names and values. In this talk, we will also review this solution to show how deep learning models can be used to solve this problem for enterprise using SAP Leonardo Machine Learning.  

50-minute Talk Frank Wu - Head of SAP Machine Learning Business Network, SAP Labs
Nazanin Zaker - Lead Data Scientist, SAP
Favorite
S8571 - Towards AI Agents That Can See, Talk, and Act We are witnessing unprecedented advances in computer vision and AI. What lies next for AI? We believe that the next generation of intelligent systems (say the next generation of Google's Assistant, Facebook's M, Apple's Siri, Amazon's Alexa) will need to possess the ability to perceive their environment (through vision, audition, or other sensors), communicate (i.e., hold a natural language dialog with humans and other agents), and act (e.g., aid humans by executing API calls or commands in a virtual or embodied environment), for tasks such as: aiding visually impaired users in understanding their surroundings; interacting with an AI assistant (Human: 'Alexa – can you see the baby in the baby monitor?', AI: 'Yes, I can', Human: 'Is he sleeping or playing?'); robotics applications (e.g. search and rescue missions) where the operator may be situationally blind and operating via language. We'll present work from our lab on a range of projects on such visually grounded conversational agents. 25-minute Talk Dhruv Batra - Assistant Professor and Researcher, Georgia Tech and Facebook AI Research
Favorite
S8630 - What the Profiler is Telling You: Optimizing GPU Kernels In this session we explore how to analyze and optimize the performance of kernels running on the GPU. Working with a real-world example, we will walk through an analysis-driven process leading to a series of kernel-level optimizations, using NVIDIA's profiling tools as an example. Attendees will learn about the fundamental performance limiters-instruction throughput, memory throughput, and latency and we will present strategies to identify and tackle each type of limiter. 50-minute Talk Jakob Progsch - Developer Technology Engineer, NVIDIA
Christoph Angerer - Senior Developer Technology Engineer, NVIDIA
Mathias Wagner - Developer Technology Engineer, NVIDIA
Favorite
S8636 - Autoware on NVIDIA DRIVE: The Open-Source Self-Driving Platform

We'll present a complete open-source software stack for self-driving vehicles, called Autoware, and its open integration with the NVIDIA DRIVE platform. Autoware implements working modules of localization and 3D mapping with LiDAR and GNSS, object detection and traffic light recognition with deep learning, path planning with lattice and search methods, and vehicle dynamics control. Compute-intensive tasks of these modules are accelerated by using CUDA, and timing-aware tasks are protected by RTOS capabilities. We'll discuss the impact of CUDA acceleration on self-driving vehicles and its performance evaluation. Learn how Autoware enables any by-wire vehicles to become high-quality self-driving vehicles that can operate in real-world environments.

50-minute Talk Shinpei Kato - CTO, Tier IV, Inc.
Favorite
S8698 - Optimizing Depth Fusion for Mixed Reality with an NVIDIA Quadro GP100 We'll show how a headset-mounted depth camera can be used to enable scalable real-time scene reconstruction for immersive mixed reality applications. We'll discuss and profile optimized CUDA kernels that leverage the tremendous performance of an NVIDIA Quadro GP100. Furthermore, we'll show how knowledge of the headset's position and orientation in space can be leveraged to improve and make more robust the reconstruction process. 50-minute Talk Sven Middelberg - Developer Technology Engineer, NVIDIA
Favorite
S8735 - A.I. Disrupting the Future of Content Creation for Games

The artistic manpower needed to create a video-game has been increasing exponentially over the years. Thanks to the computational power of NVIDIA GPUs, new AI accelerated workflows are poised to solve this problem, saving artists and studios time and money, and driving greater creativity. Artomatix is the leading pioneer in this space, its AI-based approach to content creation helps automate many of the mundane, tedious and repetitive tasks artists and designers face every day. This talk introduces the academic theory and history behind Creative AI and then delves into specific use cases and applications such as: Texture Synthesis, Material Enhancement, Hybridization and Style Transfer. Finally, this talk presents the next generation of tools for the creative industries, powered by AI, and gives case studies on how they've been solving some of the game industries largest problems over the past year. Join this session to gain an insight to the future of game creation.

50-minute Talk Eric Risser - Founder & CTO, Artomatix
Favorite
S8747 - ORNL Summit: Petascale Molecular Dynamics Simulations on the Summit POWER9/Volta Supercomputer Learn the opportunities and pitfalls of running billion-atom science at scale on a next-generation pre-exascale GPU-accelerated supercomputer. The highly parallel molecular dynamics code NAMD has been long used on the GPU-accelerated Cray XK7 Blue Waters and ORNL Titan machines to perform petascale biomolecular simulations, including a 64-million-atom model of the HIV virus capsid. In 2007 NAMD was was one of the first codes to run on a GPU cluster, and it is now one of the first on the new ORNL Summit supercomputer, which features IBM POWER9 CPUs, NVIDIA Volta GPUs, and the NVLink CPU-GPU interconnect. This talk will cover the latest NAMD performance improvements and scaling results on Summit and other leading supercomputers. 25-minute Talk James Phillips - Senior Research Programmer, University of Illinois
Favorite
S8760 - Deep-Learning Inferencing on IBM Cloud with NVIDIA TensorRT We'll focus on the deep-learning neural network model deployment and inference on the IBM Cloud and how well Nvidia GPUs perform in this area compared to FPGAs that have been tuned for deep-learning primitives. We believe this topic is very relevant today because, with the emergence of new powerful NVIDIA GPUs, more and more artificial intelligence has become part of our daily lives, from Siri, Alexa, language translation, image recognition, to self-driving cars. The cognitive era has truly begun. Toward this end, IBM has formed a close partnership with Nvidia to offer GPU-enabled systems - both dedicated servers and on the cloud - to our customers and developers to run their cognitive workloads. 50-minute Talk Larry Brown - Senior Software Engineer, IBM
Khoa Huynh - Senior Technical Staff Member (STSM), IBM
Favorite
S8843 - Building an Enterprise Machine Learning Center of Excellence

Algorithmic advancements and new research capabilities frequently overshadow the infrastructure that enables that research and serves it to customers in production applications. Having a solid infrastructure for real world machine learning often ends up being the biggest determinant of success and is an exciting area of research and engineering in its own right. These environments are what allow brilliant algorithms to deliver value at scale. We'll detail how Capital One has designed its GPU computing environment to accelerate machine learning efforts and outline the services used, the framework to leverage those services, and the engineering practices used to develop and deploy well-governed, accurate models to high-volume production environments. Beyond production deployments, we'll discuss how this infrastructure performs large-scale testing of models and frameworks to explore the interactions of deep learning tools like MXNet and TensorFlow. We'll also discuss the practices that enabled Capital One to hire a high-performing team in this incredibly desirable field.

25-minute Talk Zachary Hanif - Director of Machine Learning, Capital One
Favorite
S8182 - CUDA Based Stitching of Teravoxel Microscopy Images

Learn how to use (multi) GPU and CUDA to speed up the process of stitching very large images (up to TeraBytes in size). Image stitching is the process of combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image. Image stitching is widely used in many important fields, like high resolution photo mosaics in digital maps and satellite photos or medical images. Motivated by the need to combine images produced in the study of the brain, we developed and released for free the TeraStitcher tool that we recently enhanced with a CUDA plugin that allows an astonishing speedup of the most computing intensive part of the procedure. The code can be easily adapted to compute different kinds of convolution. We describe how we leverage shuffle operations to guarantee an optimal load balancing among the threads and CUDA streams to hide the overhead of moving back and forth images from the CPU to the GPU when their size exceeds the amount of available memory. The speedup we obtain is such that jobs that took several hours are now completed in a few minutes.

25-minute Talk Massimo Bernaschi - Director of Technology, National Research Council of Italy
Favorite
S8193 - Prototyping Vision-Based Classifiers in Constrained Environments SOFWERX developed a vision-based classifier using commodity hardware and machine learning libraries to satisfy an urgent high-level requirement. To track the usage of tank ammunition, the team had to address challenges involving unavailable training data, varying spatial orientations, and limited power consumption. To resolve these challenges, SOFWERX generated an augmented dataset using synthetic models, implemented spatial transformers, and experimented with different hardware/software optimizations. 25-minute Talk Ted Hromadka - Senior Software Engineer, Integrity Applications Incorporated
Cameron Hunt - CIO, SOFWERX
Favorite
S8458 - Capture Sparsity in DL Applications We'll present a new technique for improving efficiency of inference and training in deep learning in the presence of sparse workloads. We'll start with a brief overview of applications of sparse linear algebra in engineering and data analysis. Then, we'll analyze the presence of sparsity in both the training and inference phases of deep learning. To exploit this sparsity, we present our method of improving memory locality of sparse applications. We'll establish lower and upper bounds for sparse matrix operations and crossover with dense matrix operations. We'll demonstrate how to minimize memory traffic by tiling matrix operations, efficient use of L2, L1, and SMEM. We'll conclude with a performance comparison of our method with existing techniques on some real pruned weight matrices from GoogLeNet and OpenNMT's multiway translation network. This is the joint work of Michael Frumkin, Jeff Pool, and Lung Sheng Chien. 25-minute Talk Michael Frumkin - Sr. Compute Architect, NVIDIA
Favorite
S8462 - Multi-GPU Training with NCCL

We'll cover recent features and performance improvement in the NVIDIA collective communication library (NCCL). NCCL is designed to make computing on multiple GPUs easy and is integrated in most deep learning frameworks to accelerate training times. NCCL supports communication over Shared memory, PCI, NVLink, Sockets, and InfiniBand Verbs, to support both multi-GPU machines and multi-node clusters. 

25-minute Talk Sylvain Jeaugey - Senior Computing/Networking engineer, NVIDIA
Favorite
S8540 - Deep Learning for Molecular Docking Molecular docking is an important tool for computational drug discovery that aims to predict the binding pose of a ligand (drug) to a target protein. Identifying a correctly oriented pose requires a scoring function that has a global optimum close to the experimentally observed pose. Additionally, it should also be differentiable with respect to atomic positions so that it can be used for gradient-based pose optimization. We'll describe a differentiable grid-based convolutional neural network scoring function and explore its application in an end-to-end GPU-optimized molecular docking workflow. We'll show that convolutional neural networks trained on experimental data can successfully identify correct binding modes and meaningfully rank and score compounds. We'll also describe several visualization approaches that map the CNN score back to the atomic inputs to help guide medicinal chemistry optimization and provide insight into the functioning of the neural network. The entirety of our approach is available under an open-source license as part of our gnina package (https://github.com/gnina). 25-minute Talk David Koes - Assistant Professor, University of Pittsburgh
Favorite
S8550 - Performance Optimization for Deep Image Matting in Photoshop Learn how a research paper from Adobe Research Labs makes it into a real customer product like Photoshop. We attempted to solve a number of challenging issues about applying the technology to real-world use cases, including large model size, heavy memory consumption, and slow runtime performance. 25-minute Talk Salil Tambe - Computer Vision Engineer, Adobe Systems
Betty Leong - Photoshop Engineering Manager, Adobe Systems
Christopher Hebert - Developer Technology Engineer, NVIDIA
Favorite
S8560 - Towards Theory of AI's Mind To effectively leverage the progress in Artificial Intelligence (AI) to make our lives more productive, it is important for humans and AI to work well together in a team. Traditionally, research has focused primarily on making AI more accurate, and (to a lesser extent) on having it better understand human intentions, tendencies, beliefs, and contexts. The latter involves making AI more human-like and having it develop a theory of our minds. In this talk, I will argue that for human-AI teams to be effective, humans must also develop a Theory of AI's Mind – get to know its strengths, weaknesses, beliefs, and quirks. I will present some (very) initial results in the context of visual question answering and visual dialog — where the AI agent is trained to answer natural language questions about images. 25-minute Talk Devi Parikh - Assistant Professor and Researcher, Georgia Tech and Facebook AI Research
Favorite
S8776 - GAN Fashion Photo Shoot: Garment to Model Images Using Conditional GANs Learn how VUE.ai's model generator uses conditional GANs to produce product-specific images suitable for replacing photographs in catalogs. We'll present networks that generate images of fashion models wearing specific garments, using an image of the garment as a conditioning variable. Network architecture variants, training, and manipulation of latent variables to control attributes such as model pose, build, or skin color will be addressed. 25-minute Talk Costa Colbert - Chief Scientist, Senior Vice President, MAD Street Den, Inc./ VUE.ai
Favorite
S8802 - Juicing Up Ye Olde GPU Monte Carlo Code

We'll discuss the GPU accelerated Monte Carlo compute at JP Morgan which was architected for C1060 cards and revamped a few times as new architectures were released. The key features of the code are exclusive use of double precision, data caching, and code structure where significant amount of CPU pre-compute is followed by running multiple GPU kernels. On the latest devices, memory per flop is a throughput limiting factor for a class of our GPU-accelerated models. As byte/flop ratio is continuing to fall from one generation of GPU to the next, we are exploring the ways to re-architecture Monte Carlo simulation code to decrease memory requirements and improve TCO of the GPU-enabled compute. Obvious next steps are store less, re-calculate more, and unified memory. 

25-minute Talk Oleg Rasskazov - Executive Director, Quantitative Research, JP Morgan Chase
Richard Hayden - Vice President, JP Morgan Chase
Favorite
S8880 - Khronos Standards Update: Vulkan, glTF, OpenCL and OpenXR for Cross-Platform VR/AR

Discover how over 100 companies cooperate at the Khronos Group to create open, royalty-free standards that enable developers to access the power of the GPU to accelerate demanding compute, graphics and vision applications. This session includes the very latest updates on many Khronos cross-platform standards, including OpenXR for portable AR and VR, Vulkan, SPIR-V, OpenGL and OpenCL. The session also provides insights into how these open standards APIs are supported across NVIDIA's product families.

25-minute Talk Neil Trevett - Vice President Developer Ecosystem, NVIDIA
Favorite
S8909 - ORNL Summit: Exposing Particle Parallelism in the XGC PIC code by exploiting GPU memory hierarchy XGC is a kinetic whole-­‐volume modeling code with unique capabilities to study tokamak edge plasmas in real geometry and answer important questions about the design of ITER and other future fusion reactors. The main technique is the Particle-­‐in-­‐Cell method, which models the plasma as billions of quasiparticles representing ions and electrons. Ostensibly, the process of advancing each particle in time is embarrassingly parallel. However, the electric and magnetic fields must be known in order to push the particle, which requires an implicit gather operation from XGC's sophisticated unstructured mesh. In this session, we'll show how careful mapping of field and particle data structures to GPU memory allowed us to decouple the performance of the critical electron push routine from size of the simulation mesh and allowed the true particle parallelism to dominate. This improvement enables performant, high resolution, ITER scale simulations on Summit. 25-minute Talk Stephen Abbott - Solutions Architect, NVIDIA
Favorite
S8982 - Essential CUDA Optimization Techniques - Presented by Acceleware (Session 4 of 4)

Learn how to optimize your algorithms for NVIDIA GPUs. This informative tutorial will provide an overview of the key optimization strategies for compute, latency and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. For compute bound algorithms we will discuss how to improve branching efficiency, intrinsic functions and loop unrolling. For memory bound algorithms, optimal access patterns for global and shared memory will be presented. Cooperative groups will also be introduced as an additional optimization technique. This session will include code examples throughout and a programming demonstration highlighting the optimal global memory access pattern which is applicable to all GPU architectures. Printed copies of the material will be provided to all attendees for each session - collect all four!

80 Minutes Tutorial Chris Mason - Technical Product Manager, Acceleware
Dan Cyca - Chief Technology Officer, Acceleware
Favorite
CE8107 - Connect with the Experts: Video Codec SDK and Capture SDK

Come by and ask us all you want to know about the NVIDIA Video Codec SDK and Capture SDK. Let's talk about h.264/HEVC, desktop capturing, encoding & decoding performance, your requirements and problems.

Connect directly with NVIDIA engineers and experts from other organizations on specific topics. Come on in and ask a question.

1 Hour Connect with the Experts Stefan Schoenefeld - Senior DevTech Engineer and Manager, NVIDIA
Abhijit Patait - Director, System Software, NVIDIA
Favorite
CE8132 - Connect with the Experts: Building Autonomous Vehicles using DRIVE Platforms

Connect with NVIDIA experts and discuss why autonomous technologies powered by deep learning have become a key focus for every car manufacturer, as well as transportation services and technology companies. The car needs to know exactly where it is, recognize the objects around it, and continuously calculate the optimal path for a safe driving experience. This situational and contextual awareness of the car and its surroundings demands a powerful visual computing system that can merge data from cameras and other sensors, plus navigation sources, while also figuring out the safest path - all in real-time. This autonomous driving platform is NVIDIA DRIVE PX.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject.  

1 Hour Connect with the Experts Richard Albayaty, NVIDIA
Dmitry Chichkov, NVIDIA
Felix Schmitt, NVIDIA
Favorite
CE8138 - Connect with the Experts: Jetson

NVIDIA Jetson is the world's leading computing platform for AI at the edge. High in performance and low in power, it's ideal for compute-intensive embedded applications like robots, drones, mobile medical imaging, and Intelligent Video Analytics (IVA). OEMs, independent developers, makers and hobbyists can use the NVIDIA Jetson Developer Kit and module to explore the future of embedded computing and artificial intelligence. Have questions? Jetson experts and the NVIDIA Developer Tools team will be present to cover CUDA debugging and profiling, system trace and graphics debugging and profiling tools, and more.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Ashok Kelur, NVIDIA
Eric Brower - Software Management, NVIDIA
Hung Chen, NVIDIA
Winnie Hsu, NVIDIA
Avraham Shapira - Sr. Director Software Engineering, NVIDIA
Rohit Vaswani, NVIDIA
Andrey Trachenko, NVIDIA
Sanjiv Satoor, NVIDIA
Felix Schmitt, NVIDIA
John Welsh, NVIDIA
Zheng Ba, NVIDIA
Jeffrey Kiel - Senior Manager of Graphics Tools, NVIDIA
Favorite
S8133 - Managing Multi-User Virtual Reality Environments at Scale Enterprise Virtual Reality presents challenges for to traditional IT approaches. Discuss how enterprises can address integrating virtual reality into business workflows using common enterprise IT management tools. Learn how virtualization with virtual reality can increase density, and reduce complexity. Discuss how collaborative virtual reality affects enterprise deployment strategies. We'll present several real-world use cases of Multi-User VR System deployment. Examples will include industries and varied applications such as Location Based Entertainment, Manufacturing Collaborative Design Review, and Public Sector Training and Simulation. We'll outline both the business and technical requirements that drove design decisions toward fewer larger systems each consolidating multiple virtualized VR-Ready Virtual Machines instead of many individual PCs serving one VR user each. Final solution architecture will be presented for each example along with early user feedback results where available. Attendees of this session will learn how to evaluate the pros and cons of deploying Multi-User VR systems vs individual VR-ready PCs and be better equipped to design a solution that fits VR. 25-minute Talk Friederich Devoir - Sr. Solutions Architect, NVIDIA
Thomas Kaye - Sr. Solution Architect, NVIDIA
Favorite
S8134 - Instance Sizing for Your GPU Fleet: Lessons from Developing Smart Kitchen Technology Learn how to size your GPU fleet by following examples from our work in Computer Vision for the Smart Kitchen. Although GPU technology has enabled us to dramatically improve quality of service and reduce costs, in order to obtain optimal value we had to consider our needs in terms of both GPU and CPU capabilities. In this talk we give an overview of the problem domain that we have been working in, then dive into a demonstration of how memory requirements, along with raw performance needs, have played a key role in determining our choice of AWS GPU instances. Innit has pioneered technology addressing all aspects of people's interactions with food, from meal planning to shopping, storage and cooking; in this talk we focus on our food intelligence platform which relies heavily on recognizing both generic food items and specific packaged goods in a variety of contexts. 50-minute Talk Hristo Bojinov - CTO, Innit, Inc.
Rob Laber - Senior Computer Vision Engineer, Innit, Inc
Favorite
S8165 - Hard Facts - Benchmarking NVIDIA Virtual GPU Accelerated Remote Desktop User Experience NVIDIA vGPU Community Advisors Ruben Spruijt and Benny Tritsch present their latest findings on benchmarking user experience performance in NVIDIA vGPU-accelerated environments hosted on-premises and in the cloud. Get in-depth information on the latest versions of Citrix XenApp/XenDesktop, VMware Horizon and Microsoft RDS when accelerated by NVIDIA GPUs. What is the performance impact caused by remoting protocol settings, latency and common WAN scenarios? Hundreds of recorded screen videos and telemetry data sets, combined with a unique visualisation tool, allow Ruben and Benny to analyse and compare the performance of selected NVIDIA vGPU-accelerated remote desktops and VDI scenarios, live on stage. 50-minute Talk Benny Tritsch - Principal Consultant, RDSGURUS
Ruben Spruijt - Field CTO, Frame
Favorite
S8179 - Performance Evaluation of GPU-Accelerated Linear Solvers on TCAD Examples We'll present the results of our evaluation of GPU-accelerated sparse linear solvers from paralution and magma and compare them with our CPU-only sparse linear solvers on technology computer-aided design (TCAD) examples. TCAD is a category of software tools for designing semiconductor devices. The use of semiconductor devices can be found in almost any area of our current life. The purpose of TCAD tools is to replace the cumbersome physical experiments with computer simulations. A significant part of the whole simulation time is spent on solving the linear systems so the performance of the linear solvers is extremely important. 25-minute Talk Ana Iontcheva - Senior Development Engineer Numerics, Silvaco
Favorite
S8229 - GPU Accelerated Sequence Learning for Action Recognition We'll introduce several attempts for modeling the long-term sequence dependence to help improve the action recognition performance. First, we'll introduce a fused feature of deep and hand-crafted features to prove the complementation between them. We'll also introduce an attempt of attention model to illustrate the effectiveness of attention mechanism on action recognition. We'll then introduce shuttleNet, which is a biologically-inspired neural network. Finally, we'll give some divergent experiments on action recognition to show the potential research direction. 25-minute Talk Yemin Shi - PhD, Peking University
Favorite
S8254 - De Novo Drug Design using Artificial Intelligence We propose a novel computational strategy based on deep and reinforcement learning techniques for de-novo design of molecules with desired properties. This strategy integrates two deep neural networks – generative and predictive – to generate novel chemical structures with the desired properties. In the first phase of the method, generative and predictive models are separately trained with supervised learning algorithms. In the second phase, both models are jointly trained with reinforcement learning approach to bias newly generated chemical structures towards those with desired physical and biological properties. In this proof-of-concept study, we have employed this strategy to design chemical libraries biased toward compounds with either maximal, minimal, or specific range of physical properties, such as melting point and hydrophobicity, as well as to develop novel putative inhibitors of JAK2. This new approach can find a general use for generating targeted chemical libraries optimized for a single desired property or multiple properties. 25-minute Talk Olexandr Isayev - Assistant Professor, University of North Carolina
Favorite
S8289 - How to Get the Most out of GPU Accelerated Database Operators Early on, memory bandwidths, more than an order of magnitude higher than conventional processors have made GPUs an attractive platform for data-intensive applications. While there are many success stories about GPU-accelerated databases built from scratch, GPU-accelerated operations for large-scale, general-purpose databases are rather an exception than the norm. We characterize fundamental database operators like scan, filter, join, and group-by based on their memory access patterns. From these characteristics, we derive their potential for GPU acceleration, such as upper bounds for performance on current and future architectures. Starting from basic GPU implementations, we deep dive into aspects like optimizing data transfers, access, and layout, etc. 50-minute Talk Tim Kaldewey - Senior Manager Developer Technology for AI and Data Analytics, NVIDIA
Nikolay Sakharnykh - Sr. Developer Technology Engineer, NVIDIA
Jiri Kraus - Senior Devtech Compute, NVIDIA
Favorite
S8292 - Fraud Detection via Deep Learning

We'll discuss the role of deep learning in "nontraditional" settings -- domains that don't involve images, speech, or language. We'll highlight C3 IoT's work using deep learning to detect electricity fraud, and discuss how deep learning compares to traditional machine learning methods.

50-minute Talk Mehdi Maasoumy, C3 IoT
Favorite
S8318 - 3D Convolutional Neural Networks (CNNs) with Fast and Memory Efficient Cross-Hair Filters Over the years, state-of-the-art architectures have been built with convolutional layers and have been employed successfully on 2D image processing and classification tasks. This success naturally appeals for the extension of the 2D convolutional layers to 3D convolutional layers to handle higher dimensional tasks in the form of video and 3D volume processing. However, this extension comes with an exponential increase in the number of computations and parameters in each convolutional layer. Due to these problems, 2D convolutional layers are still widely used to handle 3D images, which suffer from 3D context information. In view of this, we'll present a 3D fully convolutional neural network (FCNN) with 2D orthogonal cross-hair filters that makes use of 3D context information, avoiding the exponential scaling described above. By replacing 3D filters with 2D orthogonal cross-hair filters, we achieve over 20% improvement in execution time and 40% reduction in the overall number of parameters while accuracy is preserved. 25-minute Talk Marie Piraud - Senior Researcher, Technical University of Munich
Favorite
S8363 - Now I C U: Analyzing Data Flow Inside an Autonomous Driving Car Learn large-scale data analytics and anomaly detection in intelligent networks for autonomous vehicles. Autonomous driving is no longer a research topic but a reality in the making. Internally, an autonomous vehicle is a very complex network (boardnet) of electronic control units (ECUs), communicating with each other using multiple different networking protocols, such as CAN, Ethernet, FlexRay, etc. Learn how BMW Group is using novel machine learning approaches to right size boardnet resources, including precise provisioning of ECUs and message buses to optimize network throughput for quicker decision making. We'll showcase a demo of boardnet traffic over time and demonstrate how to perform anomaly detection for finding performance and security bottlenecks in communication flows. The demo also shows machine learning and visualization on top of a GPU-accelerated database running over NVIDIA DGX-1 machine to find said anomalies. 50-minute Talk Selam Getachew Woldetsadick - Data Scientist, BMW Group
Arpit Mehta - Data Scientist, Product Owner: Big Data Architectures, BMW Group
Favorite
S8508 - Monitoring Honey Bee Health Using TensorRT and Microsoft Cognitive Toolkit

We'll take a deep dive into honey bee hive health monitoring with NVIDIA's TX2, TensorRT (a high-performance deep learning inference optimizer), Kinetica’s insight engine running on DGX-1/DGXStaion, and Microsoft Cognitive Toolkit to rapidly optimize, validate, and deploy trained neural networks for inference. In recent years, the media has reported that bees seem to be dying at an unprecedented rate. We'll explore how new accelerated analytics technologies and their corresponding compute platforms can deliver game-changing possibilities for innovation as we follow a honey bee farm scientist in California, who agreed to field test this real-time monitoring solution with her beehives.  See first-hand how adaptable and accessible these complex, cutting-edge technologies have become and how we can use intelligent monitoring technologies to help rescue the honey bee in the real-world environmental analytics opportunity.

50-minute Talk Anusua Trivedi - Data Scientist, Microsoft
Jacqueline Cenci-McGrody - Solutions Architect (Partner SA), NVIDIA
Favorite
S8591 - Continuously Learning AI Pathologist : A Smart Microscope that can Automatically Screen Different Biological Specimen

Clinical laboratories play a crucial role in healthcare ecosystem - the laboratories are pivotal and act as a screening sub-system by providing early inference in disease and abnormality diagnosis. An estimated 70% of clinical decisions regarding prevention, diagnosis and treatment involve lab tests. Surprisingly, 60% of the inferencing done at a clinical laboratory can be performed by one "wonder-tool" - microscope. Microscopy has helped pathologists assess and analyse the patients for over several centuries. The key hurdles in the microscopic examination are the amount of time that the pathologists have to spend in manual analysis and the need for the pathologists to be co-located with the specimen. In this talk, we introduce SigTuple's AI powered smart microscope that can automatically learn, analyse and summarize the inferences of several hundred abnormalities across different biological specimen (blood, urine and semen). It also utilizes the power of GPU computing on cloud to provide higher order analysis of the samples and acts as a tele-pathology enabler by providing pathologists the power to view or review any analysis or report from any part of the world.

25-minute Talk Tathagato Rai Dastidar - Chief Scientific Officer, SigTuple Technologies Pvt Ltd
Favorite
S8610 - Scaling Convolutional Neural Networks with Kubernetes and TensorFlow on AWS GPUs In this session we present a Kubernetes deployment on Amazon AWS GPUs that provide customized computer vision to a large number of users. Reza offers an overview of Matroid's pipeline and demonstrates how to customize computer vision neural network models in the browser, followed by building, training, and visualizing TensorFlow models, which are provided at scale to monitor video streams. 50-minute Talk Reza Zadeh - CEO, Matroid
Favorite
S8656 - Analyzing Sequences of Time Series Security Data with Recurrent Residual Networks

Analyzing time series data from security controls for signs of malicious activity is a common challenge in financial networks. We show how one tool, a recurrent residual deep learning (DL) model, can be used to rapidly analyze variable-length time series data to achieve meaningful analysis. Recurrent networks have long been a popular choice in DL for analyzing data with multiple time-steps where the meaning of data at one point in time is dependent upon data at other time-steps. For example, natural language processing solutions frequently utilize recurrent DL models to achieve state-of-the-art results in classification tasks. However, recurrent models are often plagued by issues concerning training difficulty as a function of the model depth. These issues are often exacerbated by the desire to create very deep models for particularly difficult tasks. Utilizing the ResNet concept developed by Microsoft research applied to a recurrent model, we show how models analyzing large sequences can achieve state-of-the-art results with fewer parameters and faster training times.

50-minute Talk Leon DeFrance - VP, Security R&D, US Bank
Ivko Cvejic - Data Scientist, US Bank
Favorite
S8739 - Machine Learning with StarCraft II

We'll present an overview of the StarCraft II machine learning environment, including some basic API examples using C++ and Python.

50-minute Talk Chris Lee - Lead Software Engineer, Blizzard
Timo Ewalds - London, DeepMind
Favorite
S8811 - An Agile Approach to Building a GPU-enabled and Performance-portable Global Cloud-resolving Atmospheric Model We'll give a high-level overview of the results of these efforts, and how we built a cross-organizational partnership to achieve them. Ours is a directive-based approach using OMP and OpenACC to achieve portability. We have focused on achieving good performance on three main architectural branches available to us, namely: traditional multi-core processors (e.g. Intel Xeons), core processors such as the Intel Xeon Phi, and, of course NVIDIA GPUs. Our focus has been on creating tools for accelerating the optimization process, techniques for effective cross-platform optimization, and methodologies for characterizing and understanding performance. The results are encouraging, suggesting a path forward based on standard directives for responding to the pressures of future architectures. 25-minute Talk Richard Loft - Director, Technology Development, National Center for Atmospheric Research
Favorite
S8903 - Dense Connection Networks for Conversational Speech Recognition Densely connected neural networks were originally introduced to avoid the problem of layer-wise vanishing gradients when CNNs are stacked in a very deep fashion, specifically for image recognition tasks. Inspired by these works, we've explored the use of dense networks connections within LSTM models for the task of automatic speech recognition. By introducing additional connections, to connect (almost) every layer to at least one other layer, we mitigate the vanishing gradient effect between LSTM layers and enable error signals to propagated back to the very first layer during training. In this presentation, we'll present the fundamentals of speech recognition and introduce different neural network model structures that have been shown to be effective for this task. We'll then introduce identity, highway, and dense connections and demonstrate how they improve the performance of these models. We'll evaluate the performance of these models across different datasets, and show that with a lattice-based system combination, densely connected LSTMs significantly contributed to reaching the marks of 5.0% and 9.1% in word error rate (WER) for the Switchboard and CallHome testsets. 50-minute Talk Ian Lane - Associate Research Professor, Carnegie Mellon University
Kyu Han - Principle Machine Learning Scientist, Capio Inc.
Favorite
S8910 - ONRL Summit: GPU Acceleration of Multiphysics CFD Software for Propulsion and Power Flow Systems Simulation and analysis of flow and combustion processes in propulsion and power systems presents many new and interesting challenges. A multitude of strongly coupled fluid dynamic, thermodynamic, transport, chemical, multiphase, and heat transfer processes are intrinsically coupled and must be considered simultaneously in complex domains associated with devices such as gas-turbine and rocket engines. The problem is compounded by the effects of turbulence and high-pressure phenomena, which require treatment of nonideal fluid mixtures at supercritical conditions. The combination of complex multicomponent property evaluations along with the computational grid resolution requirements makes these simulations expensive and cumbersome. Recent advances in high performance computing (HPC) systems, such as the graphics processing unit (GPU) based architectures, provides an opportunity for significant advances in dealing with these complexities while reducing the time to solution. 25-minute Talk Joseph C. Oefelein - Professor in the Daniel Guggenheim School of Aerospace Engineering, Georgia Institute of Technology
Favorite
S8918 - High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs We'll present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks. Conditional GANs have enabled a variety of applications, but the results are often limited to low-res and still far from realistic. We'll show that we're capable of generating 2048x1024 visually appealing results with a novel adversarial loss, as well as new multi-scale generator and discriminator architectures. Furthermore, we extend our framework to interactive visual manipulation with two additional features. First, we incorporate object instance segmentation information, which enables object manipulations such as removing/adding objects and changing the object category. Second, we propose a method to generate diverse results given the same input, allowing users to edit the object appearance interactively. Human opinion studies demonstrate that our method significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing. 25-minute Talk Ting-Chun Wang - Research Scientist, NVIDIA
Favorite
S8968 - Autoregressive Wavenet Inference on Volta GPUs

Autoregressive wavenets have demonstrated extremely high quality real-time speech synthesis results.  However, the compute requirements and tight latency bounds have made them impractical for deployment on traditional CPU-only systems.  In this talk we demonstrate that Volta GPUs provide excellent real-time inference performance on these networks, making practical deployments possible.  We discuss several alternative implementation techniques and demonstrate their achieved performance on a V100 GPU.

25-minute Talk Brian Pharris - Principal Architect, NVIDIA
Favorite
S8974 - Architecting a Complete Data Infrastructure for AI and Deep Learning (Presented by NetApp)

Enterprises are eager to take advantage of artificial intelligence technologies such as deep learning to introduce new services and enhance insights from company data. As data science teams move past proof of concept and begin to operationalize deep learning, it becomes necessary to focus on the creation of a complete data architecture that eliminates bottlenecks to facilitate faster model iteration. Designing a data architecture involves thinking holistically about the deep learning pipeline, from data ingest and edge analytics, to data prep and training in the core data center, to archiving in the cloud. It is necessary to understand performance requirements and data services needed, but one should also consider future extensibility and supportability as deep learning hardware and cloud approaches evolve over time. This session will examine all the factors involved in the architecture of a deep learning pipeline, focusing in on data management and the hybrid cloud. Careful infrastructure planning can smooth the flow of data through your deep learning pipeline, lead to faster time to deployment, and thus maximum competitive differentiation.

50-minute Talk Santosh Rao - AI & Data Engineering, NetApp
Kesari Mishra - Principal Engineer, NetApp
Favorite
S8149 - Using Virtual Reality To Enhance The Quality Of Machine Learning Data

We'll discuss our use of Virtual Reality to enhance the quality of machine learning data in a real-time, collaborative environment. Radiant Solutions provides highly specialized, innovative geospatial multisource data, analytics, software and services to deliver critical insights and intelligence where and when it matters. DigitalGlobe is the world's leading provider of high-resolution Earth imagery and adds over 3 million square kilometers of imagery to their library every day. At Radiant solutions, we use DigitalGlobe imagery to create one of the largest satellite imagery machine learning data sets in the world. With great data, comes great responsibility, and that's why we are innovating on new methods to control the quality of machine learning data. Powered by GPUs, we can view massive amounts of image data all in an immersive VR experience. We believe that methods like these will help push the boundaries of what is possible for machine learning data.

25-minute Talk Kevin McGee - Production Lead, Radiant Solutions
Favorite
S8290 - The Big Picture: How to Build Display Walls Using NVIDIA APIs/Tools The need to drive multiple displays, be it for digital signage, a corporate conference room, or even an immersive VR room, is becoming more common. We'll provide an overview of the display management tools and APIs that are part of NVIDIA's DesignWorks SDK. You'll learn about NVIDIA MOSAIC; display setup and management using NVAPI + NVWMI; synchronization methods; and warp and blend APIs. 80 Minutes Tutorial William VanDyken - Senior Solution Architect, NVIDIA
Favorite
S8333 - Prediction of Heterodimeric Protein Complexes from Protein-Protein Interaction Networks Using Deep Learning

We'll present how to apply deep learning to predict small-sized protein complexes with multiple biological information and hybrid deep learning model. We'll describe the background of the problem, what kind of biological information are useful for accurately predicting small-sized protein complexes, how to improve the prediction accuracy by using hybrid deep learning models for different information, and compare the performance of multiple deep learning models for this problem.

25-minute Talk Peiying Ruan - Deep Learning Solution Architect, NVIDIA
Favorite
S8386 - Identifying New Therapeutics for Parkinson's Disease Using Virtual Neurons on an Azure Hosted GPU Cluster

Learn how to apply recent advances in GPU and open data to unravel the mysteries of biology and etiology of disease. Our team has built data driven simulated neurons using CUDA and open data, and are using this platform to identify new therapeutics for Parkinson's disease with funding from the Michael J. Fox Foundation. In this session I'll discuss the open data which enables our approach, and how we are using Nvidia Tesla cards on Microsoft Azure to dynamically scale to more than 100,000 GPU cores while managing technology costs.

25-minute Talk Andy Lee - CTO, Neuroinitiative
Favorite
S8406 - Model Architectures and Training Techniques for High-Precision Landmark Localization

We'll discuss training techniques and deep learning architectures for high-precision landmark localization. In the first part of the session, we'll talk about ReCombinator Networks, which aims at maintaining pixel-level image information, for high-accuracy landmark localization. This model combines coarse-to-fine features to first observe global (coarse) image information and then recombines local (fine) information. By using this model, we report SOTA on three facial landmark datasets. This model can be used for other tasks that require pixel-level accuracy (for example, image segmentation, image-to-image translation). In the second part, we'll talk about improving landmark localization in a semi-supervised setting, where less labeled data is provided. Specifically, we consider a scenario where few labeled landmarks are given during training, but lots of weaker labels (for example, face emotions, hand gesture) that are easier to obtain are provided. We'll describe training techniques and model architectures that can leverage weaker labels to improve landmark localization.

25-minute Talk Pavlo Molchanov - Sr. Research Scientist, NVIDIA
Sina Honari - Ph.D. Student, University of Montreal - MILA
Favorite
S8495 - Deploying Deep Neural Networks as a Service Using TensorRT and NVIDIA-Docker Learn how you can utilize TensorRT and NVIDIA Docker to quickly configure and deploy a GPU-accelerated inference server and start gaining insights from your trained deep neural network (DNN) models. TensorRT is a high-performance tool for low-latency, high-throughput DNN inference. The latest release of TensorRT introduces a novel, framework-agnostic network definition format called universal framework format, which allows TensorRT to support and optimize DNN models trained in multiple deep learning frameworks. We'll leverage the TensorRT Python API to create a lightweight Python Flask application capable of serving multiple DNN models trained using TensorFlow, PyTorch, and Caffe, and also discuss how to containerize this inference service using NVIDIA Docker for ease of deployment at scale. This session will consist of a lecture, live demos, and detailed instructions. 80 Minutes Tutorial Alec Gunny - Solutions Architect, NVIDIA
Prethvi Kashinkunti - Solutions Architect, NVIDIA
Favorite
S8812 - An Approach to Developing MPAS on GPUs MPAS-A is a general circulation (global) model of the Earth's atmosphere that is designed to work down to so-called non-hydrostatic scales where convective (vertical) cloud processes are resolved. To date, MPAS-A has been used primarily for meteorological research applications, although climate applications in the community earth system model are being contemplated. At a high level, MPAS-A consists of a dynamics part, a fluid flow solver that integrates the non-hydrostatic time dependent nonlinear partial differential equations of the atmosphere, and a physics part, which computes the forcings of these equations due to radiative transport, cloud physics, and surface and near surface processes. The dynamics is in turn divided into the dry dynamics and moist dynamics parts. Algorithmically, the dynamics uses a finite volume method on an unstructured centroidal Voronoi mesh (grid, or tessellation) with a C-grid staggering of the state variables as the basis for the horizontal discretization. 25-minute Talk Raghu Raj Prasanna Kumar - Project Scientist I & Group Head, Special Technical Project Group, Tec, National Center for Atmospheric Research
Favorite
S8850 - Autotuning Dense Batched QR Factorizations on GPU

The increasing complexity and heterogeneity of computer architectures makes it challenging to design both efficient and portable codes. Indeed, designing generic GPU kernels that attempt to fit all GPU architectures would not be efficient on any given architecture. Moreover, the careful and customized design of a GPU kernel for a specific GPU will be hardly efficient on the next generation of GPUs. Furthermore, writing tailored kernels for every GPU is a daunting task that would require too much time and effort. We'll present our work on applying the auto-tuning idea to target this issue for batched QR factorization kernels on GPUs by generating automatically codes specific to a given GPU.

25-minute Talk Wissam M. Sid-Lakhdar - PostDoctoral Researcher, Lawrence Berkeley National Laboratory
Favorite
S8867 - Attention GAN for Fine-Grained Language-to-Image Generation We have long envisioned that machines one day can perform human-like perception, reasoning, and expression across multiple modalities including vision and language, which will augment and transform the ways humans communicate with each other and with the real world. With this vision, we'll introduce the latest work of developing a deep attention GAN for fine-grained language-to-image synthesis. We'll discuss the open problems behind the task that we're thrilled to solve, including image and language understanding, joint reasoning across both modalities, and expressing abstract concepts into full imagination, which are of fundamental importance to reaching general intelligence. 25-minute Talk Pengchuan Zhang - Researcher, Microsoft Research
Favorite
S8926 - ORNL Summit: Accelerated Simulations of Stellar Explosions with FLASH: Towards Exascale Capability Multiphysics and multiscale simulations are found in a variety of computational science subfields, but their disparate computational characteristics can make GPU implementations complex and often difficult. Simulations of supernovae are ideal examples of this complexity. We use the scalable FLASH code to model these astrophysical cataclysms, incorporating hydrodynamics, thermonuclear kinetics, and self-­‐gravity across considerable spans in space and time. Using OpenACC and GPU-­‐enabled libraries coupled to new NVIDIA GPU hardware capabilities, we have improved the physical fidelity of these simulations by increasing the number of evolved nuclear species by more than an order-­‐of-­‐ magnitude. I will discuss these and other performance improvements to the FLASH code on the Summit supercomputer at Oak Ridge National Laboratory. 25-minute Talk Austin Harris - Distinguished Postdoctoral Research Associate, Oak Ridge National Laboratory
Favorite
CE8105 - Connect with the Experts: Ray Tracing with the NVIDIA OptiX SDK

Discuss with experts how to apply the NVIDIA OptiX GPU ray casting SDK to solve graphics visualization tasks and other algorithms requiring ray tracing. Ask questions about ways to use the OptiX API to implement fast and flexible ray tracing applications.

Connect directly with NVIDIA engineers and experts from other organizations on specific topics. Come on in and ask a question.

1 Hour Connect with the Experts Detlef Roettger - Senior Developer Technology Engineer, NVIDIA
Favorite
CE8131 - Connect with the Experts: Performance Analysis and Optimization

Come ask your GPU code optimization questions to experts in the field.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Mathias Wagner - Developer Technology Engineer, NVIDIA
Peng Wang, NVIDIA
Lei Wu, NVIDIA
Vinay Deshpande - Compute DevTech Engineer, NVIDIA
Favorite
CE8145 - Connect with the Experts: Deep Learning On Windows Workstation

Covering what you need to know to get up and running with the major frameworks and lower level tools on the Windows platform. We are here to assist you with building frameworks for windows, using tools such as CUDA and cuDNN on Windows and ensuring that you are getting the best out of newer hardware features.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

 

 

1 Hour Connect with the Experts Salil Tambe - Computer Vision Engineer, Adobe Systems
Christopher Hebert - Developer Technology Engineer, NVIDIA
Donald Brittain - Principal Engineer, NVIDIA
Yury Uralsky, NVIDIA
Favorite
L8113 - Detection of Anomalies in Financial Transactions using Deep Autoencoder Networks The "unsupervised" and "end-to-end" detection of anomalies in transactional data is one of the long-standing challenges in financial statement audits or fraud investigations. In this lab will walk you through a use case of how autoencoder neural networks, can be trained to detect such anomalies by learning a compressed but "lossy" model of regular transactions. In detail we will (1) Introduce the basic concepts, intuition and major building blocks of autoencoder neural networks, (2) Learn how to preprocess financial data in order to learn a model of its characteristics, (3) Design, implement and train a deep autoencoder network using PyTorch to detect anomalies in large-scale financial data, and, (4) Interpret and evaluate the networks detection results as well as its reconstruction loss. 120 Minutes Instructor-Led Lab Timur Sattarov - Forensic Data Analyst, PricewaterhouseCoopers GmbH WPG
Marco Schreyer - Researcher, German Research Center for Artificial Intelligence
Favorite
L8118 - VR Development in Unity You'll learn the fundamentals of working with the Unity engine for developing Virtual Reality experiences. We'll cover general workflows, Unity C# scripting, the graphics pipeline, animation, VR optimization and more. 120 Minutes Instructor-Led Lab Daniel Miller - VR/AR Evangelist, Unity Technologies
Favorite
L8140 - Image Classification with DIGITS

Deep learning enables entirely new solutions by replacing hand-coded instructions with models learned from examples. Train a deep neural network to recognize handwritten digits by:

• Loading image data to a training environment

• Choosing and training a network

• Testing with new data and iterating to improve performance

Upon completion of this lab, you'll be able to assess what data you should be using for training.

Presented by the NVIDIA Deep Learning Institute (DLI).

120 Minutes Instructor-Led Lab Mike Mendelson - Deep Learning Institute Curriculum Developer, NVIDIA
Favorite
L8153 - Modeling Time Series Data with Recurrent Neural Networks in Keras

Prerequisites: Some experience training CNNs

Duration: 2 hours

Framework: Keras

Recurrent Neural Networks (RNNs) allow models to classify or forecast time-series data, like natural language, markets—and in the case of this Lab, a patient's health over time. You'll:

• Create training and testing datasets using electronic health records in HDF5 (hierarchical data format version five)

• Prepare datasets for use with recurrent neural networks, which allows modeling of very complex data sequences

• Construct a Long-Short Term Memory model (LSTM), a specific RNN architecture, using the Keras library running on top of Theano to evaluate model performance against baseline data

Upon completion, you'll be able to model time-series data using Recurrent Neural Networks.

Presented by the NVIDIA Deep Learning Institute (DLI).

120 Minutes Instructor-Led Lab Cameron Carlin - Data Scientist, Children's Hospital Los Angeles, Virtual Pediatric ICU
Steven Steinke - Curriculum Developer, NVIDIA
David Ledbetter - Senior Data Scientist, Children's Hospital Los Angeles Virtual Pediatric ICU
Favorite
L8174 - Building Autonomous Vehicles with DRIVE PX Sensor Abstraction Layer (SAL) is required for software to interface with the hardware sensors. NVIDIA DriveWorks is a Software Development Kit (SDK) that contains reference applications, tools, and library labs, including SAL. This session will go through key labs included in DriveWorks. You'll learn how to: • Interface with sensors using Sensor Abstraction Layer on DRIVE PX2 • Use the tools and Labs to do perception on the vehicle • Integrate DriveWorks Labs into custom code or applications Upon completion, you'll be able to use DriveWorks Labs to create a perception lab or integrate into your own solution. Prerequisites: Fundamentals or equivalent background/experience 120 Minutes Instructor-Led Lab Aaraadhya Narra - Solutions Architect, DLI Certified Instructor, NVIDIA
Favorite
S81015 - Learn How IBM Visual Insights from Watson IoT Uses Deep Learning to Help Manufacturers "See" Defects Instantly (Presented by IBM)

Learn how the power of AI-powered image recognition along with Nvidia GPUs helps clients detect and classify production line defects more quickly, accurately and reliably. See customer use cases of how IBM's real-time visual inspection solution has helped manufacturing companies successfully transform their quality management processes by reducing inspection time and costs, improving efficiency and reliability, and reducing scrap and increasing manufacturing yield. 

50-minute Talk Jayashree Ravichandran - Senior Offering Manager Visual Insights, Acoustic Insights, Quality Analytics, IoT for Manufacturing, IBM
Favorite
S81029 - Chopout: A Simple Way to Train Various Sized Neural Networks at Once

Variable sized networks are hard to train since for each change in network layer requires re-learning of its parameter values.  We present a novel operator, "Chopout",  to simultaneously learn variable sized networks and can perform inference through the desired choice of network's size.

Several previous approaches design deeper architectures for improving the accuracy of deep neural nets, however, they are not efficient in cost and inference speed. Selecting a smaller architecture from the previous designs requires re-learning of the network. Chopout operator helps learn both random subnetworks as well as full network, which provides versatility to network size during inference. The method can be easily integrated with any neural network architecture without the need for additional parameters. The effectiveness is further evaluated through experiments. 

50-minute Talk Takanori Ogata - Co-founder & Chief Research Officer, ABEJA, Inc.
Favorite
S8197 - Recurrent Generative Adversarial Neural Networks for Compressive Imaging We'll present recurrent generative adversarial networks (GANs) for image recovery from compressed measurements, which has applications ranging from undersampled medical image reconstruction to image super-resolution. State-of-the-art analytics are not aware of the image perceptual quality, and demand iterative algorithms that incur significant computational overhead for real-time tasks. To sidestep these hurdles, we introduce a novel compressive imaging framework using deep neural networks that approximates a low-dimensional manifold of images using GANs. To ensure the images are consistent with the measurements, a recurrent GAN architecture is deployed that consists of multiple alternative blocks of generator networks and affine projection, which is then followed by a discriminator network to score the perceptual quality of the generated images. A deep residual network with skip connections is used for the generator, while the discriminator is a multilayer Perceptron. Experiments performed with real-world contrast enhanced MRI data corroborate the superior diagnostic quality and faster reconstruction for the retrieved images relative to state-of-the-art schemes. 50-minute Talk Morteza Mardani - Postdoctoral Research Fellow, Stanford University
Favorite
S8238 - Teaching Machines to See, Communicate, and Act

A successful autonomous system needs to not only understand the visual world but also communicate its understanding with humans. To make this possible, language can serve as a natural link between high level semantic concepts and low level visual perception. We'll discuss recent work in the domain of vision and language, covering topics such as image/video captioning and retrieval, and question-answering. We'll also talk about our recent work on task execution via language instructions.

25-minute Talk Sanja Fidler - Assistant Professor, University of Toronto
Favorite
S8241 - Sunny Skies Ahead! Versioning GPU accelerated WRF to 3.7.1 We'll detail the inherent challenges in porting a GPU-accelerated community code to a newer major version, integrating the community non-GPU changes with OpenACC directives from the earlier version. This is a non-trivial exercise - this particular version upgrade contained 143,000 modified lines of code which required reintegration into our accelerator directives. This work is important in providing support for newer features whilst still providing GPU support for the users. We'll also look at efforts to improve the maintainability of GPU accelerated community codes. 25-minute Talk Stanley Posey - Program Manager, ESM and CFD Solution Development, NVIDIA
Jeffrey Adie - Principal Solutions Architect, NVIDIA
Favorite
S8281 - Instance-Aware Image and Sentence Matching with Selective Multimodal LSTM We'll present a unique framework for cross-modal image and sentence matching; namely selective multimodal long short-term memory (LSTM) that incorporates a new deep learning module as multimodal context-modulated attention network to selectively attend to pairwise semantic concepts. In detail, effective image and sentence matching depends on measuring their global visual-semantic similarity. Based on the observation that such a global similarity arises from a complex aggregation of multiple local similarities between pairwise instances of image (objects) and sentence (words), we propose a selective multimodal LSTM network (sm-LSTM) for instance-aware image and sentence matching. The sm-LSTM includes a multimodal context-modulated attention scheme at each timestep that can selectively attend to a pair of instances of image and sentence by predicting pairwise instance-aware saliency maps for image and sentence. For selected pairwise instances, their representations are obtained based on the predicted saliency maps, and then compared to measure their local similarity. By similarly measuring multiple local similarities within a few timesteps, the sm-LSTM sequentially aggregate. 25-minute Talk Yan Huang - Assistant Professor, Institute of Automation, Chinese Academy of Sciences
Favorite
S8424 - Graph Partitioning Using Bayesian Inference on GPU We implement an efficient CUDA algorithm that solves the graph clustering problem using the stochastic block model for the first time on GPUs. The algorithm views the graph as a generative model called degree-corrected stochastic block model, and performs statistical inference to discover the graph partitions most likely to result in the graph. A greedy agglomerative heuristic is used with Markov Chain Monte Carlo (MCMC) to do Bayesian inference. A comparison is made with the baseline GraphChallenge implementation on synthetic datasets. Our implementation achieves speed-ups of 11.5x and 4.1x over single-threaded and multi-threaded OpenMP implementations on the CPU. We'll provide empirical evidence that even though our method of parallelizing MCMC leads to worse convergence in terms of iteration number, we are able to harness the parallelism of the GPU to discover clusters at the same accuracy in less time. 25-minute Talk Carl Yang - Graduate Student, UC Davis
Favorite
S8457 - Deep Learning in Medical Imaging: Learning from Regions of Interest through Segmentation

Attendees will learn about some of the key opportunities for deep learning in medical imaging, current challenges, and exciting recent developments that are tackling them. We will provide an overview of medical imaging and key applications for deep learning for improving image interpretation. Unlike natural images (e.g., ImageNet), medical images often have regions of fewer than 0.1% by pixel count. By getting a pixel-wise loss, segmentation networks can learn subtle local signals. We will survey projects where segmentation has been successful, including learning from a dataset of less than 30 images without transfer learning. Using coarse ground truth labels allows for easier and more scalable approaches to data acquisition. We will also discuss the ability to use the representations learned from segmentation networks in related tasks such as classification. We will also discuss the role of segmentation in the broader field of medical image informatics and deep learning.

50-minute Talk Daniel Rubin - Associate Professor, Stanford University
Darvin Yi - Graduate Student, Stanford University
Favorite
S8469 - Compression-Aware Training of Neural Networks

We'll demonstrate the importance of accounting for compression during deep neural network training. We introduce a regularizer in the training loss to encourage the parameter matrix of each layer to have low-rank. In essence, and by contrast with methods that aim to learn uncorrelated units to prevent overfitting, our new approach seeks to learn correlated ones, which can then easily be pruned in a second phase. In addition, we also analyze the case where this regularizer is combined with a sparsity-inducing regularizer to achieve even higher compression. The proposed compression-aware training scheme yields networks that are well adapted to the following post-processing stage. As a result, our approach achieves high compression rates at virtually no loss in prediction accuracy. On ICDAR, the algorithm achieves a compression rate of 95.5% and a reduction in training time up to 70% with a 1% increment in top-1 accuracy. When used on ImageNet for ResNet-50, our approach yields compression rates of up to 35%, with no significant drop in performance, compared to 4% compression rate achieved by state-of-the art post-processing methods.

50-minute Talk Jose Alvarez - Senior Research Scientist, Toyota Research Institute
Favorite
S8539 - Pooling and Orchestrating NVIDIA Jetson for AI and Deep Learning on the Edge Attendees will learn how NVIDIA's Jetson TX-series processors can be scaled out to create an adaptive supercomputing platform for bespoke deployments and edge computing environments. Advancements in composable infrastructure technology now make it possible to pool and orchestrate Jetson processors for deployments with specialized parallel computing requirements. Use cases include Jetson deployments in non-embedded environments for edge computing where traditional HPC architectures are not hospitable. Clusters of NVIDIA Jetson TX- devices can be deployed in edge compute environments connected to arrays of sensors for neural net training, pattern recognition, and deep learning. Applications for autonomous transportation can also benefit from clustering massive numbers of Jetson TX- devices to simulate fleets of vehicles to train machine learning algorithms in parallel. Jetson use cases can be expanded well beyond embedded applications when deployed with PCIe-based fabric composable infrastructure technology, permitting 16x networking performance improvement over the embedded 1Gb Ethernet interface. 25-minute Talk Sumit Puri - CEO/Cofounder, Liqid Inc.
Favorite
S8543 - GPUs for Everyone: Why Optimize Windows 10 and Every Application with GRID With the switch to Windows 10, more applications are being developed with the assumption of a GPU being present. GPUs are in our desktops, laptops, tablets, and even in the mobile phones in our pockets. Why should VDI be any different? Come see how the University of Arkansas is giving everyone the fastest possible experience and opening doors to new ways of learning by serving up VDI desktops and applications with pervasive GPU access. When every app has GPU acceleration, the user experience is better than ever. 50-minute Talk Jon Kelley - Associate Director of Enterprise Innovation, University of Arkansas
Favorite
S8557 - Tricks, Tips, and Timings: The Data Movement Strategies You Need to Know Learn the latest strategies to efficiently move complicated data structures between GPUs and CPUs. We'll go beyond basic data movement, showing techniques that have been used in practice to port and optimize large-scale production applications. These include a look at the unique benefits of zero copy, how to set up a deep copy to avoid having to flatten data structures, and how this can be done in OpenMP 4. We'll cover both CUDA and directive approaches using examples written in modern Fortran and applicable in any language. 50-minute Talk David Appelhans - Research Staff Member, IBM
Favorite
S8569 - Flexible and Fast Machine Learning and Deep Learning with Alluxio With the exponentially-growing deluge of data today, data lakes are pooling everywhere. So, how can you harness them for critical insights and is there an easy way to tap into the multitude of different storage systems that they're stored in? Enter Alluxio, an agnostic and fast storage abstraction, which, when paired with deep learning and GPU-accelerated analytics yields a quick and easy way to harness the data. Join NVIDIA's Applied Solutions Engineering (ASE) team as they walk through how to use Alluxio for fun and profit. 25-minute Talk Yupeng Fu - Founding Member and Senior Architect, Alluxio
Michael Wendt - Manager, Applied Engineering Solutions, NVIDIA
Favorite
S8576 - Achieving Human Parity in Conversational Speech Recognition Using CNTK and a GPU Farm Microsoft's speech recognition research system has recently achieved a milestone by matching professional human transcribers in how accurately it transcribes natural conversations, as measured by government benchmark tasks. In this talk we will discuss the significance of the result, give a high-level overview of the deep learning and other machine learning techniques used, and detail the software techniques used. A key enabling factor was the use of CNTK, the Microsoft Cognitive Toolkit, which allowed us to train hundreds of acoustic models during development, using a farm of GPU servers and parallelized training. Model training was parallelized on GPU host machines, using 1-bit distributed stochastic gradient descent algorithm. LSTM acoustic and language model training takes advantage of CNTK's optimizations for recurrent models, such as operation fusion, dynamic unrolling, and automatic packing and padding of variable length sequences. We also give an overview of CNTK's functional API. 50-minute Talk Frank Seide - Principal Researcher, Microsoft
Andreas Stolcke - Principal Researcher, Microsoft
Favorite
S8597 - A Map of Knowledge: Using Behavioral Data in Higher-Ed to Surface Novel Semantic Structure and Personalized Guidance Personalized learning has been a promising but often elusive ideal sought after in education. We'll demonstrate the progress made with two concrete examples of personalized learning supports implemented at scale in a massive open online course (MOOC) and on the UC Berkeley campus in a collaboration with the Office of the Registrar. Both approaches employ long short-term memory to leverage a collaborative signal out of millions of historic learner actions. In the case of the MOOC, the next page a learner is expected to spend considerable time on is predicted and offered as a real-time suggestion. At the university, we consider sequences of millions of historic enrollments over the past eight years. These sequences of course identifiers, when modeled with representation learning approaches most commonly applied to natural language, reveal a tremendous degree of semantic relational information about the courses which can be visualized, reasoned about, and surfaced to students. Our course information platform uses this automatically inferred semantic information to help students navigate the university's offerings and provides personalized course suggestions based on topic preference. 50-minute Talk Zachary Pardos - Professor, UC Berkeley
Favorite
S8679 - Trade and Manage Wealth with Deep Reinforcement Learning and Memory We'll present how deep reinforcement learning (DRL) and memory extended networks can be used to train agents, which optimize asset allocations or propose trading actions. The memory component is crucial for improved mini-batch parallelization and helps mitigate catastrophic forgetting. We also address how concepts from risk-sensitive and safe reinforcement learning apply to improve the robustness of the learned policies. The DRL approach has several advantages over the industry standard approach, which is still based on the mean variance portfolio optimization. The most significant benefit is that the information bottleneck between the statistical return model and the portfolio optimizer is removed, and available market data and trade history are used much more efficiently. 50-minute Talk Daniel Egloff - Founder, Flink AI
Favorite
S8709 - Accelerating Molecular Modeling Tasks on Desktop and Pre-Exascale Supercomputers We'll showcase recent successes in the use of GPUs to accelerate challenging molecular simulation analysis tasks on the latest Volta-based Tesla V100 GPUs on both Intel and IBM/OpenPOWER hardware platforms, and with large scale runs on petascale computers such as ORNL Summit. We'll highlight the performance benefits obtained from die-stacked memory on Tesla V100, the NVLink interconnect on the IBM OpenPOWER platforms, and the use of advanced features of CUDA, Volta's new Tensor units, and just-in-time compilation to increase the performance of key analysis algorithms. We'll present results obtained with OpenACC parallel programming directives, current challenges, and future opportunities. Finally, we'll describe GPU-accelerated machine learning algorithms for tasks such as clustering of structures resulting from molecular dynamics simulations. 50-minute Talk John Stone - Senior Research Programmer, University of Illinois at Urbana Champaign
Favorite
S8732 - VACnet: Using Deep Learning to Combat Cheating in 'Counter-Strike: Global Offensive' We'll delve into the nuts and bolts of how Valve has utilized deep learning to combat cheating in "Counter-Strike: Global Offensive." We'll cover total system details, from the high-level server architecture to the low-level features fed into the AI. Deep learning has proven to be very effective at identifying cheating behavior without any client-side instrumentation, making it robust against malicious attack by cheaters and cheat vendors. By retraining regularly, the network continues to evolve, picking up new cheating behaviors within hours of their appearance. As a result of this approach, certain types of cheats have been reduced by a factor of 100. 50-minute Talk John McDonald - Programmer, Valve
Favorite
S8883 - Saccade-Driven Redirected Walking for Dynamic Room-Scale VR Redirected walking techniques can enhance the immersive capabilities of virtual reality (VR) navigation while maintaining visual-vestibular consistency. However they are often limited by the size, shape, and content of the physical rooms. We propose a redirected walking system that can improve VR mobility in small physical rooms. Our key contributions include the use of eye-tracking to perform redirection during saccadic blindness, and a dynamic path planning algorithm to guide the redirection. Our use of saccadic suppression significantly improves the opportunity to perform rotation and translation gains, and our path planning algorithm runs in real-time to avoid stationary as well as moving obstacles. Our system is applicable to large open virtual spaces and small physical rooms. 25-minute Talk Anjul Patney - Senior Research Scientist, NVIDIA
Qi Sun - Ph.D., Stony Brook University
Favorite
S8888 - Multi GPU Parallel Processing with NVLINK

Multi-GPU processing with the GP100 and NVLINK will be discussed using a hypervelocity impact problem. Multi-GPU processing has always been possible via the PCIe interface which means communication between GPUs is accomplished through the CPU. The NVLINK connection allows software to bypass this slower connection and allow for direct communication between GPUs to improve performance. An SPH solver, a particle based method, is used to solve the hypervelocity problem. The SPH solver does all calculations on the GPU so it is a perfect choice to compare performance between the various GPUs past and present. The results for single and multiple GPU simulations for K20, K40, P6000 and GP100 are presented.

25-minute Talk Wayne Mindle - Director of Sales & Marketing, CertaSIM, LLC
Favorite
S8937 - ORNL Summit: GPU Accelerated Performance of QMCPACK on Leadership-Class HPC Systems Using Cuda and Cublas QMCPACK is an open-source, massively parallel Quantum Monte-Carlo code enabling the accurate calculation of quantum many-body problems such as systems of atoms, molecules, and even solids. Here, we demonstrate the implementation of a rank-k matrix update scheme leading to increased compute density and performance improvements up to 1.5-fold compared to the current rank-1 update at every step. We compare performance results on Oak Ridge's next supercomputer, Summit, as well as its development precursor, SummitDev to the current machine, Titan. Based on detailed runtime traces we illustrate how speed-ups were achieved and give an outlook which future library features could be most beneficial to our application performance. 25-minute Talk Andreas Tillack - Distinguished Postdoctoral Research Associate, Oak Ridge National Laboratory
Favorite
S81003 - Faster than Real-Time Computing in Tsunami Early Warning Systems

When used as predictive tools in natural disasters such as tsunamis, numerical models require extremely fast computations. Just a few years ago, real-time computing in Tsunami Early Warning Systems (TEWS) was unthinkable. Nevertheless, the EDANYA Group has revolutionized tsunami science paradigms. With the goal of saving lives in the framework of TEWS, our group has developed Tsunami-HySEA, a GPU-based numerical model aimed at producing numerical simulations of tsunami events that are faster than ever. Based on highly efficient, robust mathematical algorithms, together with the computational power of NVIDIA GPUs, Tsunami-HySEA is able to simulate a tsunami event in only a few minutes. Nowadays, one of the main challenges in tsunami science is producing accurate assessments of tsunami wave impacts and just a few minutes after the generating earthquake is triggered. This timely prediction would save many lives in a tsunami scenario. When the response is needed only in a few minutes, the resulting scenario is challenging. The required characteristics are difficult to combine in a single numerical tool: robustness, low-dissipation, large domains, and an extremely fast response

25-minute Talk Jorge Macias - Associate Professor, EDANYA Group (University of Malaga)
Favorite
S8159 - SSD++ Boosting Performance of Single-Shot MultiBox Detection Using Convolution Autoencoders We'll showcase how you can apply a wealth of unlabeled image data to significantly improve accuracy and speed of single-shot object-detection (SSD) techniques. Our approach, SSD++, advances the state-of-the-art of single shot multibox-based object detectors (such as SSD, YOLO) by employing a novel combination of convolution-deconvolution networks to learn robust feature maps, thus making use of unlabeled dataset, and the fresh approach to have confluence of convolution and deconvolution features to combine generic as well as semantically rich feature maps. As a result, SSD++ drastically reduces the requirement of labeled datasets, works on low-end GPUs, identifies small as well as large objects with high fidelity, and speeds up inference process by decreasing the requirement of default boxes. SSD++ achieves state-of-the-art results on PASCAL VOC and MS COCO datasets. Through ablation study, we'll explain the effectiveness of different components of our architecture that help us achieve improved accuracy on the above datasets. We'll further show a case study of SSD++ to identify shoppable objects in fashion, home decor, and food industry from images in the wild. 25-minute Talk Vijay Gabale - Co-founder and CTO, Huew
Favorite
S8270 - Acceleration of a LLNL Production Fortran Application on SIERRA Supercomputer The U.S. Department of Energy's (DOE) stockpile stewardship mission relies heavily on petascale simulations that have traditionally run on homogeneous architecture supercomputers. The DOE and Lawrence Livermore National Lab's newest computer, SIERRA, which is scheduled to be the second most powerful supercomputer in the nation, is being installed and employs a heterogeneous architecture leveraging both IBM Power9 CPUs and NVIDIA Volta GPUs. This talk presents performance results for Teton, a mission-critical radiative transport application, as it is re-engineered to leverage heterogeneous computing platforms. The data structure and algorithm optimizations necessary to increase thread level parallelism 1,000 times and achieve GPU, CPU, and network concurrency will also be discussed. 25-minute Talk Aaron Black - Computer Scientist, Lawrence Livermore National Laboratory
Favorite
S8384 - Datasets and Algorithms for Road Identification Via Satellite Imagery Road identification and route prediction in near real time remains a challenging problem for many geographic regions, particularly in the case of natural disasters or crisis situations. Existing methods such as manual road labeling or aggregation of mobile GPS track data are currently insufficient in dynamic scenarios. The frequent revisits of satellite imaging constellations may accelerate efforts to rapidly update road network and optimal path prediction, provided routing information can be extracted from imaging pixels. We'll demonstrate deep learning segmentation methods for identifying road center lines and intersections from satellite imagery, and inferring networks from these road segments. We'll also explore data quality requirements by comparing open source labels with-high precision labels created as part of the SpaceNet Roads challenge. 25-minute Talk Adam Van Etten - Senior Research Scientist, In-Q-Tel
Favorite
S8438 - Disrupting 3D Design - GPU Based Real-Time Simulation for Rapid Concepting Join us for an exciting presentation that will unveil the latest use of GPU technology that aids in real-time engineering simulations. You will see a new technology, called ANSYS Discovery Live, that provides instant, invaluable feedback that promotes engineering designs more optimized and better understood than previously possible. Rather than engineers consume time with non value added tasks, they can turn the design process into an interactive, educational experience. The marrying of simulation technology with the technological advances of NVIDIA graphics are fundamentally changing the way products are designed and developed. The possibilities are endless with this technology. 25-minute Talk Justin Hendrickson - Director of Product Development, ANSYS
Favorite
S8568 - Supporting a DGX Air-Gapped Production Environments This tutorial will cover the issues encountered when deploying NVIDIA DGX-1/DGXStation into secure environment. For security reasons, some installations require that systems be isolated from the internet or outside networks. Since most DGX-1 software updates are accomplished through an over-the-network process with NVIDIA servers, this session will walk the participants through how updates can be made by maintaining an intermediary server. This session will be a combination of lecture, live demos and along with detailed instructions. 80 Minutes Tutorial Sumit Kumar - Solutions Architect, NVIDIA
Jeffrey Weiss - Director, Solution Architects, NVIDIA
Favorite
S8582 - Embodied Question Answering Building intelligent agents that possess the ability to perceive the rich visual environment around us, communicate this understanding in natural language to humans and other agents, and execute actions in a physical environment, has been a long-term goal of Artificial Intelligence. In this talk, I will present my recent work on an instantiation of this goal -- Embodied Question Answering (EQA) -- where an agent that is spawned at a random location in an environment (a house or building) is asked a natural language question ("What color is the car?"). The agent perceives its environment through first-person vision and can perform a few 'atomic' actions: move-{forward, backward, right, left}, and turn-{right, left}. The objective of the agent is to explore the environment and gather visual information necessary to answer the question ("orange"). I'll introduce our OpenGL-based environments, a large-scale dataset of expert demonstrations for this task and deep models, trained end-to-end using reinforcement learning, from raw pixels to multi-step navigation control to visual question answering. 25-minute Talk Abhishek Das - PhD Student, Georgia Tech
Abhishek Das - PhD Student, Georgia Tech
Favorite
S8696 - Audio Recognition, Context-Awareness, and its Applications

We'll explain the concept and the importance of audio recognition, which aims to understand literally all the information contained in the audio, not limiting its scope to speech recognition. It includes the introduction of various types of non-verbal information contained in the audio such as acoustic scenes/events, speech, and music. This session is helpful to the people who are not familiar with audio processing but are interested in the context-aware system. Also, it might be inspiring for someone who develops AI applications such as AI home assistant, a humanoid robot, and self-driving cars. It also covers the potential use-cases and creative applications, including a video demonstration of the audio context-aware system applied to media-art performance for real-time music generation.

25-minute Talk Yoonchang Han - CEO, cochlear.ai
Favorite
S8933 - Design Creativity empowered by living Immersive Experiences with DASSAULT Systems and NVIDIA solutions

Our new and upcoming solutions provide a paradigm shift in Design, with natively built-in VR immersive experiences. These experiences happen directly within the 3D Design Environment of DASSAULT Systems CATIA. Designers and Engineers can now access a new level of creativity by combining creative tools for Sketching, 3D Modeling, CAD and Simulation with Virtual Reality.  We'll present how easy it is to be immersed in your Design by yourself or even as a team. This will bring Designers and Engineers a major step forward in the design validation and collaborative decision workflow. We'll cover how CATIA Design solutions on the 3DEXPERIENCE platform use the latest technologies from NVIDIA for VR Immersive experiences to create, collaborate, and do 3D product design on native and massive models.

25-minute Talk Stephan Ritz - CATIA Design, Product Experience Roles Portfolio Director, Dassault Systemes
Favorite
CE8113 - Connect with the Experts: Full-Stack GPU Computing with Julia

Learn how the Julia programming language can be used for GPU programming, both for (1) low-level kernel programming, and (2) high-level array and AI libraries. This full-stack support drastically simplifies code bases, and GPU programmers can take advantage of all of Julia's most powerful features: generic programming, n-dimensional kernels, higher order functions and custom numeric types. The session will overview the compiler's implementation and performance characteristics via the Rodinia benchmark suite. We'll show how these techniques enable highly flexible AI libraries with state-of-the-art performance, and allow a major government user to run highly computational threat modelling on terabytes of data in real time.

Connect directly with NVIDIA engineers and experts from other organizations on specific topics. Come on in and ask a question.

1 Hour Connect with the Experts Mike Innes - Software Engineer, Julia Computing
Tim Besard - PhD Student, Ghent University
Favorite
CE8121 - Connect with the Experts: Unified Memory and Future Memory Technologies

Learn about modern memory management techniques on heterogeneous systems. Whether you are new to GPU programming and trying to get your application running on the GPU as quickly as possible, or a CUDA veteran exploring new ways to manage memory efficiently and stressing the limits of hardware this session is for you! Ask us any questions you have on Unified Memory: page faults, migrations, prefetching, memory oversubscription, interaction with OS and anything else that comes to your mind. Share your use cases and feedback so we can improve our hardware and software to make the GPU developer's life better in future.

Connect directly with NVIDIA engineers and experts from other organizations on specific topics. Come on in and ask a question.

1 Hour Connect with the Experts Jiri Kraus - Senior Devtech Compute, NVIDIA
Nikolay Sakharnykh - Sr. Developer Technology Engineer, NVIDIA
Favorite
CE8149 - Connect with the Experts: Deep Learning Basics (2)

Attend this session to get your questions on deep learning basics and concepts answered. NVIDIA experts can help you with the fundamentals and provide guidance on how and when to apply Deep Learning and GPUs to your work. No question is too basic to ask.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject.  

1 Hour Connect with the Experts Robert Crovella, NVIDIA
Hans Mortensen - Sr. Solutions Architect, NVIDIA
Favorite
S81033 - Accelerating Cancer Research with Deep Learning

The Department of Energy (DOE) entered into a partnership with the National Cancer Institute (NCI) of the National Institutes of Health (NIH) to accelerate cancer research. This "Cancer Moonshot" aims to tackle three main objectives: better understand the mechanisms of cancer, use large amounts of diverse medical data for predictive models, and enable precision medicine by providing guidance for treatment to individual patients. Leveraging the compute expertise of DOE in high performance computing (HPC) and new methods for deep learning in artificial intelligence, this HPC+AI approach aims to create a single scalable deep neural network code called CANDLE (CANcer Distributed Learning Environment) that will be used to address all three challenges. This talk aims to give an overview of the project and highlight how GPU accelerated systems in the DOE ecosystem, Summit and Sierra, have contributed to the project.

50-minute Talk Fernanda Foertter - GPU Developer Advocate (Healthcare, HPC + AI), NVIDIA
Favorite
S8130 - Building a GPU-Accelerated Short-Read Aligner for Bisulfite-Treated DNA Sequences

It is not always easy to accelerate a complex serial algorithm with CUDA parallelization. A case in point is that of aligning bisulfite-treated DNA (bsDNA) sequences to a reference genome. A simple CUDA adaptation of a CPU-based implementation can improve the speed of this particular kind of sequence alignment, but it's possible to achieve order-of-magnitude improvements in throughput by organizing the implementation so as to ensure that the most compute-intensive parts of the algorithm execute on GPU threads.

25-minute Talk Richard Wilton - Associate Research Scientist, Johns Hopkins University
Favorite
S8138 - AI and Deep Learning in R

We'll discuss use cases for machine learning on GPUs and how to implement them easily in the R programming language by walking through the ideas behind several modern techniques, including penalized regression, boosted trees, and deep nets. Along with introducing the concepts briefly cover, we'll discuss some of the math behind the models and look at code examples to run the models on GPUs in R.

50-minute Talk Jared Lander - Chief Data Scientist, Lander Analytics
Favorite
S8224 - Deep Learning at BMW: Robust AI in the Production Chain

We'll present new solutions in deploying robust and efficient deep learning approaches in production chain and logistics. For process automation in industrial environments, highly accurate and error-resistant models are needed. To reach this level of confidence for error-prone tasks, such as object detection or pose estimation, we developed an efficient NVIDIA-based development pipeline: From tools for fast (semi-automated) data labeling over concurrent detection models, to deep reinforcement solutions for adaptive online learning.

50-minute Talk Jimmy Nassif - Project Manager, BMW AG
Norman Müller - AI Specialist, BMW Group
Favorite
S8350 - Realtime Signal Processing on NVIDIA TX2 using CUDA In our presentation, we will focus on low latency real-time signal processing on NVIDIA Jetson TX2. Originally designed for image processing, the Jetson TX2 incorporates a vast amount of embedded GPU processing power. However, it has not been widely used for signal processing so far. There are two main challenges that have to be addressed: Constantly high input and output data rate of arbitrary digital signals, as well as a very short and deterministic latency requirement (processing time and data transfer time). Using the example of multichannel digital audio processing, we will look at details of CUDA kernel programming, which is a precondition for uninterrupted signal processing. Moreover, we explain efficient data I/O transfer to Jetson TX2 GPU memory, synchronization between CPU and GPU as well as update mechanisms for control data. 25-minute Talk Armin Weiss - Researcher, Zurich University of Applied Sciences
Favorite
S8515 - Studying Autonomous Driving Corner Cases, Powered by the Jetson TX Series Learn how to train neural networks for all-purpose autonomous driving, without the need for highly structured streets or intelligent infrastructure. A methodology is presented and examples will be shown, where we used RC model cars for training autonomous driving in chaotic real-life scenarios. The fleet of autonomous cars, based on NVIDIA Jetson systems, will be presented in detail. Results of different learning approaches, using convoluted neural nets, are explained and different ways of interpreting the performance will be discussed. We will also show visualization techniques for better understanding of the neural network. 50-minute Talk Baladitya Yellapragada - Ph.D. Student UC Berkeley, International Computer Science Institute - UC Berkeley
Sascha Hornauer - Visiting Research Scholar, International Computer Science Institute - UC Berkeley
Favorite
S8546 - Deploying Containerized Distributed GPU Applications at Scale

We'll demonstrate how to seamlessly deploy containerized GPU-enabled distributed applications on a cluster. We'll review a number of existing solutions that address resource management and containerization before arriving to our proposed system architecture: a one-stop solution addressing GPU allocation, containerization, deployment, and scheduling. Our solution can be deployed on premises or on the cloud and accelerate your workflows while easing the burden of deploying applications across a cluster. Examples will be given, including distributed TensorFlow applications and hyperparameter optimization.

50-minute Talk Thuc Tran - Machine Learning Engineer, Capital One Financial
Athanassios Kintsakis - Machine Learning Engineer, Capital One Financial
Favorite
S8577 - Discover Orders in Unordered Datasets: Generative Markov Networks In this work, we argue that for any dataset even without explicit orders, there exists implicit orders/relationships for the data. Aiming at finding these orders and relationships, we introduce novel generative markov networks (GMNs) that considers a Markov Chain data generation process. To make the learning of transition operator tractable and flexible, we utilize neural networks as smooth function approximators. Moreover, we propose a batch-wise permutation training regime to ensure an ergodic training process for the Markov Chain. We'll show that GMNs are able to discover orders and relationships in datasets, and can also perform well on benchmark one-shot recognition task. 25-minute Talk Yao-Hung Tsai - Ph.D. Student, Carnegie Mellon University
Favorite
S8609 - Recent Advances in Neural Machine Translation: Multilingual, Non-Parametric to Unsupervised Neural Machine Translation We'll describe the latest advances in neural machine translation from three different perspectives. We'll start with character-level, multilingual neural machine translation, which aims at harnessing positive language transfer among multiple languages to improve the translation quality and the robustness of such a multilingual translation model to intra-sentence code-switching and typos. We'll then discuss the recent research on exploiting data beside oft-used parallel corpora. We'll discuss how another modality, such as vision, can be used to enable zero-resource machine translation, and how purely unsupervised neural machine translation can be done by exploiting the similarity between language distributions of two languages. Finally, we'll discuss a recent trend of retrieval-based approaches to deep learning with a specific example of non-parametric neural machine translation. 50-minute Talk Kyunghyun Cho - Assistant Professor, New York University
Favorite
S8621 - Deploying, Profiling, and Optimizing Distributed TensorFlow in Production with GPUs Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, we'll demonstrate how to optimize, profile, and deploy TensorFlow models in GPU-based production environments. We'll cover many demos based on open source tools. You can completely reproduce all demos through Docker on your own GPU cluster. See http://pipeline.ai for links to the GitHub Repo. 50-minute Talk Chris Fregly - Founder & Research Engineer, PipelineAI
Favorite
S8632 - Innovations with Blast Extreme and Horizon with NVIDIA GRID This is a panel to talk about innovation with VMware Blast Extreme Protocol and Horizon software offering along with NVIDIA GRID. We will focus on providing IT administrators with tools and technologies to help deliver the best experience for virtualized 3D graphics applications in their environments. 50-minute Talk Kiran Rao - Director, Product Management, VMware Inc
Cory Smith - CIO/CTO, City of Davenport, Iowa
Luke Wignall - Senior Manager, Pro Viz Performance Engineering & Technical Marketing, NVIDIA
Favorite
S8682 - Defect Inspection from Scratch to Production In order to fulfill customer's requirement, companies have to guarantee the quality of delivered products, which can often be achieved only by manually inspection of the finished product. Since human-based defect inspection and classification are time-consuming and the results vary by individuals, automatic defect detection and classification has the potential to reduce the cost of quality assurance significantly. In this talk, we will demonstrate how to utilize deep learning algorithms, i.e., Fully Convolutional Neural Network to build a general defect inspection and classification model. We will also share experiences on how to effectively collect labelling data, deal with imbalance data, and also how to optimize the model in terms of latency and throughput with TensorRT before deploy the model to the production line. 50-minute Talk Sheng-Ting Shen - Solution Architect, NVIDIA
Kuan-Liang (Andrew) Liu - Solution Architect, NVIDIA
Favorite
S8740 - Democratizing Deep Learning with Unity ML-Agents

Unity ML-Agents is an open source AI toolkit that enables machine learning developers and researchers to train agents in realistic, complex scenarios with decreased technical barriers. ML-Agents offers a flexible way to use the Unity Editor and Engine to develop and test new AI algorithms quickly and efficiently across games, robotics, and beyond. We'll walk you through new learning methods that are bundled with the latest version of Unity ML-Agents. This includes, (1) Imitation Learning: Train agents to mimic human behavior. (2) Multi-agent Reinforcement Learning: Train multiple agents together to fulfill cooperative, competitive, and general tasks. We'll showcase these new learning methods in some interesting training scenarios with real game examples.

50-minute Talk Arthur Juliani - Senior Machine Learning Engineer, Unity Technologies
Favorite
S8744 - The Future of Real-Time: Experience Design Epic Games presents a panel discussion with partners who are using Unreal Engine to bring real-time, high-fidelity interactive experiences to their customers. From product design and visualization to virtual production, photorealism, and final pixels, content creators are uncovering the power of Unreal Engine. Hear how Unreal Engine customers are applying game engine technology to revolutionize the conventions of architectural design, automotive, aerospace, and product design, and the future of customer engagement. 50-minute Panel Leighton Carr - Research Program Lead, Boeing
Shital Shah - Washington, Microsoft
Owen Coffee - Associate Principal, HKS
Ashley Micks - Technical Specialist, Ford
Marc Petit - General Manager, Unreal Engine, Epic Games
Favorite
S8837 - OpenCL at NVIDIA - Recent Improvements and Future Plans Learn about recent improvements in OpenCL on NVIDIA platforms. We will share our learnings and touch upon improvements we recently made to OpenCL on NVIDIA platforms. In particular, we talk about our efforts to improve and smoothen data-transfer performance. We also talk about improvements in our memory allocation extension we introduced last year. We will also talk about our future plans for OpenCL. 50-minute Talk Nikhil Joshi - Engineering Manager, OpenCL Driver, NVIDIA
Favorite
S8892 - Machine Learning in Precision Medicine: Quantitative Medical Imaging, Artificial Intelligence, GPU Efficiency

Machine Learning in Precision Medicine: Patient-Specific Treatment Enabled by Quantitative Medical Imaging, Artificial Intelligence, and GPU Efficiency The attendees will learn about the need for and use of machine learning in today's patient-centered healthcare. The talk will focus on general approaches requiring machine learning to obtain image-based quantitative features, reach patient diagnoses, predict disease outcomes, and identify proper precision-treatment strategies. While the presented methods are general in nature, examples from cardiovascular disease management will be used to demonstrate the need for and power of machine learning enabled by the performance advantages of GPU computation.

25-minute Talk Milan Sonka - Professor, University of Iowa
Favorite
S8900 - How Microservices and Serverless Computing Enable the Next Generation of Machine Intelligence We'll discuss why AI and machine learning are a natural fit for serverless computing and a general architecture for scalable and serverless machine learning in production. We'll discuss issues encountered during implementing our own on-demand scaling over GPU clusters, show how these apply to more general solutions, and present one possible vision for the future of cloud-based machine learning. 50-minute Talk Diego Oppenheimer - CEO, Algorithmia
Favorite
S8995 - What it Takes to Drive Autonomously on Chinese roads

Pony.ai will share the key technological milestones it has achieved in the past several months of road testing in China, including the company's soft launch of China's first-ever autonomous vehicle robotaxi service. CEO James Peng will share the unique challenges posed by a Chinese road environment and how we leveraged deep learning and computational models to conquer those challenges. Pony.ai's mission is to build the safest and most reliable L4 autonomous driving technology. The startup was founded at the end of 2016 and is co-located in the heart of Silicon Valley and China.

25-minute Talk Yiming Liu - Infrastructure Lead, Pony.ai
Favorite
S8128 - Image-Domain Gridding on Accelerators We will present our latest results on Image Domain Gridding, an algorithm for radio astronomical imaging. This algorithm outperforms the state of the art in traditional imaging algorithms both in terms of image quality (by applying more corrections) and performance. In this talk, we will first introduce the algorithm and then demonstrate that this algorithm works very well on highly parallel accelerators. We will show the in-depth performance analysis and optimization techniques that we applied to get there. 25-minute Talk Bram Veenboer - PhD Researcher, ASTRON
Favorite
S8258 - Deep Learning Autonomous Driving Simulation Realistic automotive simulation platforms, where virtual cars travel virtual roads in virtual cities in remarkably true-to-life conditions, will be a vital part of developing and testing autonomous vehicles. The technology behind the Cognata simulation engine is heavily based on deep learning, computer vision, and other advanced AI methods. We'll present a cloud-based simulation engine, and discuss how it works and how to develop with it. 25-minute Talk Danny Atsmon - CEO, Cognata
Favorite
S8471 - From Bits to Bedside: Translating Large-Scale Routine Clinical Datasets into Precision Mammography

We'll demonstrate how to use deep learning (DL) approaches to translate big data from routine clinical care into medical innovation that directly improves routine clinical care. Typically, large healthcare institutions have sufficient quantities of clinical data to facilitate precision medicine through a DL paradigm. However, this clinical data is hardly translated into direct clinical innovation because computer algorithms cannot readily ingest or reason over it. Using routine mammographic screening data for breast cancer as an example, we first downloaded over 30,000 free text pathology reports and used long short term memory DL algorithms to infer cancer outcomes for individual patients. We then labeled over 700,000 mammographic views of breast imaging with our inferred pathology outcomes. Finally, we trained convolutional neural network DL algorithms to directly predict pathology outcomes from breast imaging. With our approach, we demonstrate how to leverage DL to realize precision oncology and significantly improve the interpretation of routine screening mammography for millions of women using routine clinical big data.

25-minute Talk Dexter Hadley - Assistant Professor, UCSF
Favorite
SE0001 - Posters & Beer Reception

Check out over 150 research posters and mingle, beverage in hand, with their brilliant authors at the Monday reception (18:00-20:00). See how big ideas are accelerated through the power of GPUs. 

Special Event - 2 h Special Event
Favorite
SE0016 - Dinner with Strangers (Mon)

Join a random group of GTC attendees for enlightening conversations over a self-hosted dinner in great restaurants nearby. Less creepy than it sounds, this is one of the more popular programs at GTC.

​Sign up on Concourse.

Special Event - 2 h Special Event
Favorite
S8885 - Opening Keynote

The 2018 GTC opening keynote is delivered by the NVIDIA Founder and CEO, Jensen Huang, speaking on the future of computing.

2 Hour Keynote Jen-Hsun Huang - Founder & CEO, NVIDIA
Favorite
SE0003 - Lunch (Tue/Wed/Thu)

Lunch will be served in the South Hall.

Special Event - 2 h Special Event
More Times Favorite
CE8140 - Connect with the Experts: Deep Libraries for Training - cuDNN, cuBLAS

In this session, we will discuss the CUDA libraries foundational for training on GPUs. Learn about the latest and upcoming features. Talk to NVIDIA experts about your use case and to discuss latest developments.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Philippe Vandermersch, NVIDIA
Mostafa Hagog, NVIDIA
Yang Xu, NVIDIA
Khairul Kabir, NVIDIA
Seth Walters, NVIDIA
Slawomir Stepniewski, NVIDIA
Kevin Vincent, NVIDIA
Favorite
CE8146 - Connect with the Experts: OpenACC - Quick On-ramp to GPUs (2)

This session is designed for anyone who is either looking to start with GPUs or already accelerating their code with OpenACC on GPUs or CPUs. Join OpenACC experts and your fellow OpenACC developers to get an expert advice, discuss your code and learn how OpenACC Directives are used by others.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Sunita Chandrasekaran - Assistant Professor, Department of Computer & Information Sciences, University of Delaware
Michael Wolfe - Compiler Engineer, NVIDIA
Guido Juckeland - Head of Computational Science Group, Helmholtz-Zentrum Dresden-Rossendorf
Randy Allen - Director, Mentor Graphics
Robert Henschel - Director Science Community Tools, Indiana University
Robert Crovella, NVIDIA
Jeffrey Larkin - Senior DevTech Software Engineer, NVIDIA
Favorite
CE8150 - Connect with the Experts: Deep Learning Frameworks for Training (2)

Attend this session to get you questions on deep learning frameworks answered. Learn more about widely used Deep Learning Frameworks such as Caffe, Theano, Torch, TensorFlow, CNTK, and MXNet and let NVIDIA experts can help you with choosing the right framework for your research or project.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject.  

1 Hour Connect with the Experts Cho Che Cheng, NVIDIA
John Woolley, NVIDIA
Sami Kama, NVIDIA
Andrei Ivanov, NVIDIA
Pooya Davoodi, NVIDIA
Jie Jiang, NVIDIA
Przemyslaw Tredak - Senior Deep Learning Engineer, NVIDIA
Michael O'Connor - Director, NVIDIA
Favorite
L8120 - Building a Reinforcement Learning Agent in Starcraft 2

In this lab you will train a deep reinforcement learning agent to play Starcraft 2. The agent will train in Blizzard's Starcraft 2 Machine Learning environment and use DeepMind's Starcraft 2 Learning Environment (SC2LE) to communicate with the game engine in Python. You will implement deep reinforcement learning algorithms and test their effectiveness in minigames of increasing difficulty. SC2LE is designed so that AI will have to play similarly to a human. The AI can only see the units that are in its field of view and has to input commands at a rate that is comparable to human play. With AlphaGo recently beating the world's best Go players, Starcraft 2 is the next great challenge in AI and reinforcement learning.

120 Minutes Instructor-Led Lab Miroslav Enev - Sr. Solution Architect Deep Learning, NVIDIA
Eric Harper - Solutions Architect Deep Learning, NVIDIA
Favorite
L8124 - Text-to-Image Using Deep Learning Generative adversarial networks (GANs) have been proven as a powerful approach to generating landscapes, transferring style from one photo to another, and generating an image from a simple sentence. In this lab, participants will learn how to use GANs to perform these three tasks in a popular deep learning framework. 120 Minutes Instructor-Led Lab Myrieme Demouth - Solutions Architect, NVIDIA
Allison Seidel - Solutions Architect, NVIDIA
Favorite
L8141 - Object Detection with DIGITS Prerequisites: 'Image Classification with DIGITS'

Duration: 2 hours

Framework: Caffe with DIGITS interface

Many problems have established deep learning solutions, but sometimes the problem that you want to solve does not. Learn to create custom solutions through the challenge of detecting whale faces from aerial images by:

• Combining traditional computer vision with deep learning

• Performing minor "brain surgery" on an existing neural network using the deep learning framework Caffe

• Harnessing the knowledge of the deep learning community by identifying and using a purpose-built network and end-to-end labeled data

Upon completion of this lab, you'll be able to solve custom problems with deep learning.

Presented by the NVIDIA Deep Learning Institute (DLI).
120 Minutes Instructor-Led Lab Griffin Lacey, NVIDIA
Favorite
L8160 - Genomics: Using Deep Learning to Massively Accelerate the Accurate Identification of Genetic Variants Identifying genetic changes (variants) is the first step in making genomics useful for healthcare and life sciences. Determining what the changes mean (annotation) and whether those changes are harmful (pathogenic) is the critical step. Attendees will apply deep learning methods to genomic data to identify pathogenic variants with high accuracy and speed. They will perform data reduction and train their own models. By varying parameters in the model, DANN (Deleterious Annotation of Genetic Variants Using Neural Networks) run on Nvidia's GPC, they will create an ROC to compare with variant annotators from prior scientific publications. Attendees will complete the lab understanding how deep learning improves genomic data analysis using the GPU platform. 120 Minutes Instructor-Led Lab Chris Yoo - Chairman and CEO, Systems Imagination Inc
David Schneider - Director of Knowledge Engineering, Systems Imagination
Kendyl Douglas - Applied Mathematician, Systems Imagination
Margaret Linan - Deep Learning Bioinformatics Scientist, Systems Imagination
Favorite
L8175 - Training Semantic Segmentation for DRIVE PX The level of accuracy needed for urban driving is different than highway driving due to the density of different objects in a given scene. Using CamVid dataset, this lab will go through all the steps required to do semantic segmentation given the computation capabilities of DRIVE PX2. You'll learn how to: • Convert an existing network into a fully convolutional network • Explore different design choices to fit into the computation budget • Train a semantic segmentation neural network Upon completion, you'll be able to use create and train a fully convolutional network for semantic segmentation tasks in self-driving cars. Prerequisites: Fundamentals or equivalent background/experience 120 Minutes Instructor-Led Lab Aaraadhya Narra - Solutions Architect, DLI Certified Instructor, NVIDIA
Favorite
S81004 - The Early Detection of Pancreatic Cancer Using Deep Learning: Preliminary Observations

This talk will present the challenges and opportunities in developing a deep learning program for use in medical imaging. It will present a hands on approach to the challenges that need to be overcome and the need for a multidisciplinary approach to help define the problems and potential solutions. The role of highly curated data for training the algorithms and the challenges in creating such datasets is addressed. The annotation of data becomes a key point in training and testing the algorithms. The role of experts in computer vision, and radiology will be addressed and how this project can prove to be a roadmap for others planning collaborative efforts will be addressed Finally I will discuss the early results of the Felix project whose goal is nothing short of the early detection of pancreatic cancer to help improve detection and ultimately improve patient outcomes.

50-minute Talk Elliot Fishman - Professor of Radiology, Surgery, Oncology and Urology, Johns Hopkins Hospital
Favorite
S81047 - Introduction to Deep Stream SDK

Introduction to high performance deep learning inference for video analytics. NVIDIA DeepStreamSDK simplifies the development of scalable intelligent video analytics (IVA) applications powered by deep learning for smart cities and hyperscale datacenters. 

25-minute Talk Kaustubh Purandare, NVIDIA
Favorite
S81048 - IBM PowerAI: Realizing Business Value with Machine Learning (Presented by IBM)

There is no shortage of hype around AI, but realizing value through machine and deep learning comes with its challenges. IBM PowerAI removes the inhibitors across each stage of a workflow, allowing enterprises to rapidly realize business value with AI.

50-minute Talk Adel El-Hallak - Director of Product Management Machine & Deel Learning for IBM Cognitive Systems, IBM
Favorite
S8135 - Programming GPU-based Extreme-Scale HPC Systems: OpenSHMEM and SharP This talk will introduce two programming models OpenSHMEM and SharP to address the programming challenges of HPC systems with multiple GPUs per node, high-performing network, and huge amount of hierarchical heterogeneous memory. SharP uses distributed data-structure approach to abstract the memory and provide uniform interfaces for data abstractions, locality, sharing and resiliency across these memory systems. OpenSHMEM is a well-established library based PGAS programming model for programming HPC systems. We show how NVSHMEM, an implementation of OpenSHMEM, can enable communication in the CUDA kernels and realize OpenSHMEM model for GPU-based HPC systems. These two complementary programming models provide ability to program emerging architectures for Big-Compute and Big-Data applications. After the introduction, we will present experimental results for a wide-variety of applications including QMCPack, HPGMG, CoMD, and Memcached demonstrating the programming model advantages. 50-minute Talk Sreeram Potluri - Senior Software Engineer, NVIDIA
Manjunath Gorentla Venkata - Research Scientist, Oak Ridge National Laboratory
Favorite
S8219 - GUNREAL: GPU-Accelerated Unsupervised Reinforcement and Auxiliary Learning We'll introduce GPU-accelerated unsupervised reinforcement and auxiliary learning (UNREAL) algorithm. Recent state-of-the-art deep reinforcement learning algorithms, such as A3C and UNREAL, are designed to train on a single device with only CPUs. Using GPU acceleration for these algorithms results in low GPU utilization, which means the full performance of the GPU is not reached. Motivated by the architecture changes made by the GA3C algorithm, which gave A3C better GPU acceleration, together with the high learning efficiency of the UNREAL algorithm, we extend GA3C with the auxiliary tasks from UNREAL to create GUNREAL. We show that our GUNREAL system finished training faster than UNREAL and reached higher scores than GA3C. 25-minute Talk Koichi Shirahata - Researcher, Fujitsu Laboratories Ltd.
Favorite
S8266 - AstroAccelerate - GPU-Accelerated Signal Processing for Next Generation Radio Telescopes AstroAccelerate is a GPU-enabled software package that focuses on enabling real-time processing of time-domain radio-astronomy data. It uses the CUDA programming language for NVIDIA GPUs. The massive computational power of modern day GPUs allows the code to perform algorithms such as de-dispersion, single pulse searching, and Fourier domain acceleration searching in real time on very large datasets, which are comparable to those that will be produced by next-generation radio telescopes such as the Square Kilometre Array. 50-minute Talk Wes Armour - Director, University of Oxford
Favorite
S8278 - CUDA - New Features and Beyond

CUDA is NVIDIA's parallel computing platform and programming model. You'll learn about new programming model enhancements and performance improvements in the latest release of CUDA, preview upcoming GPU programming technology, and gain insight into the philosophy driving the development of CUDA and how it will take advantage of current and future GPUs. You'll also learn about NVIDIA's vision for CUDA and the challenges for the future of parallel software development.

50-minute Talk Stephen Jones - Software Architect, NVIDIA
Favorite
S8368 - Containerizing Deep Learning with Singularity We'll talk about how to use Singularity to containerize deep learning applications. We'll provide compelling reasons to choose Singularity over Docker. We'll cover deep learning frameworks, including TensorFlow, NV-Caffe, MXNet, and others. We'll present the current challenges and workarounds when using Singularity in a HPC cluster. We'll compare the performance of Singularity to bare-metal systems. 25-minute Talk Nishanth Dandapanthula - Engineering Manager, HPC and Deep Learning Solutions, Dell EMC
Favorite
S8376 - Compute-Enabled Efficiency: Technology Adapted AEC Workflows for Data Capture, Analysis & Presentation Learn how and why the architecture, engineering, and construction (AEC) community is using technology (e.g. drones, photogrammetry) to enable efficiency in data capture, analysis, and presentation workflows. Market drivers are pointing the AEC community toward compute enabled efficiencies, such as AI powered automation, to address labor-capacity and cost-competition challenges. New data capture techniques are the normal for efficiently capturing accurate data in support of activities across the full project life-cycle. Improved drone processes allows for geo-referenced data. GPUs provide post processing capability enabling this workflow. 50-minute Talk Bill Dale - Director, Applied Technology, Jacobs
Chris Torres - VDC Director, Emerging Technologies, Jacobs
Favorite
S8393 - CatBoost: Fast Open-Source Gradient Boosting Library For GPU Learn how to use GPUs to accelerate gradient boosting on decision trees. We'll discuss CUDA implementation of CatBoost — an open-source library that successfully handles categorical features and shows better quality compared to other open-source gradient boosted decision trees libraries. We'll provide a brief overview of problems which could be solved with CatBoost. Then, we'll discuss challenges and key optimizations in the most significant computation blocks. We'll describe how one can efficiently build histograms in shared memory to construct decision trees and how to avoid atomic operation during this step. We'll provide benchmarks that shows that our GPU implementation is five to 40 times faster compared to Intel server CPUs. We'll also provide performance comparison against GPU implementations of gradient boosting in other open-source libraries. 25-minute Talk Vasily Ershov - Software Developer, Yandex
Favorite
S8502 - GOAI One Year Later

This talk will discuss the evolution of the GPU Open Analytics Initiative (GoAi) from its inception to today. GoAi, at its core, is a collection of libraries, frameworks, and APIs that lower the barrier of GPU adoption for data scientists. The goal of GoAi is to enable end to end data science workflows across many multi-GPU servers, to analyze and understand data more efficiently than ever before. To date, GoAi includes methods for performing SQL, machine learning, data processing or feature engineering, graph analytics, and graph visualization all on the GPU. This talk will discuss the who, what, when, where, and whys of GoAi, and its integration into the traditional big data world through leading open source projects like Apache Arrow and Apache Parquet. Finally, this talk will highlight major achievements of GoAi, our plans for the future, and how developers can become a part of this rapidly evolving ecosystem.

50-minute Talk Joshua Patterson - Director AI Infrastructure, NVIDIA
Favorite
S8754 - Deep Thinking: The Challenges of Deep Learning and GPU Acceleration of Financial Data

Accelerated analytics offer massive upside over conventional computing in the financial industry. Deep learning and AI accelerated analytics have many applications in finance, such as fraud detection, risk management, and loss forecasting. GPUs are leveraged to provide high-performance computing on a scalable platform for quantitative analysis of big data providing agile methods for ingesting data, performing automated data mining, and implementing robust deep learning architectures. Applying deep learning methods to complex financial data, we can exploit non-linear relationships in financial data that lead to critical risk events.

50-minute Talk Erind Brahimi - Quantitative Associate, Wells Fargo
Favorite
S8769 - Commoditizing GPU-as-a-Service Providers with Red Hat OpenShift Container Platform

Red Hat OpenShift Container Platform, with Kubernetes at it's core, can play an important role in building flexible hybrid cloud infrastructure. By abstracting infrastructure away from developers, workloads become portable across any cloud. With NVIDIA Volta GPUs now available in every public cloud [1], as well as from every computer maker, an abstraction library like OpenShift becomes even more valuable. Through demonstrations, this session will introduce you to declarative models for consuming GPUs via OpenShift, as well as the two-level scheduling decisions that provide fast placement and stability.

25-minute Talk Jeremy Eder - Senior Principal Performance Engineer, Red Hat
Andre Beausoleil - Senior Principal Partner Manager, Red Hat
Favorite
S8784 - Deep Generative Models for Image and Video Creation We'll focus on recent developments in deep learning-based generative models for image and video creation. The last two to three years have seen an explosive growth in the development of generative adversarial networks, variational autoencoders, and related autoregressive methods that have been made it possible to automatically generate images and videos, by harnessing the power of GPUs and deep learning libraries. These methods present interesting possibilities in automatic generation of datasets for training machine learning methods, as well as in real-world applications for image and video processing such as morphing, editing, advertising, design, and art. We'll present the technical details of these methods and recent results in various settings. 25-minute Talk Vineeth N Balasubramanian - Assistant Professor, Indian Institute of Technology (IIT), Hyderabad, India
Favorite
S8817 - PyTorch: A Fast and Flexible Deep Learning Framework (Presented by Facebook)

We'll discuss how to get started with PyTorch from the creator of the project, Soumith Chintala. PyTorch is a fast and flexible deep learning framework that has been called a 'breath of fresh air' by researchers and developers alike for its ease of use, flexibility, and similarity to python programming. It consists of an ndarray library that natively supports GPU execution (automatic differentiation engine that is flexible and fast), and an optimization package for gradient based optimization methods. 

25-minute Talk Soumith Chintala - AI Research Engineer, Facebook
Favorite
S8845 - NVIDIA Holodeck: The VR Design Lab of the Future

Bring your ideas to life with NVIDIA Holodeck, the world's first intelligent, photorealistic, and collaborative virtual reality platform. With Holodeck designers will be able to visualize large, highly detailed models and explore them in photo-real fidelity — in real-time. Design teams can collaborate on these complex models remotely to discover new ideas, streamline reviews, and minimize costly physical prototyping. Holodeck even promises to tap into AI to accelerate design workflows and complex simulations. Come hear the talk and then experience Holodeck demos in the VR Village!

25-minute Talk Zvi Greenstein - General Manager, VR, NVIDIA
Favorite
S8878 - Cinematic Lighting in Unreal Engine

Join Epic's Kim Libreri and Marcus Wassmer along with NVIDIA's Ignacio Llamas and Edward Liu as they provide an in-depth view of the creative and technical aspects of creating photo-realistic cinematic content that runs at real time.

80 Minutes Tutorial Marcus Wassmer - Rendering Team Lead, Epic Games
Ignacio Llamas - Senior Manager of Real Time Rendering Software, NVIDIA
Kim Libreri - CTO, Epic Games
Edward Liu - Senior Real Time Rendering Engineer, NVIDIA
Favorite
S8915 - AI at the Edge - Intelligent Machines

Artificial intelligence is impacting almost every part of the industrial and agricultural supply chain. From robots that quickly adapt to build new products, to automated vehicles that address last-mile challenges for product delivery, to UAVs that can automatically detect failing infrastructure, the world is transitioning from processes that are largely manual to ones that are largely automated. We'll discuss how AI and deep learning are enabling these advances. We'll also analyze a sampling of early successes across different applications. And finally we'll describe some of the remaining challenges to wide-scale deployment, and the work NVIDIA is doing to address those challenges via its Isaac initiative.

25-minute Talk Jesse Clayton - Senior Manager of Product Management for Intelligent Machines, NVIDIA
Favorite
S8952 - Rapid Pace of Change and Industry Progress We are still in the early stages of AI, and its impact on industries is already significant - from healthcare to financial services to retail. Businesses are seeing unprecedented levels of efficiencies and productivity, which will only continue to rise and transform how companies operate. This session will explore the progress of AI adoption over the last year, the industries that are leaping ahead, new AI innovations that will serve cross-industry concerns, and what businesses should expect in terms of adoption maturity in 2018. 50-minute Talk Nick Patience - Founder & Research Vice President, 451 Research
John Abbott - Founder & Research Vice President, 451 Research
Favorite
S8970 - Creating AI-Based Digital Companion for Mercedes-Benz Vehicles

In-vehicle user experience needs intelligence not only to delight its users with a truly personalized experience and to simplify repetitive actions but also to minimize cognitive load and to decrease distractions.

When driving becomes fully autonomous, vehicle needs to understand its users’ intent without getting explicit directions from them. To achieve such experience, customers’ behavior and interactions are analyzed in real-time to understand their intent and to predict what they will do next.

25-minute Talk Rigel Smiroldo - Principal Engineer, Machine Learning & Predictive UX, Mercedes-Benz Research & Development North America Inc.
Favorite
S81005 - Developing and Deploying Software for Robotics

Nvidia is launching a new effort designed to empower the deployment of Robotics.

25-minute Talk Claire Delaunay - VP of engineering, NVIDIA
Favorite
S8581 - Object-Level Deep Reinforcement Learning We'll show how deep reinforcement learning can be greatly sped up by separating perception and action, with a reward function specified in terms of objects and their motions, which are supplied by the perceptual system. In the past five years, reinforcement learners have become vastly more powerful by incorporating deep learning techniques, playing Atari, Mario, Go, and other games with superhuman skill. However, these learners require vast amounts of training data to become skilled. For example, to master Pong, state-of-the-art reinforcement learners require tens of millions of game frames, equivalent to months of play time at human speed. We show that endowing the learner with a minimal perceptual system, capable of detecting and tracking objects, greatly reduces the number of frames needed for learning. This shifts the learning bottleneck from the amount of training data available to computations easily accelerated with GPUs. 25-minute Talk William Agnew - PhD Student, University of Washington
Favorite
S8663 - Microsoft AI and Research - Infrastructure Overview for Deep Learning and Other Research Microsoft Research leverages a wide variety of open-source, free and custom tools to manage a complex infrastructure for doing research. We are in a unique position at Microsoft and in the industry, where we serve academic experts who expect access to the latest open source tools, in an environment where Microsoft solutions should also be considered. See examples of how we manage popular/constrained assets and enforce fairness across many systems. Linux/Docker, Windows, On-site, Azure, or a hybrid of all-of-the above – we see it all. In this session, you will learn what tools can be easily leveraged to manage your own onsite and cloud GPU infrastructure. We touch on Cluster management fabrics, scheduling, authentication, hot storage, configuration management, software portability/container management and high-performance hardware selection. 25-minute Talk Jim Jernigan - Sr. R&D Systems Engineer, Microsoft Research
Favorite
S8666A - Deploying Autonomous Vehicles with NVIDIA DRIVE -

DRIVE PX is an open platform for Autonomous Driving Ecosystem. It’s been adopted by over 300 partners in the automotive ecosystem to develop solutions for vehicles that are intelligent and autonomous. This talk will outline the technical challenges facing development of autonomous intelligent vehicles and provide details of how the next generation of DRIVE AI car computer i.e. DRIVE Xavier and DRIVE Pegasus address these challenges.

25-minute Talk Srikanth Sundaram - Senior Product Manager DRIVE PX 2, NVIDIA
Favorite
S8686 - Progressive Growing of GANs for Improved Quality, Stability, and Variation We'll describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds up the training and greatly stabilizes it, allowing us to produce images of unprecedented quality, for example, CelebA images at 1024². We'll also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we'll describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we'll suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we'll construct a higher-quality version of the CelebA dataset. 25-minute Talk Samuli Laine - Researcher, NVIDIA
Favorite
S8818 - ONNX: Interoperable Deep learning (Presented by Facebook)

We'll discuss how to transfer models seamlessly from one framework to another using open neural network exchange (ONNX) from the project's lead developer. ONNX is an open specification to provide a common intermediate representation for deep learning models. This specification and set of tools, launched by Facebook, Microsoft, and Amazon, is now supported by a community of partners that includes hardware vendors, startups, and a growing number of deep learning frameworks. The ONNX ecosystem also includes support by hardware-optimized libraries such as NVIDIA's TensorRT. ONNX is the crucial first step toward an open ecosystem that empowers AI developers to choose the most effective tools for each project and accelerate AI research to production scale. 

25-minute Talk Dmytro Dzhulgakov - Technical Lead Manager, Facebook
Favorite
S8966 - Building Smarter Cities with AI-Powered Applications

Learn how Verizon is helping create safer streets, reducing traffic congestion, aiding the navigation of both vehicles and pedestrians, and reducing energy costs and consumption through AI-enabled sensor based networks that leverage LED street lighting infrastructure. We will discuss our Vision Zero application and how use deep learning to recognize, detect, classify and concurrently track vehicles in traffic, pedestrians, bicyclists, and parked cars, and turn it into actionable data to help make better urban planning decisions and quantify the results.

25-minute Talk Andrew Herson - Head of Computer Vision Products, Verizon
Favorite
CE8130 - Connect with the Experts: Deep Learning Training for Volta Tensor Cores

Attend this session to get your questions on mixed-precision training and inferencing answered. Learn more about using Tensor Cores for training and inference with the popular Deep Learning Frameworks, such as TensorFlow, PyTorch, Caffe2, MXNet, Cognitive Toolkit, etc. Let NVIDIA experts provide guidance on implementation questions and concerns.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Hoo Chang Shin - Deep Learning Institute Certified Instructor, NVIDIA
Carl Case, NVIDIA
John Woolley, NVIDIA
Paulius Micikevicius - Compute Architecture, NVIDIA
Favorite
CE8151 - Connect with the Experts: NVIDIA vGPU Performance Engineering - How many vGPU Users Per Host Can I get?

Meet with the experts from NVIDIA’s Professional Visualization Performance Engineering team, learn about testing methodology, our game changing benchmarking toolset quantifying user experience in GPU-enabled VDI environments, and learn how to benchmark your own solution.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Luke Wignall - Senior Manager, Pro Viz Performance Engineering & Technical Marketing, NVIDIA
Vinay Bagade, NVIDIA
Jeremy Main, NVIDIA
Favorite
CE8169 - Connect with the Experts: Performance Analysis and Optimization (2)

Come ask your GPU code optimization questions to experts in the field.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Alexey Romanenko, NVIDIA
Lei Wu, NVIDIA
Kamesh Arumugam Karunanithi - -, NVIDIA
Jakob Progsch - Developer Technology Engineer, NVIDIA
Peng Wang, NVIDIA
Milos Maric - -, NVIDIA
Alan Gray - Developer Technology Engineer, NVIDIA
Favorite
S81009 - Accelerate TensorFlow Inference with New TensorRT Integration

TensorFlow is an open source software library for numerical computation using data flow graphs. NVIDIA TensorRT is an inference optimizer and runtime for runtime deployment. TensorRT provides optimizations for deep neural networks and uses reduced precision to increase throughput, reduce latency, while maintaining accuracy. Today we announced tighter integration in TensorFlow for TensorRT through with new TensorFlow APIs, sub-graph optimizations and INT8 calibration to automatically leverage Tensor Cores on Volta GPUs. TensorRT delivers 2.5x faster inference throughput compared to inference without TensorRT. In this session, NVIDIA developers will use an example based workflow to show how to use this new capability.

25-minute Talk Julie Bernauer, NVIDIA
Favorite
S8117 - Learning-Free Universal Style Transformer Universal style transfer aims to transfer any arbitrary visual styles to content images. Existing feed-forward based methods, while enjoying the inference efficiency, are mainly limited by inability of generalizing to unseen styles or compromised visual quality. We'll present a simple yet effective method that tackles these limitations without training on any pre-defined styles. The key ingredient of our method is a pair of feature transform -- whitening and coloring -- that are embedded to an image reconstruction network. The whitening and coloring transforms reflect a direct matching of feature covariance of the content image to a given style image, which shares similar spirits with the optimization of Gram matrix-based cost in neural style transfer. We demonstrate the effectiveness of our algorithm by generating high-quality stylized images with comparisons to a number of recent methods. We also analyze our method by visualizing the whitened features and synthesizing textures via simple feature coloring. 25-minute Talk Chen Fang - Research Scientist, Adobe Research
Favorite
S8287 - Building and Optimizing AI Cloud: Better Leveraging GPU in Container Cloud Infrastructure Time witnessed the rapid growth of AI cloud and AI-as-a-service, along with the AI explosion this year. We'll report our continuous effort and progress on bringing NVIDIA GPU to container cloud, GPU scheduling optimization, and experience sharing of holding AI workload on container cloud. Firstly, based on the work we reported at GTC 2017, we will update our latest progress of new GPU features adding to Kubernetes, including two GPU advanced schedulers and GPU resource namespace control. This year, we have brought GPU-enabled Kubernetes to IBM Cloud Private, the IBM commercial on-premise container cloud, and several other import IBM products, including our own IBM AI product PowerAI Vision. Meanwhile, we also keep activity to continuously share our technology to open community. Secondly, we want to share our lessons and learns about how to design, manage, optimize and operate AI cloud, with our experiences from our product and user feedback over two years. 50-minute Talk Yubo Li - Research Staff Member, IBM Research China
Seetharami Seelam - Research Staff Member, IBM Research Watson
Favorite
S8316 - Multi-GPU Programming Models Do you need to compute larger or faster than a single GPU allows you to? Learn how to scale your application to multiple GPUs. Learn how to use the different available multi-GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy. 50-minute Talk Jiri Kraus - Senior Devtech Compute, NVIDIA
Favorite
S8339 - Powering Real-Time Radio Astronomy Signal Processing with Latest GPU Architectures

We'll present a summary of ongoing work that targets the use of newer GPU architecture (Pascal and Volta) features in real-time signal processing applications in radio astronomy telescopes, and outline the future growth path for this exciting new application of GPUs. With Pascal and Volta architectures, we'll discuss the advantage of using higher memory bandwidth, half-single precision, and integer arithmetic in existing GPU-based correlator pipeline code. This is an ongoing effort between the National Centre for Radio Astrophysics and NVIDIA. We'll look at various processing stages involved in the pipeline for exploring optimization possibilities, and highlight interesting results that were achieved. We'll address in detail the effect of using half precision with respect to accuracy of performance and required library changes.

25-minute Talk Harshavardhan Reddy - Engineer-C, NCRA
Favorite
S8417 - Breaking the Speed of Interconnect with Compression for Database Applications Learn strategies for efficiently employing various cascaded compression algorithms on the GPU. Many database input fields are amenable to compression since they have repeating or gradually increasing pattern, such as dates and quantities. Fast implementations of decompression algorithms such as RLE-Delta will be presented. By utilizing compression, we can achieve 10 times greater effective read bandwidth than the interconnect allows for raw data transfers. However, I/O bottlenecks still play a big role in the overall performance and data has to be moved efficiently in and out of the GPU to ensure optimal decompression rate. After a deep dive into the implementation, we'll show a real-world example of how BlazingDB leverages these compression strategies to accelerate database operations. 50-minute Talk Nikolay Sakharnykh - Sr. Developer Technology Engineer, NVIDIA
Felipe Aramburu - CTO, Blazing DB
Favorite
S8429 - At the Intersection of AI Cities and ANNs Artificial intelligence has promised a lot. Now decades old, it's still hard to tell the theoretical from the practical. Smart cities are a clear example of where AI technology is solving real-world problems today. Public transportation, smart grids, and data-focused city planning are all being pushed forward, but more visionary goals like safety, walkability, and improved citizens' experience remain elusive. Intersections have long been as much an opportunity as a challenge with complicated traffic pattern, cars interacting in multiple directions and at varied velocities. Add pedestrians, and the ability to quantify the complex interaction is only possible through advanced technology. At Motionloft, we've committed to digitize the physical world. To bring the most meaningful data possible, we realized that we couldn't rely on the marketplace for sensors. Since there is no path to accurate data without artificial neural networks and no weatherproof computer vision, we endeavored to develop our own heuristics using all tools available. We'll describe a variety of use cases this allowed, with particular focus on traffic intersections. 25-minute Talk Paul McAlpine - VP OF ENGINEERING, MOTIONLOFT
Favorite
S8593 - Doing Bayesian Deep Learning with ZhuSuan

We discuss the basic concepts of Bayesian deep learning will be introduced with a hands-on tutorial that walks through several example applications using ZhuSuan (https://github.com/thu-ml/zhusuan). We'll start with simpler models like Bayesian logistic regression, and then proceed to deeper ones like Bayesian neural networks (BNN) and variational autoencoders (VAE). Learn how to use Bayesian methods to capture uncertainty of deep learning, including modeling the data distribution, calibrating the confidence of outputs, and smoothing predictions to prevent overfitting. Real problems (e.g. regression, image generation, semi-supervised classification) will be used to illustrate the models.

50-minute Talk Jiaxin Shi - PhD candidate, Tsinghua University
Favorite
S8603 - Deep Reinforcement Learning for Real-World Robotic Manipulation Deep reinforcement learning (deep RL) has emerged as a promising direction for autonomous acquisition of complex behaviors due to its ability to process complex sensory input and to acquire elaborate behavior skills, using general-purpose neural network representations. Since learning expressive function approximators requires large quantities of data, deep RL has been mostly applied to simulated domains, such as video games and simulated robotic locomotion and manipulation tasks, where the data collection can occur faster than real time and be trivially parallelized. We'll address techniques that have been proposed to enable deep RL for real-world robotics, and discuss how the maximum-entropy principle can be leveraged to reduce the required amount of real-world interaction. 25-minute Talk Tuomas Haarnoja - Graduate Student, UC Berkeley
Favorite
S8635 - Designing Large-Scale Machine Learning Systems with NVIDIA GPUs and Mellanox Interconnect Come join us and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We'll present the state of the art techniques for distributed machine learning, and discuss what special requirements they impose on the system, followed by an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology, and a special focus on the in-network computing SHARP technology used to accelerate large scale deployments in artificial intelligence and high performance computing. 50-minute Talk Gil Bloch - Principal Architect, Mellanox Technologies
Favorite
S8637 - Analysis of Performance Gap Between OpenACC and the Native Approach on P100 GPU and SW26010: A Case Study with GTC-P We'll present our experience with using OpenACC to port GTC-P, a real-world plasma turbulence simulation, on NVIDIA P100 GPU and SW26010, the Chinese home-grown many-core processor. Meanwhile, we developed the GTC-P code with the native approach on Sunway TaihuLight supercomputer so that we can analyze the performance gap between OpenACC and the native approach on P100 GPU and SW26010. The experiment results show that the performance gap between OpenACC and CUDA on P100 GPU is less than 10% by PGI compiler. However, the gap on SW26010 is more than 50% since the register level communication only supported by native approach can avoid low-efficiency main memory access. Our case study demonstrates that OpenACC can deliver impressively portable performance on P100 GPU, but the lack of software cache via RLC supported by the OpenACC compiler on SW26010 results in large performance gap between OpenACC and the native approach. 25-minute Talk Stephen Wang - GPU Specialist, Shanghai Jiao Tong University
Favorite
S8655 - GPU Accelerated Machine Learning for Bond Price Prediction We'll discuss our application of deep learning and classical machine learning (ML) to the prediction of bond prices. The performance gains obtained from using GPUs over conventional high-performance CPUs for the model training process will be discussed. 25-minute Talk Rafael Nicolas Fermin Cota - CEO, RN Financial Corporation
Venkat Bala - Data Scientist, RN Financial Corporation
Favorite
S8748 - Simulate and Validate your DNN Inference with CATIA before ADAS Industrial Deployment One of the tough aspect of Deep Neural Network resides in its behavior validation. Although actual driving should be achieved with physical cars to train the neural network, there is today no tool to appropriately prepare data acquisition campaign or go through stress validation before further on-road testing and industrial deployment. This talk will show how hardware and software in the loop on 3DEXPERIENCE CATIA, can now be extended to AI in the loop, with the ability to activate the full system engineering simulation with the actual neural network meant to run in the autonomous vehicle, accurately reproducing the neural network inference and checking overall vehicle behavior in various conditions. Every stage from full 3D synthetic data ingest and real-time software simulation, through actual hardware in the loop validation both use cases leveraging TensorRT GPU inference can now consistently be proofed for appropriate in-depth understanding of the network reactions before it drives on the road. A POC showing TensorRT and DNN behavior validation will be presented in details, opening new opportunities to validate GPU inference but also compare actual performance impact versus CPU 50-minute Talk Simon Berard - Senior Strategic Planning Analyst, Dassault Systèmes
Cecile Doan - VP CATIA Strategic Planning and Market Development, Dassault Systèmes
Favorite
S8834 - In-Vehicle Change Detection, Closing the Loop in the Car The world isn't static. It's constantly shifting and evolving. Mapping systems that support autonomous driving must therefore constantly detect, verify, and update the changes that are happening in the world, in near real-time, and make appropriate updates to the map. The only way for a map to obtain this level of freshness is to crowdsource data from sensors installed on vehicle fleets in order to adapt to, and to match to, the constantly changing environment—it needs the ability to self-heal. Yet, a constant creation and transmission of vehicle environment data over the air, from a large vehicle fleet, is not practical economically. Hence a strategy for minimizing the necessary bandwidth and subsequent cost for data transmission is crucial. HERE Technologies is developing an in-vehicle solution to ensure autonomous vehicles have the most up to date HD Live Map data while minimizing bandwidth and costs for data transmission. 25-minute Talk Stephen O'Hara - Principal Research Engineer, HERE Technologies
Favorite
S8849 - GE's Evolution from HPC to AI in Healthcare

For more than a decade, GE has partnered with Nvidia in Healthcare to power our most advanced modality equipment, from CT to Ultrasound. Part 1 of this session will offer an introduction to the deep learning efforts at GEHC, the platform we're building on top of NGC to accelerate new algorithm development, and then a deep dive into a case study of the evolution of our cardiovascular ultrasound scanner and the underlying extensible software stack. It will contain 3 main parts as follows: (a) Cardiovascular ultrasound imaging from a user perspective. Which problems we need to solve for our customers. Impact of Cardiovascular disease in a global perspective (b) An introduction to the Vivid E95 and the cSound platform , GPU based real time image reconstruction & visualization. How GPU performance can be translated to customer value and outcomes and how this has evolved the platform during the last 2 ½ years. (c) Role of deep learning in cardiovascular ultrasound imaging, how we are integrating deep learning inference into our imaging system and preliminary results from automatic cardiac view detection.

50-minute Talk Keith Bigelow - VP Analytics, GE Healthcare Waukesha
Erik Steen - Chief Engineer, GE Healthcare
Favorite
S8920 - Holodeck in the Enterprise Environment for Automotive and Heavy Equipment Development

Join industry experts in automotive and heavy equipment design for a 50-minute discussion to learn how they are exploring ways to use Holodeck and VR in general to integrate new tools into their product development environments. Our panel will discuss VR’s potential to evolve the automotive and equipment design processes. Learn how professionals are using Holodeck to bridge the gap between visualization, prototyping, and global collaboration. Get the latest information how VR will expand use cases, overcome industry challenges, and how Holodeck and AI can potentially improve quality and decision making in tomorrow’s design environment.

50-minute Panel Galen Faidley - Sr Engineering Project Team Leader, Caterpillar Inc.
Elizabeth Baron - Virtual Reality & Advanced Visualization Technical Specialist, Ford
Philippe Lesueur - Senior Manager Digital Acceleration, Nissan
Brandon Barach - Senior Manager, Toyota
Favorite
S8984 - Success in the Age of AI From healthcare to financial services to retail, businesses are seeing unprecedented levels of efficiencies and productivity, which will only continue to rise and transform how companies operate. This session will look at how Accenture as an enterprise is optimizing itself in the age of AI, as well as how it guides its customers to success. A look at best practices, insights, and measurement to help the audience inform their AI roadmap and journey. 50-minute Talk Michael Sutcliff - CEO, Accenture Digital, Accenture
Favorite
S8999 - How GPU Server Architectures Deliver increase productivity for Deep Learning Training Workloads & HPC Customers (Presented by Supermicro)

Overview of numerous GPU hardware platforms designed for today's taxing AI/machine learning and HPC workloads, including custom solutions targeted for Deep Learning Inferencing and Deep Learning Training. Talk will cover systems based on PCIe based GPUs as well as GPU systems with the NVLink interface.

50-minute Talk Jason Pai - Director, Super Micro Computer, Inc.
Sarosh Irani - Principal Product Manger, Supermicro
Favorite
S81010 - The Real-Time Revolution GPU accelerated creative development platforms are no longer just for games, they're revolutionizing areas from film to automotive. See how Unity is being used to enable unheard-of levels of productivity and create even deeper collaboration between teams. 25-minute Talk Adam Myhill - Head of Cinematics, Unity
Favorite
S8156 - Deep Learning for Transportation: Fast Estimation of Travel Times Using Historical Routes During this presentation we will review a deep neural network architecture and its training approaches used for producing high volume of estimations of travel times on a road graph with historical routes and traffic. This includes initial and continuous online training, finding various sources to produce training data, challenges of quality control, and, of course, the invaluable role of GPU's for computation during both training and inference. 25-minute Talk Dmitry Kudinov - Senior Data Scientist, Esri Inc.
Favorite
S8522 - DirectX: Evolving Microsoft's Graphics Platform For over 20 years, DirectX has been the platform used by game developers to create the fastest, most visually impressive games on the planet. Come and learn our plans to deliver the next generation of graphics innovation. 50-minute Talk Matt Sandy - Program Manager, Microsoft
Favorite
S8618 - GPU Accelerated LIDAR Based Localization for Automated Driving Applications We'll discuss pattern matching techniques for vehicle localization on GPUs. The audience will gain insights into the problems that arise when pattern matching sensor data, and the challenges these problems pose in GPU implementation. We'll elaborate on techniques used in accelerating GPU localization solvers, yielding up to 10X speedups. 25-minute Talk Praveen Narayanan - Research Scientist, Ford Motor Company
Favorite
S8787 - Differentiable Tree Planning for Deep Reinforcement Learning We'll discuss recent research in deep reinforcement learning (RL), with a focus on the application of intuitions, from planning to neural network architectures for deep RL. Planning in complex visual environments has thus far been held back by the difficulty of learning accurate predictive models. To address this, we embedded a model inside a differentiable, dynamically-constructed tree-planning architecture, so that we identify an effective model when used within that planner. We'll share our work on developing these architectures, as well as our approaches to various technical obstacles associated with the efficient optimization of deep tree-structured models on GPU. 50-minute Talk Gregory Farquhar - DPhil Candidate, University of Oxford
Favorite
S8836 - The New Era of Investments

We'll discuss Qraft Technologies plan to deliver: 1) The remarkable performances Qraft's AI engines have achieved in the financial industry; 2) The concept of technology used in the AI engines to generate strategic investment portfolios. Qraft provides materials that include actual examples of a robo-fund where AI is used to create a mutual fund, robo-advisor where AI recommends an optimal portfolio consisting of mutual funds and fully reflects an investor's propensity, and other important achievements that Qraft has obtained in the financial industry.  Qraft is constructing an eco-system of AI in investment that includes the world's well-known institutions and researchers. 

25-minute Talk Hyung Sik - Chief Executive, Qraft Technologies
Favorite
S8847 - Solar Storm Modeling using OpenACC: From HPC Cluster to "In-House" We explore using OpenACC to migrate applications required for modeling solar storms from CPU HPC clusters to an "in-house" multi-GPU system. We describe the software pipeline and the utilization of OpenACC in the computationally heavy codes. A major step forward is the initial implementation of OpenACC in our Magnetohydrodynamics code MAS. Strategies for overcoming some of the difficulties encountered are discussed, including handling Fortran derived types, array reductions, and performance tuning. Production-level "time-to-solution" results will be shown for multi-CPU and multi-GPU systems of various sizes. The timings show that it is possible to achieve acceptable "time-to-solution"s on a single multi-GPU server/workstation for problems that previously required using multiple HPC CPU-nodes. 25-minute Talk Ronald Caplan - Computational Scientist, Predictive Science Inc.
Favorite
S8914 - Automating the Last Mile

Self-driving vehicles will transform every aspect of how we work and play. Humanity spends 500 million hours each day driving to and from the grocery store. The impact of automating these tasks is huge. Marble is building self-driving delivery vehicles to give you back this time and make delivery a delightful experience. I'll talk about why delivery is a good application of robotics, and how deep learning enables us to automate driving.

50-minute Talk Kevin Peterson - Cofounder, Marble
Favorite
CE8119 - Connect with the Experts: Data Analytics and Machine Learning (2)

Join us in the hangout area to get your technical questions about optimizing data analytics pipelines and machine learning algorithms answered by NVIDIA experts. Learn about the latest capabilities to accelerate entire data analytics pipelines from databases to analytic algorithms, machine learning, and graph analytics. How can GPUs excel at data intensive workloads like complex data analytics tasks? By example, we will demonstrate how to accelerate critical components, covering benchmarks, tools, frameworks, etc. Related presentations: S8289 - How to Get the Most out of GPU Accelerated Database Operators S8417 - Breaking the Speed of Interconnect with Compression for Database Applications S8502 - GOAI One Year Later

1 Hour Connect with the Experts Jiri Kraus - Senior Devtech Compute, NVIDIA
Tim Kaldewey - Senior Manager Developer Technology for AI and Data Analytics, NVIDIA
Hugo Braun - Content Tech, NVIDIA
Shankara Rao Thejasw Nanditale - Compute Devtech Engineer, NVIDIA
Favorite
CE8129 - Connect with the Experts: Advanced Deep Learning

Attend this session to get your technical questions about NN architecture and scaling DL applications answered. Learn more about strategies you can employ to explore the right neural network architectures for your problem and train at scale to converge to your solution faster. NVIDIA deep learning research and HPC experts can provide you with the right guidance to maximize the performance and accuracy of your DL based solution.

Connect directly with NVIDIA engineers and experts from other organizations on specific topics. Come on in and ask a question.

1 Hour Connect with the Experts Michael Houston, NVIDIA
Michelle Gill, NVIDIA
Thomas Cullison, NVIDIA
Evan Acharya, NVIDIA
Favorite
CE8152 - Connect with the Experts: Deep Learning Basics (3)

Attend this session to get your questions on deep learning basics and concepts answered. NVIDIA experts can help you with the fundamentals and provide guidance on how and when to apply Deep Learning and GPUs to your work. No question is too basic to ask.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject.  

1 Hour Connect with the Experts Adam Thompson - Senior Solutions Architect, NVIDIA
Philippe Vandermersch, NVIDIA
Manish Gupta, NVIDIA
Hassan Kianinejad, NVIDIA
Kevin Vincent, NVIDIA
Yang Xu, NVIDIA
Favorite
L8142 - Neural Network Deployment with DIGITS and TensorRT Prerequisites: Image Classification with DIGITS

Duration: 2 hours

Framework: Caffe with DIGITS and TensorRT

Deep learning lets us map inputs to outputs that are extremely computationally intense. Learn to deploy deep learning to applications that recognize images and detect pedestrians in real time by:

• Accessing and understanding the files that make up a trained model

• Building from each function's unique input and output

• Optimizing the most computationally intense parts of your application for different performance metrics like throughput and latency

Upon completion of this lab, you'll be able to implement deep learning to solve problems in the real world.

Presented by the NVIDIA Deep Learning Institute (DLI).
120 Minutes Instructor-Led Lab Mike Mendelson - Deep Learning Institute Curriculum Developer, NVIDIA
Favorite
L8165 - The Necessity of Explainability Explained Because of its success, the use of AI is expanding and now being applied to tasks where consequences of errors can be very unpleasant or even dangerous. Unfortunately it is difficult to understand what AI is really recognizing: it may not be recognizing what you expect and can be easily fooled. This lab will demonstrate that a high level of accuracy does not guarantee that the network performs recognition as you may expect. A new method is demonstrated to reveal the patterns that the network is looking for by deriving the patterns internally from learned network weights. It is developed by Optimizing Mind and can be used for any type of neural network. You will be able to evaluate a network, see what patterns it is looks for and manipulate the patterns to create adversarial examples. 120 Minutes Instructor-Led Lab Tsvi Achler - CTO, Optimizing Mind
Favorite
L8168 - Image Generation Using CycleGAN AI can automatically generate every horse as a zebra, while the same process can generate satellite imagery from any map. The same AI can take a sprite sheet and generate a sheet with a different theme for automatic digital asset creation. In this lab, you will learn how to: • Use image analogies to translate image to image • Create Autoencoder architecture using encoder, transformer, and decoder • Employ PatchGAN discriminator to complete the Generative Adversarial Network After completion of this lab, you will be able to automatically create analogous images using CycleGAN. Fundamentals of Deep Learning with Computer Vision or similar experience 120 Minutes Instructor-Led Lab Kelvin Lwin - Certified Instructors, NVIDIA
Favorite
L8176 - Deployment of Semantic Segmentation Network Using TensorRT NVIDIA TensorRT is a high-performance neural network inference engine for production deployment of deep learning applications. You will learn how to: • Profile inference performance using Drive PX • Optimize using giexec or own executable • Deep dive into INT8 calibration workflow Upon completion, you will know how to use TensorRT to optimize, validate, and deploy trained neural network for inference in a self-driving car application. Prerequisites: Fundamentals of Deep Learning with Computer Vision or similar experience 120 Minutes Instructor-Led Lab Joohoon Lee - Certified Instructor, NVIDIA
Favorite
L8181 - Deep Learning for Genomics using DragoNN with Keras and Theano Learn to interpret deep learning models to discover predictive genome sequence patterns. Use the DragoNN toolkit on simulated and real regulatory genomic data to: • Demystify popular DragoNN (Deep RegulAtory GenOmics Neural Network) architectures • Explore guidelines for modeling and interpreting regulatory sequence using DragoNN models • Identify when DragoNN is a good choice for a learning problem in genomics and high-performance models Upon completion, you'll be able to use the discovery of predictive genome sequence patterns to hopefully gain new biological insights. 120 Minutes Instructor-Led Lab Steven Steinke - Curriculum Developer, NVIDIA
Yonatan Israeli - Consultant, NVIDIA
Favorite
S8152 - Meet Horovod: Uber's Open Source Distributed Deep Learning Framework for TensorFlow

Horovod makes it easy to train a single GPU TensorFlow model on many GPUs; both on a single server and across multiple servers. We'll cover Uber's explorations of distributed deep learning, how to use Horovod, and what kind of performance you can get on standard models, such as Inception V3 and ResNet-101. Learn how to speed up training of your TensorFlow model with Horovod.

25-minute Talk Alexander Sergeev - Senior Software Engineer, Uber
Favorite
S8263 - The Journey from a Small Development Lab Environment to a Production Datacenter for Deep Learning Applications

We'll do a dive deep into best practices and real world examples of leveraging the power and flexibility of local GPU workstations, such has the DGX Station, to rapidly develop and prototype deep learning applications. We'll demonstrate the setup of our small lab, which is capable of supporting a team of several developers/researchers, and our journey as we moved from lab to data center. Specifically, we'll walk through our experience of building the TensorRT Inference Demo, aka Flowers, used by Jensen to demonstrate the value of GPU computing throughout the world-wide GTCs. As an added bonus, get first-hand insights into the latest advancements coming to AI workstations this year. The flexibility for fast prototyping provided by our lab was an invaluable asset as we toyed with different software and hardware components. As the models and applications stabilized and we moved from lab to data center, we were able to run fully load-balanced over 64 V100s serving video inference demonstrating Software-in-the-Loop's (SIL) ReSim capabilities for Autonomous Vehicles at GTC EU. Real live examples will be given.

50-minute Talk Ryan Olson - Solutions Architect, NVIDIA
Markus Weber - Senior Product Manager, NVIDIA
Favorite
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and Engineering Teams The Krylov Project is the key component in eBay's AI Platform initiative that provides an easy to use, open, and fast AI orchestration engine that is deployed as managed services in eBay cloud. The main goals of the project are: Every AI and machine learning algorithm should be shareable and easily implementable with possible options of frameworks; enable machine learning engineers to do end-to-end training pipelines that distribute and parallelize over many machines; training models should be automated and allow easy access to vast eBay datasets; engineers should be able to search past job submissions, view results, and share with others. We have built Krylov from the ground up, leveraging JVM, Python, and Go as the main technologies to build the Krylov components, while standing in shoulder of giants of technology such as Docker, Kubernetes, and Apache Hadoop. Using Krylov, AI scientists can access eBay's massive datasets; build and train AI models; spin up powerful compute (high-memory or GPU instances) on the Krylov HPC cluster; and set up machine learning pipelines, such as using declarative constructs that stitch together pipeline lifecycle. 50-minute Talk Henry Saputra - Technical Lead, eBay
Favorite
S8310 - Can FPGAs Compete with GPUs?

Previously, FPGAs were known to be highly energy efficient, but notoriously difficult to program, and unsuitable for complex HPC applications. This is changing due to new technology developments: a high-level programming language (OpenCL), hard floating-point units, and tight integration with CPU cores. We'll compare FPGAs and GPUs with respect to architecture, programming model, programming effort, performance, and energy efficiency, using some radio-astronomical signal-processing and imaging algorithms as examples. Can they compete with GPUs?

25-minute Talk John W. Romein - Senior Researcher, ASTRON (Netherlands Institute for Radio Astronomy)
Favorite
S8330 - Protecting Pulsed High-Power Lasers with Real-Time Image Classification

Learn how to combine computer vision techniques and deep learning to improve the sensitivity of a real-time, GPU-powered safety system. In petawatt laser systems, firing at 10 Hz, suddenly appearing scatterers can damage components. Spreading of damage can be avoided by suspending operation immediately on occurrence of such an event. We'll present our approach for the automatic detection of critical failure states from intensity profiles of the laser beam. By incorporating quick feature detection and learned heuristics for feature classification, both real-time constraints and limited available training data are accommodated. Localization of triggering feature is crucial for when the problem is located in non-sensitive sections and will not be removed from the beam in production.

25-minute Talk Jeffrey Kelling - Researcher, Helmholtz-Zentrum Dresden - Rossendorf
Favorite
S8344 - OpenMP on GPUs, First Experiences and Best Practices OpenMP has a long history on shared memory, CPU-based machines, but has recently begun to support offloading to GPUs and other parallel accelerators. This talk will discuss the current state of compilers for OpenMP on NVIDIA GPUs, showing results and best practices from real applications. Developers interested in writing OpenMP codes for GPUs will learn how best to achieve good performance and portability. 50-minute Talk Jeffrey Larkin - Senior DevTech Software Engineer, NVIDIA
Favorite
S8383 - SmartSense: Real-Time, Field-Deployed CV Traffic Analysis System Miovision presents a video-based traffic analytics system, capable of tracking and classifying vehicles in real time throughout cities. The system leverages Jetson TX2 modules and inferencing to accurately classify vehicles at over 50 frames per second using single-shot multibox detection and DAC, a VGG-based network. We'll cover many of the issues our teams went through to design and implement the system, including data collection, annotation, training, incorporating continuous training, and deep learning iteration. We'll also illustrate how the measured traffic trends were used to reduce congestion and evaluate the health of traffic corridors. 25-minute Talk Justin Eichel - Technical Director, Miovision
Favorite
S8398 - Designing Human Centric Spaces with Holodeck and Machine Learning The growth in density of housing in cities like London and New York has resulted in the higher demand for efficient smaller apartments. These designs challenge the use of space and function while trying to ensure the occupants have the perception of a larger space than provided. The process of designing these spaces has always been the responsibility and perception of a handful of designers using 2D and 3D static platforms as part of the overall building design and evaluation, typically constraint by a prescriptive program and functional requirement. A combination of human- and AI-based agents creating and testing these spaces through design and virtual immersive environments (NVIDIA Holodeck) will attempt to ensure the final results are efficient and best fit for human occupancy prior to construction. 25-minute Talk Cobus Bothma - Applied Research Director, KPF
Xin Zhang - BIM Specialist, Kohn Pedersen Fox Associates
Favorite
S8412 - Deep Imaging: Quantitative Biomarkers for Clinical Decision Making

The transformation towards value-based healthcare needs inventive ways to lower cost and increase patient health outcomes. Artificial intelligence is vital for realizing value-based care. Turning medical images into biomarkers helps to increase effectiveness of care.

25-minute Talk Razvan Ionasec - Global Product Manager for Artificial Intelligence, Siemens Healthineers
Favorite
S8484 - Blazing Fast SQL Analytics on Your Data Lake Extract analytical value out of your enterprise data lake with a state-of-the-art GPU SQL analytics engine. As businesses continue to consolidate massive datasets into data lake technologies (HDFS, AWS S3, Azure Blob, etc.), they find themselves unable to fully leverage the value these lakes hold. Data engineering departments need to produce unique, costly ETL processes for every dataset and every tool which hopes to interact with said dataset. At BlazingDB we've built an analytics engine that runs SQL directly on open source file formats inside data lakes, currently BlazingDB's Simpatico and Apache Parquet. These file formats can be easily accessed from a variety of different tools, limit duplication of large volumes of data, and support improved data governance. Learn strong practices for ensuring your data lake doesn't turn into a swamp and how to extract the full value of your data lake investment. 25-minute Talk Rodrigo Aramburu - CEO, BlazingDB
William Malpica - VP of Engineering, BlazingDB
Favorite
S8489 - Scaling Molecular Dynamics Across 25,000 GPUs on Sierra & Summit As a part of the Department of Energy/National Cancer Institute pilot programs and the Sierra Institutional Center of Excellences, Lawrence Livermore National Laboratory has developed strong scaling molecular dynamics codes for atomic-level simulation in physics, materials science, and biology. Our implementation is portable from tablets and laptops to supercomputers, and can efficiently scale up to tens of thousands of GPUs. In particular, we target the Department of Energy leadership computing facilities, Sierra and Summit, at the Livermore and Oak Ridge National Laboratories. These are over 100-petaflops supercomputers powered by IBM and NVIDIA hardware. We'll discuss the performance and scaling of our code, and its application to cancer biology research, material science, and high-energy physics. 50-minute Talk Shiv Sundram - Scientific Software Developer, Lawrence Livermore National Laboratory
Tomas Oppelstrup - Staff Scientist, Lawrence Livermore National Laboratory
Favorite
S8562 - The Future of AI for Media & Entertainment AI has already had a major impact on Media & Entertainment – from connecting people with relevant content, to video analytics and dynamic distribution. Join our panelists to gain high-level insights about new ways AI will impact the Film, Television, AR/VR, and Broadcast industries. We'll discuss advancements in content creation, dynamic delivery, and intelligent interactivity. 50-minute Panel Munika Lay - Director, Strategy & Business Development, End Cue
Vicki Dobbs Beck - Executive in Charge, ILMxLAB
Shalini De Mello - Senior Research Scientist, NVIDIA
Marcie Jastrow - SVP of Immersive Media, Technicolor
Rick Champagne - Global Media & Entertainment Strategy and Marketing, NVIDIA
Favorite
S8614 - Digital Twin for the Railway Network We describes concept of Digital Twin with respect to the Railway Network. Railroad customers across the world manage thousands of miles of Track infrastructure that consists of the Rails, Ballast, Ties, Bridges, Tunnels, Wayside equipment, etc. This talk demonstrates a new approach to Track infrastructure monitoring that GE is piloting for customers using the concept of Digital Twin for network. Using an offline GPU infrastructure – Deep Learning models are created and trained on large volumes of video data to learn the state of healthy Track and predict anomalies. During the talk, real customer use-case videos will be shown that demonstrate Analytics on videos from Locomotive-mounted cameras with Deep Learning models to calculate health index and display on a map for driving Maintenance decisions. 50-minute Talk Dattaraj Rao - Principal Architect, General Electric
Favorite
S8751 - Bringing Data to Life - Data Management and Visualization Techniques

We'll discuss a practical overview and the financial implications of popular data ingestion and pre-processing techniques use today. We'll provide creative techniques for using GPU database technology to better understand financial industry data. We'll focus on the use of Spark, Alluxio, Arrow, NiFi, Sqoop, Kafka, Tensorflow Datasets, and GPU database techniques throughout the different phases of data management and analysis.

50-minute Talk Benika Hall - Analytics Consultant, Wells Fargo
Rob Harrison - Analytics Consultant, Wells Fargo
Favorite
S8921 - Development of a Self-Learning AI-Based L4 Vehicle - The Dream Car

The development of self-driving cars requires a strong relationships between partners in a different way as we know it today. This might be the only way to successfully bring self-driving vehicles on the road. ZF, Virtual Vehicle, and NVIDIA have joined forces to develop an AI-based L4 vehicle for urban scenarios in only six months; the so-called dream car. Learning while sleeping is the groundbreaking idea of the dream car which was realized in the second half of 2017. Without driving around, the car constantly learns and adapts itself based on data acquired from other cars driving around somewhere else in the world. The key is AI and ZF's ProAI which was developed with NVIDIA in the past year. ProAI interprets the data in real-time, learns from it, validates the data, checks the plausibility, and adjusts the vehicle behavior. We'll summarizes the implementation steps, HW and SW architecture, relevant driving/testing scenarios, our AI approach, and the challenges met in order to realize the dream car.

25-minute Talk Oliver Briemle - Head of L4 Feature Development, Domain Control and V2X, ZF
Daniel Watzenig - Head of Department and Full Professor, Virtual Vehicle
Favorite
S8953 - AI for Social Good as an Innovation Driver Innovation can take many forms, and led by varying stakeholders across an organization. One successful model is utilizing AI for Social Good to drive a proof-of-concept that will advance a critical strategic goal. The Data Science Bowl (DSB) is an ideal example, launched by Booz Allen Hamilton in 2014, it galvanizes thousands of data scientists to participate in competitions that will have have far reaching impact across key industries such as healthcare. This session will explore the DSB model, as well as look at other ways organizations are utilizing AI for Social Good to create business and industry transformation. 50-minute Panel Richard Wender - Chief Cancer Control Officer, American Cancer Society
Ben Hamner - Cofounder and CTO, Kaggle
Josh Sullivan - Senior Vice President, Booz Allen Hamilton
Catherine Ordun - Senior Data Scientist, Booz Allen Hamilton
Favorite
S8222 - Deep Learning for Heliophysics NASA's heliophysics division operates a fleet of spacecraft, the so-called Heliophysics System Observatory, to monitor the Sun's activity and how its changes drive space weather in interplanetary space and in the near-Earth environment. We'll present case studies of how a number of challenging problems encountered in heliophysics can be tackled using deep learning: spectropolarimetric inversions for measuring the magnetic field on the solar surface, and mega-Kelvin thermometry of the Sun's corona by using a deep neural network to solve a compressed sensing problem. These low-cost solutions make possible new concepts for deep space missions for space weather monitoring. Some of the work in this presentation was made possible by NASA's Frontier Development Lab, a public-private partnership between the agency and industry partners (including the SETI Institute, NVIDIA, IBM, Intel, kx & Lockheed Martin), whose mission is to use artificial intelligence to tackle problems related to planetary defense and heliophysics. 25-minute Talk Mark Cheung - Staff Physicist, Lockheed Martin Solar & Astrophysics Laboratory
Favorite
S8223 - Simulating a City: GPU Simulations of Traffic, Crowds and Beyond Learn how to simulate transportation systems and crowds for smart city applications at massive scale. This talk will give insights into novel algorithms and techniques which are being applied to: 1) National (entire UK) scale road network flow simulations, 2) City sized simulations of intelligent individually modelled vehicles, and 3) Integrated simulations of national infrastructure with Pedestrian crowds, vehicles and rail. Examples of techniques include low-density high-diameter graph traversal, multi agent simulation and virtual reality interaction using the OmniDeck treadmill and the Oculus Rift. 25-minute Talk Paul Richmond - Research Software Engineering Fellow and Senior Lecturer, University of Sheffield
Favorite
S8312 - Learning Affinity via Spatial Propagation Networks We provide a unified framework on learning affinity in pure data-driven fashion using a linear propagation structure. This is a GPU and deep learning friendly pairwise learning module that does not require solving linear equation, iterative inferences or manually defined kernels. Specifically, we develop a three-way connection for the linear propagation model, which formulates a sparse transformation matrix, where all elements can be the output from a deep CNN, but results in a dense affinity matrix that effectively models any task-specific pairwise similarity matrix. The spatial propagation network can be applied to many affinity-related tasks, such as image matting, segmentation and colorization, to name a few. Essentially, the model can learn semantically aware affinity relations for high-level vision tasks due to the powerful learning capability of the deep CNN. We validate the framework on the task of refinement for image segmentation boundaries. Experiments on face parsing and semantic segmentation tasks show that the spatial propagation network provides a general, effective, and efficient solution for generating high-quality segmentation results. 25-minute Talk Sifei Liu, NVIDIA
Favorite
S8519 - New Features in OptiX NVIDIA OptiX is a sophisticated library for performing GPU ray tracing in many domains, ranging from film rendering to acoustic modeling to scientific visualization. Version 5.0 adds many new features including motion blur, MDL support and an AI accelerated denoiser. We'll discuss the newest features in the OptiX SDK. We'll dive deep into some of the new features presenting code samples and integration tips. 50-minute Talk Ankit Patel - Senior Product Manager, NVIDIA
Jesse Lacewell - DevTech, NVIDIA
Favorite
S8563 - Building a GPU-Focused CI Solution As the number of GPU-accelerated applications have multiplied, the needs for better development tools and services have increased as well. Chief among such services is continuous integration (CI), which dramatically improves and speeds up the development life cycle through automated builds and integration testing. CI for GPU-accelerated applications comes with its own set of challenges, but the rewards can be enormous. We'll walk through how we implemented CI for the NVIDIA GPU Cloud by leaning on open source solutions such as Jenkins, discuss the lessons we learned in the process, and demonstrate how other such systems should be built in the future. 25-minute Talk Michael Wendt - Manager, Applied Engineering Solutions, NVIDIA
Favorite
S8649 - VR and AI in the Hospitality Industry Virtual Reality and Artificial Intelligence are the keys to revolutionizing the hospitality industry. From how hotels are designed, to how guests shop for their rooms, to the complete gamut of on-premise experiences, the entire hospitality experience is on the cusp of change. At Gettys Group we're embracing VR throughout our design projects, and we've begun exploring how AI can simultaneously enhance guest experiences and reduce hotel staffing costs. In this presentation, we'll share examples of our new VR-enhanced workflows, highlighting how we're leveraging NVIDIA's VR-Ready systems and the new Holodeck platform to simultaneously accelerate our processes and win new business. We'll conclude with our wishlist for additional immersive experience functionality and our thoughts on how this revolution will affect the broader travel industry. 25-minute Talk Stephen Phillips - Chief Technology Officer, Theia Interactive
Ron Swidler - Principal, The Gettys Group
Favorite
S8700 - Unlocking Access to HD Maps for Autonomous Driving

Autonomous vehicles require highly accurate, up-to-date maps for a safe, comfortable and optimized experience. TomTom's multi-source, multi-sensor approach leads to HD Maps that have greater coverage, are more richly attributed, and have higher quality than single-source, single-sensor maps. Autonomous vehicles also need to be able to access the latest, most up-to-date HD Maps with minimal latency. Learn how TomTom is taking on this challenge.

25-minute Talk Willem Strijbosch - Head of Autonomous Driving, TomTom
Favorite
S8795 - Research To Production: How Facebook does AI at Scale (Presented by Facebook)

Facebook's strength in AI innovation comes from the ability to quickly bring cutting-edge research into large scale production using a multi-faceted toolset. We'll discuss how Facebook leverages open source software to perform truly iterative AI research, scale it seamlessly for inference, and deploy it across the data center and mobile environments with ONNX. 

50-minute Talk Howard Mansell - Engineering Manager, Facebook AI Research
Sarah Bird - Technical Program Manager, Facebook
Favorite
S8844 - Applied AI: Inference on Live Data from Autonomous UAVs on Industrial Job Sites

Using a live video stream from a UAV, we perform real-time edge inference to detect and classify active machinery and heavy equipment on mining and construction sites. Our goal is to show the foundation for applied AI on industrial sites that connect to a necessary workflow, giving site supervisors real-time updates, automatic analysis, and automated responses to benefit productivity and efficiency of their site. What is the motivation for this form of real-time detection system? As jobsites today scale and become increasingly complex with more machines, people, and assets, there is too much data for a human to keep track of and process, preventing industrial site managers from getting full oversight and control. Sensors mounted on drones to capture data combined with AI systems to break this data down into actionable insights give us the power to provide visibility and quantified information on activity occurring at every moment on the site. This allows supervisors to get real-time updates on key factors like jobsite progress, material deliveries, changes or delays to schedule, safety issues, and so on.

50-minute Talk Angela Sy - Head of AI, Skycatch, Inc.
Favorite
S8871 - AI Models to Clinical Practice: Open AI Marketplace for Diagnostic Imaging

Learn about the importance of clinical domain expertise in AI algorithm/model development and incorporation into clinical workflow, specifically in medical imaging, from a radiologist. With growing media attention, there is much fear, hype, and hope when it comes to using DL in radiology. We will present through examples why it is essential to incorporate clinical domain expertise when developing DL models. We will demonstrate various ways AI can augment the radiologists both in image interpretation as well as beyond within the overall workflow. In the second portion of this talk, we will present the gap between developing a great AI model in isolation and having it become part of daily medical practice. From integration and hospital connectivity to algorithm serving at scale to meet growing demand, we will show how an AI Marketplace can create the ecosystem that allows AI to flourish.

25-minute Talk Woojin Kim - Chief Medical Information Officer, Nuance Communications
Arman Sharafshahi - Engineering Director, Nuance Communications
Favorite
S8899 - Scaling Deep Learning for Immersive User Interfaces Deep learning creates advances following a virtuous recipe: model architecture search, creating large training datasets, and scaling computation. Baidu Research's Silicon Valley AI Lab develops state-of-the-art conversational user interfaces following this DL recipe. We research new model architectures and features for speech recognition (Deep Speech 3), speech generation (Deep Voice 3), and natural language processing. To deploy these models in impactful products, we want a deep understanding of how recipe components coordinate to drive accuracy improvements. Through large-scale empirical studies, we find intriguing results about how deep learning is likely to scale: As training set size increases, DL model generalization error and model sizes scale as particular power-law relationships. For a fixed dataset size, as model size grows, training time remains roughly constant -- larger models require fewer steps to converge to the same accuracy. These scaling relationships have significant implications on DL research, practice, and systems. They can assist model debugging, setting accuracy targets, and decisions about dataset growth and future computing system design. 50-minute Talk Joel Hestness - Systems Research Scientist, Baidu Research