No
Yes
View More
View Less
Working...
Close
OK
Cancel
Confirm
System Message
Delete
My Schedule
An unknown error has occurred and your request could not be completed. Please contact support.
Scheduled
Wait Listed
Personal Calendar
Speaking
Conference Event
Meeting
Interest
There aren't any available sessions at this time.
Conflict Found
This session is already scheduled at another time. Would you like to...
Loading...
Please enter a maximum of {0} characters.
{0} remaining of {1} character maximum.
Please enter a maximum of {0} words.
{0} remaining of {1} word maximum.
must be 50 characters or less.
must be 40 characters or less.
Session Summary
We were unable to load the map image.
This has not yet been assigned to a map.
Search Catalog
Reply
Replies ()
Search
New Post
Microblog
Microblog Thread
Post Reply
Post
Your session timed out.
This web page is not optimized for viewing on a mobile device.Visit this site in a desktop browser or download the mobile app to access the full set of features.
GTC 2018 Silicon Valley
Favorite
Remove from My Interests
Browse & Search for sessions, and click "Add to Schedule" to save sessions to your agenda.

Note sessions are first come, first serve on the day of the conference. Arrive early to the room for high priority sessions.

Sign-up is required for Conference + Training pass holders to reserve seats in Instructor-Led Labs.

TDLIW04 - Pre-GTC DLI Workshop: Fundamentals of Accelerated Computing with CUDA C/C++

Pre-requisite: None

Duration: 8 hours

Format: Self-paced online or instructor-led

Languages: English

The CUDA computing platform enables the acceleration of CPU-only applications to run on the world's fastest massively parallel GPUs. Experience C/C++ application acceleration by:

  • Accelerating CPU-only applications to run their latent parallelism on GPUs
  • Utilizing essential CUDA memory management techniques to optimize accelerated applications
  • Exposing accelerated application potential for concurrency and exploiting it with CUDA streams
  • Leveraging command line and visual profiling to guide and check your work

Upon completion of this workshop, you'll be able to accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques. You’ll understand an iterative style of CUDA development that will allow you to ship accelerated applications fast.

See GTC Pricing for more information.

8 hours Pre-GTC DLI Workshops Joshua Wyatt - Content Developer, NVIDIA Deep Learning Institute, NVIDIA
Favorite
SE0000 - Welcome Reception

At this reception, meet NVIDIA staff and other GTC alumni to get tips, especially if you're a first-timer.

Special Event - 2 h Special Event
Favorite
SE0002 - Dinner with Strangers (Sun)

Join a random group of GTC attendees for enlightening conversations over a self-hosted dinner in great restaurants nearby. Less creepy than it sounds, this is one of the more popular programs at GTC.

Sign up in Main Lobby.

Special Event - 2 h Special Event
Favorite
CE8164 - Connect with the Experts: CUDA-based Raytracing and Rendering

We will answer your questions on the design and implementation of renderers based on raytracing using CUDA, and discuss how to get the best performance out of NVIDIA hardware in your renderer. 

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Carsten Waechter - Ray Tracing Software Architect, NVIDIA
Pascal Gautron - Senior Developer Technology Engineer, NVIDIA
Favorite
L8111A - Jetson Developer Tools Training Labs

This lab is focused on teaching you how to maximize the productivity when developing software for the Jetson platform. You will experience first hand how to manage source code on the host PC to cross-compile the software, initiate remote debugging sessions to debug CPU C/C++ and CUDA C code. Through a comprehensive set of exercises, you will also learn how to use the CUDA Visual Profiler for optimizing CUDA kernels, use the Tegra System Profiler for optimizing CPU code and tracing multi-process system-wide activities, and use Tegra Graphics Debugger for debugging and profiling 3D graphics applications. Prerequisites: Basic CUDA-C and C++ coding skills.

120 Minutes Instructor-Led Lab Sebastien Domine - VP SW Eng. Developer Tools, NVIDIA
Favorite
L8167 - Image Creation using Generative Adversarial Networks using TensorFlow and DIGITS This lab will guide you through the process of training a Generative Adversarial Network (GAN) to generate image contents in DIGITS. You'll learn how to: • Use Generative Adversarial Networks (GANs) to create handwritten numbers • Visualize the feature space and use attribute vector to generate image analogies • Train a GAN to generate images with set attributes Upon completion, you'll be able to use GANs to generate images by manipulating feature space. Prerequisites: Fundamentals of Deep Learning with Computer Vision or similar experience 120 Minutes Instructor-Led Lab Jonathan Bentz, NVIDIA
Favorite
S8225 - Sharing Physically Based Materials Between Renderers with MDL We'll discuss the basics of NVIDIA's Material Definition Language, showing how a single material can be used to define matching appearances between different renderers and rendering techniques. End users will learn how physically based definitions can be defined, while developers will learn what's entailed in supporting MDL within their own products or renderers. 50-minute Talk Jan Jordan - Software Product Manager MDL, NVIDIA
Lutz Kettner - Director, Rendering Software and Material Definition, NVIDIA
Favorite
S8236 - Singularity: Reproducible, Trusted Containers for Scientific Computing

Singularity is a container technology which is widely supported by HPC centers and service providers because it facilitates extreme mobility of compute via verifiable, trusted containers. This talk will cover a high level view of container computing and an introduction to Singularity, description of the Singularity Image Format (SIF), as well as technical recipes and usage examples with GPUs. After attending this talk, you will have a strong understanding of containerization and how to leverage this technology to create extremely reproducible workflows.

50-minute Talk Gregory Kurtzer - CEO, SyLabs
Favorite
S8286 - Quick and Easy DL Workflow Proof of Concept Spin up a deep learning (DL) proof-of-concept on a budget. We'll walk you through a DL workflow in the cloud leveraging DIGITS, then download a trained model, and run inference on a Jetson TX2. This session considers multiple options such as Nimbix, AMI, and NGC on Tesla P100, Tesla V100, and NVIDIA DGX-1 servers. This tutorial will be a combination of lecture, live demos, and detailed instructions. 50-minute Talk Jeffrey Weiss - Director, Solution Architects, NVIDIA
Alec Gunny - Solutions Architect, NVIDIA
Kenneth Hester - Solution Architect, NVIDIA
Favorite
S8382 - Zero to GPU Hero with OpenACC GPUs are often the fastest way to obtain your scientific results, but many students and domain scientists don't know how to get started. In this tutorial we will take an application from simple, serial loops to a fully GPU-enabled application. Students will learn a profile-guided approach to accelerating applications, including how to find hotspots, how to use OpenACC to accelerated important regions of code, and how to get the best performance they can on GPUs. No prior experience in GPU-programming or OpenACC is required, but experience with C, C++, or Fortran is a must. Several books will be given away to attendees who complete this tutorial. 80 Minutes Tutorial Jeffrey Larkin - Senior DevTech Software Engineer, NVIDIA
Favorite
S8483 - Empowering CUDA Developers with Virtual Desktops You've just been tasked with deploying the NVIDIA CUDA Toolkit to a group of developers. Wouldn't it be great if you could save time deploying it, protect the developers work, reduce the amount of unique workstation hardware needed, & get more out of your hardware investment? This session will show how this can be done with VMware Horizon Virtual Desktops leveraging vGPUs and the CUDA Toolkit. The CUDA Toolkit is a core component of most developer's desktops and provides the underpinnings for many development operations that take advantage of GPU technology. It can, and often is, difficult to install on Virtual Machines. We will walk through its deployment on Linux virtual machines, insuring requirements for both the CUDA Toolkit & VMware Horizon with vGPU are met. 50-minute Talk Tony Foster - Sr. Advisor, Technical Marketing Ready Bundles for HPC, Dell EMC
Favorite
S8587 - Recent Progress in Accelerating Monte Carlo Simulation on GPU for Pricing and Risk Management of Financial Instruments

Learn about recent progress in accelerating Monte Carlo simulation on the GPU in applications for pricing financial instruments and risk management. We'll focus on the forward Monte Carlo simulation, which allows for a natural parallelization across CUDA cores, and present a recent extension of our implementation to a broad selection of industry standard valuation models for different asset classes, including hybrid models that can be used to price multi-currency and multi-asset portfolios. Even with increasing complexity and dimensionality of valuation models, our benchmarks show stable GPU speedup factors in the ranges of 20x and 30x for calculations with floating point double precision FP64 and single precision FP32, respectively. We also briefly summarize a most recent research project on a more complex backward (/American / Least Squares) Monte Carlo simulation method, based on regression algorithms used to price general financial instruments with optionality. The latter method heavily relies on matrix calculations and benefits from using GPU- accelerated libraries, cuBLAS for linear algebra and cuSOLVER for solvers.

25-minute Talk Serguei Issakov - Global Head of Quantitative Research and Development, Senior Vice Pres, Numerix
Favorite
S8596 - Overcoming Missing Modalities in Remote Sensing

Recent advances in earth observation are opening up a new exciting area for exploration of satellite image data. We'll teach you how to analyse this new data source with deep neural networks. Focusing on emergency response, you will learn how to apply deep neural networks for semantic segmentation on satellite imagery. We will specifically focus on multimodal segmentation and the challenge of overcoming missing modality information during inference time. It is assumed that registrants are already familiar with fundamentals of deep neural networks.

25-minute Talk Damian Borth - Director, German Research Center for Artificial Intelligence (DFKI)
Benjamin Bischke - PhD Candidate, German Research Center for Artificial Intelligence (DFKI)
Favorite
S8660 - A Deep Neural Network for Estimating Depth from Stereo We present a deep neural network architecture for estimating 3D depth from stereo images. The network is modeled after computer vision stereo matching pipelines to simplify training process. Our loss function consists of a photometric loss term and Lidar based loss terms. This combination makes it possible to train our DNN in a supervised, semi-supervised and completely unsupervised way. Our DNN produces depth maps that have accuracy similar to Lidar based depth. We also compare our stereo DNN architecture to other stereo architectures as well as to a monocular depth DNN architecture. We demonstrate qualitative and quantitative test results. 50-minute Talk Nikolai Smolyanskiy - Principal Deep Learning and Computer Vision Engineer, NVIDIA
Alexey Kamenev - Senior Deep Learning and Computer Vision Engineer, NVIDIA
Favorite
S8666 - Deploying Autonomous Vehicles with NVIDIA DRIVE

DRIVE PX is an open platform for Autonomous Driving Ecosystem. It’s been adopted by over 300 partners in the automotive ecosystem to develop solutions for vehicles that are intelligent and autonomous. This talk will outline the technical challenges facing development of autonomous intelligent vehicles and provide details of how the next generation of DRIVE AI car computer i.e. DRIVE Xavier and DRIVE Pegasus address these challenges.

50-minute Talk Srikanth Sundaram - Senior Product Manager DRIVE PX 2, NVIDIA
Favorite
S8704 - NVIDIA IndeX - Advanced Large-Scale Data Visualizations on the NVIDIA GPU Cloud (NGC)

NVIDIA IndeX incorporates NVIDIA's hardware and software technology to enable interactive high-quality 3D visual exploration and real time evaluation of computed and simulated large data for a wide range of scientific fields: NVIDIA IndeX is deployed for DGX technology and can be made available as a container on the cloud, such as AWS or NGC. With NVIDIA IndeX scientists gain unique insights into unlimited size and complexity of 3D data and NV-IndeX's in-situ solution allows scientists envisioning remarkable new data simulation and visualization workflows. We present NVIDIA IndeX's CUDA programming interface for implementing novel visualization techniques, illustrates CUDA programs that produce various high-fidelity visualizations and demonstrates large-scale data visualization on the NVIDIA GPU Cloud based on custom visualization techniques.

25-minute Talk Marc Nienhaus - Sr. Manager Software Engineering, NVIDIA IndeX, NVIDIA
Alexander Kuhn - Senior Software Engineer, NVIDIA
Henning Lux - Senior Software Engineer, NVIDIA
Favorite
S8727 - Improving NAMD Performance on Volta GPUs In 2007, NAMD was the first full-featured production molecular dynamics software to use CUDA for accelerating its costliest computations. We'll describe our latest efforts, techniques, and results in our quest to optimize NAMD to make best use of the tremendous computational capabilities of state-of-the-art Volta GPUs, particularly in new dense node configurations such as the NVIDIA DGX and ORNL Summit systems that feature NVLink-connected GPUs. In existence now for over 20 years, NAMD is a sophisticated parallel molecular dynamics program. NAMD development has emphasized parallel scalability to support large-size and long-timescale biomolecular simulations running on petascale supercomputers. As GPU technology has evolved, NAMD has benefited from moving greater amounts of work to the GPU. NVIDIA's release of Volta has now shifted the balance almost entirely to the GPU, with the small remaining CPU calculations often posing bottlenecks to NAMD's performance. Attendees will learn optimization strategies and pitfalls for achieving higher performance as Amdahl's Law poses an ever increasing challenge for mature GPU-accelerated codes like NAMD. 50-minute Talk David Hardy - Research Programmer, University of Illinois at Urbana-Champaign
Ke Li - HPC Developer Technology Engineer, NVIDIA
John Stone - Senior Research Programmer, University of Illinois at Urbana Champaign
Favorite
S8782 - A Cross-Field VR Case Study to Treat Children with Autism Spectrum Disorder We build a contextualized learning system with realistic interaction for medical education. This system is intended to integrate Virtual Reality (VR) with the knowledge of occupational therapy, especially for autistic children. Our system supports a variety of scenes to facilitate training for children's confidence, adaptability and social ability. Adopting our system, the training content is no longer limited to the traditional treatment room. Clearly, therapist and children are able to save their arranging time and focus on immersive training. 25-minute Talk Huai-Sheng Huang - Assistant Professor, Fu Jen Catholic University - Department of Information Management
Favorite
S8873 - GBM Inferencing on GPU We'll present a novel GPU implementation for batched GBM inferencing. We'll also present detailed performance comparison of our implementation against the state-of-the-art libraries such as XGBoost and Treelite. We'll then compare inference performance on various real-world datasets. 50-minute Talk Shankara Rao Thejasw Nanditale - Compute Devtech Engineer, NVIDIA
Vinay Deshpande - Compute DevTech Engineer, NVIDIA
Favorite
S8979 - An Introduction to CUDA Programming Session 1 of 4 (Presented by Acceleware)

Join us for an informative introduction to CUDA programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!

80 Minutes Tutorial Dan Cyca - Chief Technology Officer, Acceleware
Chris Mason - Technical Product Manager, Acceleware
Favorite
S8963 - How Will Machine Learning and Artificial Intelligence Change the Practice of Healthcare

This session will give an overview of new methods that leverage machine learning and causal inference to enable reliable individualized decision-making. We will present applications in different areas of healthcare where real-time inference is changing the practice of medicine. The latter also gives rise to new challenges in developing human-machine collaborative systems.

25-minute Talk Suchi Saria - John C. Malone Assistant Professor, Johns Hopkins University
Favorite
CE8126 - Connect with the Experts: Deep Learning Basics

Attend this session to get your questions on deep learning basics and concepts answered. NVIDIA experts can help you with the fundamentals and provide guidance on how and when to apply Deep Learning and GPUs to your work. No question is too basic to ask.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject.  

1 Hour Connect with the Experts Rajan Arora - Solution Architect, NVIDIA
Xuan Vinh Nguyen, NVIDIA
Robert Crovella - SA Mgr., NVIDIA
Favorite
S81014 - Advancing State-of-the-Art of Autonomous Vehicles and Robotics Research using AWS GPU Instances (Presented by Amazon Web Services)

Toyota Research Institute's (TRI) mission is to improve the quality of human life through advances in artificial intelligence, automated driving, and robotics. Learn more about their research and how they are using AWS EC2 P3 instances, industry's most powerful GPUs instances, in combination with other AWS services to enable autonomous vehicles and robots at scale.

50-minute Talk Chetan Kapoor - Senior Product Manager - EC2, Amazon Web Services
Adrien Gaidon - Machine Learning Lead, Toyota Research Institute
Mike Garrison - Senior Infrastructure Engineer, Toyota Research Institute
Favorite
S8216 - Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image Learn how to predict a dense depth image from a sparse set of depth measurements and a single RGB image. This approach can be applied to serve as a plug-in module in simultaneous localization and mapping to convert sparse maps to dense maps, and as a super-resolution of LiDAR depth data. We'll describe the performance of our prediction method, explain how to train the depth prediction network, and showcase examples of its applications. Codes and video demonstration are also publicly available. This session is for registrants who are already familiar with basic machine learning techniques. 25-minute Talk Fangchang Ma - Ph.D. Candidate, Massachusetts Institute of Technology
Favorite
S8227 - Integrating the NVIDIA Material Definition Language MDL in Your Application The NVIDIA MDL SDK provides a rich toolset to integrate MDL in a wide range of renderers, from physically based ray tracing to real-time applications. In this tutorial-like session, we'll show how MDL materials and texturing functions can be compiled for OptiX/CUDA, x86, and OpenGL target platforms. We'll present how the MDL Distiller can be used to simplify MDL materials for use with real-time rendering solutions. Developers will learn about the available APIs and example code. 50-minute Talk Sandra Pappenguth - Senior Software Engineer, NVIDIA
Matthias Raab - Senior Graphics Software Engineer, NVIDIA
Favorite
S8343 - Detection of Financial Statement Fraud using Deep Autoencoder Networks

Explore how auditors are applying deep learning to detect "anomalous" records in large volumes of accounting data. The Association of Certified Fraud Examiners estimates in its Global Fraud Study 2016 that the typical organization loses 5% of its annual revenues due to fraud. At the same time, organizations accelerate the digitization of business processes affecting Enterprise Resource Planning (ERP) systems. These systems collect vast quantities of electronic journal entry data in general- and sub-ledger accounts at an almost atomic level. To conduct fraud, perpetrators need to deviate from regular system usage or posting pattern. This deviation will be weakly recorded and reflected accordingly by a very limited number of "anomalous" journal entries of an organization. To anomalous journal entries, several deep auto-encoder networks are trained using NVIDIA's DGX-1 system. The empirical evaluation on two real-world accounting datasets underpinned the effectiveness of the trained network in capturing journal entries highly relevant for a detailed audit while outperforming several baseline methods.

25-minute Talk Marco Schreyer - Researcher, German Research Center for Artificial Intelligence
Timur Sattarov - Forensic Data Analyst, PricewaterhouseCoopers GmbH WPG
Favorite
S8504 - Creating Immersive AI-Powered Virtual Reality Simulation Training For Medical Professionals Experiential learning is among the best ways to practice for pediatric emergencies. However, hospitals are spending millions on expensive and inefficient mannequin-based training that does not consistently offer an authentic experience for med students and doctors, or offer convenient repeatability. Come hear about a groundbreaking pilot program that brought together a hospital and two unique VR and AI developer teams to deliver virtual reality training simulations for some of the most high stakes emergencies hospitals see: pediatric trauma. Learn how doctors aided in the design process to create authentic trauma room scenarios; how expert content and simulation developers crafted a VR experience that would have impact in a world where there is no room for error; and why Oculus supported this project with funding and hardware. 25-minute Talk Shauna Heller - President, North America, AiSolve
Favorite
S8669 - Deep Learning Demystified

What is Deep Learning? In what fields is it useful? How does it relate to artificial intelligence? We'll discuss  deep learning and why this powerful new technology is getting so much attention, learn how deep neural networks are trained to perform tasks with super-human accuracy, and the challenges organizations face in adopting this new approach. We'll also cover some of the best practices, software, hardware, and training resources that many organizations are using to overcome these challenges and deliver breakthrough results.

50-minute Talk William Ramey - Director, Developer Programs, NVIDIA
Favorite
S8715 - Reinforcement Learning for Multiplayer Agents at SEED

Over the last couple of years, neural nets have enabled significant breakthroughs in computer vision, voice generation and recognition, translation, and self-driving cars. Neural nets will also be a powerful enabler for future game development. We'll give an overview of the potential of neural nets in game development, as well as provide an in-depth look at how we can use neural nets combined with reinforcement learning for new types of game AI.  We will also show some new exciting results from applying deep reinforcement learning to AAA games.

50-minute Talk Magnus Nordin - Technical Director, Electronic Arts / SEED
Favorite
S8750 - Porting VASP to GPUs with OpenACC VASP is a software package for atomic-scale materials modeling. It's one of the most widely used codes for electronic-structure calculations and first-principles molecular dynamics. We'll give an overview and status of porting VASP to GPUs with OpenACC. Parts of VASP were previously ported to CUDA C with good speed-ups on GPUs, but also with an increase in the maintenance workload as VASP is otherwise written wholly in Fortran. We'll discuss OpenACC performance relative to CUDA, the impact of OpenACC on VASP code maintenance, and challenges encountered in the port related to management of aggregate data structures. Finally, we'll discuss possible future solutions for data management that would simplify both new development and maintenance of VASP and similar large production applications on GPUs. 50-minute Talk Markus Wetzstein - HPC DevTech Engineer, NVIDIA
Stefan Maintz - DevTech Engineer, NVIDIA
Favorite
S8806 - NVIDIA vGPU and Red Hat Virtualization: High End Virtual Workstations A shared physical graphics processor unit (GPU) exposed to virtual guests as a virtual GPU drastically changes the dynamics of what is possible from both a technical and monetary standpoint in high tech virtual workstations. You are able to run lots of GPU based workloads in multiple VMs on one host utilizing NVIDIA Tesla cards. Attendees will learn about vGPU technology, Virtual Function IO (VFIO) and associated roadmaps. 25-minute Talk Andre Beausoleil - Senior Principal Partner Manager, Red Hat
Favorite
S8881A - NVIDIA Vulkan 2018 Update

Two years after release, Vulkan is a mature and full-featured low-level graphics API, with significant adoption in the developer community.

NVIDIA will present a status update on our Vulkan software stack. We will cover latest Vulkan developments, including extensions, software libraries and tools. We will also cover best practices and lessons learned from our own work with the Vulkan API in the past year.

50-minute Talk Nuno Raposo Subtil - Senior Software Engineer, NVIDIA
Favorite
S8895 - A Component-Based AI Engine Platform for Medical Workflow

As deep learning techniques have been applied to the field of healthcare, more and more AI-based medical systems continue to come forth, which are accompanied by new heterogeneity, complexity and security risks. In the real-world we've seen this sort of situation lead to demand constraints, hindering AI applications development in China's hospitals. First, we'll share our experience in building a unified GPU accelerated AI engine system to feed component-based functionality into the existing workflow of clinical routine and medical imaging. Then, we'll demonstrate in a pipeline of integrating the different types of AI applications (detecting lung cancer, predicting childhood respiratory disease and estimating bone age) as microservice to medical station, CDSS, PACS and HIS system to support medical decision-making of local clinicians. On this basis, we'll describe the purpose of establishing an open and unified, standardized, legal cooperation framework to help AI participants to enter the market in China to build collaborative ecology.

25-minute Talk Xu Chen - Director of AI Research, Winning Health
Favorite
S8399b - Driver Drowsiness Detection for ADAS (2)

We'll present an in-car ADAS technology to detect drowsy driving. This technique can be used to alert and awaken the driver, or take corrective actions if required. We employ a CNN-based approach for this technique, which is trained on a mix of synthetic and real images. We'll cover the details of the detection system pipeline and the synthetic dataset generation. We'll also show a demonstration of the detection system in action.

25-minute Talk Sidharth Varier - Senior System Software Engineer, NVIDIA
Favorite
S8416 - Real-Time Inference of Deep Generative Models on Tegra X1 at Traffic Intersections

Detecting objects, whether they're pedestrians, bicyclists, or other vehicles, at a traffic intersection is essential to ensure efficient traffic flow and the safety of all participants. We'll present an experiment to assess training and real-time inference of a NVIDIA Tegra X1 SoC module with a suite of GigE Flea3 Point Grey cameras installed on a vehicle. The system is to be trained using a subset of data collected from different types of busy intersections on a university campus and testing is to be done on the remaining data. We'll use a deep generative model that can learn and reconstruct the traffic scene. We'll share our CUDA optimization strategies on the Tegra X1 and the real-time performance of the inference model.

25-minute Talk Menna El-Shaer - Doctoral Student/Researcher, The Ohio State University
Favorite
S8528 - Accelerating Bioinformatics: End-to-End Computation of NASA GeneLab Data with GPU Data Frame

Protecting crew health is a critical concern for NASA in preparation of long duration, deep-space missions like Mars. Spaceflight is known to affect immune cells. Splenic B-cells decrease during spaceflight and in ground-based physiological models. The key technical innovation presented by our work is end-to-end computation on the GPU with the GPU Data Frame (GDF), running on the DGXStation, to accelerate the integration of immunoglobulin gene-segments, junctional regions, and modifications that contribute to cellular specificity and diversity. Study results are applicable to understanding processes that induce immunosuppression—like cancer therapy, AIDS, and stressful environments here on earth.

25-minute Talk Venkat Krishnamurthy - Head, Product Management, MapD Technologies
Jacqueline Cenci-McGrody - Solutions Architect (Partner SA), NVIDIA
Favorite
S8580 - Modernizing OpenMP for an Accelerated World OpenMP has come a long way in its first 20 years, but the last few have brought by far the most change. With accelerated computing on the rise, OpenMP integrated features to address distributed memory devices and offloading to accelerators. Now, as we prepare for the next generation of supercomputers and GPUs, OpenMP is growing to meet the challenges of productively programming scientific applications in a world of accelerators, unified memory, and explicitly hierarchical memories. This talk will discuss the present and future of OpenMP as we ramp up to version 5.0, presenting some of the new features incorporated so far and how they are shaped by and in turn how they shape large scale scientific applications. 25-minute Talk Bronis de Supinski - Chief Technology Officer for Livermore Computing, Lawrence Livermore National Laboratory
Tom Scogland - Computer Scientist, Lawrence Livermore National Laboratory
Favorite
S8638 - Make Yield Curve Construction More Intelligent with GPU

The yield curve provides the information of bonds' returns of various maturities, and reflects extremely complex market interactions and monetary policy. The yield curve constructing models, such as the Spline Fitting Model, use a number of bond sample points and model parameters to deduce the yield curve. It involves repeated experiments by choosing appropriate bond samples which rely highly on manual operation. Due to the amount of relevant information and rapid growth of transaction data, this task becomes even more challenging. Some literatures show that deep learning can detect and exploit interactions in the data that are, invisible to any existing financial economic theory. By discovering latent patterns in historical data, it can be a good supplement for choosing active samples and assessing curve's quality. In financial applications, accuracy and speed are both of critical importance. The GPU is applied to both deep learning framework and yield curve construction. Intelligent, fast and accurate, our yield curve construction framework achieves 5x speed up vs manual operation, and provides a feasible way for future practice.

25-minute Talk Joe Zhang - Project Manager, Shanghai Clearing House
Favorite
S8976 - Create Customer Value with Google Cloud AI (Presented by Google)

In this session, you will learn how Google Cloud helps enterprises make the most out of data, and deliver customer value. We will provide an in-depth overview of the Cloud AI and Data Analytics offering that helps enterprises manage their ML lifecycle, from data ingestion to insights and prediction. We will also demonstrate some breakthrough solutions, like AutoML, that are making ML accessible to everyone.

50-minute Talk Chris Kleban - Product Manager, GPUs on Google Cloud, Google Inc.
Favorite
S8980 - An Introduction to the GPU Memory Model - Session 2 of 4 (Presented by Acceleware)

Explore the memory model of the GPU! This session will begin with an essential overview of the GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. Features such as shared memory configurations and Read-Only Data Cache are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!

80 Minutes Tutorial Dan Cyca - Chief Technology Officer, Acceleware
Chris Mason - Technical Product Manager, Acceleware
Favorite
S8273 - Programming GPU Supercomputers Ten Years From Now We'll briefly review how programming for GPU computing has progressed over the past ten years, and where it is going over the next ten years, specifically for data management and parallel compute management. CUDA languages expose all aspects of data and compute management, allowing and sometimes requiring programmers to take control of both. Libraries typically internalize all compute management, and some internalize all data management as well. Directives virtualize both data and compute management, but don't completely hide either. Future hardware and software capabilities will allow programs to enjoy automatic data movement between DDR memory and GPU device memory, and enhanced caching hardware reduces the need for explicit scratchpad memory programming. As parallel constructs are added to standard programming languages, writing parallel programs for GPU computing will become no more or less difficult than multicore programming. 25-minute Talk Michael Wolfe - Compiler Engineer, NVIDIA
Favorite
S8314 - Multi GPU Programming with MPI Learn how to program multi-GPU systems or GPU clusters using the message passing interface (MPI) and OpenACC or NVIDIA CUDA. We'll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we'll cover advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. We'll also cover the latest improvements with CUDA-aware MPI, interaction with unified memory, the multi-process service (MPS, aka Hyper-Q for MPI), and MPI support in NVIDIA performance analysis tools. 50-minute Talk Jiri Kraus - Senior Devtech Compute, NVIDIA
Favorite
S8524 - Leveling Up to Autonomous Design

Data fuels so much of our lives. It accelerates our conversations, our decisions, our very ideas. And in the physical world, data is already acting as an accelerant to how we take these ideas and make them real. As the things we make become increasingly connected, our world becomes increasingly computable. And anything that becomes easily computable, becomes equally mutable. What does this mean for the world we live in? As we let go of our design tools and hand more control to intelligent algorithms, we'll see this reflected in the real world: the world of self-driving everything.

25-minute Talk Radha Mistry - Story Strategist, Autodesk
Favorite
S8532 - Cascaded 3D Fully Convolutional Networks for Medical Image Segmentation

We'll show how recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. FCNs can be trained to automatically segment 3D medical images, such as computed tomography (CT) scans based on manually annotated anatomies like organs and vessels. The presented methods achieve competitive segmentation results while avoiding the need for handcrafting features or training class-specific models, in a clinical setting. We'll explain a two-stage, coarse-to-fine approach that will first use a 3D FCN based on the 3D U-Net architecture to roughly define a candidate region. This candidate region will then serve as input to a second 3D FCN to do a fine prediction. This cascaded approach reduces the number of voxels the second FCN has to classify to around 10 percent of the original 3D medical image, and therefore allows it to focus on more detailed segmentation of the organs and vessels. Our experiments will illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results on many datasets. Code and trained models will be made available.

25-minute Talk Holger Roth - Assistant Professor (Research), Nagoya University
Favorite
S8561 - "Free" In Situ Volume Compression Using NVENC Scientific simulations typically store just a small fraction of their computed timesteps--as few as one in 500--due to I/O and storage limitations. Previous work has demonstrated in situ software-based compression, but at the cost of compute cycles that simulation scientists are loath to sacrifice. We propose the use of the special-purpose video processing unit (VPU), currently unutilized in the HPC context, for inexpensive lossy encoding. Our work demonstrates that video encoding quality is suitable for volumes and recommends encoder settings. We'll show that data can be encoded in parallel with a hybrid CPU/GPU computation with minimal impact on run time. We'll also demonstrate that decoding is fast enough for on-the-fly decompression during analysis. 25-minute Talk Nick Leaf - Graduate Student Researcher, University of California, Davis
Favorite
S8601 - NVIDIA GPU Video Technologies and Video Codec SDK: Updates and Roadmap NVIDIA's video SDK is a set of APIs for hardware-accelerated video encoding and decoding using NVIDIA GPUs. We'll provide an overview of the APIs, with particular emphasis on the latest features, such as FFmpeg support of NVIDIA-accelerated transcoding, quality and performance enhancements. We'll discuss some strategies on efficient usage of GPU video hardware acceleration for use cases such as video inferencing, transcoding, and media archiving. 50-minute Talk Abhijit Patait - Director, System Software, NVIDIA
Favorite
S8796 - Deep Neural Network-Based Cooperative Visual Tracking Through Multiple Flying Robots Human and animal full-body motion capture (MoCap) in outdoor scenarios is a challenging and largely unsolved problem. We'll introduce a multiple flying robots-based solution for it. MoCap systems like Vicon, Optitrack, and the 4D Dynamic Body Scanner at MPI-IS Tuebingen achieve high degrees of accuracy in indoor settings. Besides being bulky, they make use of reflected infrared light and heavily rely on precisely calibrated wall or ceiling-mounted fixed cameras. Consequently, such systems cannot be used to perform MoCap in outdoor scenarios where changing ambient light conditions persist and permanent fixtures in the environment cannot be made. Our outdoor MoCap solution involves flying robots with on-board cameras, Intel i7 CPUs, NVIDIA Jetson TX1 GPU modules, and a deep learning-based approach. 50-minute Talk Aamir Ahmad - Research Scientist, Max Planck Institute for Intelligent Systems
Eric Price - PhD Student, Max Planck Institute for Intelligent Systems
Favorite
S8207 - Demystifying the Available Codec Options for your NVIDIA Virtual GPU Deployments

We're sure you heard about Citrix's HDX and VMware's Blast Extreme protocol. Maybe you know about different codecs like H.264, H.265/HEVC, VP9, AV-1 or MJPEG and 2DRLE. We'd like to give you some insights what codec technology can be used in which remoting protocol and what you can expect in regards to density, image quality and granularity when configuring these codecs. What do you think is better, Adaptive Display V2 or Full screen H.264. Better use YUV 4:2:0 or YUV 4:4:4 for H.264 ? PCoIP or Blast Extreme ? Should you use NVenc or not, what options available in VDI and RDSH, on Kepler, Maxwell and Pascal ? You probably want to ask what's recommended to use ? As always: It depends. So please join our session to learn more and discuss about the pros & cons of the available technologies and how you can make the best out of it four YOUR deployment

25-minute Talk Simon Schaber - GRID Solution Architect, NVIDIA
Ronald Grass - Sr. Sales Engineer, Citrix Systems GmbH
Favorite
S8324 - Synthetic Data Generation for an All-in-One Driver Monitoring System Driver monitoring systems are used to detect many driver attributes like gaze, head pose, eye openness, and other features pertaining to attention and assistance. We'll present a synthetic method of generating data for training DNNs, which caters to the above mentioned features of the subject. We use blender for generating synthetic images, powered by NVIDIA GPUs, which can be scaled to match training needs. Synthetic data generatation allows precise control over data points that are difficult to control in a real environment, like pupil dialation. This approach avoids noisy measurements and results in high accuracy without the need for a high-precision 3D sensor. 25-minute Talk Sagar Bhokre - Senior System Software Engineer, NVIDIA
Favorite
S8355 - How To Train and Execute a Deep Learning Model Able to Re-identify and Extract Attributes from Humans

We'll present a deep learning system able to decide if two people are similar or not. This system use the global appearance of a person, not just the face, to perform the re-identification. Our system also provides attributes (top color, bottom color, genre, length of the clothes, and the hair). We'll describe how to train a system with tensorflow on a GPU cluster and how to use it on a global video analysis system running on GPU devices.

25-minute Talk Matthieu Ospici - AI Engineer, Atos
Favorite
S8492 - Fire Simulation & Visualization at a Nuclear Power Plant using GPUs See how computational risk analysis is aided by the means of advanced visualization and simulation techniques of a fire propagating in a nuclear plant room by means NVIDIA's GPU based GVDB library. Dive into the visualization techniques used in the process for performing analysis to design & incorporate additional safety measures for Nuclear Plants. 25-minute Talk Ramprasad Sampath - Director of R&D, Centroid LAB
Favorite
S8650 - Cadillac in VR "Cadillac in VR" is the premiere VR showroom experience. In our presentation we want to highlight the needs Cadillac came to us with, our approach for creating this experience, key challenges we faced during development, our final results, and what this might mean for the future of car buying. The needs we discuss will involve key points of change in the automotive industry and how Cadillac wanted to adapt to those changes. Our approach will touch on how we established our underlying philosophy which guided our decision making process throughout development. Following that, we will dive deeper into the technical challenges we faced while developing the experience. The environment, level of detail, lighting, UX/UI, and hardware are all key areas of discussion. We hope to have someone on stage at this point with the experience running to further add emphasis and clarification. Finally, we'll cover how all this came together in our final product and where we think it might take the future of buying a car. 25-minute Talk Mike Konchalski - Director of Technology, All Things Media
Favorite
S8805 - Managing Memory of Complex Aggregate Data Structures in OpenACC It is extremely challenging to move data between host and device memories when deep nested complex aggregate data structures are commonly used in an application. This talk will bring users diving into VASP, ICON, and other real-world applications and see how the deep copy issue is solved in these real-world applications with PGI compiler and OpenACC APIs. The OpenACC 2.6 specification includes directives and rules that enable programmer-controlled manual deep copy, albeit in a form that can be intrusive in terms of the number of directives required. The OpenACC committee is designing new directives to extend explicit data management to aggregate data structures in a form that is more elegant and concise. The talk will also cover comparison of unified memory, manual deepcopy, full deepcopy, and true deepcopy. 25-minute Talk Xiaonan Tian - GPU Compiler Engineer, NVIDIA
Favorite
S8919 - Medical Imaging with TensorFlow

Dive in to recent work in medical imaging, where TensorFlow is used to spot cancerous cells in gigapixel images, and helps physicians to diagnose disease. During this talk, we'll introduce concepts in Deep Learning, and show concrete code examples you can use to train your own models. In addition to the technology, we'll cover problem solving process of thoughtfully applying it to solve a meaningful problem. We'll close with our favorite educational resources you can use to learn more about TensorFlow.

25-minute Talk Josh Gordon - Developer Advocate for TensorFlow, Google
Favorite
S8924 - Block-Sparse Recurrent Neural Networks Recurrent neural networks are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modeling. Sparsity is a technique to reduce compute and memory requirements of deep learning models. Sparse RNNs are easier to deploy on devices and high-end server processors. Even though sparse operations need less compute and memory relative to their dense counterparts, the speed-up observed by using sparse operations is less than expected on different hardware platforms. To address this issue, we prune blocks of weights in a layer instead of individual weights. Using these techniques, we can create block-sparse RNNs with sparsity ranging from 80% to 90% with a small loss in accuracy. This technique allows us to reduce the model size by 10x. Additionally, we can prune a larger dense network to recover this loss in accuracy while maintaining high block sparsity and reducing the overall parameter count. Our technique works with a variety of block sizes up to 32x32. Block-sparse RNNs eliminate overheads related to data storage and irregular memory accesses while increasing hardware efficiency compared to unstructured sparsity. 25-minute Talk Eric Undersander - Research Engineer, Baidu USA
Sharan Narang - Systems Researcher, Baidu USA
Favorite
L8111B - Jetson Developer Tools Training Labs - Repeat

This lab is focused on teaching you how to maximize the productivity when developing software for the Jetson platform. You will experience first hand how to manage source code on the host PC to cross-compile the software, initiate remote debugging sessions to debug CPU C/C++ and CUDA C code. Through a comprehensive set of exercises, you will also learn how to use the CUDA Visual Profiler for optimizing CUDA kernels, use the Tegra System Profiler for optimizing CPU code and tracing multi-process system-wide activities, and use Tegra Graphics Debugger for debugging and profiling 3D graphics applications. Prerequisites: Basic CUDA-C and C++ coding skills.

120 Minutes Instructor-Led Lab Sebastien Domine - VP SW Eng. Developer Tools, NVIDIA
Favorite
L8129 - Generate Financial Time Series with Variational Autoencoders

In this lab we explain how generative models such as deep variational autoencoders can generate all kinds of realistic time series data for prices of stocks, FX rates etc. The ability to generate realistic time series is of great importance to improve the robustness of risk management or algorithmic trading applications. We look behind the theory of variational autoencoders and walk step by step through the implementation of a simple Gaussian recurrent variational autoencoder in TensorFlow. After this lab the attendee will be able to build more general generative models and train them with data.

120 Minutes Instructor-Led Lab Daniel Egloff - Founder, Flink AI
Favorite
L8150 - Image Style Transfer with Torch

Framework: TensorFlow, DIGITS

This lab will guide you through how to transfer the look and feel of one image to another image by extracting distinct visual features. See how convolutional neural networks are used for feature extraction, and how these features feed into a generator to create a new image. You'll learn how to:

•Transfer the look and feel of one image to another image by extracting distinct visual features

•Qualitatively determine whether a style is transferred correctly using different techniques

•Use architectural innovations and training techniques for arbitrary style transfer

Upon completion, you'll be able to use neural networks to do arbitrary style transfer that's fast enough to apply even to videos.

Presented by the NVIDIA Deep Learning Institute (DLI).

120 Minutes Instructor-Led Lab Steven Harpster - Technical Marketing Engineer, NVIDIA
Favorite
L8169 - Medical Image Analysis with R and MXNet Convolutional neural networks (CNNs) can be applied to medical image analysis to infer patient status from non-visible images. Train a CNN to infer the volume of the left ventricle of the human heart from time-series MRI data and learn to: • Extend a canonical 2D CNN to more complex data • Use the framework MXNet through the standard Python API and through R • Process high-dimensionality imagery that may be volumetric and have a temporal component Upon completion, you'll know how to use CNNs for non-visible images. Prerequisites: Some experience training neural networks using datasets 120 Minutes Instructor-Led Lab Abel Brown - Certified Instructor, NVIDIA
Favorite
CE8102 - Connect with the Experts: OpenACC - Quick On-ramp to GPUs

This session is designed for anyone who is either looking to start with GPUs or already accelerating their code with OpenACC on GPUs or CPUs. Join OpenACC experts and your fellow OpenACC developers to get an expert advice, discuss your code and learn how OpenACC Directives are used by others.

Connect directly with NVIDIA engineers and experts from other organizations on specific topics. Come on in and ask a question.

1 Hour Connect with the Experts Sunita Chandrasekaran - Assistant Professor, Department of Computer & Information Sciences, University of Delaware
Jeffrey Larkin - Senior DevTech Software Engineer, NVIDIA
Robert Crovella, NVIDIA
Guido Juckeland - Head of Computational Science Group, Helmholtz-Zentrum Dresden-Rossendorf
Michael Wolfe - Compiler Engineer, NVIDIA
Robert Henschel - Director Science Community Tools, Indiana University
Randy Allen - Director, Mentor Graphics
Favorite
S81041 - Using HPC Computational Physics Tools for Advanced Engineering Simulations and Production Deployment (Presented by Amazon Web Services)

AWS offers the most powerful GPU-accelerated cloud infrastructure that delivers unparalleled computational efficiency for advanced engineering simulations and analysis, enabling High Performance Computing (HPC) workloads to run in the cloud at scale. This session features a real-world use case from the advanced product engineering team at Western Digital, who is using HPC solutions to model new technologies and capabilities prior to production. Western Digital's computational tools incorporate the description of physics occurring during the HDD recording process and ultimately result in input to a recording sub system channel model which produces an Error Rate. The length scales involved in the recording model range from a few nanometers in the description of the recording media to microns in the description of the recording head. The power of the current generation of NVIDIA GPUs allows Western Digital to generate enough simulation data so that the same recording sub system channel model, used in experiments, can be employed in studies that include fabrication processes variances. 

50-minute Talk David Hinz - Senior Director, Engineering Services, Cloud and Data Center Computing Operations, Western Digital Technologies, Inc.
David Pellerin - Head of WW Business Development for Hitech/Semiconductor, Amazon Web Services
Byron Lengsfield - Research Scientist, Western Digital
Favorite
S8249 - Machine Learning on VMware vSphere using NVIDIA's Virtualized GPUs You'll learn about enabling virtualized GPUs for machine learning workloads on VMware vSphere and combining GPU performance with data center management benefits of VMware vSphere. NVIDIA's Pascal GPU is the first GPU to offer both virtualized Compute/CUDA and virtualized Graphics. It supports multiple virtual machines (VM) sharing GPU for both compute and graphics capabilities. We will present our research results of machine learning workloads with vSphere platform using NVIDIA's virtualized GPUs. Learn different ways to deploy GPU-based workloads developed with popular machine learning frameworks like TensorFlow and Caffe using VMware DirectPathIO and NVIDIA vGPU solutions. We will discuss use cases to leverage best scheduling methods Equal Share, Fixed Share and Best Effort for virtualized GPUs and illustrate their benefits via our performance study. We address the scalability of machine learning workloads in term of the number of VMs per vSphere server and the number of GPUs per VM. Data center resource utilization of these workloads on vSphere with NVIDIA GPUs is also analyzed and presented. 50-minute Talk Uday Kurkure - Staff Engineer, VMware
Lan Vu - California, VMware
Favorite
S8297 - HornetsNest - Scalable Static and Dynamic Graph Algorithms Made Easy We'll present HornetsNest, a framework for developing static and dynamic graph algorithms with relative ease. Through a small subset of graph primitives, which are the API for our framework, it is possible to implement parallel graph algorithms using a fairly small number of code lines. These graph primitives are optimized in the backend and as such programmers can focus on algorithm design rather than load-balancing, system utilization, and optimizations. Using these primitives, it's possible to implement BFS in roughly 10 lines of code. Performance-wise, this BFS performs as well is its counterpart in the Gunrock library. More importantly, HornestsNest is the first framework to support a wide range of high-performing dynamic graph analytics, including new algorithms for dynamic triangle counting, dynamic page rank, and dynamic Katz centrality. Finally, we'll cover the performance of numerous graph algorithms. 25-minute Talk Oded Green - Research Scientist, Georgia Institute of Technology
Favorite
S8475 - Accelerating Linear Algebra on Small Matrices - from Batched BLAS to Large Scale Solvers Learn how to accelerate many small-sized linear algebra problems - from kernels to large-scale solvers. We describe techniques targeting parallelization, vectorization, and communication, which have become extremely challenging on many-core architectures/GPUs. Standard interfaces, called batched APIs, are proposed to be included in highly-optimized libraries like MAGMA that provide the most extended set of batched BLAS and LAPACK functionalities to date. We'll describe the developments as well as their use to accelerate applications from big data analytics to high-order FEM tensor computations, and low-rank approximations for solvers and preconditioners. We'll also concentrate on the GPU acceleration of a large-scale distributed-memory solver that uses a hierarchically compressed coefficient matrix. 50-minute Talk Ichitaro Yamazaki - Research Scientist, UTK
Stanimire Tomov - Research Director, UTK
Favorite
S8518 - An Introduction to NVIDIA OptiX We'll explain the NVIDIA OptiX ray tracing engine, a sophisticated library for performing GPU ray tracing. We'll provide an overview of the OptiX ray tracing pipeline and the programmable components. OptiX can be used in many domains, ranging from rendering to acoustic modeling to scientific visualization. We'll dive deeper into the new features of OptiX and present code samples demonstrating best practices for writing a high-performance ray tracer using the OptiX programming model. 80 Minutes Tutorial Ankit Patel - Senior Product Manager, NVIDIA
Detlef Roettger - Senior Developer Technology Engineer, NVIDIA
Favorite
S8604 - Developing Agile UAV Autonomy in a Virtual Reality Environment

Despite the high level of interest in autonomous unmanned aerial vehicles (UAVs) over the last few years, the gap between human pilots and UAVs without an external infrastructure remains exceedingly large. Autonomous UAVs face limitations, both in autonomy algorithms and in the platforms and testing environments required to develop the algorithms. We'll discuss a UAV system built around a Jetson TX1 module and a custom carrier board to provide the computation, sensors, and agility required for high-performance flight; a real-time photorealistic image simulation testing environment that acts as a virtual reality environment while a UAV is in flight; and the vision-based algorithms developed using the aforementioned two that enable autonomous agile flight.

25-minute Talk Thomas Sayre-McCord - PhD Candidate, MIT
Favorite
S8607 - Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training We find 99.9 percent of the gradient exchange in distributed SGD is redundant, and we propose deep gradient compression (DGC) to greatly reduce the communication bandwidth and improve the scalability of distributed training. To preserve accuracy during this compression, DGC employs four methods: momentum correction, local gradient clipping, momentum factor masking, and warm-up training. We have applied DGC to image classification, speech recognition, and language modeling with multiple datasets including Cifar10, ImageNet, Penn Treebank, and Librispeech Corpus. In all these scenarios, DGC achieves a gradient compression ratio from 270x to 600x without losing accuracy, cutting the gradient size of ResNet-50 from 97MB to 0.35MB, and for DeepSpeech from 488MB to 0.74MB. DGC enables large-scale distributed training on inexpensive commodity 1Gbps Ethernet and facilitates distributed training on mobile. 50-minute Talk Song Han - scientist, Stanford/Google/MIT
Favorite
S8651 - Extracting Data from Tables and Charts in Natural Document Formats Financial analysis depends on accurate financial data, and these data are often distributed via PDF and other "natural document" formats. While these formats are optimized for easy human comprehension, automatically extracting the data can be quite challenging. We'll describe our work using a deep learning pipeline to extract data from tables and charts in PDF documents. We'll also show some of our latest research, inspired by image captioning models, for directly going from images of tables to a markup language (LaTeX) representation. 50-minute Talk David Rosenberg - Data Scientist, Office of the CTO, Bloomberg
Philipp Meerkamp - Financial Software Engineer, Bloomberg
Favorite
S8718 - Optimizing HPC Simulation and Visualization Codes Using the NVIDIA Nsight Systems

Are you readying your application for dense multi-GPU compute hardware like the NVIDIA DGX or ORNL Summit? Are you sure your CPUs and GPUs are all working to their capabilities? Are you looking for a tool to squeeze out that last bit of performance? Come and learn how the new NVIDIA Nsight Systems can help you maximize the performance of your simulation and visualization applications on GPU-accelerated clusters. Learn suggested techniques and best practices for optimizing HPC workloads. NVIDIA engineers and the developers of molecular modeling tools at University of Illinois will share their experiences using the NVIDIA Nsight Systems to analyze and optimize several of their HPC applications, including NAMD, VMD, and Lattice Microbes. The session will highlight several intermediate and advanced profiling techniques and will demonstrate how incorporation of NVTX profiling hooks into the application can help focus profiling activity and improve clarity of profiling results in complex HPC apps.

50-minute Talk Robert Knight - Software Engineer, NVIDIA
Daniel Horowitz - Director of Platform Developer Tools, NVIDIA
John Stone - Senior Research Programmer, University of Illinois at Urbana Champaign
Favorite
S8742 - Optimizing for Real-Time Inference Real-time games have an extremely small budget for computations of each frame. Learn the right way to approach real-time performance with inference workloads, taking advantage of the newest technologies available. 50-minute Talk Donald Brittain - Principal Engineer, NVIDIA
Favorite
S8816 - How Deep Learning Could Predict Weather Events How do meteorologists predict weather or weather events such as hurricanes, typhoons, and heavy rain? Predicting weather events were done based on supercomputer (HPC) simulations using numerical models such as WRF, UM, and MPAS. But recently, many deep learning-based researches have been showing various kinds of outstanding results. We'll introduce several case studies related to meteorological researches. We'll also describe how the meteorological tasks are different from general deep learning tasks, their detailed approaches, and their input data such as weather radar images and satellite images. We'll also cover typhoon detection and tracking, rainfall amount prediction, forecasting future cloud figure, and more. 50-minute Talk Sa-Kwang Song - Principal Researcher, Korea Institute of Science and Technology
Favorite
S8851 - The Road From GPU-Powered Prototypes to Production-Ready ECUs GPUs provide power-efficient hardware acceleration for graphics processing and deep learning algorithms, making them the ideal compute processors for highly automated driving functionalities. Despite the predominance of GPUs in the development of prototypes, the actual market penetration of GPUs in series-production electronic control units (ECUs) remains comparably low. In this talk we will focus on a key contributor to this problem: deficient support for integration into the design processes of the automotive supply chain and automotive software standards. 50-minute Talk Alexander Much - Chief Expert, Elektrobit Automotive GmbH
Christoph Herzog - Head of Portfolio Management, Elektrobit Automotive GmbH
Favorite
S8908 - ORNL Summit: Enabling Large Scale Science on Summit Through the Center for Accelerated Application Readiness

The Center for Accelerated Application Readiness within the Oak Ridge Leadership Computing Facility is a program to prepare scientific applications for next generation supercomputer architectures. Currently the program consists of thirteen domain science application development projects focusing on preparing codes for efficient use on Summit. Over the last three years, these teams have developed and executed a development plan based on detailed information about Summit's architecture and system software stack. This presentation will highlight the progress made by the teams that have used Titan, the 27 PF Cray XK7 with NVIDIA K20X GPUs, SummitDev, an early IBM Power8+ access system with NVIDIA P100 GPUs, and since very recently Summit, OLCF's new IBM Power9 system with NVIDIA V100 GPUs. The program covers a wide range of domain sciences, with applications including ACME, DIRAC, FLASH, GTC, HACC, LSDALTON, NAMD, NUCCOR, NWCHEM, QMCPACK, RAPTOR, SPECFEM, and XGC

25-minute Talk Jack Wells - Director of Science, Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory
Favorite
S8981 - Asynchronous Operations and Dynamic Parallelism in CUDA - Session 3 of 4 (Presented by Acceleware)

This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. The second part of the session will focus on dynamic parallelism. A programming demo involving asynchronous operations will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!

80 Minutes Tutorial Chris Mason - Technical Product Manager, Acceleware
Dan Cyca - Chief Technology Officer, Acceleware
Favorite
S8488 - Leveraging GPUs for Bayesian Inference We'll present results on speeding up Bayesian inference in NVIDIA DGX-1 server for medical diagnostics. Bayesian inference is an AI technique to reason under uncertainty that is computationally and data intensive. We'll discuss the implications for both inference and training of Bayesian networks. 25-minute Talk Alex Kozlov - Solutions Architect, NVIDIA
Alec Gunny - Solutions Architect, NVIDIA
Favorite
S8960 - Computational Pathology at Scale: Changing Clinical Practice One Petabyte at a Time

How can we train medical deep learning models at a petabyte scale and how can these models impact clinical practice? We will discuss possible answers to these questions in the field of Computational Pathology. Pathology is in the midst of a revolution from a qualitative to a quantitative discipline. This transformation is fundamentally driven by machine learning in general and computer vision and deep learning in particular. With the help of PAIGE.AI we are building a clinical-grade AI at Memorial Sloan Kettering Cancer Center. The models are trained based on petabytes of image and clinical data on top of the largest DGX-1 V100 cluster in pathology. The goal is not only to automated cumbersome and repetitive tasks, but to impact diagnosis and treatment decisions in the clinic. This talk will focus on our recent advances in deep learning for tumor detection and segmentation, on how we train these high capacity models with annotations collected from pathologists, and how the resulting systems are implemented in the clinic.

50-minute Talk Thomas Fuchs - Associate Professor, Memorial Sloan Kettering Cancer Center
Favorite
CE8141 - Connect with the Experts: VR: GL, DX & VK

Come talk to us about anything VR related. We invite you to discuss anything from efficient rendering over multi-GPU rendering to the newest hardware features.

1 Hour Connect with the Experts Jan Robert Menzel - Developer Technology Engineer, NVIDIA
Kai Ingo Esser - Sr. Devtech Engineer, NVIDIA
Favorite
L8172 - DRL for Optimal Execution of Portfolio Transactions Deep reinforcement learning (D-RL) can be trained to optimize execution of large portfolio transactions using a simple environment simulator. Simulator will generate features such as current and lagged values of spread, best bid/ask, volume, volatility, log-return, bid-ask imbalance, etc. for Neural Network to optimize trading trajectories. Upon completion, you should have a starting framework for incorporating higher dimensional order book data into the methodology that allows state-of-the-art Neural Networks to be used for optimization. Prerequisites: Working knowledge of basic scientific python, Basic level knowledge of TensorFlow 120 Minutes Instructor-Led Lab Onur Yilmaz - Deep Learning Solution Architect and Certified Instructor, NVIDIA
Favorite
S8158 - Graph-Centric AI for Cybersecurity Large enterprise networks and computer systems face the daily challenge of cyberattacks, which originate from software and hardware vulnerabilities and result in data theft, service interruption, and monetary loss. To address this challenge, we've developed a set of graph-based machine learning techniques for accelerating threat detection on GPUs. We'll present our research on graph-centric AI that can be used to discover malicious actions in time to prevent irreversible damage to the systems. In the era of big data, these techniques help us to have a deep understanding of critical relationships in computer systems, social networks, and IoT, which is essential in many industry segments, including defense, software, finance, e-commerce, and healthcare. 50-minute Talk Howie Huang - Associate Professor, The George Washington University
Favorite
S8161 - GPU Acceleration of Direct Sparse Matrix Solver for ANSYS Electronics A GPU-accelerated direct sparse matrix solver has been in use at ANSYS since 2016. It achieves high performance on CPUs and GPUs for a wide range of electromagnetic problems, in comparison with state-of-the-art commercial and open-source software. We'll review the current GPU acceleration technique, and describe our recent improvements to the GPU-enabled matrix solver technique, observing up to 1.5x speedup over the existing GPU algorithm. This new innovation enables GPU acceleration of matrix computations that would not benefit from GPUs before. 25-minute Talk Zhen Wang - Senior Research and Development Engineer, ANSYS
Favorite
S8188 - Application of openACC to Computer Aided Drug Discovery software suite "Sanjeevini"

We will demonstrate the features and capabilities of OpenACC for porting and optimizing the ParDOCK docking module of the Sanjeevini suite for computer aided drug discovery developed at the Supercomputing Facility for Bioinformatics and Computational Biology at the Indian Institute of Technology Delhi. We have used OpenACC to efficiently port the existing C++ programming model of ParDOCK software with minimal code modifications to run on latest NVIDIA P100 GPU card. These code modifications and tuning resulted in a six times average speedup of improvements in turnaround time. By implementing openACC, the code is now able to sample ten times more ligand conformations leading to an increase in accuracy. The ACC ported ParDOCK code is now able to predict a correct pose of a protein-ligand interaction from 96.8 percent times, compared to 94.3 percent earlier (for poses under 1 A) and 89.9 percent times compared to 86.7 percent earlier (for poses under 0.5 A).

25-minute Talk Bharatkumar Sharma - Senior Solution Architect, NVIDIA
Abhilash Jayaraj - Research Scholar, Indian Institute of Technology Delhi
Favorite
S8517 - Applying AI to Simplify Support- Lessons Learnt We'll provide insights into how customer support built on the foundation of AI can help streamline customer support for large enterprises, especially manufacturers. With AI technologies like image recognition and natural language processing maturing, enterprises should strongly consider building an AI-based support platform, especially those with an omni-channel strategy. Delivering an amazing and differentiated user experience will lead to higher net promoter and customer satisfaction scores. By employing AI-based technologies, enterprises can reduce their contacts, consequently reducing their cost and contact. It will also help them sell more replacement parts online. 25-minute Talk Satish Mandalika - CEO & Co Founder, Drishyam.ai
Favorite
S8527 - Efficient Parallel Distributed Approaches for Deep Reinforcement Learning

Deep Reinforcement Learning is quickly becoming one of the most exciting fields in machine learning. Continuous interactions with an environment allow learning agents to learn the optimal execution policy from past experience by optimizing parameterized Neural Network models. However, a single learning agent suffers from limitations on computation resources as well as the unary exposure to the environment. To counter these limitations, we can scale the Deep Reinforcement Learning process by parallelizing training and processes to collect data from the environment. Existing efforts include novel distributed Deep Reinforcement Learning algorithms such as G-A3C, and TRPO, and open-source libraries for implementations, including BigDL and Ray. In this session, we will review key parallel distributed algorithms and libraries for Deep Reinforcement Learning.

50-minute Talk Marcos Campos - Head of Artificial Intelligence, Bonsai
Favorite
S8534 - Making Business Application Intelligent Using SAP Leonardo Machine Learning

The SAP Leonardo Machine Learning provides capabilities, micro-services, applications, and technology that enable the integration and the adaption of ML in the enterprise. We will present how ML technology works and how you can transform your business, with SAP Leonardo ML and the power of SAP Cloud Platform (SCP). One of the ML use cases we built is called Catalog Normalization. This solution processes catalogs received from suppliers, extracts attributes from free-text descriptions and normalizes attribute names and values. In this talk, we will also review this solution to show how deep learning models can be used to solve this problem for enterprise using SAP Leonardo Machine Learning.  

50-minute Talk Frank Wu - Head of SAP Machine Learning Business Network, SAP Labs
Nazanin Zaker - Lead Data Scientist, SAP
Favorite
S8571 - Towards AI Agents That Can See, Talk, and Act We are witnessing unprecedented advances in computer vision and AI. What lies next for AI? We believe that the next generation of intelligent systems (say the next generation of Google's Assistant, Facebook's M, Apple's Siri, Amazon's Alexa) will need to possess the ability to perceive their environment (through vision, audition, or other sensors), communicate (i.e., hold a natural language dialog with humans and other agents), and act (e.g., aid humans by executing API calls or commands in a virtual or embodied environment), for tasks such as: aiding visually impaired users in understanding their surroundings; interacting with an AI assistant (Human: 'Alexa – can you see the baby in the baby monitor?', AI: 'Yes, I can', Human: 'Is he sleeping or playing?'); robotics applications (e.g. search and rescue missions) where the operator may be situationally blind and operating via language. We'll present work from our lab on a range of projects on such visually grounded conversational agents. 25-minute Talk Dhruv Batra - Assistant Professor and Researcher, Georgia Tech and Facebook AI Research
Favorite
S8735 - A.I. Disrupting the Future of Content Creation for Games

The artistic manpower needed to create a video-game has been increasing exponentially over the years. Thanks to the computational power of NVIDIA GPUs, new AI accelerated workflows are poised to solve this problem, saving artists and studios time and money, and driving greater creativity. Artomatix is the leading pioneer in this space, its AI-based approach to content creation helps automate many of the mundane, tedious and repetitive tasks artists and designers face every day. This talk introduces the academic theory and history behind Creative AI and then delves into specific use cases and applications such as: Texture Synthesis, Material Enhancement, Hybridization and Style Transfer. Finally, this talk presents the next generation of tools for the creative industries, powered by AI, and gives case studies on how they've been solving some of the game industries largest problems over the past year. Join this session to gain an insight to the future of game creation.

50-minute Talk Eric Risser - Founder & CTO, Artomatix
Favorite
S8760 - Deep-Learning Inferencing on IBM Cloud with NVIDIA TensorRT We'll focus on the deep-learning neural network model deployment and inference on the IBM Cloud and how well Nvidia GPUs perform in this area compared to FPGAs that have been tuned for deep-learning primitives. We believe this topic is very relevant today because, with the emergence of new powerful NVIDIA GPUs, more and more artificial intelligence has become part of our daily lives, from Siri, Alexa, language translation, image recognition, to self-driving cars. The cognitive era has truly begun. Toward this end, IBM has formed a close partnership with Nvidia to offer GPU-enabled systems - both dedicated servers and on the cloud - to our customers and developers to run their cognitive workloads. 50-minute Talk Larry Brown - Senior Software Engineer, IBM
Khoa Huynh - Senior Technical Staff Member (STSM), IBM
Favorite
S8843 - Building an Enterprise Machine Learning Center of Excellence

Algorithmic advancements and new research capabilities frequently overshadow the infrastructure that enables that research and serves it to customers in production applications. Having a solid infrastructure for real world machine learning often ends up being the biggest determinant of success and is an exciting area of research and engineering in its own right. These environments are what allow brilliant algorithms to deliver value at scale. We'll detail how Capital One has designed its GPU computing environment to accelerate machine learning efforts and outline the services used, the framework to leverage those services, and the engineering practices used to develop and deploy well-governed, accurate models to high-volume production environments. Beyond production deployments, we'll discuss how this infrastructure performs large-scale testing of models and frameworks to explore the interactions of deep learning tools like MXNet and TensorFlow. We'll also discuss the practices that enabled Capital One to hire a high-performing team in this incredibly desirable field.

25-minute Talk Zachary Hanif - Director of Machine Learning, Capital One
Favorite
S8193 - Prototyping Vision-Based Classifiers in Constrained Environments SOFWERX developed a vision-based classifier using commodity hardware and machine learning libraries to satisfy an urgent high-level requirement. To track the usage of tank ammunition, the team had to address challenges involving unavailable training data, varying spatial orientations, and limited power consumption. To resolve these challenges, SOFWERX generated an augmented dataset using synthetic models, implemented spatial transformers, and experimented with different hardware/software optimizations. 25-minute Talk Ted Hromadka - Senior Software Engineer, Integrity Applications Incorporated
Cameron Hunt - CIO, SOFWERX
Favorite
S8458 - Capture Sparsity in DL Applications We'll present a new technique for improving efficiency of inference and training in deep learning in the presence of sparse workloads. We'll start with a brief overview of applications of sparse linear algebra in engineering and data analysis. Then, we'll analyze the presence of sparsity in both the training and inference phases of deep learning. To exploit this sparsity, we present our method of improving memory locality of sparse applications. We'll establish lower and upper bounds for sparse matrix operations and crossover with dense matrix operations. We'll demonstrate how to minimize memory traffic by tiling matrix operations, efficient use of L2, L1, and SMEM. We'll conclude with a performance comparison of our method with existing techniques on some real pruned weight matrices from GoogLeNet and OpenNMT's multiway translation network. This is the joint work of Michael Frumkin, Jeff Pool, and Lung Sheng Chien. 25-minute Talk Michael Frumkin - Sr. Compute Architect, NVIDIA
Favorite
S8462 - Multi-GPU Training with NCCL

We'll cover recent features and performance improvement in the NVIDIA collective communication library (NCCL). NCCL is designed to make computing on multiple GPUs easy and is integrated in most deep learning frameworks to accelerate training times. NCCL supports communication over Shared memory, PCI, NVLink, Sockets, and InfiniBand Verbs, to support both multi-GPU machines and multi-node clusters. 

25-minute Talk Sylvain Jeaugey - Senior Computing/Networking engineer, NVIDIA
Favorite
S8540 - Deep Learning for Molecular Docking Molecular docking is an important tool for computational drug discovery that aims to predict the binding pose of a ligand (drug) to a target protein. Identifying a correctly oriented pose requires a scoring function that has a global optimum close to the experimentally observed pose. Additionally, it should also be differentiable with respect to atomic positions so that it can be used for gradient-based pose optimization. We'll describe a differentiable grid-based convolutional neural network scoring function and explore its application in an end-to-end GPU-optimized molecular docking workflow. We'll show that convolutional neural networks trained on experimental data can successfully identify correct binding modes and meaningfully rank and score compounds. We'll also describe several visualization approaches that map the CNN score back to the atomic inputs to help guide medicinal chemistry optimization and provide insight into the functioning of the neural network. The entirety of our approach is available under an open-source license as part of our gnina package (https://github.com/gnina). 25-minute Talk David Koes - Assistant Professor, University of Pittsburgh
Favorite
S8550 - Performance Optimization for Deep Image Matting in Photoshop Learn how a research paper from Adobe Research Labs makes it into a real customer product like Photoshop. We attempted to solve a number of challenging issues about applying the technology to real-world use cases, including large model size, heavy memory consumption, and slow runtime performance. 25-minute Talk Salil Tambe - Computer Vision Engineer, Adobe Systems
Betty Leong - Photoshop Engineering Manager, Adobe Systems
Christopher Hebert - Developer Technology Engineer, NVIDIA
Favorite
S8880 - Khronos Standards Update: Vulkan, glTF, OpenCL and OpenXR for Cross-Platform VR/AR

Discover how over 100 companies cooperate at the Khronos Group to create open, royalty-free standards that enable developers to access the power of the GPU to accelerate demanding compute, graphics and vision applications. This session includes the very latest updates on many Khronos cross-platform standards, including OpenXR for portable AR and VR, Vulkan, SPIR-V, OpenGL and OpenCL. The session also provides insights into how these open standards APIs are supported across NVIDIA's product families.

25-minute Talk Neil Trevett - Vice President Developer Ecosystem, NVIDIA
Favorite
S8909 - ORNL Summit: Exposing Particle Parallelism in the XGC PIC code by exploiting GPU memory hierarchy XGC is a kinetic whole-­‐volume modeling code with unique capabilities to study tokamak edge plasmas in real geometry and answer important questions about the design of ITER and other future fusion reactors. The main technique is the Particle-­‐in-­‐Cell method, which models the plasma as billions of quasiparticles representing ions and electrons. Ostensibly, the process of advancing each particle in time is embarrassingly parallel. However, the electric and magnetic fields must be known in order to push the particle, which requires an implicit gather operation from XGC's sophisticated unstructured mesh. In this session, we'll show how careful mapping of field and particle data structures to GPU memory allowed us to decouple the performance of the critical electron push routine from size of the simulation mesh and allowed the true particle parallelism to dominate. This improvement enables performant, high resolution, ITER scale simulations on Summit. 25-minute Talk Stephen Abbott - Solutions Architect, NVIDIA
Favorite
S8982 - Essential CUDA Optimization Techniques - Presented by Acceleware (Session 4 of 4)

Learn how to optimize your algorithms for NVIDIA GPUs. This informative tutorial will provide an overview of the key optimization strategies for compute, latency and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. For compute bound algorithms we will discuss how to improve branching efficiency, intrinsic functions and loop unrolling. For memory bound algorithms, optimal access patterns for global and shared memory will be presented. Cooperative groups will also be introduced as an additional optimization technique. This session will include code examples throughout and a programming demonstration highlighting the optimal global memory access pattern which is applicable to all GPU architectures. Printed copies of the material will be provided to all attendees for each session - collect all four!

80 Minutes Tutorial Chris Mason - Technical Product Manager, Acceleware
Dan Cyca - Chief Technology Officer, Acceleware
Favorite
CE8107 - Connect with the Experts: Video Codec SDK and Capture SDK

Come by and ask us all you want to know about the NVIDIA Video Codec SDK and Capture SDK. Let's talk about h.264/HEVC, desktop capturing, encoding & decoding performance, your requirements and problems.

Connect directly with NVIDIA engineers and experts from other organizations on specific topics. Come on in and ask a question.

1 Hour Connect with the Experts Stefan Schoenefeld - Senior DevTech Engineer and Manager, NVIDIA
Abhijit Patait - Director, System Software, NVIDIA
Favorite
S8133 - Managing Multi-User Virtual Reality Environments at Scale Enterprise Virtual Reality presents challenges for to traditional IT approaches. Discuss how enterprises can address integrating virtual reality into business workflows using common enterprise IT management tools. Learn how virtualization with virtual reality can increase density, and reduce complexity. Discuss how collaborative virtual reality affects enterprise deployment strategies. We'll present several real-world use cases of Multi-User VR System deployment. Examples will include industries and varied applications such as Location Based Entertainment, Manufacturing Collaborative Design Review, and Public Sector Training and Simulation. We'll outline both the business and technical requirements that drove design decisions toward fewer larger systems each consolidating multiple virtualized VR-Ready Virtual Machines instead of many individual PCs serving one VR user each. Final solution architecture will be presented for each example along with early user feedback results where available. Attendees of this session will learn how to evaluate the pros and cons of deploying Multi-User VR systems vs individual VR-ready PCs and be better equipped to design a solution that fits VR. 25-minute Talk Friederich Devoir - Sr. Solutions Architect, NVIDIA
Thomas Kaye - Sr. Solution Architect, NVIDIA
Favorite
S8134 - Instance Sizing for Your GPU Fleet: Lessons from Developing Smart Kitchen Technology Learn how to size your GPU fleet by following examples from our work in Computer Vision for the Smart Kitchen. Although GPU technology has enabled us to dramatically improve quality of service and reduce costs, in order to obtain optimal value we had to consider our needs in terms of both GPU and CPU capabilities. In this talk we give an overview of the problem domain that we have been working in, then dive into a demonstration of how memory requirements, along with raw performance needs, have played a key role in determining our choice of AWS GPU instances. Innit has pioneered technology addressing all aspects of people's interactions with food, from meal planning to shopping, storage and cooking; in this talk we focus on our food intelligence platform which relies heavily on recognizing both generic food items and specific packaged goods in a variety of contexts. 50-minute Talk Hristo Bojinov - CTO, Innit, Inc.
Rob Laber - Senior Computer Vision Engineer, Innit, Inc
Favorite
S8179 - Performance Evaluation of GPU-Accelerated Linear Solvers on TCAD Examples We'll present the results of our evaluation of GPU-accelerated sparse linear solvers from paralution and magma and compare them with our CPU-only sparse linear solvers on technology computer-aided design (TCAD) examples. TCAD is a category of software tools for designing semiconductor devices. The use of semiconductor devices can be found in almost any area of our current life. The purpose of TCAD tools is to replace the cumbersome physical experiments with computer simulations. A significant part of the whole simulation time is spent on solving the linear systems so the performance of the linear solvers is extremely important. 25-minute Talk Ana Iontcheva - Senior Development Engineer Numerics, Silvaco
Favorite
S8289 - How to Get the Most out of GPU Accelerated Database Operators Early on, memory bandwidths, more than an order of magnitude higher than conventional processors have made GPUs an attractive platform for data-intensive applications. While there are many success stories about GPU-accelerated databases built from scratch, GPU-accelerated operations for large-scale, general-purpose databases are rather an exception than the norm. We characterize fundamental database operators like scan, filter, join, and group-by based on their memory access patterns. From these characteristics, we derive their potential for GPU acceleration, such as upper bounds for performance on current and future architectures. Starting from basic GPU implementations, we deep dive into aspects like optimizing data transfers, access, and layout, etc. 50-minute Talk Tim Kaldewey - Senior Manager Developer Technology for AI and Data Analytics, NVIDIA
Nikolay Sakharnykh - Sr. Developer Technology Engineer, NVIDIA
Jiri Kraus - Senior Devtech Compute, NVIDIA
Favorite
S8292 - Fraud Detection via Deep Learning

We'll discuss the role of deep learning in "nontraditional" settings -- domains that don't involve images, speech, or language. We'll highlight C3 IoT's work using deep learning to detect electricity fraud, and discuss how deep learning compares to traditional machine learning methods.

50-minute Talk Mehdi Maasoumy, C3 IoT
Favorite
S8318 - 3D Convolutional Neural Networks (CNNs) with Fast and Memory Efficient Cross-Hair Filters Over the years, state-of-the-art architectures have been built with convolutional layers and have been employed successfully on 2D image processing and classification tasks. This success naturally appeals for the extension of the 2D convolutional layers to 3D convolutional layers to handle higher dimensional tasks in the form of video and 3D volume processing. However, this extension comes with an exponential increase in the number of computations and parameters in each convolutional layer. Due to these problems, 2D convolutional layers are still widely used to handle 3D images, which suffer from 3D context information. In view of this, we'll present a 3D fully convolutional neural network (FCNN) with 2D orthogonal cross-hair filters that makes use of 3D context information, avoiding the exponential scaling described above. By replacing 3D filters with 2D orthogonal cross-hair filters, we achieve over 20% improvement in execution time and 40% reduction in the overall number of parameters while accuracy is preserved. 25-minute Talk Marie Piraud - Senior Researcher, Technical University of Munich
Favorite
S8363 - Now I C U: Analyzing Data Flow Inside an Autonomous Driving Car Learn large-scale data analytics and anomaly detection in intelligent networks for autonomous vehicles. Autonomous driving is no longer a research topic but a reality in the making. Internally, an autonomous vehicle is a very complex network (boardnet) of electronic control units (ECUs), communicating with each other using multiple different networking protocols, such as CAN, Ethernet, FlexRay, etc. Learn how BMW Group is using novel machine learning approaches to right size boardnet resources, including precise provisioning of ECUs and message buses to optimize network throughput for quicker decision making. We'll showcase a demo of boardnet traffic over time and demonstrate how to perform anomaly detection for finding performance and security bottlenecks in communication flows. The demo also shows machine learning and visualization on top of a GPU-accelerated database running over NVIDIA DGX-1 machine to find said anomalies. 50-minute Talk Selam Getachew Woldetsadick - Data Scientist, BMW Group
Arpit Mehta - Data Scientist, Product Owner: Big Data Architectures, BMW Group
Favorite
S8508 - Monitoring Honey Bee Health Using TensorRT and Microsoft Cognitive Toolkit

We'll take a deep dive into honey bee hive health monitoring with NVIDIA's TX2, TensorRT (a high-performance deep learning inference optimizer), Kinetica’s insight engine running on DGX-1/DGXStaion, and Microsoft Cognitive Toolkit to rapidly optimize, validate, and deploy trained neural networks for inference. In recent years, the media has reported that bees seem to be dying at an unprecedented rate. We'll explore how new accelerated analytics technologies and their corresponding compute platforms can deliver game-changing possibilities for innovation as we follow a honey bee farm scientist in California, who agreed to field test this real-time monitoring solution with her beehives.  See first-hand how adaptable and accessible these complex, cutting-edge technologies have become and how we can use intelligent monitoring technologies to help rescue the honey bee in the real-world environmental analytics opportunity.

50-minute Talk Anusua Trivedi - Data Scientist, Microsoft
Jacqueline Cenci-McGrody - Solutions Architect (Partner SA), NVIDIA
Favorite
S8591 - Continuously Learning AI Pathologist : A Smart Microscope that can Automatically Screen Different Biological Specimen

Clinical laboratories play a crucial role in healthcare ecosystem - the laboratories are pivotal and act as a screening sub-system by providing early inference in disease and abnormality diagnosis. An estimated 70% of clinical decisions regarding prevention, diagnosis and treatment involve lab tests. Surprisingly, 60% of the inferencing done at a clinical laboratory can be performed by one "wonder-tool" - microscope. Microscopy has helped pathologists assess and analyse the patients for over several centuries. The key hurdles in the microscopic examination are the amount of time that the pathologists have to spend in manual analysis and the need for the pathologists to be co-located with the specimen. In this talk, we introduce SigTuple's AI powered smart microscope that can automatically learn, analyse and summarize the inferences of several hundred abnormalities across different biological specimen (blood, urine and semen). It also utilizes the power of GPU computing on cloud to provide higher order analysis of the samples and acts as a tele-pathology enabler by providing pathologists the power to view or review any analysis or report from any part of the world.

25-minute Talk Tathagato Rai Dastidar - Chief Scientific Officer, SigTuple Technologies Pvt Ltd
Favorite
S8610 - Scaling Convolutional Neural Networks with Kubernetes and TensorFlow on AWS GPUs In this session we present a Kubernetes deployment on Amazon AWS GPUs that provide customized computer vision to a large number of users. Reza offers an overview of Matroid's pipeline and demonstrates how to customize computer vision neural network models in the browser, followed by building, training, and visualizing TensorFlow models, which are provided at scale to monitor video streams. 50-minute Talk Reza Zadeh - CEO, Matroid
Favorite
S8739 - Machine Learning with StarCraft II

We'll present an overview of the StarCraft II machine learning environment, including some basic API examples using C++ and Python.

50-minute Talk Chris Lee - Lead Software Engineer, Blizzard
Timo Ewalds - London, DeepMind
Favorite
S8903 - Dense Connection Networks for Conversational Speech Recognition Densely connected neural networks were originally introduced to avoid the problem of layer-wise vanishing gradients when CNNs are stacked in a very deep fashion, specifically for image recognition tasks. Inspired by these works, we've explored the use of dense networks connections within LSTM models for the task of automatic speech recognition. By introducing additional connections, to connect (almost) every layer to at least one other layer, we mitigate the vanishing gradient effect between LSTM layers and enable error signals to propagated back to the very first layer during training. In this presentation, we'll present the fundamentals of speech recognition and introduce different neural network model structures that have been shown to be effective for this task. We'll then introduce identity, highway, and dense connections and demonstrate how they improve the performance of these models. We'll evaluate the performance of these models across different datasets, and show that with a lattice-based system combination, densely connected LSTMs significantly contributed to reaching the marks of 5.0% and 9.1% in word error rate (WER) for the Switchboard and CallHome testsets. 50-minute Talk Ian Lane - Associate Research Professor, Carnegie Mellon University
Kyu Han - Principle Machine Learning Scientist, Capio Inc.
Favorite
S8910 - ONRL Summit: GPU Acceleration of Multiphysics CFD Software for Propulsion and Power Flow Systems Simulation and analysis of flow and combustion processes in propulsion and power systems presents many new and interesting challenges. A multitude of strongly coupled fluid dynamic, thermodynamic, transport, chemical, multiphase, and heat transfer processes are intrinsically coupled and must be considered simultaneously in complex domains associated with devices such as gas-turbine and rocket engines. The problem is compounded by the effects of turbulence and high-pressure phenomena, which require treatment of nonideal fluid mixtures at supercritical conditions. The combination of complex multicomponent property evaluations along with the computational grid resolution requirements makes these simulations expensive and cumbersome. Recent advances in high performance computing (HPC) systems, such as the graphics processing unit (GPU) based architectures, provides an opportunity for significant advances in dealing with these complexities while reducing the time to solution. 25-minute Talk Joseph C. Oefelein - Professor in the Daniel Guggenheim School of Aerospace Engineering, Georgia Institute of Technology
Favorite
S8968 - Autoregressive Wavenet Inference on Volta GPUs

Autoregressive wavenets have demonstrated extremely high quality real-time speech synthesis results.  However, the compute requirements and tight latency bounds have made them impractical for deployment on traditional CPU-only systems.  In this talk we demonstrate that Volta GPUs provide excellent real-time inference performance on these networks, making practical deployments possible.  We discuss several alternative implementation techniques and demonstrate their achieved performance on a V100 GPU.

25-minute Talk Brian Pharris - Principal Architect, NVIDIA
Favorite
S8974 - Architecting a Complete Data Infrastructure for AI and Deep Learning (Presented by NetApp)

Enterprises are eager to take advantage of artificial intelligence technologies such as deep learning to introduce new services and enhance insights from company data. As data science teams move past proof of concept and begin to operationalize deep learning, it becomes necessary to focus on the creation of a complete data architecture that eliminates bottlenecks to facilitate faster model iteration. Designing a data architecture involves thinking holistically about the deep learning pipeline, from data ingest and edge analytics, to data prep and training in the core data center, to archiving in the cloud. It is necessary to understand performance requirements and data services needed, but one should also consider future extensibility and supportability as deep learning hardware and cloud approaches evolve over time. This session will examine all the factors involved in the architecture of a deep learning pipeline, focusing in on data management and the hybrid cloud. Careful infrastructure planning can smooth the flow of data through your deep learning pipeline, lead to faster time to deployment, and thus maximum competitive differentiation.

50-minute Talk Santosh Rao - AI & Data Engineering, NetApp
Kesari Mishra - Principal Engineer, NetApp
Favorite
S8149 - Using Virtual Reality To Enhance The Quality Of Machine Learning Data

We'll discuss our use of Virtual Reality to enhance the quality of machine learning data in a real-time, collaborative environment. Radiant Solutions provides highly specialized, innovative geospatial multisource data, analytics, software and services to deliver critical insights and intelligence where and when it matters. DigitalGlobe is the world's leading provider of high-resolution Earth imagery and adds over 3 million square kilometers of imagery to their library every day. At Radiant solutions, we use DigitalGlobe imagery to create one of the largest satellite imagery machine learning data sets in the world. With great data, comes great responsibility, and that's why we are innovating on new methods to control the quality of machine learning data. Powered by GPUs, we can view massive amounts of image data all in an immersive VR experience. We believe that methods like these will help push the boundaries of what is possible for machine learning data.

25-minute Talk Kevin McGee - Production Lead, Radiant Solutions
Favorite
S8386 - Identifying New Therapeutics for Parkinson's Disease Using Virtual Neurons on an Azure Hosted GPU Cluster

Learn how to apply recent advances in GPU and open data to unravel the mysteries of biology and etiology of disease. Our team has built data driven simulated neurons using CUDA and open data, and are using this platform to identify new therapeutics for Parkinson's disease with funding from the Michael J. Fox Foundation. In this session I'll discuss the open data which enables our approach, and how we are using Nvidia Tesla cards on Microsoft Azure to dynamically scale to more than 100,000 GPU cores while managing technology costs.

25-minute Talk Andy Lee - CTO, Neuroinitiative
Favorite
S8406 - Model Architectures and Training Techniques for High-Precision Landmark Localization

We'll discuss training techniques and deep learning architectures for high-precision landmark localization. In the first part of the session, we'll talk about ReCombinator Networks, which aims at maintaining pixel-level image information, for high-accuracy landmark localization. This model combines coarse-to-fine features to first observe global (coarse) image information and then recombines local (fine) information. By using this model, we report SOTA on three facial landmark datasets. This model can be used for other tasks that require pixel-level accuracy (for example, image segmentation, image-to-image translation). In the second part, we'll talk about improving landmark localization in a semi-supervised setting, where less labeled data is provided. Specifically, we consider a scenario where few labeled landmarks are given during training, but lots of weaker labels (for example, face emotions, hand gesture) that are easier to obtain are provided. We'll describe training techniques and model architectures that can leverage weaker labels to improve landmark localization.

25-minute Talk Pavlo Molchanov - Sr. Research Scientist, NVIDIA
Sina Honari - Ph.D. Student, University of Montreal - MILA
Favorite
S8812 - An Approach to Developing MPAS on GPUs MPAS-A is a general circulation (global) model of the Earth's atmosphere that is designed to work down to so-called non-hydrostatic scales where convective (vertical) cloud processes are resolved. To date, MPAS-A has been used primarily for meteorological research applications, although climate applications in the community earth system model are being contemplated. At a high level, MPAS-A consists of a dynamics part, a fluid flow solver that integrates the non-hydrostatic time dependent nonlinear partial differential equations of the atmosphere, and a physics part, which computes the forcings of these equations due to radiative transport, cloud physics, and surface and near surface processes. The dynamics is in turn divided into the dry dynamics and moist dynamics parts. Algorithmically, the dynamics uses a finite volume method on an unstructured centroidal Voronoi mesh (grid, or tessellation) with a C-grid staggering of the state variables as the basis for the horizontal discretization. 25-minute Talk Raghu Raj Prasanna Kumar - Project Scientist I & Group Head, Special Technical Project Group, Tec, National Center for Atmospheric Research
Favorite
S8850 - Autotuning Dense Batched QR Factorizations on GPU

The increasing complexity and heterogeneity of computer architectures makes it challenging to design both efficient and portable codes. Indeed, designing generic GPU kernels that attempt to fit all GPU architectures would not be efficient on any given architecture. Moreover, the careful and customized design of a GPU kernel for a specific GPU will be hardly efficient on the next generation of GPUs. Furthermore, writing tailored kernels for every GPU is a daunting task that would require too much time and effort. We'll present our work on applying the auto-tuning idea to target this issue for batched QR factorization kernels on GPUs by generating automatically codes specific to a given GPU.

25-minute Talk Wissam M. Sid-Lakhdar - PostDoctoral Researcher, Lawrence Berkeley National Laboratory
Favorite
L8113 - Detection of Anomalies in Financial Transactions using Deep Autoencoder Networks The "unsupervised" and "end-to-end" detection of anomalies in transactional data is one of the long-standing challenges in financial statement audits or fraud investigations. In this lab will walk you through a use case of how autoencoder neural networks, can be trained to detect such anomalies by learning a compressed but "lossy" model of regular transactions. In detail we will (1) Introduce the basic concepts, intuition and major building blocks of autoencoder neural networks, (2) Learn how to preprocess financial data in order to learn a model of its characteristics, (3) Design, implement and train a deep autoencoder network using PyTorch to detect anomalies in large-scale financial data, and, (4) Interpret and evaluate the networks detection results as well as its reconstruction loss. 120 Minutes Instructor-Led Lab Timur Sattarov - Forensic Data Analyst, PricewaterhouseCoopers GmbH WPG
Marco Schreyer - Researcher, German Research Center for Artificial Intelligence
Favorite
L8118 - VR Development in Unity You'll learn the fundamentals of working with the Unity engine for developing Virtual Reality experiences. We'll cover general workflows, Unity C# scripting, the graphics pipeline, animation, VR optimization and more. 120 Minutes Instructor-Led Lab Daniel Miller - VR/AR Evangelist, Unity Technologies
Favorite
L8140 - Image Classification with DIGITS

Deep learning enables entirely new solutions by replacing hand-coded instructions with models learned from examples. Train a deep neural network to recognize handwritten digits by:

• Loading image data to a training environment

• Choosing and training a network

• Testing with new data and iterating to improve performance

Upon completion of this lab, you'll be able to assess what data you should be using for training.

Presented by the NVIDIA Deep Learning Institute (DLI).

120 Minutes Instructor-Led Lab Mike Mendelson - Deep Learning Institute Curriculum Developer, NVIDIA
Favorite
L8153 - Modeling Time Series Data with Recurrent Neural Networks in Keras

Prerequisites: Some experience training CNNs

Duration: 2 hours

Framework: Keras

Recurrent Neural Networks (RNNs) allow models to classify or forecast time-series data, like natural language, markets—and in the case of this Lab, a patient's health over time. You'll:

• Create training and testing datasets using electronic health records in HDF5 (hierarchical data format version five)

• Prepare datasets for use with recurrent neural networks, which allows modeling of very complex data sequences

• Construct a Long-Short Term Memory model (LSTM), a specific RNN architecture, using the Keras library running on top of Theano to evaluate model performance against baseline data

Upon completion, you'll be able to model time-series data using Recurrent Neural Networks.

Presented by the NVIDIA Deep Learning Institute (DLI).

120 Minutes Instructor-Led Lab Cameron Carlin - Data Scientist, Children's Hospital Los Angeles, Virtual Pediatric ICU
Steven Steinke - Curriculum Developer, NVIDIA
David Ledbetter - Senior Data Scientist, Children's Hospital Los Angeles Virtual Pediatric ICU
Favorite
S81015 - Learn How IBM Visual Insights from Watson IoT Uses Deep Learning to Help Manufacturers "See" Defects Instantly (Presented by IBM)

Learn how the power of AI-powered image recognition along with Nvidia GPUs helps clients detect and classify production line defects more quickly, accurately and reliably. See customer use cases of how IBM's real-time visual inspection solution has helped manufacturing companies successfully transform their quality management processes by reducing inspection time and costs, improving efficiency and reliability, and reducing scrap and increasing manufacturing yield. 

50-minute Talk Jayashree Ravichandran - Senior Offering Manager Visual Insights, Acoustic Insights, Quality Analytics, IoT for Manufacturing, IBM
Favorite
S81029 - Chopout: A Simple Way to Train Various Sized Neural Networks at Once

Variable sized networks are hard to train since for each change in network layer requires re-learning of its parameter values.  We present a novel operator, "Chopout",  to simultaneously learn variable sized networks and can perform inference through the desired choice of network's size.

Several previous approaches design deeper architectures for improving the accuracy of deep neural nets, however, they are not efficient in cost and inference speed. Selecting a smaller architecture from the previous designs requires re-learning of the network. Chopout operator helps learn both random subnetworks as well as full network, which provides versatility to network size during inference. The method can be easily integrated with any neural network architecture without the need for additional parameters. The effectiveness is further evaluated through experiments. 

50-minute Talk Takanori Ogata - Co-founder & Chief Research Officer, ABEJA, Inc.
Favorite
S8197 - Recurrent Generative Adversarial Neural Networks for Compressive Imaging We'll present recurrent generative adversarial networks (GANs) for image recovery from compressed measurements, which has applications ranging from undersampled medical image reconstruction to image super-resolution. State-of-the-art analytics are not aware of the image perceptual quality, and demand iterative algorithms that incur significant computational overhead for real-time tasks. To sidestep these hurdles, we introduce a novel compressive imaging framework using deep neural networks that approximates a low-dimensional manifold of images using GANs. To ensure the images are consistent with the measurements, a recurrent GAN architecture is deployed that consists of multiple alternative blocks of generator networks and affine projection, which is then followed by a discriminator network to score the perceptual quality of the generated images. A deep residual network with skip connections is used for the generator, while the discriminator is a multilayer Perceptron. Experiments performed with real-world contrast enhanced MRI data corroborate the superior diagnostic quality and faster reconstruction for the retrieved images relative to state-of-the-art schemes. 50-minute Talk Morteza Mardani - Postdoctoral Research Fellow, Stanford University
Favorite
S8241 - Sunny Skies Ahead! Versioning GPU accelerated WRF to 3.7.1 We'll detail the inherent challenges in porting a GPU-accelerated community code to a newer major version, integrating the community non-GPU changes with OpenACC directives from the earlier version. This is a non-trivial exercise - this particular version upgrade contained 143,000 modified lines of code which required reintegration into our accelerator directives. This work is important in providing support for newer features whilst still providing GPU support for the users. We'll also look at efforts to improve the maintainability of GPU accelerated community codes. 25-minute Talk Stanley Posey - Program Manager, ESM and CFD Solution Development, NVIDIA
Jeffrey Adie - Principal Solutions Architect, NVIDIA
Favorite
S8281 - Instance-Aware Image and Sentence Matching with Selective Multimodal LSTM We'll present a unique framework for cross-modal image and sentence matching; namely selective multimodal long short-term memory (LSTM) that incorporates a new deep learning module as multimodal context-modulated attention network to selectively attend to pairwise semantic concepts. In detail, effective image and sentence matching depends on measuring their global visual-semantic similarity. Based on the observation that such a global similarity arises from a complex aggregation of multiple local similarities between pairwise instances of image (objects) and sentence (words), we propose a selective multimodal LSTM network (sm-LSTM) for instance-aware image and sentence matching. The sm-LSTM includes a multimodal context-modulated attention scheme at each timestep that can selectively attend to a pair of instances of image and sentence by predicting pairwise instance-aware saliency maps for image and sentence. For selected pairwise instances, their representations are obtained based on the predicted saliency maps, and then compared to measure their local similarity. By similarly measuring multiple local similarities within a few timesteps, the sm-LSTM sequentially aggregate. 25-minute Talk Yan Huang - Assistant Professor, Institute of Automation, Chinese Academy of Sciences
Favorite
S8424 - Graph Partitioning Using Bayesian Inference on GPU We implement an efficient CUDA algorithm that solves the graph clustering problem using the stochastic block model for the first time on GPUs. The algorithm views the graph as a generative model called degree-corrected stochastic block model, and performs statistical inference to discover the graph partitions most likely to result in the graph. A greedy agglomerative heuristic is used with Markov Chain Monte Carlo (MCMC) to do Bayesian inference. A comparison is made with the baseline GraphChallenge implementation on synthetic datasets. Our implementation achieves speed-ups of 11.5x and 4.1x over single-threaded and multi-threaded OpenMP implementations on the CPU. We'll provide empirical evidence that even though our method of parallelizing MCMC leads to worse convergence in terms of iteration number, we are able to harness the parallelism of the GPU to discover clusters at the same accuracy in less time. 25-minute Talk Carl Yang - Graduate Student, UC Davis
Favorite
S8457 - Deep Learning in Medical Imaging: Learning from Regions of Interest through Segmentation

Attendees will learn about some of the key opportunities for deep learning in medical imaging, current challenges, and exciting recent developments that are tackling them. We will provide an overview of medical imaging and key applications for deep learning for improving image interpretation. Unlike natural images (e.g., ImageNet), medical images often have regions of fewer than 0.1% by pixel count. By getting a pixel-wise loss, segmentation networks can learn subtle local signals. We will survey projects where segmentation has been successful, including learning from a dataset of less than 30 images without transfer learning. Using coarse ground truth labels allows for easier and more scalable approaches to data acquisition. We will also discuss the ability to use the representations learned from segmentation networks in related tasks such as classification. We will also discuss the role of segmentation in the broader field of medical image informatics and deep learning.

50-minute Talk Daniel Rubin - Associate Professor, Stanford University
Darvin Yi - Graduate Student, Stanford University
Favorite
S8469 - Compression-Aware Training of Neural Networks

We'll demonstrate the importance of accounting for compression during deep neural network training. We introduce a regularizer in the training loss to encourage the parameter matrix of each layer to have low-rank. In essence, and by contrast with methods that aim to learn uncorrelated units to prevent overfitting, our new approach seeks to learn correlated ones, which can then easily be pruned in a second phase. In addition, we also analyze the case where this regularizer is combined with a sparsity-inducing regularizer to achieve even higher compression. The proposed compression-aware training scheme yields networks that are well adapted to the following post-processing stage. As a result, our approach achieves high compression rates at virtually no loss in prediction accuracy. On ICDAR, the algorithm achieves a compression rate of 95.5% and a reduction in training time up to 70% with a 1% increment in top-1 accuracy. When used on ImageNet for ResNet-50, our approach yields compression rates of up to 35%, with no significant drop in performance, compared to 4% compression rate achieved by state-of-the art post-processing methods.

50-minute Talk Jose Alvarez - Senior Research Scientist, Toyota Research Institute
Favorite
S8543 - GPUs for Everyone: Why Optimize Windows 10 and Every Application with GRID With the switch to Windows 10, more applications are being developed with the assumption of a GPU being present. GPUs are in our desktops, laptops, tablets, and even in the mobile phones in our pockets. Why should VDI be any different? Come see how the University of Arkansas is giving everyone the fastest possible experience and opening doors to new ways of learning by serving up VDI desktops and applications with pervasive GPU access. When every app has GPU acceleration, the user experience is better than ever. 50-minute Talk Jon Kelley - Associate Director of Enterprise Innovation, University of Arkansas
Favorite
S8557 - Tricks, Tips, and Timings: The Data Movement Strategies You Need to Know Learn the latest strategies to efficiently move complicated data structures between GPUs and CPUs. We'll go beyond basic data movement, showing techniques that have been used in practice to port and optimize large-scale production applications. These include a look at the unique benefits of zero copy, how to set up a deep copy to avoid having to flatten data structures, and how this can be done in OpenMP 4. We'll cover both CUDA and directive approaches using examples written in modern Fortran and applicable in any language. 50-minute Talk David Appelhans - Research Staff Member, IBM
Favorite
S8569 - Flexible and Fast Machine Learning and Deep Learning with Alluxio With the exponentially-growing deluge of data today, data lakes are pooling everywhere. So, how can you harness them for critical insights and is there an easy way to tap into the multitude of different storage systems that they're stored in? Enter Alluxio, an agnostic and fast storage abstraction, which, when paired with deep learning and GPU-accelerated analytics yields a quick and easy way to harness the data. Join NVIDIA's Applied Solutions Engineering (ASE) team as they walk through how to use Alluxio for fun and profit. 25-minute Talk Yupeng Fu - Founding Member and Senior Architect, Alluxio
Michael Wendt - Manager, Applied Engineering Solutions, NVIDIA
Favorite
S8709 - Accelerating Molecular Modeling Tasks on Desktop and Pre-Exascale Supercomputers We'll showcase recent successes in the use of GPUs to accelerate challenging molecular simulation analysis tasks on the latest Volta-based Tesla V100 GPUs on both Intel and IBM/OpenPOWER hardware platforms, and with large scale runs on petascale computers such as ORNL Summit. We'll highlight the performance benefits obtained from die-stacked memory on Tesla V100, the NVLink interconnect on the IBM OpenPOWER platforms, and the use of advanced features of CUDA, Volta's new Tensor units, and just-in-time compilation to increase the performance of key analysis algorithms. We'll present results obtained with OpenACC parallel programming directives, current challenges, and future opportunities. Finally, we'll describe GPU-accelerated machine learning algorithms for tasks such as clustering of structures resulting from molecular dynamics simulations. 50-minute Talk John Stone - Senior Research Programmer, University of Illinois at Urbana Champaign
Favorite
S8732 - VACnet: Using Deep Learning to Combat Cheating in 'Counter-Strike: Global Offensive' We'll delve into the nuts and bolts of how Valve has utilized deep learning to combat cheating in "Counter-Strike: Global Offensive." We'll cover total system details, from the high-level server architecture to the low-level features fed into the AI. Deep learning has proven to be very effective at identifying cheating behavior without any client-side instrumentation, making it robust against malicious attack by cheaters and cheat vendors. By retraining regularly, the network continues to evolve, picking up new cheating behaviors within hours of their appearance. As a result of this approach, certain types of cheats have been reduced by a factor of 100. 50-minute Talk John McDonald - Programmer, Valve
Favorite
S8888 - Multi GPU Parallel Processing with NVLINK

Multi-GPU processing with the GP100 and NVLINK will be discussed using a hypervelocity impact problem. Multi-GPU processing has always been possible via the PCIe interface which means communication between GPUs is accomplished through the CPU. The NVLINK connection allows software to bypass this slower connection and allow for direct communication between GPUs to improve performance. An SPH solver, a particle based method, is used to solve the hypervelocity problem. The SPH solver does all calculations on the GPU so it is a perfect choice to compare performance between the various GPUs past and present. The results for single and multiple GPU simulations for K20, K40, P6000 and GP100 are presented.

25-minute Talk Wayne Mindle - Director of Sales & Marketing, CertaSIM, LLC
Favorite
S8937 - ORNL Summit: GPU Accelerated Performance of QMCPACK on Leadership-Class HPC Systems Using Cuda and Cublas QMCPACK is an open-source, massively parallel Quantum Monte-Carlo code enabling the accurate calculation of quantum many-body problems such as systems of atoms, molecules, and even solids. Here, we demonstrate the implementation of a rank-k matrix update scheme leading to increased compute density and performance improvements up to 1.5-fold compared to the current rank-1 update at every step. We compare performance results on Oak Ridge's next supercomputer, Summit, as well as its development precursor, SummitDev to the current machine, Titan. Based on detailed runtime traces we illustrate how speed-ups were achieved and give an outlook which future library features could be most beneficial to our application performance. 25-minute Talk Andreas Tillack - Distinguished Postdoctoral Research Associate, Oak Ridge National Laboratory
Favorite
S81003 - Faster than Real-Time Computing in Tsunami Early Warning Systems

When used as predictive tools in natural disasters such as tsunamis, numerical models require extremely fast computations. Just a few years ago, real-time computing in Tsunami Early Warning Systems (TEWS) was unthinkable. Nevertheless, the EDANYA Group has revolutionized tsunami science paradigms. With the goal of saving lives in the framework of TEWS, our group has developed Tsunami-HySEA, a GPU-based numerical model aimed at producing numerical simulations of tsunami events that are faster than ever. Based on highly efficient, robust mathematical algorithms, together with the computational power of NVIDIA GPUs, Tsunami-HySEA is able to simulate a tsunami event in only a few minutes. Nowadays, one of the main challenges in tsunami science is producing accurate assessments of tsunami wave impacts and just a few minutes after the generating earthquake is triggered. This timely prediction would save many lives in a tsunami scenario. When the response is needed only in a few minutes, the resulting scenario is challenging. The required characteristics are difficult to combine in a single numerical tool: robustness, low-dissipation, large domains, and an extremely fast response

25-minute Talk Jorge Macias - Associate Professor, EDANYA Group (University of Malaga)
Favorite
S8384 - Datasets and Algorithms for Road Identification Via Satellite Imagery Road identification and route prediction in near real time remains a challenging problem for many geographic regions, particularly in the case of natural disasters or crisis situations. Existing methods such as manual road labeling or aggregation of mobile GPS track data are currently insufficient in dynamic scenarios. The frequent revisits of satellite imaging constellations may accelerate efforts to rapidly update road network and optimal path prediction, provided routing information can be extracted from imaging pixels. We'll demonstrate deep learning segmentation methods for identifying road center lines and intersections from satellite imagery, and inferring networks from these road segments. We'll also explore data quality requirements by comparing open source labels with-high precision labels created as part of the SpaceNet Roads challenge. 25-minute Talk Adam Van Etten - Senior Research Scientist, In-Q-Tel
Favorite
S8438 - Disrupting 3D Design - GPU Based Real-Time Simulation for Rapid Concepting Join us for an exciting presentation that will unveil the latest use of GPU technology that aids in real-time engineering simulations. You will see a new technology, called ANSYS Discovery Live, that provides instant, invaluable feedback that promotes engineering designs more optimized and better understood than previously possible. Rather than engineers consume time with non value added tasks, they can turn the design process into an interactive, educational experience. The marrying of simulation technology with the technological advances of NVIDIA graphics are fundamentally changing the way products are designed and developed. The possibilities are endless with this technology. 25-minute Talk Justin Hendrickson - Director of Product Development, ANSYS
Favorite
S8568 - Supporting a DGX Air-Gapped Production Environments This tutorial will cover the issues encountered when deploying NVIDIA DGX-1/DGXStation into secure environment. For security reasons, some installations require that systems be isolated from the internet or outside networks. Since most DGX-1 software updates are accomplished through an over-the-network process with NVIDIA servers, this session will walk the participants through how updates can be made by maintaining an intermediary server. This session will be a combination of lecture, live demos and along with detailed instructions. 80 Minutes Tutorial Sumit Kumar - Solutions Architect, NVIDIA
Jeffrey Weiss - Director, Solution Architects, NVIDIA
Favorite
S8582 - Embodied Question Answering Building intelligent agents that possess the ability to perceive the rich visual environment around us, communicate this understanding in natural language to humans and other agents, and execute actions in a physical environment, has been a long-term goal of Artificial Intelligence. In this talk, I will present my recent work on an instantiation of this goal -- Embodied Question Answering (EQA) -- where an agent that is spawned at a random location in an environment (a house or building) is asked a natural language question ("What color is the car?"). The agent perceives its environment through first-person vision and can perform a few 'atomic' actions: move-{forward, backward, right, left}, and turn-{right, left}. The objective of the agent is to explore the environment and gather visual information necessary to answer the question ("orange"). I'll introduce our OpenGL-based environments, a large-scale dataset of expert demonstrations for this task and deep models, trained end-to-end using reinforcement learning, from raw pixels to multi-step navigation control to visual question answering. 25-minute Talk Abhishek Das - PhD Student, Georgia Tech
Abhishek Das - PhD Student, Georgia Tech
Favorite
S8696 - Audio Recognition, Context-Awareness, and its Applications

We'll explain the concept and the importance of audio recognition, which aims to understand literally all the information contained in the audio, not limiting its scope to speech recognition. It includes the introduction of various types of non-verbal information contained in the audio such as acoustic scenes/events, speech, and music. This session is helpful to the people who are not familiar with audio processing but are interested in the context-aware system. Also, it might be inspiring for someone who develops AI applications such as AI home assistant, a humanoid robot, and self-driving cars. It also covers the potential use-cases and creative applications, including a video demonstration of the audio context-aware system applied to media-art performance for real-time music generation.

25-minute Talk Yoonchang Han - CEO, cochlear.ai
Favorite
S8933 - Design Creativity empowered by living Immersive Experiences with DASSAULT Systems and NVIDIA solutions

Our new and upcoming solutions provide a paradigm shift in Design, with natively built-in VR immersive experiences. These experiences happen directly within the 3D Design Environment of DASSAULT Systems CATIA. Designers and Engineers can now access a new level of creativity by combining creative tools for Sketching, 3D Modeling, CAD and Simulation with Virtual Reality.  We'll present how easy it is to be immersed in your Design by yourself or even as a team. This will bring Designers and Engineers a major step forward in the design validation and collaborative decision workflow. We'll cover how CATIA Design solutions on the 3DEXPERIENCE platform use the latest technologies from NVIDIA for VR Immersive experiences to create, collaborate, and do 3D product design on native and massive models.

25-minute Talk Stephan Ritz - CATIA Design, Product Experience Roles Portfolio Director, Dassault Systemes
Favorite
CE8113 - Connect with the Experts: Full-Stack GPU Computing with Julia

Learn how the Julia programming language can be used for GPU programming, both for (1) low-level kernel programming, and (2) high-level array and AI libraries. This full-stack support drastically simplifies code bases, and GPU programmers can take advantage of all of Julia's most powerful features: generic programming, n-dimensional kernels, higher order functions and custom numeric types. The session will overview the compiler's implementation and performance characteristics via the Rodinia benchmark suite. We'll show how these techniques enable highly flexible AI libraries with state-of-the-art performance, and allow a major government user to run highly computational threat modelling on terabytes of data in real time.

Connect directly with NVIDIA engineers and experts from other organizations on specific topics. Come on in and ask a question.

1 Hour Connect with the Experts Mike Innes - Software Engineer, Julia Computing
Tim Besard - PhD Student, Ghent University
Favorite
CE8149 - Connect with the Experts: Deep Learning Basics (2)

Attend this session to get your questions on deep learning basics and concepts answered. NVIDIA experts can help you with the fundamentals and provide guidance on how and when to apply Deep Learning and GPUs to your work. No question is too basic to ask.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject.  

1 Hour Connect with the Experts Robert Crovella, NVIDIA
Hans Mortensen - Sr. Solutions Architect, NVIDIA
Favorite
S8621 - Deploying, Profiling, and Optimizing Distributed TensorFlow in Production with GPUs Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, we'll demonstrate how to optimize, profile, and deploy TensorFlow models in GPU-based production environments. We'll cover many demos based on open source tools. You can completely reproduce all demos through Docker on your own GPU cluster. See http://pipeline.ai for links to the GitHub Repo. 50-minute Talk Chris Fregly - Founder & Research Engineer, PipelineAI
Favorite
S8632 - Innovations with Blast Extreme and Horizon with NVIDIA GRID This is a panel to talk about innovation with VMware Blast Extreme Protocol and Horizon software offering along with NVIDIA GRID. We will focus on providing IT administrators with tools and technologies to help deliver the best experience for virtualized 3D graphics applications in their environments. 50-minute Talk Kiran Rao - Director, Product Management, VMware Inc
Cory Smith - CIO/CTO, City of Davenport, Iowa
Luke Wignall - Senior Manager, Pro Viz Performance Engineering & Technical Marketing, NVIDIA
Favorite
S8682 - Defect Inspection from Scratch to Production In order to fulfill customer's requirement, companies have to guarantee the quality of delivered products, which can often be achieved only by manually inspection of the finished product. Since human-based defect inspection and classification are time-consuming and the results vary by individuals, automatic defect detection and classification has the potential to reduce the cost of quality assurance significantly. In this talk, we will demonstrate how to utilize deep learning algorithms, i.e., Fully Convolutional Neural Network to build a general defect inspection and classification model. We will also share experiences on how to effectively collect labelling data, deal with imbalance data, and also how to optimize the model in terms of latency and throughput with TensorRT before deploy the model to the production line. 50-minute Talk Sheng-Ting Shen - Solution Architect, NVIDIA
Kuan-Liang (Andrew) Liu - Solution Architect, NVIDIA
Favorite
S8744 - The Future of Real-Time: Experience Design Epic Games presents a panel discussion with partners who are using Unreal Engine to bring real-time, high-fidelity interactive experiences to their customers. From product design and visualization to virtual production, photorealism, and final pixels, content creators are uncovering the power of Unreal Engine. Hear how Unreal Engine customers are applying game engine technology to revolutionize the conventions of architectural design, automotive, aerospace, and product design, and the future of customer engagement. 50-minute Panel Leighton Carr - Research Program Lead, Boeing
Shital Shah - Washington, Microsoft
Owen Coffee - Associate Principal, HKS
Ashley Micks - Technical Specialist, Ford
Marc Petit - General Manager, Unreal Engine, Epic Games
Favorite
S8837 - OpenCL at NVIDIA - Recent Improvements and Future Plans Learn about recent improvements in OpenCL on NVIDIA platforms. We will share our learnings and touch upon improvements we recently made to OpenCL on NVIDIA platforms. In particular, we talk about our efforts to improve and smoothen data-transfer performance. We also talk about improvements in our memory allocation extension we introduced last year. We will also talk about our future plans for OpenCL. 50-minute Talk Nikhil Joshi - Engineering Manager, OpenCL Driver, NVIDIA
Favorite
S8892 - Machine Learning in Precision Medicine: Quantitative Medical Imaging, Artificial Intelligence, GPU Efficiency

Machine Learning in Precision Medicine: Patient-Specific Treatment Enabled by Quantitative Medical Imaging, Artificial Intelligence, and GPU Efficiency The attendees will learn about the need for and use of machine learning in today's patient-centered healthcare. The talk will focus on general approaches requiring machine learning to obtain image-based quantitative features, reach patient diagnoses, predict disease outcomes, and identify proper precision-treatment strategies. While the presented methods are general in nature, examples from cardiovascular disease management will be used to demonstrate the need for and power of machine learning enabled by the performance advantages of GPU computation.

25-minute Talk Milan Sonka - Professor, University of Iowa
Favorite
S8995 - What it Takes to Drive Autonomously on Chinese roads

Pony.ai will share the key technological milestones it has achieved in the past several months of road testing in China, including the company's soft launch of China's first-ever autonomous vehicle robotaxi service. CEO James Peng will share the unique challenges posed by a Chinese road environment and how we leveraged deep learning and computational models to conquer those challenges. Pony.ai's mission is to build the safest and most reliable L4 autonomous driving technology. The startup was founded at the end of 2016 and is co-located in the heart of Silicon Valley and China.

25-minute Talk Yiming Liu - Infrastructure Lead, Pony.ai
Favorite
S8128 - Image-Domain Gridding on Accelerators We will present our latest results on Image Domain Gridding, an algorithm for radio astronomical imaging. This algorithm outperforms the state of the art in traditional imaging algorithms both in terms of image quality (by applying more corrections) and performance. In this talk, we will first introduce the algorithm and then demonstrate that this algorithm works very well on highly parallel accelerators. We will show the in-depth performance analysis and optimization techniques that we applied to get there. 25-minute Talk Bram Veenboer - PhD Researcher, ASTRON
Favorite
SE0001 - Posters & Beer Reception

Check out over 150 research posters and mingle, beverage in hand, with their brilliant authors at the Monday reception (18:00-20:00). See how big ideas are accelerated through the power of GPUs. 

Special Event - 2 h Special Event
Favorite
SE0016 - Dinner with Strangers (Mon)

Join a random group of GTC attendees for enlightening conversations over a self-hosted dinner in great restaurants nearby. Less creepy than it sounds, this is one of the more popular programs at GTC.

​Sign up on Concourse.

Special Event - 2 h Special Event
Favorite
S8885 - Opening Keynote

The 2018 GTC opening keynote is delivered by the NVIDIA Founder and CEO, Jensen Huang, speaking on the future of computing.

2 Hour Keynote Jen-Hsun Huang - Founder & CEO, NVIDIA
Favorite
SE0003 - Lunch (Tue/Wed/Thu)

Lunch will be served in the South Hall.

Special Event - 2 h Special Event
More Times Favorite
CE8140 - Connect with the Experts: Deep Libraries for Training - cuDNN, cuBLAS

In this session, we will discuss the CUDA libraries foundational for training on GPUs. Learn about the latest and upcoming features. Talk to NVIDIA experts about your use case and to discuss latest developments.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Philippe Vandermersch, NVIDIA
Mostafa Hagog, NVIDIA
Yang Xu, NVIDIA
Khairul Kabir, NVIDIA
Seth Walters, NVIDIA
Slawomir Stepniewski, NVIDIA
Kevin Vincent, NVIDIA
Favorite
CE8146 - Connect with the Experts: OpenACC - Quick On-ramp to GPUs (2)

This session is designed for anyone who is either looking to start with GPUs or already accelerating their code with OpenACC on GPUs or CPUs. Join OpenACC experts and your fellow OpenACC developers to get an expert advice, discuss your code and learn how OpenACC Directives are used by others.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Sunita Chandrasekaran - Assistant Professor, Department of Computer & Information Sciences, University of Delaware
Michael Wolfe - Compiler Engineer, NVIDIA
Guido Juckeland - Head of Computational Science Group, Helmholtz-Zentrum Dresden-Rossendorf
Randy Allen - Director, Mentor Graphics
Robert Henschel - Director Science Community Tools, Indiana University
Robert Crovella, NVIDIA
Jeffrey Larkin - Senior DevTech Software Engineer, NVIDIA
Favorite
CE8150 - Connect with the Experts: Deep Learning Frameworks for Training (2)

Attend this session to get you questions on deep learning frameworks answered. Learn more about widely used Deep Learning Frameworks such as Caffe, Theano, Torch, TensorFlow, CNTK, and MXNet and let NVIDIA experts can help you with choosing the right framework for your research or project.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject.  

1 Hour Connect with the Experts Cho Che Cheng, NVIDIA
John Woolley, NVIDIA
Sami Kama, NVIDIA
Andrei Ivanov, NVIDIA
Pooya Davoodi, NVIDIA
Jie Jiang, NVIDIA
Przemyslaw Tredak - Senior Deep Learning Engineer, NVIDIA
Michael O'Connor - Director, NVIDIA
Favorite
L8160 - Genomics: Using Deep Learning to Massively Accelerate the Accurate Identification of Genetic Variants Identifying genetic changes (variants) is the first step in making genomics useful for healthcare and life sciences. Determining what the changes mean (annotation) and whether those changes are harmful (pathogenic) is the critical step. Attendees will apply deep learning methods to genomic data to identify pathogenic variants with high accuracy and speed. They will perform data reduction and train their own models. By varying parameters in the model, DANN (Deleterious Annotation of Genetic Variants Using Neural Networks) run on Nvidia's GPC, they will create an ROC to compare with variant annotators from prior scientific publications. Attendees will complete the lab understanding how deep learning improves genomic data analysis using the GPU platform. 120 Minutes Instructor-Led Lab Chris Yoo - Chairman and CEO, Systems Imagination Inc
David Schneider - Director of Knowledge Engineering, Systems Imagination
Kendyl Douglas - Applied Mathematician, Systems Imagination
Margaret Linan - Deep Learning Bioinformatics Scientist, Systems Imagination
Favorite
L8175 - Training Semantic Segmentation for DRIVE PX The level of accuracy needed for urban driving is different than highway driving due to the density of different objects in a given scene. Using CamVid dataset, this lab will go through all the steps required to do semantic segmentation given the computation capabilities of DRIVE PX2. You'll learn how to: • Convert an existing network into a fully convolutional network • Explore different design choices to fit into the computation budget • Train a semantic segmentation neural network Upon completion, you'll be able to use create and train a fully convolutional network for semantic segmentation tasks in self-driving cars. Prerequisites: Fundamentals or equivalent background/experience 120 Minutes Instructor-Led Lab Aaraadhya Narra - Solutions Architect, DLI Certified Instructor, NVIDIA
Favorite
S81004 - The Early Detection of Pancreatic Cancer Using Deep Learning: Preliminary Observations

This talk will present the challenges and opportunities in developing a deep learning program for use in medical imaging. It will present a hands on approach to the challenges that need to be overcome and the need for a multidisciplinary approach to help define the problems and potential solutions. The role of highly curated data for training the algorithms and the challenges in creating such datasets is addressed. The annotation of data becomes a key point in training and testing the algorithms. The role of experts in computer vision, and radiology will be addressed and how this project can prove to be a roadmap for others planning collaborative efforts will be addressed Finally I will discuss the early results of the Felix project whose goal is nothing short of the early detection of pancreatic cancer to help improve detection and ultimately improve patient outcomes.

50-minute Talk Elliot Fishman - Professor of Radiology, Surgery, Oncology and Urology, Johns Hopkins Hospital
Favorite
S81047 - Introduction to Deep Stream SDK

Introduction to high performance deep learning inference for video analytics. NVIDIA DeepStreamSDK simplifies the development of scalable intelligent video analytics (IVA) applications powered by deep learning for smart cities and hyperscale datacenters. 

25-minute Talk Kaustubh Purandare, NVIDIA
Favorite
S8135 - Programming GPU-based Extreme-Scale HPC Systems: OpenSHMEM and SharP This talk will introduce two programming models OpenSHMEM and SharP to address the programming challenges of HPC systems with multiple GPUs per node, high-performing network, and huge amount of hierarchical heterogeneous memory. SharP uses distributed data-structure approach to abstract the memory and provide uniform interfaces for data abstractions, locality, sharing and resiliency across these memory systems. OpenSHMEM is a well-established library based PGAS programming model for programming HPC systems. We show how NVSHMEM, an implementation of OpenSHMEM, can enable communication in the CUDA kernels and realize OpenSHMEM model for GPU-based HPC systems. These two complementary programming models provide ability to program emerging architectures for Big-Compute and Big-Data applications. After the introduction, we will present experimental results for a wide-variety of applications including QMCPack, HPGMG, CoMD, and Memcached demonstrating the programming model advantages. 50-minute Talk Sreeram Potluri - Senior Software Engineer, NVIDIA
Manjunath Gorentla Venkata - Research Scientist, Oak Ridge National Laboratory
Favorite
S8219 - GUNREAL: GPU-Accelerated Unsupervised Reinforcement and Auxiliary Learning We'll introduce GPU-accelerated unsupervised reinforcement and auxiliary learning (UNREAL) algorithm. Recent state-of-the-art deep reinforcement learning algorithms, such as A3C and UNREAL, are designed to train on a single device with only CPUs. Using GPU acceleration for these algorithms results in low GPU utilization, which means the full performance of the GPU is not reached. Motivated by the architecture changes made by the GA3C algorithm, which gave A3C better GPU acceleration, together with the high learning efficiency of the UNREAL algorithm, we extend GA3C with the auxiliary tasks from UNREAL to create GUNREAL. We show that our GUNREAL system finished training faster than UNREAL and reached higher scores than GA3C. 25-minute Talk Koichi Shirahata - Researcher, Fujitsu Laboratories Ltd.
Favorite
S8266 - AstroAccelerate - GPU-Accelerated Signal Processing for Next Generation Radio Telescopes AstroAccelerate is a GPU-enabled software package that focuses on enabling real-time processing of time-domain radio-astronomy data. It uses the CUDA programming language for NVIDIA GPUs. The massive computational power of modern day GPUs allows the code to perform algorithms such as de-dispersion, single pulse searching, and Fourier domain acceleration searching in real time on very large datasets, which are comparable to those that will be produced by next-generation radio telescopes such as the Square Kilometre Array. 50-minute Talk Wes Armour - Director, University of Oxford
Favorite
S8278 - CUDA - New Features and Beyond

CUDA is NVIDIA's parallel computing platform and programming model. You'll learn about new programming model enhancements and performance improvements in the latest release of CUDA, preview upcoming GPU programming technology, and gain insight into the philosophy driving the development of CUDA and how it will take advantage of current and future GPUs. You'll also learn about NVIDIA's vision for CUDA and the challenges for the future of parallel software development.

50-minute Talk Stephen Jones - Software Architect, NVIDIA
Favorite
S8376 - Compute-Enabled Efficiency: Technology Adapted AEC Workflows for Data Capture, Analysis & Presentation Learn how and why the architecture, engineering, and construction (AEC) community is using technology (e.g. drones, photogrammetry) to enable efficiency in data capture, analysis, and presentation workflows. Market drivers are pointing the AEC community toward compute enabled efficiencies, such as AI powered automation, to address labor-capacity and cost-competition challenges. New data capture techniques are the normal for efficiently capturing accurate data in support of activities across the full project life-cycle. Improved drone processes allows for geo-referenced data. GPUs provide post processing capability enabling this workflow. 50-minute Talk Bill Dale - Director, Applied Technology, Jacobs
Chris Torres - VDC Director, Emerging Technologies, Jacobs
Favorite
S8502 - GOAI One Year Later

This talk will discuss the evolution of the GPU Open Analytics Initiative (GoAi) from its inception to today. GoAi, at its core, is a collection of libraries, frameworks, and APIs that lower the barrier of GPU adoption for data scientists. The goal of GoAi is to enable end to end data science workflows across many multi-GPU servers, to analyze and understand data more efficiently than ever before. To date, GoAi includes methods for performing SQL, machine learning, data processing or feature engineering, graph analytics, and graph visualization all on the GPU. This talk will discuss the who, what, when, where, and whys of GoAi, and its integration into the traditional big data world through leading open source projects like Apache Arrow and Apache Parquet. Finally, this talk will highlight major achievements of GoAi, our plans for the future, and how developers can become a part of this rapidly evolving ecosystem.

50-minute Talk Joshua Patterson - Director AI Infrastructure, NVIDIA
Favorite
S8769 - Commoditizing GPU-as-a-Service Providers with Red Hat OpenShift Container Platform

Red Hat OpenShift Container Platform, with Kubernetes at it's core, can play an important role in building flexible hybrid cloud infrastructure. By abstracting infrastructure away from developers, workloads become portable across any cloud. With NVIDIA Volta GPUs now available in every public cloud [1], as well as from every computer maker, an abstraction library like OpenShift becomes even more valuable. Through demonstrations, this session will introduce you to declarative models for consuming GPUs via OpenShift, as well as the two-level scheduling decisions that provide fast placement and stability.

25-minute Talk Jeremy Eder - Senior Principal Performance Engineer, Red Hat
Andre Beausoleil - Senior Principal Partner Manager, Red Hat
Favorite
S8784 - Deep Generative Models for Image and Video Creation We'll focus on recent developments in deep learning-based generative models for image and video creation. The last two to three years have seen an explosive growth in the development of generative adversarial networks, variational autoencoders, and related autoregressive methods that have been made it possible to automatically generate images and videos, by harnessing the power of GPUs and deep learning libraries. These methods present interesting possibilities in automatic generation of datasets for training machine learning methods, as well as in real-world applications for image and video processing such as morphing, editing, advertising, design, and art. We'll present the technical details of these methods and recent results in various settings. 25-minute Talk Vineeth N Balasubramanian - Assistant Professor, Indian Institute of Technology (IIT), Hyderabad, India
Favorite
S8878 - Cinematic Lighting in Unreal Engine

Join Epic's Kim Libreri and Marcus Wassmer along with NVIDIA's Ignacio Llamas and Edward Liu as they provide an in-depth view of the creative and technical aspects of creating photo-realistic cinematic content that runs at real time.

80 Minutes Tutorial Marcus Wassmer - Rendering Team Lead, Epic Games
Ignacio Llamas - Senior Manager of Real Time Rendering Software, NVIDIA
Kim Libreri - CTO, Epic Games
Edward Liu - Senior Real Time Rendering Engineer, NVIDIA
Favorite
S8915 - AI at the Edge - Intelligent Machines

Artificial intelligence is impacting almost every part of the industrial and agricultural supply chain. From robots that quickly adapt to build new products, to automated vehicles that address last-mile challenges for product delivery, to UAVs that can automatically detect failing infrastructure, the world is transitioning from processes that are largely manual to ones that are largely automated. We'll discuss how AI and deep learning are enabling these advances. We'll also analyze a sampling of early successes across different applications. And finally we'll describe some of the remaining challenges to wide-scale deployment, and the work NVIDIA is doing to address those challenges via its Isaac initiative.

25-minute Talk Jesse Clayton - Senior Manager of Product Management for Intelligent Machines, NVIDIA
Favorite
S8952 - Rapid Pace of Change and Industry Progress We are still in the early stages of AI, and its impact on industries is already significant - from healthcare to financial services to retail. Businesses are seeing unprecedented levels of efficiencies and productivity, which will only continue to rise and transform how companies operate. This session will explore the progress of AI adoption over the last year, the industries that are leaping ahead, new AI innovations that will serve cross-industry concerns, and what businesses should expect in terms of adoption maturity in 2018. 50-minute Talk Nick Patience - Founder & Research Vice President, 451 Research
John Abbott - Founder & Research Vice President, 451 Research
Favorite
S8970 - Creating AI-Based Digital Companion for Mercedes-Benz Vehicles

In-vehicle user experience needs intelligence not only to delight its users with a truly personalized experience and to simplify repetitive actions but also to minimize cognitive load and to decrease distractions.

When driving becomes fully autonomous, vehicle needs to understand its users’ intent without getting explicit directions from them. To achieve such experience, customers’ behavior and interactions are analyzed in real-time to understand their intent and to predict what they will do next.

25-minute Talk Rigel Smiroldo - Principal Engineer, Machine Learning & Predictive UX, Mercedes-Benz Research & Development North America Inc.
Favorite
S8581 - Object-Level Deep Reinforcement Learning We'll show how deep reinforcement learning can be greatly sped up by separating perception and action, with a reward function specified in terms of objects and their motions, which are supplied by the perceptual system. In the past five years, reinforcement learners have become vastly more powerful by incorporating deep learning techniques, playing Atari, Mario, Go, and other games with superhuman skill. However, these learners require vast amounts of training data to become skilled. For example, to master Pong, state-of-the-art reinforcement learners require tens of millions of game frames, equivalent to months of play time at human speed. We show that endowing the learner with a minimal perceptual system, capable of detecting and tracking objects, greatly reduces the number of frames needed for learning. This shifts the learning bottleneck from the amount of training data available to computations easily accelerated with GPUs. 25-minute Talk William Agnew - PhD Student, University of Washington
Favorite
S8663 - Microsoft AI and Research - Infrastructure Overview for Deep Learning and Other Research Microsoft Research leverages a wide variety of open-source, free and custom tools to manage a complex infrastructure for doing research. We are in a unique position at Microsoft and in the industry, where we serve academic experts who expect access to the latest open source tools, in an environment where Microsoft solutions should also be considered. See examples of how we manage popular/constrained assets and enforce fairness across many systems. Linux/Docker, Windows, On-site, Azure, or a hybrid of all-of-the above – we see it all. In this session, you will learn what tools can be easily leveraged to manage your own onsite and cloud GPU infrastructure. We touch on Cluster management fabrics, scheduling, authentication, hot storage, configuration management, software portability/container management and high-performance hardware selection. 25-minute Talk Jim Jernigan - Sr. R&D Systems Engineer, Microsoft Research
Favorite
S8666A - Deploying Autonomous Vehicles with NVIDIA DRIVE -

DRIVE PX is an open platform for Autonomous Driving Ecosystem. It’s been adopted by over 300 partners in the automotive ecosystem to develop solutions for vehicles that are intelligent and autonomous. This talk will outline the technical challenges facing development of autonomous intelligent vehicles and provide details of how the next generation of DRIVE AI car computer i.e. DRIVE Xavier and DRIVE Pegasus address these challenges.

25-minute Talk Srikanth Sundaram - Senior Product Manager DRIVE PX 2, NVIDIA
Favorite
S8966 - Building Smarter Cities with AI-Powered Applications

Learn how Verizon is helping create safer streets, reducing traffic congestion, aiding the navigation of both vehicles and pedestrians, and reducing energy costs and consumption through AI-enabled sensor based networks that leverage LED street lighting infrastructure. We will discuss our Vision Zero application and how use deep learning to recognize, detect, classify and concurrently track vehicles in traffic, pedestrians, bicyclists, and parked cars, and turn it into actionable data to help make better urban planning decisions and quantify the results.

25-minute Talk Andrew Herson - Head of Computer Vision Products, Verizon
Favorite
CE8169 - Connect with the Experts: Performance Analysis and Optimization (2)

Come ask your GPU code optimization questions to experts in the field.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Alexey Romanenko, NVIDIA
Lei Wu, NVIDIA
Kamesh Arumugam Karunanithi - -, NVIDIA
Jakob Progsch - Developer Technology Engineer, NVIDIA
Peng Wang, NVIDIA
Milos Maric - -, NVIDIA
Alan Gray - Developer Technology Engineer, NVIDIA
Favorite
S8117 - Learning-Free Universal Style Transformer Universal style transfer aims to transfer any arbitrary visual styles to content images. Existing feed-forward based methods, while enjoying the inference efficiency, are mainly limited by inability of generalizing to unseen styles or compromised visual quality. We'll present a simple yet effective method that tackles these limitations without training on any pre-defined styles. The key ingredient of our method is a pair of feature transform -- whitening and coloring -- that are embedded to an image reconstruction network. The whitening and coloring transforms reflect a direct matching of feature covariance of the content image to a given style image, which shares similar spirits with the optimization of Gram matrix-based cost in neural style transfer. We demonstrate the effectiveness of our algorithm by generating high-quality stylized images with comparisons to a number of recent methods. We also analyze our method by visualizing the whitened features and synthesizing textures via simple feature coloring. 25-minute Talk Chen Fang - Research Scientist, Adobe Research
Favorite
S8316 - Multi-GPU Programming Models Do you need to compute larger or faster than a single GPU allows you to? Learn how to scale your application to multiple GPUs. Learn how to use the different available multi-GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy. 50-minute Talk Jiri Kraus - Senior Devtech Compute, NVIDIA
Favorite
S8339 - Powering Real-Time Radio Astronomy Signal Processing with Latest GPU Architectures

We'll present a summary of ongoing work that targets the use of newer GPU architecture (Pascal and Volta) features in real-time signal processing applications in radio astronomy telescopes, and outline the future growth path for this exciting new application of GPUs. With Pascal and Volta architectures, we'll discuss the advantage of using higher memory bandwidth, half-single precision, and integer arithmetic in existing GPU-based correlator pipeline code. This is an ongoing effort between the National Centre for Radio Astrophysics and NVIDIA. We'll look at various processing stages involved in the pipeline for exploring optimization possibilities, and highlight interesting results that were achieved. We'll address in detail the effect of using half precision with respect to accuracy of performance and required library changes.

25-minute Talk Harshavardhan Reddy - Engineer-C, NCRA
Favorite
S8429 - At the Intersection of AI Cities and ANNs Artificial intelligence has promised a lot. Now decades old, it's still hard to tell the theoretical from the practical. Smart cities are a clear example of where AI technology is solving real-world problems today. Public transportation, smart grids, and data-focused city planning are all being pushed forward, but more visionary goals like safety, walkability, and improved citizens' experience remain elusive. Intersections have long been as much an opportunity as a challenge with complicated traffic pattern, cars interacting in multiple directions and at varied velocities. Add pedestrians, and the ability to quantify the complex interaction is only possible through advanced technology. At Motionloft, we've committed to digitize the physical world. To bring the most meaningful data possible, we realized that we couldn't rely on the marketplace for sensors. Since there is no path to accurate data without artificial neural networks and no weatherproof computer vision, we endeavored to develop our own heuristics using all tools available. We'll describe a variety of use cases this allowed, with particular focus on traffic intersections. 25-minute Talk Paul McAlpine - VP OF ENGINEERING, MOTIONLOFT
Favorite
S8637 - Analysis of Performance Gap Between OpenACC and the Native Approach on P100 GPU and SW26010: A Case Study with GTC-P We'll present our experience with using OpenACC to port GTC-P, a real-world plasma turbulence simulation, on NVIDIA P100 GPU and SW26010, the Chinese home-grown many-core processor. Meanwhile, we developed the GTC-P code with the native approach on Sunway TaihuLight supercomputer so that we can analyze the performance gap between OpenACC and the native approach on P100 GPU and SW26010. The experiment results show that the performance gap between OpenACC and CUDA on P100 GPU is less than 10% by PGI compiler. However, the gap on SW26010 is more than 50% since the register level communication only supported by native approach can avoid low-efficiency main memory access. Our case study demonstrates that OpenACC can deliver impressively portable performance on P100 GPU, but the lack of software cache via RLC supported by the OpenACC compiler on SW26010 results in large performance gap between OpenACC and the native approach. 25-minute Talk Stephen Wang - GPU Specialist, Shanghai Jiao Tong University
Favorite
S8834 - In-Vehicle Change Detection, Closing the Loop in the Car The world isn't static. It's constantly shifting and evolving. Mapping systems that support autonomous driving must therefore constantly detect, verify, and update the changes that are happening in the world, in near real-time, and make appropriate updates to the map. The only way for a map to obtain this level of freshness is to crowdsource data from sensors installed on vehicle fleets in order to adapt to, and to match to, the constantly changing environment—it needs the ability to self-heal. Yet, a constant creation and transmission of vehicle environment data over the air, from a large vehicle fleet, is not practical economically. Hence a strategy for minimizing the necessary bandwidth and subsequent cost for data transmission is crucial. HERE Technologies is developing an in-vehicle solution to ensure autonomous vehicles have the most up to date HD Live Map data while minimizing bandwidth and costs for data transmission. 25-minute Talk Stephen O'Hara - Principal Research Engineer, HERE Technologies
Favorite
S8849 - GE's Evolution from HPC to AI in Healthcare

For more than a decade, GE has partnered with Nvidia in Healthcare to power our most advanced modality equipment, from CT to Ultrasound. Part 1 of this session will offer an introduction to the deep learning efforts at GEHC, the platform we're building on top of NGC to accelerate new algorithm development, and then a deep dive into a case study of the evolution of our cardiovascular ultrasound scanner and the underlying extensible software stack. It will contain 3 main parts as follows: (a) Cardiovascular ultrasound imaging from a user perspective. Which problems we need to solve for our customers. Impact of Cardiovascular disease in a global perspective (b) An introduction to the Vivid E95 and the cSound platform , GPU based real time image reconstruction & visualization. How GPU performance can be translated to customer value and outcomes and how this has evolved the platform during the last 2 ½ years. (c) Role of deep learning in cardiovascular ultrasound imaging, how we are integrating deep learning inference into our imaging system and preliminary results from automatic cardiac view detection.

50-minute Talk Keith Bigelow - VP Analytics, GE Healthcare Waukesha
Erik Steen - Chief Engineer, GE Healthcare
Favorite
S8984 - Success in the Age of AI From healthcare to financial services to retail, businesses are seeing unprecedented levels of efficiencies and productivity, which will only continue to rise and transform how companies operate. This session will look at how Accenture as an enterprise is optimizing itself in the age of AI, as well as how it guides its customers to success. A look at best practices, insights, and measurement to help the audience inform their AI roadmap and journey. 50-minute Talk Michael Sutcliff - CEO, Accenture Digital, Accenture
Favorite
S8999 - How GPU Server Architectures Deliver increase productivity for Deep Learning Training Workloads & HPC Customers (Presented by Supermicro)

Overview of numerous GPU hardware platforms designed for today's taxing AI/machine learning and HPC workloads, including custom solutions targeted for Deep Learning Inferencing and Deep Learning Training. Talk will cover systems based on PCIe based GPUs as well as GPU systems with the NVLink interface.

50-minute Talk Jason Pai - Director, Super Micro Computer, Inc.
Sarosh Irani - Principal Product Manger, Supermicro
Favorite
S81010 - The Real-Time Revolution GPU accelerated creative development platforms are no longer just for games, they're revolutionizing areas from film to automotive. See how Unity is being used to enable unheard-of levels of productivity and create even deeper collaboration between teams. 25-minute Talk Adam Myhill - Head of Cinematics, Unity
Favorite
S8156 - Deep Learning for Transportation: Fast Estimation of Travel Times Using Historical Routes During this presentation we will review a deep neural network architecture and its training approaches used for producing high volume of estimations of travel times on a road graph with historical routes and traffic. This includes initial and continuous online training, finding various sources to produce training data, challenges of quality control, and, of course, the invaluable role of GPU's for computation during both training and inference. 25-minute Talk Dmitry Kudinov - Senior Data Scientist, Esri Inc.
Favorite
S8522 - DirectX: Evolving Microsoft's Graphics Platform For over 20 years, DirectX has been the platform used by game developers to create the fastest, most visually impressive games on the planet. Come and learn our plans to deliver the next generation of graphics innovation. 50-minute Talk Matt Sandy - Program Manager, Microsoft
Favorite
S8787 - Differentiable Tree Planning for Deep Reinforcement Learning We'll discuss recent research in deep reinforcement learning (RL), with a focus on the application of intuitions, from planning to neural network architectures for deep RL. Planning in complex visual environments has thus far been held back by the difficulty of learning accurate predictive models. To address this, we embedded a model inside a differentiable, dynamically-constructed tree-planning architecture, so that we identify an effective model when used within that planner. We'll share our work on developing these architectures, as well as our approaches to various technical obstacles associated with the efficient optimization of deep tree-structured models on GPU. 50-minute Talk Gregory Farquhar - DPhil Candidate, University of Oxford
Favorite
S8836 - The New Era of Investments

We'll discuss Qraft Technologies plan to deliver: 1) The remarkable performances Qraft's AI engines have achieved in the financial industry; 2) The concept of technology used in the AI engines to generate strategic investment portfolios. Qraft provides materials that include actual examples of a robo-fund where AI is used to create a mutual fund, robo-advisor where AI recommends an optimal portfolio consisting of mutual funds and fully reflects an investor's propensity, and other important achievements that Qraft has obtained in the financial industry.  Qraft is constructing an eco-system of AI in investment that includes the world's well-known institutions and researchers. 

25-minute Talk Hyung Sik - Chief Executive, Qraft Technologies
Favorite
S8847 - Solar Storm Modeling using OpenACC: From HPC Cluster to "In-House" We explore using OpenACC to migrate applications required for modeling solar storms from CPU HPC clusters to an "in-house" multi-GPU system. We describe the software pipeline and the utilization of OpenACC in the computationally heavy codes. A major step forward is the initial implementation of OpenACC in our Magnetohydrodynamics code MAS. Strategies for overcoming some of the difficulties encountered are discussed, including handling Fortran derived types, array reductions, and performance tuning. Production-level "time-to-solution" results will be shown for multi-CPU and multi-GPU systems of various sizes. The timings show that it is possible to achieve acceptable "time-to-solution"s on a single multi-GPU server/workstation for problems that previously required using multiple HPC CPU-nodes. 25-minute Talk Ronald Caplan - Computational Scientist, Predictive Science Inc.
Favorite
S8914 - Automating the Last Mile

Self-driving vehicles will transform every aspect of how we work and play. Humanity spends 500 million hours each day driving to and from the grocery store. The impact of automating these tasks is huge. Marble is building self-driving delivery vehicles to give you back this time and make delivery a delightful experience. I'll talk about why delivery is a good application of robotics, and how deep learning enables us to automate driving.

50-minute Talk Kevin Peterson - Cofounder, Marble
Favorite
CE8152 - Connect with the Experts: Deep Learning Basics (3)

Attend this session to get your questions on deep learning basics and concepts answered. NVIDIA experts can help you with the fundamentals and provide guidance on how and when to apply Deep Learning and GPUs to your work. No question is too basic to ask.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject.  

1 Hour Connect with the Experts Adam Thompson - Senior Solutions Architect, NVIDIA
Philippe Vandermersch, NVIDIA
Manish Gupta, NVIDIA
Hassan Kianinejad, NVIDIA
Kevin Vincent, NVIDIA
Yang Xu, NVIDIA
Favorite
L8142 - Neural Network Deployment with DIGITS and TensorRT Prerequisites: Image Classification with DIGITS

Duration: 2 hours

Framework: Caffe with DIGITS and TensorRT

Deep learning lets us map inputs to outputs that are extremely computationally intense. Learn to deploy deep learning to applications that recognize images and detect pedestrians in real time by:

• Accessing and understanding the files that make up a trained model

• Building from each function's unique input and output

• Optimizing the most computationally intense parts of your application for different performance metrics like throughput and latency

Upon completion of this lab, you'll be able to implement deep learning to solve problems in the real world.

Presented by the NVIDIA Deep Learning Institute (DLI).
120 Minutes Instructor-Led Lab Mike Mendelson - Deep Learning Institute Curriculum Developer, NVIDIA
Favorite
L8168 - Image Generation Using CycleGAN AI can automatically generate every horse as a zebra, while the same process can generate satellite imagery from any map. The same AI can take a sprite sheet and generate a sheet with a different theme for automatic digital asset creation. In this lab, you will learn how to: • Use image analogies to translate image to image • Create Autoencoder architecture using encoder, transformer, and decoder • Employ PatchGAN discriminator to complete the Generative Adversarial Network After completion of this lab, you will be able to automatically create analogous images using CycleGAN. Fundamentals of Deep Learning with Computer Vision or similar experience 120 Minutes Instructor-Led Lab Kelvin Lwin - Certified Instructors, NVIDIA
Favorite
L8181 - Deep Learning for Genomics using DragoNN with Keras and Theano Learn to interpret deep learning models to discover predictive genome sequence patterns. Use the DragoNN toolkit on simulated and real regulatory genomic data to: • Demystify popular DragoNN (Deep RegulAtory GenOmics Neural Network) architectures • Explore guidelines for modeling and interpreting regulatory sequence using DragoNN models • Identify when DragoNN is a good choice for a learning problem in genomics and high-performance models Upon completion, you'll be able to use the discovery of predictive genome sequence patterns to hopefully gain new biological insights. 120 Minutes Instructor-Led Lab Steven Steinke - Curriculum Developer, NVIDIA
Yonatan Israeli - Consultant, NVIDIA
Favorite
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and Engineering Teams The Krylov Project is the key component in eBay's AI Platform initiative that provides an easy to use, open, and fast AI orchestration engine that is deployed as managed services in eBay cloud. The main goals of the project are: Every AI and machine learning algorithm should be shareable and easily implementable with possible options of frameworks; enable machine learning engineers to do end-to-end training pipelines that distribute and parallelize over many machines; training models should be automated and allow easy access to vast eBay datasets; engineers should be able to search past job submissions, view results, and share with others. We have built Krylov from the ground up, leveraging JVM, Python, and Go as the main technologies to build the Krylov components, while standing in shoulder of giants of technology such as Docker, Kubernetes, and Apache Hadoop. Using Krylov, AI scientists can access eBay's massive datasets; build and train AI models; spin up powerful compute (high-memory or GPU instances) on the Krylov HPC cluster; and set up machine learning pipelines, such as using declarative constructs that stitch together pipeline lifecycle. 50-minute Talk Henry Saputra - Technical Lead, eBay
Favorite
S8310 - Can FPGAs Compete with GPUs?

Previously, FPGAs were known to be highly energy efficient, but notoriously difficult to program, and unsuitable for complex HPC applications. This is changing due to new technology developments: a high-level programming language (OpenCL), hard floating-point units, and tight integration with CPU cores. We'll compare FPGAs and GPUs with respect to architecture, programming model, programming effort, performance, and energy efficiency, using some radio-astronomical signal-processing and imaging algorithms as examples. Can they compete with GPUs?

25-minute Talk John W. Romein - Senior Researcher, ASTRON (Netherlands Institute for Radio Astronomy)
Favorite
S8344 - OpenMP on GPUs, First Experiences and Best Practices OpenMP has a long history on shared memory, CPU-based machines, but has recently begun to support offloading to GPUs and other parallel accelerators. This talk will discuss the current state of compilers for OpenMP on NVIDIA GPUs, showing results and best practices from real applications. Developers interested in writing OpenMP codes for GPUs will learn how best to achieve good performance and portability. 50-minute Talk Jeffrey Larkin - Senior DevTech Software Engineer, NVIDIA
Favorite
S8383 - SmartSense: Real-Time, Field-Deployed CV Traffic Analysis System Miovision presents a video-based traffic analytics system, capable of tracking and classifying vehicles in real time throughout cities. The system leverages Jetson TX2 modules and inferencing to accurately classify vehicles at over 50 frames per second using single-shot multibox detection and DAC, a VGG-based network. We'll cover many of the issues our teams went through to design and implement the system, including data collection, annotation, training, incorporating continuous training, and deep learning iteration. We'll also illustrate how the measured traffic trends were used to reduce congestion and evaluate the health of traffic corridors. 25-minute Talk Justin Eichel - Technical Director, Miovision
Favorite
S8398 - Designing Human Centric Spaces with Holodeck and Machine Learning The growth in density of housing in cities like London and New York has resulted in the higher demand for efficient smaller apartments. These designs challenge the use of space and function while trying to ensure the occupants have the perception of a larger space than provided. The process of designing these spaces has always been the responsibility and perception of a handful of designers using 2D and 3D static platforms as part of the overall building design and evaluation, typically constraint by a prescriptive program and functional requirement. A combination of human- and AI-based agents creating and testing these spaces through design and virtual immersive environments (NVIDIA Holodeck) will attempt to ensure the final results are efficient and best fit for human occupancy prior to construction. 25-minute Talk Cobus Bothma - Applied Research Director, KPF
Xin Zhang - BIM Specialist, Kohn Pedersen Fox Associates
Favorite
S8412 - Deep Imaging: Quantitative Biomarkers for Clinical Decision Making

The transformation towards value-based healthcare needs inventive ways to lower cost and increase patient health outcomes. Artificial intelligence is vital for realizing value-based care. Turning medical images into biomarkers helps to increase effectiveness of care.

25-minute Talk Razvan Ionasec - Global Product Manager for Artificial Intelligence, Siemens Healthineers
Favorite
S8489 - Scaling Molecular Dynamics Across 25,000 GPUs on Sierra & Summit As a part of the Department of Energy/National Cancer Institute pilot programs and the Sierra Institutional Center of Excellences, Lawrence Livermore National Laboratory has developed strong scaling molecular dynamics codes for atomic-level simulation in physics, materials science, and biology. Our implementation is portable from tablets and laptops to supercomputers, and can efficiently scale up to tens of thousands of GPUs. In particular, we target the Department of Energy leadership computing facilities, Sierra and Summit, at the Livermore and Oak Ridge National Laboratories. These are over 100-petaflops supercomputers powered by IBM and NVIDIA hardware. We'll discuss the performance and scaling of our code, and its application to cancer biology research, material science, and high-energy physics. 50-minute Talk Shiv Sundram - Scientific Software Developer, Lawrence Livermore National Laboratory
Tomas Oppelstrup - Staff Scientist, Lawrence Livermore National Laboratory
Favorite
S8562 - The Future of AI for Media & Entertainment AI has already had a major impact on Media & Entertainment – from connecting people with relevant content, to video analytics and dynamic distribution. Join our panelists to gain high-level insights about new ways AI will impact the Film, Television, AR/VR, and Broadcast industries. We'll discuss advancements in content creation, dynamic delivery, and intelligent interactivity. 50-minute Panel Munika Lay - Director, Strategy & Business Development, End Cue
Vicki Dobbs Beck - Executive in Charge, ILMxLAB
Shalini De Mello - Senior Research Scientist, NVIDIA
Marcie Jastrow - SVP of Immersive Media, Technicolor
Rick Champagne - Global Media & Entertainment Strategy and Marketing, NVIDIA
Favorite
S8614 - Digital Twin for the Railway Network We describes concept of Digital Twin with respect to the Railway Network. Railroad customers across the world manage thousands of miles of Track infrastructure that consists of the Rails, Ballast, Ties, Bridges, Tunnels, Wayside equipment, etc. This talk demonstrates a new approach to Track infrastructure monitoring that GE is piloting for customers using the concept of Digital Twin for network. Using an offline GPU infrastructure – Deep Learning models are created and trained on large volumes of video data to learn the state of healthy Track and predict anomalies. During the talk, real customer use-case videos will be shown that demonstrate Analytics on videos from Locomotive-mounted cameras with Deep Learning models to calculate health index and display on a map for driving Maintenance decisions. 50-minute Talk Dattaraj Rao - Principal Architect, General Electric
Favorite
S8751 - Bringing Data to Life - Data Management and Visualization Techniques

We'll discuss a practical overview and the financial implications of popular data ingestion and pre-processing techniques use today. We'll provide creative techniques for using GPU database technology to better understand financial industry data. We'll focus on the use of Spark, Alluxio, Arrow, NiFi, Sqoop, Kafka, Tensorflow Datasets, and GPU database techniques throughout the different phases of data management and analysis.

50-minute Talk Benika Hall - Analytics Consultant, Wells Fargo
Rob Harrison - Analytics Consultant, Wells Fargo
Favorite
S8921 - Development of a Self-Learning AI-Based L4 Vehicle - The Dream Car

The development of self-driving cars requires a strong relationships between partners in a different way as we know it today. This might be the only way to successfully bring self-driving vehicles on the road. ZF, Virtual Vehicle, and NVIDIA have joined forces to develop an AI-based L4 vehicle for urban scenarios in only six months; the so-called dream car. Learning while sleeping is the groundbreaking idea of the dream car which was realized in the second half of 2017. Without driving around, the car constantly learns and adapts itself based on data acquired from other cars driving around somewhere else in the world. The key is AI and ZF's ProAI which was developed with NVIDIA in the past year. ProAI interprets the data in real-time, learns from it, validates the data, checks the plausibility, and adjusts the vehicle behavior. We'll summarizes the implementation steps, HW and SW architecture, relevant driving/testing scenarios, our AI approach, and the challenges met in order to realize the dream car.

25-minute Talk Oliver Briemle - Head of L4 Feature Development, Domain Control and V2X, ZF
Daniel Watzenig - Head of Department and Full Professor, Virtual Vehicle
Favorite
S8953 - AI for Social Good as an Innovation Driver Innovation can take many forms, and led by varying stakeholders across an organization. One successful model is utilizing AI for Social Good to drive a proof-of-concept that will advance a critical strategic goal. The Data Science Bowl (DSB) is an ideal example, launched by Booz Allen Hamilton in 2014, it galvanizes thousands of data scientists to participate in competitions that will have have far reaching impact across key industries such as healthcare. This session will explore the DSB model, as well as look at other ways organizations are utilizing AI for Social Good to create business and industry transformation. 50-minute Panel Richard Wender - Chief Cancer Control Officer, American Cancer Society
Ben Hamner - Cofounder and CTO, Kaggle
Josh Sullivan - Senior Vice President, Booz Allen Hamilton
Catherine Ordun - Senior Data Scientist, Booz Allen Hamilton
Favorite
S8563 - Building a GPU-Focused CI Solution As the number of GPU-accelerated applications have multiplied, the needs for better development tools and services have increased as well. Chief among such services is continuous integration (CI), which dramatically improves and speeds up the development life cycle through automated builds and integration testing. CI for GPU-accelerated applications comes with its own set of challenges, but the rewards can be enormous. We'll walk through how we implemented CI for the NVIDIA GPU Cloud by leaning on open source solutions such as Jenkins, discuss the lessons we learned in the process, and demonstrate how other such systems should be built in the future. 25-minute Talk Michael Wendt - Manager, Applied Engineering Solutions, NVIDIA
Favorite
S8649 - VR and AI in the Hospitality Industry Virtual Reality and Artificial Intelligence are the keys to revolutionizing the hospitality industry. From how hotels are designed, to how guests shop for their rooms, to the complete gamut of on-premise experiences, the entire hospitality experience is on the cusp of change. At Gettys Group we're embracing VR throughout our design projects, and we've begun exploring how AI can simultaneously enhance guest experiences and reduce hotel staffing costs. In this presentation, we'll share examples of our new VR-enhanced workflows, highlighting how we're leveraging NVIDIA's VR-Ready systems and the new Holodeck platform to simultaneously accelerate our processes and win new business. We'll conclude with our wishlist for additional immersive experience functionality and our thoughts on how this revolution will affect the broader travel industry. 25-minute Talk Stephen Phillips - Chief Technology Officer, Theia Interactive
Ron Swidler - Principal, The Gettys Group
Favorite
S8795 - Research To Production: How Facebook does AI at Scale (Presented by Facebook)

Facebook's strength in AI innovation comes from the ability to quickly bring cutting-edge research into large scale production using a multi-faceted toolset. We'll discuss how Facebook leverages open source software to perform truly iterative AI research, scale it seamlessly for inference, and deploy it across the data center and mobile environments with ONNX. 

50-minute Talk Howard Mansell - Engineering Manager, Facebook AI Research
Sarah Bird - Technical Program Manager, Facebook
Favorite
S8871 - AI Models to Clinical Practice: Open AI Marketplace for Diagnostic Imaging

Learn about the importance of clinical domain expertise in AI algorithm/model development and incorporation into clinical workflow, specifically in medical imaging, from a radiologist. With growing media attention, there is much fear, hype, and hope when it comes to using DL in radiology. We will present through examples why it is essential to incorporate clinical domain expertise when developing DL models. We will demonstrate various ways AI can augment the radiologists both in image interpretation as well as beyond within the overall workflow. In the second portion of this talk, we will present the gap between developing a great AI model in isolation and having it become part of daily medical practice. From integration and hospital connectivity to algorithm serving at scale to meet growing demand, we will show how an AI Marketplace can create the ecosystem that allows AI to flourish.

25-minute Talk Woojin Kim - Chief Medical Information Officer, Nuance Communications
Arman Sharafshahi - Engineering Director, Nuance Communications
Favorite
SE1001 - “I am AI” Docuseries Screening

Come to a special screening of the “I am AI” docuseries, where we’ll show the first five episodes of the series and debut the never-before-seen sixth episode. This original docuseries explores the world’s greatest artificial intelligence achievements and the people who are making them happen. To learn more about the docuseries, visit https://www.nvidia.com/en-us/deep-learning-ai/industries/ai-innovators/.   

Special event 40 min Special Event
Favorite
CE8154 - Connect with the Experts: Jetson (2)

NVIDIA Jetson is the world's leading computing platform for AI at the edge. High in performance and low in power, it's ideal for compute-intensive embedded applications like robots, drones, mobile medical imaging, and Intelligent Video Analytics (IVA). OEMs, independent developers, makers and hobbyists can use the NVIDIA Jetson Developer Kit and module to explore the future of embedded computing and artificial intelligence. Have questions? Jetson experts and the NVIDIA Developer Tools team will be present to cover CUDA debugging and profiling, system trace and graphics debugging and profiling tools, and more.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Eric Brower - Software Management, NVIDIA
Rohit Vaswani, NVIDIA
Amiya Trivedi, NVIDIA
Andrey Trachenko, NVIDIA
Avraham Shapira - Sr. Director Software Engineering, NVIDIA
Michael Colburn, NVIDIA
Sean Pieper, NVIDIA
Sanjiv Satoor, NVIDIA
Felix Schmitt, NVIDIA
Dustin Franklin - Sr. Technical Marketing Manager & Evangelist, NVIDIA
John Welsh, NVIDIA
Favorite
S8244 - How Multi-User Collaborative VR is Changing the Way Architects Design Spaces Learn how CannonDesign leverages a multiuser VR environment to create shared experiences for collaboration in architectural building models. During this presentation, we'll cover the build of the environment, basic configuration, and how it was put into production. We'll also explore an overview of some of the software being used, and use cases for both local and remote collaboration. 25-minute Talk Jimmy Rotella - Digital Practice Director, CannonDesign
Ernesto Pacheco - Design Visualization Lead, CannonDesign
Favorite
S8584 - GPU-Powered Megacity Scale Transport Management, Municipal Services and Public Safety Solutions Learn how VisionLabs GPU-powered solutions contribute to creating a safer, smarter Megacity – a metropolitan area with a total population in excess of ten million people. We'll do a deep dive into three implemented and ongoing huge scale smart-city projects, understand challenges, technical specifics and how GPU computing impacts each of these cases: Face authentication-based immobilizer and driver monitoring systems for municipal service vehicles powered by the NVIDIA Jetson TX2 embedded platform; Megacity scale vehicle traffic analysis and anomalies detection powered by NVIDIA Tesla P40 with over 80 million daily recognition requests; National scale face identification platform for financial services with over 110 million faces in its database. The foundation of all these projects is VisionLabs LUNA – a cross-platform object recognition software based on proprietary deep neural networks (DNN) inference framework. To build cost-effective solutions, VisionLabs use know-hows in DNN quantization and acceleration. In terms of accuracy, VisionLabs is recognized as a top three best in the world by National Institute of Standards and Technology's face recognition vendor test, and LFW by University of Massachusetts challenges. 25-minute Talk Anton Nazarkin - International Development Director, VisionLabs
Favorite
S8667 - space.ml: Artificial Intelligence Meets Data-Driven Astrophysics We'll present a suite of artificial intelligence applications and computation geared towards increasing our understanding of the universe. The intensive collaboration between astrophysics and computer science has long started since Jim Gray and Alex Szalay. Nowadays, astrophysics continues to offer rich datasets, which are ideal for exploration with the latest in AI and computer science in general. We'll present successful projects in our space.ml initiative that try to answer a range of fascinating astrophysics questions. We'll show how we can use generative adversarial networks to go slightly beyond the Nyquist resolution limit in images, and to study the host galaxies of powerful quasars. We demonstrate how we can use transfer learning to identify rare galaxy mergers, and how to use variational autoencoders to forward model the processes in cosmology and galaxy evolution. We'll illustrate how we can use GPUs for compressive sensing to better analyze data from radio arrays, and to model the evolution of black holes over the age of the universe. Attendees will not only get our current answers to these questions but also get a taste of how AI is reshaping science today. 50-minute Talk Ce Zhang - Assistant Professor, ETH Zurich
Kevin Schawinski - Assistant Professor, ETH Zurich
Favorite
S8763 - Mergers & Acquisitions using Deep Learning

We'll present a case study of how a bank used machine learning to perform due diligence during company acquisitions. Techniques, strategy, and decision making mechanisms that ensured potential risks were illuminated, and mitigated. Technical details on the machine learning briefly discussed. We'll discuss how to employ cutting edge compute to slash costs and raise your ROI using NVIDIA DGX1 to achieve deep learning in real-time on millions of documents.

25-minute Talk Chris Ryan - Director, EDM Consultancy
Jonathan Bailey - Director, EDM Consultancy
Favorite
S8780 - Monte Carlo Methods and Neural Networks The average human brain has about 100 billion nerve cells. We therefore investigate the question whether there are algorithms for artificial neural networks that are linear in the number of neurons, while the number of connections incident to a neuron is bounded by a constant. We offer two approaches to answer this question: First, we derive an algorithm that quantizes a trained artificial neural network such that the resulting complexity is linear. Second, we demonstrate that training networks, whose connections are determined by uniform sampling can achieve a similar precision as compared to using fully connected layers. Due to sparsity upfront, these networks can be trained much faster. Both approaches are made plausible by relating artificial neural units to Monte Carlo integration. We'll demonstrate the results for classic test datasets. 25-minute Talk Noah Gamboa - Student, Stanford University
Favorite
S8887 - Computational Precision Medicine - How Healthcare May Look Like in 10 years Thanks to GPUs

This talk will overview the fields of Personalised Computational Medicine and In Silico Clinical Trials, which are revolutionizing Medicine and Medical Product Development. This talk will introduce these concepts, provide examples of how they can transform healthcare, and emphasize why artificial intelligence and machine learning are relevant to them. We will also explain the limitations of these approaches and why it is paramout to engage in both phenomenological (data-driven) and mechanistic (principle-driven) modelling. Both areas are in desperate need for better infrastructures -sofrware and hardaware- giving access to computational and storage resources. The talk will be thought-provoking and eye-opening as to opportunities in this space for researchers and industries alike.

25-minute Talk Alejandro Frangi - Professor / CISTIB Director, CISTIB / The University of Sheffield
Favorite
S8902 - The Latest of Project Apollo and Centralized in-car Computing Platform for Autonomous Driving Apollo Computing Unit (ACU), a mass production-oriented autonomous driving computing platform launched by Baidu, mainly features Apollo Pilot system and Intelligent Map service. As an important part of the Apollo platform, ACU is launched for mass production by the Baidu's partners. Based on the different computing capabilities required by different scenarios, it is divided into three series of products: ACU-Basic, ACU-Advanced, and ACU-Professional. 25-minute Talk Xing Yuan - General Manager of Strategy and Head of Automotive Services of Baidu I, Baidu
Favorite
S8987 - Deep Learning Institute Executive Workshop

This can't miss workshop kick-offs with a 20 minute talk "Practical Applications of AI" from Bryan Catanzaro, VP of Applied Research at NVIDIA. The talk will focus on how NVIDIA thinks about applying AI to practical problems and the characteristics of successful AI applications. After, there will be a Best Practices and Industry Use Cases panel featuring PayPal, Kaiser Permanente, Deserve and KickView. Workshop ends with a robust Q&A session.

80 Minutes Tutorial Thomas Fuchs - Associate Professor, Memorial Sloan Kettering Cancer Center
David Ohm - CEO & Co-Founder, KickView
William Ramey - Director, Developer Programs, NVIDIA
Ajay Gopal - Chief Data Scientist, Deserve
Venkatesh Ramanathan - Senior Data Scientist, PayPal
Taposh Dutta Roy - Manager, Finance Innovation, Kaiser Permanente
Bryan Catanzaro - VP, Applied Deep Learning Research, NVIDIA
Favorite
S8233 - Performance Improvements for CUDA Accelerated Real-Time Diagnostic Ultrasound Medical Imaging Motion Tracking

Motion tracking with motion compensation is an important component of modern advanced diagnostic ultrasonic medical imaging with microbubble contrast agents. Search-based on sum of absolute differences — a well-known technique for motion estimation — is very amenable to efficient implementations, which exploit the fine grained parallelism inherent in GPUs. We'll demonstrate a real-world application for motion estimation and compensation in the generation of real-time maximum intensity projections over time to create vascular roadmaps in medical images of organs, such as the liver with ultrasound contrast agents. We'll provide CUDA kernel code examples which make this application possible as well as performance measurements demonstrating the value of instruction-level parallelism and careful control of memory access patterns for kernel performance improvement. We hope to provide insight to CUDA developers interested in motion estimation and compensation as well as general insight into kernel performance optimization relevant for any CUDA developer.

25-minute Talk Ismayil Guracar - Senior Key Expert, Siemens Medical Solutions, USA Inc. Ultrasound Group
Favorite
S8430 - Everything You Need to Know About Unified Memory We'll cover all the things you need to know about Unified Memory: fundamental principles, important use cases, advances in the latest GPU architectures, HMM and ATS details, performance considerations and optimization ideas, and new application results, including data analytics and deep learning. 2018 is going to be the year of Unified Memory. Both HMM and ATS will be available and developers will start using the true Unified Memory model with the system allocator "the way it's meant to be played." We'll discuss all the caveats and differences between cudaMallocManaged and malloc. A big part of the talk will be related to performance aspects of Unified Memory: from migration throughput optimizations to improving the overlap between kernels and prefetches. 50-minute Talk Nikolay Sakharnykh - Sr. Developer Technology Engineer, NVIDIA
Favorite
S8520 - Predictive Learning of Factor Based Strategies Using Deep Neural Networks for Investment and Risk Management We develop and implement an approach using deep neural networks to process financial and macroeconomic signals to help identify key inflection points in equity market-based factor performance such as momentum and volatility. The model may be used to calibrate factor rotation strategies and better assess portfolio risks associated with factor-based exposures. The machine learning algorithm relies on the GPU for high-performance computations to drive an optimization framework within a deep neural network. 25-minute Talk Yigal Jhirad - Head of Quantitative and Derivatives Strategies, Cohen & Steers
Blay Tarnoff - Senior Application Developer and Database Architect, Cohen & Steers
Favorite
S8531 - Deep Learning Infrastructure for Autonomous Vehicles We'll introduce deep learning infrastructure for building and maintaining autonomous vehicles, including techniques for managing the lifecycle of deep learning models, from definition, training and deployment to reloading and life-long learning. DNN autocurates and pre-labels data in the loop. Given data, it finds the best run-time optimized deep learning models. Training scales with data size beyond multi-nodes. With these methodologies, one takes only data from the application and feeds DL predictors to it. This infrastructure is divided into multiple tiers and is modular, with each of the modules containerized to lower infrastructures like GPU-based cloud infrastructure. 50-minute Talk Pradeep Gupta - Head Solutions Architect, Autonomous Driving, NVIDIA
Favorite
S8856 - Improving Commercial Fleet Safety and Performing High-Def Mapping at the Same Time

In this talk will discuss how deploying cameras, sensors, and deep learning in commercial vehicles accelerated by NVIDIA Jetson can help analyze the driving environment in real time, improve driver safety, while at the same time performing dynamic HD mapping.

25-minute Talk David Julian - CTO, Netradyne
Favorite
S8893 - The Path to GPU as a Service in Kubernetes Kubernetes modern production patterns for Deep Learning applications and a deep dive into the Kubernetes GPU subsystem and its challenges (performance, scheduling, monitoring). Autonomous vehicles, face recognition, High Performance Computing, Virtual Reality, NVIDIA GPUs are enabling a new computer era with cloud computing at its center. With kubernetes being the next iteration in cloud technologies, the NVIDIA container team with the kubernetes community is driving the advances in GPU integration. During this talk we will review how to deploy a GPU enabled Kubernetes and the modern production patterns for deploying GPU enabled services and applications. We will also dive into the details of the Kubernetes device plugin (its GPU subsystem), the NVIDIA container stack and the limitations provided by the kubernetes infrastructure. We will finally be discussing the latest improvements in the device plugin subsystem of Kubernetes, and the challenges ahead of it such as NUMA, sharing and scheduling. 25-minute Talk Viraj Chavan - Director - NVIDIA GPU Cloud Compute Software, NVIDIA
Renaud Gaubert - Software Engineer on Kubernetes at NVIDIA, NVIDIA
Favorite
S8951 - Exploring Holodeck Use Cases for Architecture, Engineering and Construction

NVIDIA Holodeck is a collaborative, high-fidelity, virtual reality platform that empowers designers and inventors to bring their ideas to life. NVIDIA Holodeck will revolutionizes the design process by enabling designers to: • Visualize large and highly detailed models photorealistically, and at any scale • Simulate accurate physical interactions between people, objects and environments • Collaborate naturally and in real-time, within the same virtual environment • Enhance their workflows with AI-powered simulation tools Since its first early access release at GTC 2017, Holodeck has been used for design review in many different industries. The AEC industry has been one of them. The panel will focus on Holodeck for AEC. The panelists will discuss: • Holodeck use cases in the AEC industry. • Pain points and challenges in the architectural workflow that can be addressed with Holodeck. The panelist are experienced architects and early access Holodeck users. They will share their vision on how virtual reality and Holodeck will shape AEC in the future.

50-minute Panel Jimmy Rotella - Digital Practice Director, CannonDesign
Gregory Jones, NVIDIA
Nicholas Cameron - Director Digital Practice, Associate Principal, Perkins+Will
Andrew Burdick - Associate Principal /Director, Ennead Architects
Cobus Bothma - Applied Research Director, KPF
Brian Hopkins - Director of Applied Computing, Ennead Architects
Favorite
S8972 - Designing and Deploying End to End HPC and AI Solutions: Lessons learned from large scale HPC and AI Clusters (Presented by Penguin Computing)

We will discuss challenges and lessons learned from deploying multiple large scale HPC and AI clusters in different industries. Lessons learned will focus on end-to-end aspects of designing and deploying large scale gpu clusters including datacenter and environmental challenges, network performance and optimization, data pipeline and storage challenges as well as workload orchestration and optimization. You will learn more about open architectures for HPC, AI and Deep Learning, combining flexible compute architectures, rack scale platforms, and software-defined networking and storage, to provide a scalable software-defined deep learning environment. We will discuss strategies, providing insight into everything from specialty compute for training vs. inference to high performance storage for data workflows to orchestration and workflow management tools. We will also discuss deploying deep learning environments from development to production at scale from private cloud to public cloud.

25-minute Talk Kevin Tubbs - Sr. Director Technology and Business Development, Penguin Computing
Favorite
S81016 - How AI Technology Lifts the Ads Business in JD.com Deep learning and reinforcement learning are widely used in ads products of JD.com, e.g. ranking model in recommender systems, bidding model in ad exchange business and automatic ads review systems. These technologies have brought great benefits to JD.com and all of them are built on Nvidia GPUs. 25-minute Talk YAN YAN - AI Scientist, JD.com
Juyan Song - Account Manager, NVIDIA
Favorite
S81031 - Enabling Future Mobility Solutions with Automatic Vehicle Inspection Using Deep Learning

In recent times, there have been many advances in anomaly detection for computer vision applications. Despite this, the problem of  anomaly detection on any vehicles undercarriage remains very challenging for two main reasons: 
First, the data domain for a vehicle undercarriage is very unique; there is no publicly available data set, and it's not readily available online. Second, there is no dataset of threats to be detected, which can appear in any place or form (weapons, contraband etc). Essentially, this is a semi-supervised anomaly detection problem, where the anomaly class does not exist in the dataset.
In this presentation, we will describe the steps we took to solve this problem, including deep learning models for representations of vehicles, similarity metrics, segmentation, anomaly detection, and how all these models are combined into a singular system that analyzes a vehicle in just a few seconds. We will also show how models trained for security purpose have great value in the automotive industry, whereby in using similar systems we can detect various types of mechanical problems and damages to the exterior of any vehicle. 

25-minute Talk Amir Hever - CEO, UVeye
Favorite
S8432 - Disrupting Logistics and Optimization with AI In this talk, you will get a detailed yet accessible look at how AI is disrupting logistics. Firms have for years been using classical optimization algorithms to make decisions such as how to deliver goods to multiple clients in a city, place packages in a warehouse or route orders. Such algorithms are often built on heuristics which experts have designed to get reasonable solutions quickly. Recent advances in Deep Learning and Reinforcement learning are however making it possible to build AI systems that tackle these optimization problems from scratch. Through constant learning, a modern AI system can match and even beat existing optimization algorithms, or deliver faster solutions thanks to GPU parallel processing. Companies can now leverage these advances into significant efficiency gains for their operations. 25-minute Talk Karim Beguir - Co-Founder & CEO, InstaDeep
Favorite
S8848 - Adapting Minisweep, a Proxy Application, on Heterogeneous Systems Using OpenACC Directives Learn about how the high-level directive-based, widely popular, programming model, OpenACC can help port radiation transport scientific codes to large scale heterogeneous systems consisting of state-of-the-art accelerators such as GPUs. Architectures are rapidly evolving and the exascale machines are expected to offer billion-way concurrency. We need to rethink algorithms, languages, programming models among other components in order to increase parallelism from a programming standpoint to be able to migrate large scale applications to these massively powerful platforms. This talk will discuss programming challenges and its corresponding solutions for porting a wavefront based miniapplication for Denovo, which is a production code for nuclear reactor modeling, using OpenACC. Our OpenACC implementation running on NVIDIA's next-generation Volta GPU boasts a 85.06x speedup over serial code, which is larger than CUDA's 83.72x speedup over the same serial implementation. 25-minute Talk Sunita Chandrasekaran - Assistant Professor, Department of Computer & Information Sciences, University of Delaware
Robert Searles - Research Assistant, University of Delaware
Favorite
SE0004 - Happy Hour & Exhibits

Check out check out the GTC expo with startups, solution providers, and NVIDIA products. Grab a drink and some appetizers and experience tons of cool demos.

Special Event - 2 h Special Event
Favorite
SE152859 - WebGL, WebVR and glTF Meetup

This WebGL Meetup will be a fast paced review of the very latest in tools, applications and demos for WebGL (3D on the Web), WebVR (Virtual Reality on the Web using WebGL) and glTF (3D format for efficient downloading of scenes and objects).  Refreshments will be provided!

 

iQuarkt - Oswald Campesato - Deep Learning in Your Browser: Powered by WebGL

Google – Brandon Jones - Update on the WebXR initiative

Google and Mozilla – what’s next in WebGL

NVIDIA – Neil Trevett – glTF Ecosystem update

AGI - Patrick Cozzi - Autonomous driving vehicles using WebGL

Google - Ricardo Cabello (aka Mr.doob) - Insights into the latest Three.JS features

Special Event - 2 h Special Event
Favorite
SE0009 - AI in Telecoms Breakfast Session

Kick off Your Day@GTC with fellow speakers and attendees from Telecommunications.

Join us for breakfast from 7:15 - 7:45 in the West Lobby of the Convention Center followed by a session from 7:45 - 8:45 in room 231!

Special Event - 1.5 h Special Event Somasundaram Velayutham, NVIDIA
Favorite
CE8110 - Connect with the Experts: Multi-GPU Programming

Wondering how to scale your code to multiple GPUs in a node or cluster? Having the need to discuss some CUDA-aware MPI details? This is the right session for you to ask your beginner or expert questions on Multi-GPU Programming, GPUDirect, NVSHMEM and MPI.

Connect with the Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Jiri Kraus - Senior Devtech Compute, NVIDIA
Favorite
CE8171 - Connect with the Experts: Building Autonomous Vehicles using DRIVE Platforms (2)

Connect with NVIDIA experts and discuss why autonomous technologies powered by deep learning have become a key focus for every car manufacturer, as well as transportation services and technology companies. The car needs to know exactly where it is, recognize the objects around it, and continuously calculate the optimal path for a safe driving experience. This situational and contextual awareness of the car and its surroundings demands a powerful visual computing system that can merge data from cameras and other sensors, plus navigation sources, while also figuring out the safest path - all in real-time. This autonomous driving platform is NVIDIA DRIVE PX.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject.  

1 Hour Connect with the Experts Murat Durus - Senior Solutions Architect, NVIDIA
Luke Harvey, NVIDIA
Favorite
S8105 - Tackling the Realities of Virtual Reality

David Luebke, NVIDIA’s VP of Graphics Research, will describe NVIDIA’s vision for the future of virtual and augmented reality. Luebke will review some of the “realities of virtual reality”: challenges presented by Moore’s Law, battery technology, optics, wired and wireless connections. He will then discuss the implications and opportunities presented by these challenges, such as foveation and specialization, and conclude with a deep dive into how rendering technology, such as ray tracing, can evolve to solve the realities of virtual reality.

50-minute Talk David Luebke - Vice President of Graphics Research, NVIDIA
Favorite
S8131 - Trends and Opportunities for ML and AI in Consumer Insights Industries We'll examine business value drivers for artificial intelligence and machine learning in retail and consumer goods industries. Traditionally, traction in AI and ML has been in deep research, scientific, and technical communities. Retailers and consumer products companies are finding great success applying AI and ML technology to distinct use cases and business challenges. Join us to hear project descriptions and customer examples where AI and ML can impact the business by increasing revenue, protecting margin, and improving consumer satisfaction. 50-minute Talk Eric Thorsen - Retail Business Development, NVIDIA
Paul Hendricks, NVIDIA
Favorite
S8145 - Network Security with Machine Learning Connections have behavioral patterns that are unique to protocols, loads, window sizes, and the type of traffic. A CDN enterprise behaves completely differently than how a cloud service company would behave and they both would be different from a corporation. This also means that attack vectors and attack landscapes are different in all these places. We'll speak about modeling different kinds of attacks and building a model that is able to identify these different kinds of attacks using machine learning. The ability to bring in the expertise of a network domain expert in Driverless AI allows for quickly iterating through valuable features across the data-space. The ability to harness NVIDIA's powerful GPUs cores and the extremely optimized CUDA library changes the rate at which newer and accurate models are built for identifying attacks across the internet or a corporate network. This is truly valuable for anyone defending attacks on a variable attack surface. 50-minute Talk Ashrith Barthur - Security Scientist, H2O.ai
Favorite
S8214 - Prototyping and Developing GPU Accelerated Solutions with Python and CUDA Python language is a programming language with an increasing adoption by development community due to its fast learning curve, flexibility and ease to use and integrate with other technologies. Due to its level of abstraction, it is possible to use the same Python code on different platforms like X86, RISC and ARM. The python development community is growing fast and many community members are interested on how to start moving to GPU accelerated programming but don't know how to start and what is needed. We'll go through the steps and adoption path to start developing python solutions taking advantage of GPU acceleration, including some details, advantages and challenges for the strongest and more popular python3 modules to be used with GPUs: scikit-cuda, PyCUDA, Numba, cudamat and cupy. Some code samples and programs execution statistics will be shown as a performance analysis exercising as well. 50-minute Talk Luciano Martins - Principal Software Engineer, Oracle Corporation
Robert Sohigian - Technical Marketing Engineer, NVIDIA
Favorite
S8237 - Parallel Hashing on Multi-GPU Nodes We'll discuss WarpDrive – a high-speed, scalable, multi-GPU implementation for hashing billions of key-value pairs. Hash maps are among the most versatile data structures because of their compact data layout and expected constant time complexity for insertion and querying. CUDA-enabled GPUs can speedup hashing by virtue of their fast video memory featuring almost one terabytes per second bandwidth in comparison to state-of-the-art CPUs. However, the size of hash maps supported by single-GPU hashing implementations is restricted by the limited amount of available video RAM. We propose a novel subwarp/coalesced group-based probing scheme featuring coalesced memory access over consecutive memory regions in order to mitigate the high latency of irregular access patterns. Our implementation achieves around 1.3 billion insertions per second in single-GPU mode for a load factor of 0.95, clearly outperforming other implementations. We'll also present transparent scaling to multiple GPUs within the same node with over 4.5 billion operations per second for high load factors on four Tesla P100 GPUs connected by NVLink technology. WarpDrive is freely available at https://github.com/sleeepyjack/warpdrive. 50-minute Talk Christian Hundt - PostDoc, Johannes Gutenberg University Mainz
Bertil Schmidt - Professor, Johannes Gutenberg University Mainz
Favorite
S8348 - Advances in Discrete Element Particle Modelling Using the GPU Based Code Blaze-DEM In this talk we will look at advances in the simulation of particulate systems in Computer Aided Engineering (CAE) applications. We will in particular be focusing on the Discrete Element Method (DEM) and the strides made in terms of the number of particles and particle shape using the GPU based code Blaze-DEM. A variety of industrial applications ranging from mining, agriculture, civil engineering to pharmaceuticals will be discussed. We will also touch on how we can leverage the next wave of GPU computing namely, half precession and tensor cores in scientific computing which is still predominantly double precision based. Finally we look at the work been done by various groups to create a multi-physics GPU based platform using Blaze-DEM. 50-minute Talk Nicolin Govender - Senior Scientist, RCPE/University of Surrey
Daniel Wilke - Senior Lecturer (PhD), Department of Mechanical and Aeronautical Engineering, University of Pretoria
Favorite
S8906 - Fast Data Pipelines for Deep Learning Training

With every generation of GPU it becomes increasingly more difficult to keep the data pipeline full so that the GPU can be fully utilized. We'll propose a method for offloading the CPU and using the GPU to process image data to increase throughput.

50-minute Talk Simon Layton - Senior Deep Learning Engineer, NVIDIA
Przemyslaw Tredak - Senior Deep Learning Engineer, NVIDIA
Trevor Gale - Senior Computer Engineering student, Northeastern University
Favorite
S8945 - Analyzing Urban activity at City-scale with GPUs (Presented by YITU Tech)

Parsing millions of video cameras in real time to provide situational awareness is an enormous challenge. We will discuss how YITU Tech has overcome this using GPUs and TensorRT. We learned from 1 billion faces to win first place in face identification accuracy in FRPC 2017 hosted by NIST. We will show how we analyze data from 10 million cameras using several thousand NVIDIA Tesla P4’s and achieve accuracy of 99% in identifying pedestrians with 100 days of data from the cameras.  The result is an ability to do big data analysis on things like population density and traffic flows, that enable the development of smart cities.

25-minute Talk Hao Lu - Chief Innovation Officer, YITU Technology
Favorite
S8948 - High-Performance Input Pipelines for Scalable Deep Learning Learn how to keep your GPUs fed with data as you train the next-generation of deep learning architectures. As GPU technology continues to advance, the demand for faster data continues to grow. In deep learning, input pipelines are responsible for a complex chain of actions that ultimately feed data into GPU memory: defining how files are read from storage, deserializing them into data structures, pre-processing on a CPU, and copying to the GPU. These pipelines bring together complex hardware systems--including cluster networks, peripheral interconnects, modern CPUs, and storage devices--along with sophisticated software systems to drive the data movement and transformation. In this talk, we present a new benchmark suite for evaluating and tuning input pipelines. We will examine results with TensorFlow's DataSets API on a DGX-1 with V100 and provide guidance on key tuning parameters and diagnostic techniques for improving performance. 25-minute Talk Brian Gold - R&D Director, Pure Storage
Favorite
S8221 - Using Multimodal Learning for TV Show Summarization We'll explore new techniques for TV show summarization using multimodal deep learning for saliency detection and fusion. For TV show summarization, the goal is to compact visual summary with informativeness and enjoyability to attract audience. In our work, we propose a multimodal summarization platform to integrate the multimodal saliences learned from video, audio, and text. Our work focuses on three aspects: 1) the saliency extraction for video, audio, and text using deep learning networks; 2) fusion framework design for multimodal information integration; 3) developing tools to speed up video processing. Using AI Vision, which is a public cloud-based AI service, we summarize a TV show with 11 hours duration in one minute. 25-minute Talk Qing Wang - Research Staff Member, IBM Research China
Yonghua Lin - Distinguished Engineer, Leader of IBM AI System Research, IBM Research China
Kewei Sun - Research Staff Member, Research Staff Member
Favorite
S8299 - Highly-Efficient Caching with Tiling & Chaining in CNN Learn how to achieve 100% R/W cache hit rate for most intermediate tensors in CNN and over 80% typical DRAM traffic saving, with general applicability to a limited cache size and large tensors. The high-throughput NVIDIA Tensor Core and DLA demand high memory traffic. Chaining of consecutive layers in CNN can save DRAM traffic by reusing intermediate tensors in cache. This strategy is effective only with small tensors and a large cache. In this work, we slice tensors into small tiles (with halo) and chain these tiles so the requirement for perfect caching can always be fulfilled. Our implementation of this approach is proven to be very effective in saving DRAM traffic. This work allows us to solve the memory bandwidth issue of CNN with a relatively small but high-bandwidth cache. 25-minute Talk Yao Yao - Senior Compute Architect, NVIDIA
Favorite
S8511 - SmartIO: Dynamic Sharing of GPUs and IO in a PCIe Cluster Learn how GPUs, NVMe drives and other IO devices can be efficiently shared in a PCI Express cluster using SmartIO from Dolphin Interconnect Solutions.Traditionally, IO devices have been statically assigned to a single root complex (host machine), and features such as hot-add, device migration and remote access is not supported in a flexible way without complex software frameworks. Dolphin SmartIO eliminates these restrictions and provide a flexible framework for handling PCIe devices and systems. Devices such as GPUs, NVMe drives and other IO devices can be flexibly accessed from remote systems. We demonstrate how SmartIO is implemented using standard {PCIe} and Non-Transparent Bridging, show that our system gets near native performance when moving data from local GPUs to remote NVMe drives, and how we can dynamically add more GPUs to scale performance. 25-minute Talk Haakon Stensland - Research Scientist, Simula Research Laboratory
Favorite
S8994 - Workflow and Regulatory Challenges to Algorithm Implementation

AI in medical imaging has the potential to provide radiology with an array of new tools that will significantly improve patient care. To realize this potential, AI algorithm developers must engage with physician experts and navigate domains such as radiology workflow and regulatory compliance. This session will discuss a pathway for clinical implementation, and cover ACR's efforts in areas such as use case development, validation, workflow integration, and monitoring.

25-minute Talk Mike Tilkin - ACR Executive Vice President & Chief Information Officer, American College of Radiology (ACR)
Favorite
L8115 - In-depth Performance Analysis for OpenACC/CUDA/OpenCL Applications with Score-P and Vampir

Work with Score-P/Vampir to learn how to dive into the execution properties of CUDA and OpenACC applications. We'll show how to use Score-P to generate a trace file and how to study it with Vampir. Additionally, we'll use the newly established OpenACC tools interface to present how OpenACC applications can be studied for performance bottlenecks. This lab uses GPU resources in the cloud, so bring your laptop. Prerequisites: Basic knowledge on CUDA/OpenACC and MPI is recommended but not required. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop.

120 Minutes Instructor-Led Lab Robert Henschel - Director Science Community Tools, Indiana University
Guido Juckeland - Head of Computational Science Group, Helmholtz-Zentrum Dresden-Rossendorf
Favorite
L8140b - Image Classification with DIGITS (2)

Deep learning enables entirely new solutions by replacing hand-coded instructions with models learned from examples. Train a deep neural network to recognize handwritten digits by:

• Loading image data to a training environment

• Choosing and training a network

• Testing with new data and iterating to improve performance

Upon completion of this lab, you'll be able to assess what data you should be using for training.

Presented by the NVIDIA Deep Learning Institute (DLI).

120 Minutes Instructor-Led Lab Mike Mendelson - Deep Learning Institute Curriculum Developer, NVIDIA
Favorite
S8272 - Multi-GPU Accelerated Methods in Deep Reinforcement Learning

Deep reinforcement learning (RL) has achieved many recent successes, yet experiment turn-around time remains a key bottleneck in research and in practice. We investigate how to optimize existing deep RL algorithms for modern computers, specifically for a combination of CPUs and GPUs. We confirm that both policy gradient and Q-value learning algorithms can be adapted to learn using many parallel simulator instances. We further find it possible to train using batch sizes considerably larger than are standard, without negatively affecting sample complexity or final performance. We leverage these facts to build a unified framework for parallelization that dramatically hastens experiments in both classes of algorithm. All neural network computations use GPUs, accelerating both data collection and training. Our results include using an entire NVIDIA DGX-1 to learn successful strategies in Atari games in single-digit minutes, using both synchronous and asynchronous algorithms.

50-minute Talk Adam Stooke - Graduate Student, UC Berkeley
Favorite
S8291 - Acceleration of a Computational Fluid Dynamics Code with GPU Using OpenACC

The goal of this session is to report the knowledge acquired at the Oak Ridge GPU Hackathon that took place on October 9th-13th 2017, through the acceleration of a CFD (Computational Fluid Dynamics) solver. We'll focus on the approach used to make the application suitable for the GPU, the acceleration obtained, and the overall experience at the Hackathon. OpenACC was used to implement GPU directives in this work. We'll detail the different OpenACC directives used, their advantages and disadvantages, as well as the particularities of CFD applications.

25-minute Talk Nicholson Koukpaizan - PhD. Candidate, Georgia Institute of Technology
Favorite
S8349 - Light Field Rendering and Streaming for VR and AR

We'll discuss OTOY's cutting-edge light field rendering toolset and platform, which allows for immersive experiences on mobile HMDs and next-gen displays, making it ideal for VR and AR. OTOY is developing a groundbreaking light field rendering pipeline, including the world's first portable 360 LightStage capture system and a cloud-based graphics platform for creating and streaming light field media for VR and emerging holographic displays.

25-minute Talk Jules Urbach - CEO, OTOY, Inc.
Favorite
S8414 - Walt Disney Imagineering Technology Preview: Real-time Rendering of a Galaxy Far, Far Away

Walt Disney Imagineering strives to create amazing guest experiences at Disney Parks worldwide. Partnering with Nvidia and Epic Games, Imagineering has developed new technology to drive one of the key attractions at the upcoming Star Wars: Galaxy's Edge opening in Disneyland Resort, CA and Disney's Hollywood Studios, FL. Come learn more about how we took advantage of the newest in Nvidia hardware and the technical modifications that we made for the Unreal Engine which will allow 8 GPUs to render at unprecedented quality and speed.

50-minute Talk Eric Smolikowski, Disney Imagineer
Bei Yang - Technology Studio Executive, Walt Disney Imagineering
Favorite
S8433 - Challenges in Real-Time Rendering and Software Design for Interactive Immersive Visualization In the field of virtual engineering and design, countless application scenarios for interactive visualization exist. Huge diversity in the kind of data that needs to be handled -- construction data straight out of CAD solutions, results obtained through structural mechanics simulation or fluid dynamics data -- intersects with an ever increasing number of use cases ranging from engineering reviews, exploratory simulation for digital twin or HybridTwin (TM) up to physically based high-quality rendering. Virtual reality's full potential as an environment for collaboration, communication, and decision taking is enabled today by a complex, heterogeneous hardware landscape with output devices as diverse as HMDs, CAVEs, or even mobile streaming clients. We'll talk about how these challenges have been addressed in the design and implementation of the Helios rendering architecture, which serves as the underlying visualization engine in various ESI products and projects. First, we'll have a closer look at the structure and inner workings of Helios, before we demonstrate the benefits of ESI's Helios visualization system through practical examples. 50-minute Talk Galen Faidley - Sr Engineering Project Team Leader, Caterpillar Inc.
Andreas Dietrich - Senior Software Developer, ESI Group
Favorite
S8595 - NVSHMEM: A Partitioned Global Address Space Library for NVIDIA GPU Clusters

Addressing the apparent Amdahl's fraction of synchronizing with the CPU for communication is critical for strong scaling of applications on GPU clusters. GPUs are designed to maximize throughput and have enough state and parallelism to hide long latencies to global memory. It's important to take advantage of these inherent capabilities of the GPU and the CUDA programming model when tackling communications between GPUs. NVSHMEM provides a Partitioned Global Address Space (PGAS) that spans memory across GPUs and provides an API for fine-grained GPU-GPU data movement and synchronization from within a CUDA kernel. NVSHMEM also provides CPU-side API for GPU-GPU data movement that provides a progression for applications to move to NVSHMEM. CPU-side communication can be issued in stream order, similar to CUDA operations. It implements the OpenSHMEM programming model that is of great interest to government agencies and national labs. We'll give an overview of capabilities, API, and semantics of NVSHMEM. We'll use examples from a varied set of applications (HPGMG, Multi-GPU Transpose, Graph500, etc.) to demonstrate the use and benefits of NVSHMEM.

50-minute Talk Sreeram Potluri - Senior Software Engineer, NVIDIA
Anshuman Goswami - Senior Software Engineer, NVIDIA
Favorite
S8633 - The Long Road to Model Deployment� or how to make a good model great!

In this talk we will cover the essential building blocks of the AI platform Nvidia engineers are using to build a world-class automotive perception stack. Through a computer vision application example, we will see how to improve a baseline model to produce better, faster predictions.

The talk will focus on:

- hyper-parameter optimization,

- model complexity reduction (pruning),

- target platform optimizations (TensorRT integration),

- automation of complex workflows

50-minute Talk Gregory Heinrich - Software Engineer, NVIDIA
Favorite
S8646 - Autodesk BIM Cloud Workspace on Azure and Citrix Customer Panel Discussion

GPU virtualization in the Cloud has ushered in a new era for architects, builders, designers and engineers. In this case study session you will learn how TBI personnel are now using Autodesk applications including BIM 360, Stingray, Revit and Navisworks, through a digital workspace hosted on Citrix XenDesktop HDX 3D Pro running on Microsoft Azure NV-series virtual machines with NVIDIA Quadro Workstation technology. This technology stack enables TBI employees to work together in real time, from any location, while enjoying a highly optimized 3D user experience on any device, even the low-cost Raspberry Pi. In their technology journey, TBI progressed from an age of 2D flatland, to the more advanced age of optimization of 3D digital data, to the present-day era of interoperability and collaboration in a new age where connectivity is key.This session will also include a Citrix customer panel discussion. Hear from customers who have implemented virtualized 3D workloads to solve complex business challenges. Bring your questions along and join in on the knowledge sharing in an interactive setting.

50-minute Panel Frank Wolbertus - BIM-Expert / Developer, TBI
Marc Sleegers - Technical Consultant, Autodesk
Adam Jull - CEO, IMSCAD Global
Allen Furmanski - Senior Product Marketing Manager, Citrix Systems
Favorite
S8993 - Accelerating Medical Device Development in Medical Imaging

TBA

25-minute Talk Alejandro Frangi - Professor / CISTIB Director, CISTIB / The University of Sheffield
Favorite
S8189 - Automatic Generation of 1D Recursive Filter Code for GPUs Learn how to automatically generate 1D recursive filter code for GPUs using PLR, a domain-specific compiler. It only requires the filter coefficients as input and emits high-performance CUDA code. Later result values depend on earlier result values in digital filters, making it a challenge to compute them in parallel. We'll present the new work and space efficient algorithm PLR uses to implement digital filters and other linear recurrences, and we explain how it automatically parallelizes and optimizes the GPU code. Our evaluation shows that, for single-stage IIR filters, the generated code reaches the throughput of memory copy for large inputs, which cannot be surpassed. On other digital filters, the automatically parallelized code outperforms the fastest prior implementations. 25-minute Talk Martin Burtscher - Professor, Texas State University
Favorite
S8404 - Production-Quality, Final-Frame Rendering on a GPU We'll discuss the latest features of Redshift, the GPU-accelerated renderer running on NVIDIA GPUs that is redefining the industry's perception towards GPU final-frame rendering. We'll demonstrate a few customer work examples. This talk will be of interest to industry professionals who want to learn more about GPU-accelerated production-quality rendering as well as software developers who are interested in GPU-accelerated rendering. 25-minute Talk Robert Slater - Immersive Experience, VP Engineering, Redshift
Favorite
S8710 - CPG Product Capture Under 48 hours - From Production Lines to Retail Shelves New Consumer Packaged Goods(CPG) & products need not just sit in retail shelves – without having a link back to their production lines.Our solutions, through a patent pending process, create a feedback link between retail shelves and manufacturing entities, production lines that manufacture CPG products and distribution channels - to enable processes that solve out-of-stock issues in retail shelves and also understand customer behavior at the shelf level.Evolving from a solution that was intended to solve out-of-stock issues in retail shelves in real time, our current generic CPG product training platform is set to create a global database of CPG products with their respective descriptions including ingredients,nutrition etc. In this session, we will walk you through how an ecosystem was built, with deep learning at it's core. You will get insights on how GPU's have helped in speeding up the creation of the ecosystem. The session ends with what the future of retail holds in terms of maximizing the human experience through empathy and responsibility -Nested distribution networks for redistribution of unsold retail shelf food in low income groups -enabled by AI. 25-minute Talk Pradeep Pydah - CEO, Maxerience
Favorite
S8989 - Scaling AI POCs Across the Enterprise Has your team developed an AI proof-of-concept with promising metrics? Next step is to broaden the scope to impact larger areas of the enterprise. With its unique challenges and complexities, scaling POCs across multiple business units is a significant part of any company's AI roadmap. This session will look at best practices, insights and success, rooted in Element AI's experience with enterprise customers. 25-minute Talk Omar Dhalla - SVP Industry Solutions, Element AI
Favorite
S8992 - From Promising Algorithms to Clinical Practice: Next Generation of Challenges

There is large promise in machine learning methods for the automated analysis of medical imaging data for supporting disease detection, diagnosis and prognosis. These examples include the extraction of quantitative imaging biomarkers that are related to presence and stage of disease, radiomics approaches for tumor classification and therapy selection, and deep learning methods for directly linking imaging data to clinically relevant outcomes. However, the translation of such approaches requires methods for objective validation in clinically realistic settings or clinical practice. In this talk, I will discuss the role of next generation challenges for this domain.

25-minute Talk Wiro Niessen - MICCAI Society President, Medical Image Computing and Computer Assisted Interventions (MICCAI)
Favorite
CE8147 - Connect with the Experts: OpenACC - Quick On-ramp to GPUs (3)

This session is designed for anyone who is either looking to start with GPUs or already accelerating their code with OpenACC on GPUs or CPUs. Join OpenACC experts and your fellow OpenACC developers to get an expert advice, discuss your code and learn how OpenACC Directives are used by others.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Michael Wolfe - Compiler Engineer, NVIDIA
Jeffrey Larkin - Senior DevTech Software Engineer, NVIDIA
Robert Crovella - SA Mgr., NVIDIA
Randy Allen - Director, Mentor Graphics
Guido Juckeland - Head of Computational Science Group, Helmholtz-Zentrum Dresden-Rossendorf
Sunita Chandrasekaran - Assistant Professor, Department of Computer & Information Sciences, University of Delaware
Robert Henschel - Director Science Community Tools, Indiana University
Favorite
CE8165 - Connect with the Experts: Jetson (3)

NVIDIA Jetson is the world's leading computing platform for AI at the edge. High in performance and low in power, it's ideal for compute-intensive embedded applications like robots, drones, mobile medical imaging, and Intelligent Video Analytics (IVA). OEMs, independent developers, makers and hobbyists can use the NVIDIA Jetson Developer Kit and module to explore the future of embedded computing and artificial intelligence. Have questions? Jetson experts and the NVIDIA Developer Tools team will be present to cover CUDA debugging and profiling, system trace and graphics debugging and profiling tools, and more.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Joonhwa Shin, NVIDIA
Zheng Liu, NVIDIA
Bhanu Pisupati, NVIDIA
Andrey Trachenko, NVIDIA
Shiva Dubey, NVIDIA
Amulya Yarlagadda, NVIDIA
Bhanu Velukuru, NVIDIA
Philip Lawrence, NVIDIA
Sanjiv Satoor, NVIDIA
Avraham Shapira - Sr. Director Software Engineering, NVIDIA
Favorite
S8212 - Training Deep AutoEncoders for Collaborative Filtering This session will describe an approach to building personalized recommendations using (very) deep autoencoders. We will explore effects of different activation functions, network depth and novel algorithmic approaches. The model is trained end-to-end without any layer-wise pre-training and our PyTorch-based code is publicly available. 25-minute Talk Oleksii Kuchaiev - Senior Applied Scientist, NVIDIA
Favorite
S8239 - Tools for Improving Cross-Platform Software Development Building software for the wide variety of heterogenous computers often requires writing multiple versions of everything from low-level computational kernels to high-level problem partitioning and communication schemes. Recently, EM Photonics has undertaken several efforts to develop tools to assist developers in this work. These tools have two primary focuses: 1) To ease the process of developing cross-platform and mixed device software and 2) Allow application developers to focus more on their specific domain expertise than on the intricacies of building efficient, scalable software. In this talk, we will provide an overview of tools we have developed and discuss their use on real world applications. In particular, we will present our work with the climate modeling and computational fluid dynamics teams at NASA. 25-minute Talk Eric Kelmelis - CEO, EM Photonics
Favorite
S8360 - Interactive and Production Rendering with V-Ray GPU Come learn the latest advances in GPU acceleration for the Academy Award winning V-Ray renderer and how it's improving artistic workflows and speeding final frame rendering. 25-minute Talk Blagovest Taskov - Lead Developer, Chaos Group
Vladimir Koylazov - CTO, Chaos Group
Teodor Kirilov - V-Ray GPU developer, Chaos Group
Favorite
S8371 - How We Can Analyze Profile from Real-Time Conversation by Unsupervised Learning To convert phonemes of telephone conversations and responses at meetings into texts in real time, pass the text to the computational model created by DGX-1, label with a learning without teacher, and add the clusters, we are developing a system which compares objects and analyzes meaning of conversation and profiles of interlocutors. With this technology, customers can receive appropriate responses at the beginning of a conversation with a help desk, and patients can receive correspondence during a remote diagnosis with a doctor based solely off of their dialogue and examination results. By using TensorFlow as a platform and running the K-Means method, Word2vec, Doc2Vec, etc. in DGX-1 clustered environment on DGX-1, the result of arithmetic processing is found at high speed conversation. Even if the amount of sentences is increased, the learning effect increases linearly, demonstrating that the proportion of validity can be raised without taking grammar of languages ​​other than English (e.g. Japanese) into account. 50-minute Talk Shigehisa Omatsu - CEO, dAIgnosis,Inc.
Favorite
S8478 - New Frontiers for Dense Linear Solvers: Towards Extreme Performance and Energy Efficiency Learn how to develop fast and energy-efficient linear solvers using GPUs. Hybrid CPU-GPU techniques achieve high performance at the cost of extra power consumption. The new advancements in GPU architectures enable full GPU solutions that are high performance, energy efficient, and CPU-independent. In addition, new technologies such as half precision arithmetic (FP16) help the design of new solvers that are significantly faster and even more energy efficient. While FP16 arithmetic has been a powerful tool for deep learning applications, our designs show that it is also very useful for boosting performance and energy efficiency of linear solvers. The new developments complement the hybrid algorithms in the MAGMA library, and provide users with a wide variety of designs that fit different requirements of performance, energy efficiency, and numerical accuracy. 50-minute Talk Azzam Haidar - Research Scientist II, Innovative Computing Laboratory, University of Tennessee
Ahmad Abdelfattah - Research Scientist I, Innovative Computing Laboratory University of Tennessee
Favorite
S8688 - Inside DGX-2

This session presents an overview of the hardware and software architecture of the DGX-2 platform. This talk will discuss the NVSwitch hardware that enables all 16 GPUs on the DGX-2 to achieve 24x the bandwidth of two DGX-1V systems. CUDA developers will learn ways to utilize the full GPU connectivity to quickly build complex applications and utilize the high bandwidth NVLINK connections to scale up performance.

50-minute Talk Vyas Venkataraman - System Software Manager, NVIDIA
Glenn Dearth - Architect, NVIDIA
Favorite
S8916 - Taking Virtual Graphics to Eleven Autodesk has many well known, and large products. Our Engineering and Development teams run development and testing across a wide variety of operating systems, and across versions of current, and older products. Many Developers have secondary systems, or in some cases 3, 4 and beyond. Most systems had high end NVIDIA graphics cards. As is the nature of development, these secondary and other systems were used in cycles. Often lying idle for weeks at a time. And despite the need for graphics in our products, we found that developers only used 50% of their graphics resources in their development cycle. By virtualizing those additional workstations, the graphics, cpu and other resources were used more efficiently. And, by not replacing the physical systems every 3 years, our cost avoidance rose rapidly. By early 2017, we were hearing from our customers that they wanted Autodesk to provide the same kind of service to them. We are in Pilot with a Platform that will provide this service to our Enterprise customers. Behind the successful virtualization of desktops is a people story. Mindsets, culture and skill sets had to evolve and change. 50-minute Talk Rachel O'Gorman - Service Manager, Desktop Virtualization Services, Autodesk
Favorite
S8925 - A Data-Driven Future in Visual Effects Pipelines Visual effects pipelines at Digital Domain are undergoing a transition from using the traditional parametric models to data-driven generative models. We are extensively developing example-based systems that are geared to accommodate artist input while staying within these generative models. Underlying all of these technologies is heavy GPU usage that is helping artists iterate quickly and reach their creative goals much faster than ever before. GPU-ready toolkits and libraries have also provided the ability to quickly iterate and try different methods in the development of data-driven approaches and have been key to finding a production-ready solution. We'll go over how this transition occurred, along with examples where our creature development pipeline has benefited from these changes. We'll also discuss where we see machine learning taking us in the future. 25-minute Talk Rishabh Battulwar - Software Engineer, Digital Domain
Favorite
S8298 - Re3: Realtime Recurrent Regression Networks for Visual Tracking of Generic Objects Robust object tracking requires knowledge and understanding of the object being tracked: its appearance, motion, and change over time. A tracker must be able to modify its underlying model and adapt to new observations. We present Re3, a real-time deep object tracker capable of incorporating temporal information into its model. Rather than focusing on a limited set of objects or training a model at test-time to track a specific instance, we pretrain our generic tracker on a large variety of objects and efficiently update on the fly; Re3 simultaneously tracks and updates the appearance model with a single forward pass. This lightweight model is capable of tracking objects at 150 FPS, while attaining competitive results on challenging benchmarks. We also show that our method handles temporary occlusion better than other comparable trackers using experiments that directly measure performance on sequences with occlusion. 25-minute Talk Daniel Gordon - Graduate Student, University of Washington
Favorite
S8708 - Immersed Boundary Solver Parallelization using OpenACC

We'll discuss the multi-physics flow problems like Fluid-Structure Interaction (FSI) involve complex interaction physics and require solution of non-linear partial difference equations. Efficient numerical solvers are extremely useful tools for researchers to study the multi-physics interaction behavior. The advent of parallel algorithms and high performance computing have further revolutionized the field of computational engineering. It is therefore important to accelerate the legacy solvers using state of the art parallelization techniques. Currently, optimization of a discrete finite difference based Immersed boundary solver (IB) is undertaken to efficiently study the external or internal flow behavior around complex geometries at low Reynolds number. The performance enhancement is required in the computationally heaviest components of the solver, i.e., tagging of the intercepted cells and solving continuity and momentum equations. The computational efficiency is improved by utilizing OpenACC programming standards for parallel computing on Graphical Process Units (GPU) and using different iterative solvers for solving velocity-pressure correction equation.

25-minute Talk Bharatkumar Sharma - Senior Solution Architect, NVIDIA
Apurva Raj - PhD candidate, Department of Aerospace Engineering, IIT KGP
Favorite
S8840 - Isotropix Even Faster: Accelerating Clarisse iFX with GPU From interactive to real-time, discover how GPU acceleration boosts ray-tracing performances on Clarisse iFX. Clarisse iFX is the world's first 3D DCC featuring a fully ray-traced CPU 3D viewport that enabled CG artists to create amazing VFX on over 50 Hollywood blockbusters. Thanks to NVIDIA's OptiX 5, Clarisse iFX 3D viewport can now benefit directly from GPU's power to let users manipulate life-like environments in real-time while displaying at the same time noise-free lighting scenarios. 25-minute Talk Sebastian Guichou - CTO, Isotropix
Favorite
S8985 - Cyber Defense - Fighting DGAs with Machine Intelligence Cyberattacks have become more sophisticated than ever before and traditional cyber defense tools can no longer scale to protect today's complex organizational networks. Booz Allen and NVIDIA have partnered together to test some of the latest GPU driven machine intelligence techniques on Cyber use cases. Join us as we explore our progress against identifying Domain Generation Algorithms (DGAs) through network monitoring. DGAs are commonly used by many hackers to communicate and exfiltrate data from your networks. They are designed to circumvent traditional cyber defenses and have been extremely successful. We'll discuss our environments, what types of data we collected and trained on, and what frameworks and software we used to test and evaluate this model. 25-minute Talk Greg McCullough - Director, Cyber Machine Intelligence Capability Development, Booz Allen Hamilton
Aaron Sant-Miller - Lead Data Scientist, Booz Allen Hamilton
Favorite
L8141b - Object Detection with DIGITS (2)

Prerequisites: 'Image Classification with DIGITS'

Duration: 2 hours

Framework: Caffe with DIGITS interface

Many problems have established deep learning solutions, but sometimes the problem that you want to solve does not. Learn to create custom solutions through the challenge of detecting whale faces from aerial images by:

• Combining traditional computer vision with deep learning

• Performing minor "brain surgery" on an existing neural network using the deep learning framework Caffe

• Harnessing the knowledge of the deep learning community by identifying and using a purpose-built network and end-to-end labeled data

Upon completion of this lab, you'll be able to solve custom problems with deep learning.

Presented by the NVIDIA Deep Learning Institute (DLI).

120 Minutes Instructor-Led Lab Mike Mendelson - Deep Learning Institute Curriculum Developer, NVIDIA
Favorite
L8164A - Learning Robotics on the Jetson TX2

Are you ready to learn how to use your Jetson TX2 for robotics applications? Join AI and Robotics instructors from Udacity in this lab which will teach you how to get started using your TX2 to apply AI to your robotics application. You will get the chance to build a simple "Lights Out" robot that will use reinforcement learning to push buttons and turn out corresponding LEDs. We will cover how to wire up, program, and run the robot. Additionally, we will discuss the basics of applied reinforcement learning.

120 Minutes Instructor-Led Lab Alexis Cook - Content Developer, Udacity
Erica Tiberia - Robotics Engineer, Udacity
Favorite
L8177 - Managing Accelerated Application Memory with CUDA C/C++ Unified Memory and nvprof With the advent of Unified Memory, the CUDA computing platform has made the essential skill of managing accelerated application memory, which must coordinate between a CPU and at least one GPU, rather straightforward. Upon completion of this lab, you'll be able to profile accelerated application performance, including kernel runtime and Unified Memory behavior, using nvprof, and will be able to further optimize accelerated applications with your understanding of Unified Memory behavior in conjunction with asynchronous memory prefetching. Prerequisites: Accelerating Applications with CUDA C/C++ 120 Minutes Instructor-Led Lab Joshua Wyatt - Content Developer, NVIDIA Deep Learning Institute, NVIDIA
Favorite
CE8139 - Connect with the Experts: NVIDIA Inception Program

If you are an Inception member or interested in the NVIDIA Inception program please join us to connect with experts to make sure you are taking advantage of all the benefits the Inception program has to offer. The Global Inception team will be joined by NVIDIA experts that can help answer questions and share best practices including the Deep Learning Institute, Inception Marketing Team, Solution Architects, GPU Ventures, and Sales Account Manager.

1 Hour Connect with the Experts Margaret Albrecht, NVIDIA
Alain Tiquet, NVIDIA
Qingchun Huang - Inception Program China Lead, NVIDIA
Ekram Mukbil, NVIDIA
Daniel Saaristo, NVIDIA
Kristin Blomquist, NVIDIA
Arjun Dutt, NVIDIA
Ettikan Kandasamy Karuppiah, NVIDIA
Lisa Lahde, NVIDIA
Fausto Milletari, NVIDIA
Luke Rundel, NVIDIA
Marcio Aguiar, NVIDIA
Powon Lee, NVIDIA
Mukundhan Srinivasan, NVIDIA
Anton Dzhoraev, NVIDIA
Favorite
S81026 - Advancing Machine Learning in Medical Imaging through Competitions: The RSNA Initial Experience and Future Perspectives

The goal of this session is to describe how the Radiological Society of North America (RSNA) is helping to promote advancements in machine learning research through public competitions to ultimately improve patient care. The presentation will review the 2017 RSNA Pediatric Bone Age Challenge from its conception to final results emphasizing key components relevant to the machine learning community. The discussion will cover important aspects related to public datasets, the competition itself and results. A glimpse into upcoming competition will also be provided.

25-minute Talk Luciano Prevedello - Chief of Medical Imaging Informatics, Ohio State University Wexner Medical Center
Favorite
S8240 - An Architectural Design Firm's Journey through Virtual GPU Technology for Global Collaboration Learn the benefits that virtualization provides for an architecture and engineering design firm, along with the journey through the advancements in virtualization technology it took to finally meet the graphics-intensive needs of our design software. We'll share our experiences in how virtualization allows a large company, with over 15 offices and 1,000 people worldwide, to collaborate and work as a single firm. We'll show some cost comparisons with virtualization, along with their management benefits and requirements. We'll also look at the methods we used to set and test metrics specific to our requirements, and follow the results of those metrics through the changes in graphics virtualization technology. 50-minute Talk Jimmy Rotella - Digital Practice Director, CannonDesign
Andrew Schilling - Chief Infrastructure Officer, CannonDesign
Favorite
S8260 - Deep Learning For Intelligent Multi-Sensor Analytics Go beyond working with a single sensor and enter the realm of Intelligent Multi-Sensor Analytics (IMSA). We'll introduce concepts and methods for using deep learning with multi-sensor, or heterogenous, data. There are many resources and examples available for learning how to leverage deep learning with public imagery datasets. However, few resources exist to demonstrate how to combine and use these techniques to process multi-sensor data. As an example, we'll introduce some basic methods for using deep learning to process radio frequency (RF) signals and make it a part of your intelligent video analytics solutions. We'll also introduce methods for adapting existing deep learning frameworks for multiple sensor signal types (for example, RF, acoustic, and radar). We'll share multiple use cases and examples for leveraging IMSA in smart city, telecommunications, and security applications. 25-minute Talk David Ohm - CEO & Co-Founder, KickView
Kyle Muchmore - Software Engineer, KickView
Favorite
S8274 - From Dark Matter Detection to Deep Learning in Enterprise Advancements in deep learning are enabling enterprise companies to make meaningful impacts to bottom-line profits. Enterprises capture thousands of hours of customer phone call recordings per day. This voice data is extremely valuable because it contains insights that the business can use to improve customer experience and operations. We'll follow Deepgram CEO Dr. Scott Stephenson's path from working in a particle physics lab two miles underground to founding a deep learning company for voice understanding. We'll describe applications of cutting-edge AI techniques to make enterprise voice datasets mineable for valuable business insights. Companies today use these insights to drive the bottom line. 50-minute Talk Scott Stephenson - Co-Founder, CEO, Deepgram
Favorite
S8365 - Genesis: MPC's Virtual Production Platform Where the Stories Begin

We'll showcase the MPC Virtual Production Platform called Genesis. While we won't be able to show any datasets currently in production, we'll showcase the technology and have some MPC original content to share.

25-minute Talk Damien Fagnou - CTO, Moving Picture Company
Francesco Giordana - Realtime Software Architect, MPC
Favorite
S8373 - MVAPICH2-GDR: Pushing the Frontier of Designing MPI Libraries Enabling GPUDirect Technologies Learn about the latest developments in the high-performance mass passing interference (MPI) over InfiniBand, iWARP, and RoCE (MVAPICH2) library that simplify the task of porting MPI applications to supercomputing clusters with NVIDIA GPUs. MVAPICH2 supports MPI communication directly from GPU device memory and optimizes it using various features offered by the CUDA toolkit, providing optimized performance on different GPU node configurations. These optimizations are integrated transparently under standard MPI API, for better programmability. Recent advances in MVAPICH2 include designs for MPI-3 RMA using GPUDirect RDMA framework for MPI datatype processing using CUDA kernels, support for GPUDirect Async, support for heterogeneous clusters with GPU and non-GPU nodes, and more. We use the popular Ohio State University micro-benchmark suite and example applications to demonstrate how developers can effectively take advantage of MVAPICH2 in applications using MPI and CUDA/OpenACC. We provide guidance on issues like processor affinity to GPU and network that can significantly affect the performance of MPI applications that use MVAPICH2. 50-minute Talk Hari Subramoni - Research Scientist, The Ohio State University
Dhabaleswar Panda - Professor and University Distinguished Scholar, The Ohio State University
Favorite
S8629 - Large-Scale Self-Supervised Robot Learning with GPU-Enabled Video-Prediction Models To acquire rich repertoires of skills, robots must be able to learn from their own autonomously collected data. We'll describe a video-prediction model that predicts what a robot will see next, and show how this model can be used to solve complex manipulations tasks in real-world settings. Our model was trained on 44,000 video sequences, where the manipulator autonomously pushes various objects. Using the model, the robot is capable of moving objects that were not seen during training to desired locations, handling multiple objects and pushing objects around obstructions. Unlike other methods in robotic learning, video-prediction does not require any human labels. Our experiments show that the method achieves a significant advance in the range and complexity of skills that can be performed entirely with self-supervised robotic learning. This session is for attendees that possess a basic understanding of convolutional and recurrent neural networks. 25-minute Talk Frederik Ebert - Graduate Student, UC Berkeley
Favorite
S8670 - Multi-GPU Programming Techniques in CUDA Systems with multiple GPUs in a single node are almost universal in the cloud and high-performance computing worlds, and are increasingly common in power-user desktop systems such as NVIDIA's DGX station. Effective use of these GPUs is critical to scaling programs, but developers have typically treated them as independent machines. Targeting multiple GPUs from a single process offers the potential for far greater performance, especially with the advent of NVLink which transforms the way that these GPUs can cooperate. We will cover a number of techniques and pitfalls for direct multi-GPU programming in CUDA, then look in depth at one novel method of using NVLink to scale some programs with minimal effort. 50-minute Talk Stephen Jones - Software Architect, NVIDIA
Favorite
S8839 - Adding GPU Acceleration to Pixar Renderman

We'll discuss photo-realistic rendering in modern movie production and present the path that lead us to leverage GPUs and CPUs in a new scalable rendering architecture. You'll learn about RenderMan XPU, Pixar's next-gen physically-based production path-tracer and how we solve the problem of heterogeneous compute using a shared code base. Come to hear about our partnership with NVIDIA to create the technology that will enable the art and creativity in future feature animation and live action visual effects blockbusters.

25-minute Talk Max Liani - Senior Lead Engineer, Pixar
Favorite
S8935 - Revolutionizing Virtual Production with VR and Deep Learning Virtual Production is revolutionizing the way the world creates cinematic and immersive content. Advances in virtual reality(VR) and deep learning (DL) are bring new capabilities to storytellers enabling them to interactively design new worlds, animate lifelike characters, and visualize complex scenes all created, edited, and reviewed in real time. This panel will look into the future to explore the process, tools, and workflows made possible by Nvidia's advances in artificial intelligence, real time graphics, and high performance computing. Panelists will dive into the challenges this paradigm shift brings to on-set production as well as the potential for greater efficiency and enhanced exploration. 50-minute Panel Ben Grossmann - Co-Founder, Magnopus
Darren Hendler - Director Digital Human Group, Digital Domain
Michael Ford - CA, Sony Pictures Imageworks
Lap Luu - CTO, Magnopus
Rev Lebaredian - Vice President, GameWorks & Lightspeed Studios, NVIDIA
Richard Grandy - Sr Solutions Architect, NVIDIA
John Root - Virtual Production Supervisor, Technicolor
Favorite
S8949 - Expanding the Boundaries of AI Revolution: An In-depth Study of HBM (Presented by SK hynix)

Participants will take part in in-depth discussions revolving around the revolutionary HBM (High Bandwidth Memory) product, the distinguishing technical features and the role it plays in expanding the boundaries of the AI revolution. The session will also cover current technical and business challenges, and the future considerations for next-generation HBM line-up and many more.

25-minute Talk Nayoung Lee - Technical Marketing, SK hynix America
Sunghak Lee - Technical Marketing, SK hynix
Favorite
S8977 - Matchbox: Automatic Batching for Dynamic Deep Learning Matchbox is an open source PyTorch-based tool that lets users implement their deep learning models as imperative code that applies to individual data samples, then efficiently train and validate them on batched data using GPUs. By automatically keeping track of batch-level masking and padding and rewriting data-dependent control flow, Matchbox simplifies model code, eliminates a class of implementation bugs, and allows programmers to work directly at a more natural level of abstraction. 50-minute Talk James Bradbury - Senior Research Scientist, Salesforce
Favorite
SE151922 - Iray Leaders

In this session experienced Iray users will show how they take advantage of NVIDIA's Iray and MDL SDK to create applications with state of the art visualization quality and leading edge technology such as GPU based rendering in the cloud or cloud based applications.

Speakers will tell you how and why Iray and MDL were the best choice for them in order to deliver innovative solutions for various use cases.

Configurators, architectural walkthroughs, game engine support, or design visualization - all these solutions are built on our Iray and MDL SDKs delivering great functionality and features developed for professional applications.

Presenters:

Pascal Artus | 51VR 
Justin Slick | Cadalog 
Steve Spencer | DAZ3D 
Christian Dorfwirth | Firstinvision 
Gert-Jan van der Wel | Floorplanner 
Steve Knight | Gulfstream 
Matthew Gueller | Harley Davidson 
Rich Rabbitz | Lockheed Martin 
Paul Arden | migenius 
Guy Alroy | Optitex 
Greg Demchak | Syncro 
Bastian Krueckeberg | Dassault Systèmes 
Manjeet Kohli | Ford 
Jawed Rafai | Lumiscaphe
Ben Widdowsen | Lightworks

 

Special Event - 3 h Special Event Alexander Fuchs, NVIDIA
Favorite
S8253 - Computational Zoom: A Framework to Manipulate Image Composition in Post-Capture Telling the right story with a picture requires the ability to create the right composition. Two critical parameters controlling composition are the camera position and the focal length of the lens. The traditional paradigm to capture a picture is for a photographer to mentally visualize the desired result, select the capture parameters to produce it, and finally take the photograph, thus committing to a particular composition. To break this paradigm, we introduce computational zoom, a framework that allows a photographer to manipulate several aspects of composition in post-capture. Our approach also defines a multi-perspective camera that can generate compositions that are not attainable with a physical lens. Our framework requires a high-quality estimation of the scene's depth. Existing methods to estimate 3D information generally fail to produce dense maps, or sacrifice depth uncertainty to avoid missing estimates. We propose a novel GPU-based depth estimation technique that outperforms the state of the art in terms of quality, while ensuring that each pixel is associated with a depth value. 25-minute Talk Orazio Gallo - Sr. Research Scientist, NVIDIA
Favorite
S8295 - Frontiers of AI in Medical Imaging: Overcoming Current Challenges and Moving Beyond Classification

Learn about the key types of clinical use cases for AI methods in medical imaging beyond simple image classification that will ultimately improve medical practice, as well as the critical challenges and progress in applying AI to these applications. We'll first describe the types of medical imaging and the key clinical applications for deep learning for improving image interpretation. Next, we'll describe recent developments of word-embedding methods to leverage narrative radiology reports associated with images to generate automatically rich labels for training deep learning models and a recent AI project that pushes beyond image classification and tackles the challenging problem of clinical prediction. We'll also describe emerging methods to leverage multi-institutional data for creating AI models that do not require data sharing and recent innovative approaches of providing explanation about AI model predictions to improve clinician acceptance.

50-minute Talk Daniel Rubin - Associate Professor, Stanford University
Imon Banerjee - Postdoctoral Scholar, Stanford University
Favorite
S8684 - Deep Learning Applications in E-Commerce In this talk we will present four applications of deep learning in e-commerce. 1) A deep neural net architecture which has been successfully deployed as a large scale Visual Search and Recommendation system for e-commerce. The deployment has been at Flipkart, India's largest e-Commerce vendor, over a catalog of 50M products, supporting 2K queries per second. Our results beat state of the art on the on the Exact Street2Shop dataset. 2) Visual Semantic embedding of e-Commerce products for enhanced searchability and product ranking. 3) Neural Network based click prediction. 4) A novel neural network architecture for demand prediction. 25-minute Talk Krishnendu Chaudhury - CTO, Drishti Technologies
Favorite
S8841 - Bringing the Arnold Renderer to the GPU Arnold is a high quality production renderer for visual effects in film and feature animation used by more than 300 studios worldwide on projects such as Blade Runner 2049 and Game of Thrones. Arnold was instrumental in the shift toward physically-based light transport simulation in production rendering; in fact, this role was recognized with an Academy Award in 2017. Arnold's success is rooted in its ability to efficiently produce artifact-free images of dynamic scenes with massive complexity while simplifying the user's workflow For the first time publicly, Autodesk will be demonstrating GPU acceleration inside Arnold using NVIDIA OptiX. 25-minute Talk Adrien Herubel - Lead GPU engineer, Autodesk/Solid Angle
Favorite
S8988 - Compute Engineering Simulation Processing in Oracle Cloud Infrastructure (Presented by Oracle)

Pre and post process CAE data near your cloud compute to save time, money, and IT headaches. Whether you're building the next supercar or visualizing a medical dataset, you can now eliminate the need for data transfer to and from on-premises by running professional design and engineering applications in the cloud. See new Oracle Cloud Infrastructure GPUs in live demonstrations of data transfer, CAD pre-processing, and CAE post processing.

25-minute Talk Taylor Newill - Principal Product Manager, Oracle HPC
Favorite
CE8103 - Connect with the Experts: Multi-GPU, Multi-node Computing with NCCL

NCCL is a library designed to make GPU communication easy and efficient, to help Deep Learning frameworks and parallel applications scale to large numbers of GPUs. It is currently integrated in most deep learning frameworks such as Tensorflow/Horovod, MXNet, Cognitive Toolkit, PyTorch, and Caffe2.

Connect directly with NVIDIA engineers and experts from other organizations on specific topics. Come on in and ask a question.

1 Hour Connect with the Experts John Woolley, NVIDIA
Nathan Luehr, NVIDIA
Sylvain Jeaugey - Senior Computing/Networking engineer, NVIDIA
David Addison, NVIDIA
Michael Houston, NVIDIA
Favorite
S8306 - Optimizing Distributed GPU Collectives for Deep Learning Workloads In this session we present MPI collective algorithms optimized for distributed deep learning frameworks. The performance of large message MPI collectives such as broadcast, allreduce, reduce etc. are critical to the performance of these workloads. There is a need for a novel approach towards the design of large scale collective communication algorithms for CUDA aware MPI runtimes. The session will deep dive into our implementation of these collectives and its performance advantages on IBM POWER 9 Systems with NVIDIA V100 GPUs for OSU benchmark and Distributed TensorFlow. 25-minute Talk Pidad Gasfar D'Souza - System Performance Architect, IBM Systems Development Lab
Favorite
S8326 - GPU-Accelerated Scalable Solver for Large Linear Systems over Finite Fields

Attendees will get details of how to optimize large and dense linear systems over finite fields (GF(2)) onto single node multi-gpu system as well as multi-node multi-gpu system. Focus will be on how to scale for these basic algorithm used in several crypt-analytic techniques. Our parallel solver is implemented using NVIDIA CUDA and MPI to utilize the multi-level parallelism on multinode, multi-GPU systems, which are becoming increasingly common. CUDA-aware MPI is used to leverage GPUDirect P2P and GPUDirect RDMA for optimized intra- and internode communication.

25-minute Talk Bharatkumar Sharma - Senior Solution Architect, NVIDIA
Prashant Verma - Deputy Director, Scientific Analysis Group
Favorite
S8367 - Using TensorRT Optimizations for embedded Facial Recognition Learn how to optimize performance and accuracy of face recognition systems running on the NVIDIA Jetson platform using TensorRT. Using the NVIDIA Jetson embedded platform for face recognition allows significant savings on server and network equipment, along with increasing overall system security and performance. Due to the biometric features extraction and matching on site (not on the server), it is possible to reduce the necessary capacities by 1,5-2 times in comparison with similar projects using "traditional" server-side calculation technology. However, the use of NVIDIA Jetson as a biometric platform has a number of peculiarities related to both operating conditions (power consumption limitations, temperature mode), as well as the features of the platform itself. The presentation will demonstrate how to use TensorRT optimizations to construct neural networks for FR cameras and other devices with embedded facial recognition. We will also provide an overview of real world commercial and public safety related projects. 25-minute Talk Alexey Kadeishvili - CTO, Vocord
Favorite
S8505 - GPU Monitoring and Management with NVIDIA Data Center GPU Manager (DCGM) NVIDIA DCGM is a monitoring and management daemon, GPU Diagnostic, and SDK geared towards managing GPUs in a cluster environment. DCGM is widely deployed both internally at NVIDIA and externally at large HPC labs and Cloud Service Providers. We will go over the core features of DCGM and features that have been added in the last year. We will also demonstrate how DCGM can be used to monitor GPU health and alert on GPU errors using both the dcgmi command-line tools and the DCGM SDK. 50-minute Talk David Beer - Senior Data Center Tools Engineer, NVIDIA
Brent Stolle - Software Developer, NVIDIA
Favorite
S8533 - Reinventing Real-Time Multidimensional Analytics Powered by GPU We'll provide answers and business cases for three main questions: First, what were the main problems of big data analytics and how was it solved with GPUs? Second, how can you quickly analyze data and get maximum profit from it? Third, what is the future of business intelligence (BI)? We'll discuss the new way of analytics — a unique BI solution powered by GPU, which provides real-time multidimensional analytics for all kinds of businesses. The online analytical processing and data mining server with hybrid CPU and GPU architecture gives users freedom of analytics with no pre-aggregates, and provides the fastest analytical tool for enterprise-sized raw data volumes. We'll show the results of the latest tests of analytical platform operations on different hardware, which proves the efficiency of work on GPUs. One example of user cases that we'll show is how companies around the world use this solution to analyze billions of raw business data records, and to optimize and automate their business. We'll also show the future of BI — how the analytical platforms will look in the nearest future, and how the world of big data will change. 25-minute Talk Roman Raevsky - Founder & CTO, Polymatica
Favorite
S8788 - Adaptive Ray Tracing Rendering Powered by Deep Learning This session will present a proof of concept where a deep neural network was trained with pairs of Iray ray traced images (one arbitrary ray tracing iteration number and one fully converged image) and theirs structural similarity index (SSIM). Originally thought as a method for measuring the similarity between two images, SSIM index can also be viewed as a quality measure versus a reference image or, in our case, as a ray tracing rendering progress. The DNN can now from any render iteration of arbitrary scene infer a rendering progress estimator but also provides heat map pictures of the scenes that can be used for adaptive rendering, focusing ray tracing engine power on appropriate zones. 25-minute Talk Andrew Tao - Distinguished Engineer, Director for Deep Learning Applied research, NVIDIA
Carsten Waechter - Ray Tracing Software Architect, NVIDIA
Favorite
S8794 - Synthetic Facial Data for Training Deep Neural Networks

Training AI agents that can successfully generalize requires large amounts of diverse labeled training data. Collecting and labeling data is a significant cost in the development of AI applications, which, in some cases, may not even be feasible. We'll describe computer graphics facial models that we are developing to generate large labeled synthetic facial data for training deep neural networks. Facial analysis is central to many vision applications that involve human-computer interaction, including robotics, autonomous cars, rehabilitation, and extended usability. Generating and animating human faces with high realism is a well-studied problem in computer graphics; however, very few computer vision AI techniques take advantage of rendered facial data to augment or replace manually collected training data. We'll share key insights of how we successfully use synthetic facial data for training facial analysis classifiers. We'll also demonstrate many sub-tasks on which synthetic data helps to significantly improve accuracy and reduces the need for manual data collection.

25-minute Talk Shalini De Mello - Senior Research Scientist, NVIDIA
Favorite
L8142b - Neural Network Deployment with DIGITS and TensorRT (2)

Prerequisites: Image Classification with DIGITS

Duration: 2 hours

Framework: Caffe with DIGITS and TensorRT

Deep learning lets us map inputs to outputs that are extremely computationally intense. Learn to deploy deep learning to applications that recognize images and detect pedestrians in real time by:

• Accessing and understanding the files that make up a trained model

• Building from each function's unique input and output

• Optimizing the most computationally intense parts of your application for different performance metrics like throughput and latency

Upon completion of this lab, you'll be able to implement deep learning to solve problems in the real world.

Presented by the NVIDIA Deep Learning Institute (DLI).T

120 Minutes Instructor-Led Lab Mike Mendelson - Deep Learning Institute Curriculum Developer, NVIDIA
Favorite
L8164B - Learning Robotics on the Jetson TX2 - Repeat

Are you ready to learn how to use your Jetson TX2 for robotics applications? Join AI and Robotics instructors from Udacity in this lab which will teach you how to get started using your TX2 to apply AI to your robotics application. You will get the chance to build a simple "Lights Out" robot that will use reinforcement learning to push buttons and turn out corresponding LEDs. We will cover how to wire up, program, and run the robot. Additionally, we will discuss the basics of applied reinforcement learning.

120 Minutes Instructor-Led Lab Erica Tiberia - Robotics Engineer, Udacity
Alexis Cook - Content Developer, Udacity
Favorite
L8178 - Asynchronous Streaming, and Visual Profiling for Accelerated Applications with CUDA C/C++ The NVIDIA Visual Profiler (nvvp) makes identifying opportunities for accelerated application optimizations easy. One common optimization is to utilize multiple CUDA streams for concurrent kernel execution and memory transfers. Learn to utilize nvvp and concurrent streams to exploit the potential for concurrency in your accelerated applications by: •Using nvvp to identify opportunities for application concurrency •Launching multiple CUDA kernels concurrently with streams •Refactoring accelerated applications to expose their potential for concurrency Upon completion of this lab, you'll be able to visually profile your accelerated applications for concurrency opportunities, and to exploit them using CUDA streams. 120 Minutes Instructor-Led Lab Joshua Wyatt - Content Developer, NVIDIA Deep Learning Institute, NVIDIA
Favorite
S8111 - High-Performance Image Processing Routines for Video and Film Processing

Basic image processing functions for convolution, morphological, and arithmetic operators are at the heart of many important high-level computer vision algorithms. We'll describe how to implement these routines efficiently on the GPU, using unique GPU capabilities like the texture cache and a large register file. We'll give information about several applications where these routines are employed, like film and video restoration (either locally or in the cloud) or automatic real-time quality assessment and automatic camera path calculation (virtual director) in omnidirectional video.

25-minute Talk Hannes Fassold - senior researcher, JOANNEUM RESEARCH
Favorite
S8132 - (Deep) Learning to Grasp with a Close-Loop DNN Controller The paradigm for robot programming is changing with the adoption of the deep learning approach in the field of robotics. Instead of hard coding a complex sequence of actions, tasks are acquired by the robot through an active learning procedure. This introduces new challenges that have to be solved to achieve effective training. We'll show several issues that can be encountered while learning a close-loop DNN controller aimed at a fundamental task like grasping, and their practical solutions. First, we'll illustrate the advantages of training using a simulator, as well as the effects of choosing different learning algorithms in the reinforcement learning and imitation learning domains. We'll then show how separating the control and vision modules in the DNN can simplify and speed up the learning procedure in the simulator, although the learned controller hardly generalizes to the real world environment. Finally, we'll demonstrate how to use domain transfer to train a DNN controller in a simulator that can be effectively employed to control a robot in the real world. 25-minute Talk Mengyuan Yan - Stanford Ph.D. (NVIDIA Research Intern), Stanford University
Iuri Frosio - Senior Research Scientist, NVIDIA
Favorite
S8242 - Deep Learning for Computational Science

We'll review our study of the use of artificial intelligence to augment various domains of computational science in order to improve time to solution for various HPC problems. We'll discuss the current state-of-the-art approaches and performance gains where applicable. We'll also investigate current barriers to adoption and consider possible solutions.

25-minute Talk Yang Juntao - Solutions Architect, NVIDIA
Jeffrey Adie - Principal Solutions Architect, NVIDIA
Favorite
S8427 - Breaking Through the Barriers to GPU Accelerated Monte Carlo Particle Transport A new method of accelerating Monte Carlo particle transport with GPUs will be presented that can be implemented in modern and legacy Monte Carlo codes with little development cost. Two major barriers exist for accelerating Monte Carlo particle transport with GPUs: High development costs and limited performance. World class Monte Carlo particle transport codes require decades of development. Completely re-writing such codes for the high-performance computing platform du jour is not practical. A review of seven implementations of Monte Carlo neutron transport on GPUs indicates a performance wall of 4.5 times the speed of 8 CPU cores. The new method, which is based on ray casting, calculates neutron and photon fluence tallies on the GPU while the random walk is maintained on the CPU. This method significantly lowers the software development cost and increases performance. A performance increase of 7 times the performance of 8 CPU cores has been demonstrated for the calculation of neutron fluence in a Pressurized Water Reactor (PWR) fuel assembly. For photons, performances increases up to 29 times have been demonstrated when simulating both medical and industrial radiography. 25-minute Talk Jeremy Sweezy - Scientist, Los Alamos National Laboratory
Favorite
S8454 - Render Your Projects 10x Faster with SOLIDWORKS Visualize!

Learn how SOLIDWORKS Visualize and Iray's AI Denoiser can increase renders speeds by 10x, saving you hours each day. Complex interior scenes that used to take 10 minutes to render will now only take 1 minute! Product renderings that took 1 minute to render now only take seconds! Whether you're a designer, engineer, marketing or executive, this session is for you. Discover how easy it is to set up your CAD data in SOLIDWORKS Visualize and create photo-quality images, animations and other immersive 3D content in a snap. Drastically reduce physical prototyping and bring your products to market faster than ever before. Also joining on stage is a high-level customer from the Hollywood Film industry, presenting their exciting 3D visualization workflow using SOLIDWORKS products to design vehicles, props and sets for recent blockbuster Marvel films.

25-minute Talk Brian Hillner - Product Portfolio Manager, DS SOLIDWORKS
Joseph Hiura - Art Director, Set Designer and Illustrator, Member, IATSE Local 800 AD, SDMM & IMA Crafts
Favorite
S8583 - Micro-Weather Forecasting Using Wireless Network Data

Just think about one weather event in the U.S.: Hurricane Irma — millions of people were evacuated from their family homes, thousands of homes and small business were devastated, and the entire economy of the area was and will be affected for many years to come. At ClimaCell, we've decided to improve weather data on three aspects: inputs, models, and computing power. We are the only company in the world that has a new set of proprietary data (inputs), known as wireless networks, that are not necessarily correlated with GDP, but with a population density (doesn't matter if it is the U.S. or India). We bring both accuracy and coverage at the same time, with no trade-offs.

25-minute Talk Itai Zlotnik - Chief Visionary Officer, ClimaCell
Favorite
S8724 - Finding the Right Dress at Scale at Rent The Runway

Rent The Runway gets millions of visitors every day. We serve personalized recommendations to them based on browser behavior and some explicit feedback. To add to the complexity, we have multiple membership programs. Fashion has some unique challenges regarding seasonality, fit and feedback. We also have a unique business model where order fulfillment and reservations are tied together in a unique way. We have moved to a GPU first infrastructure to scale instead of Spark clusters. We will discuss how we are moving to power all our algorithms this way.

25-minute Talk Saurabh Bhatnagar - Sr. Data Scientist, Rent The Runway
Favorite
CE8159 - Connect with the Experts: Deep Learning Inference with TensorRT (2)

Are you ready to start using Deep Learning to enable features or capabilities in an app or device? Do you need more throughput for a DNN in the cloud or lower latency in an embedded device? Attend to learn about the TensorRT Deep Learning Inference Software. Experts will be standing by to talk about your use case and also to discuss recent developments like: reduced precision inference, user defined custom layers, and recurrent neural network (LSTM/GRU) support.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Dilip Sequeira, NVIDIA
Siddharth Sharma, NVIDIA
Craig Wittenbrink, NVIDIA
Micah Villmow - Senior Deep Learning Software Engineer, NVIDIA
Ryan Olson - Solutions Architect, NVIDIA
Braden Robison - -, NVIDIA
Han Vanholder, NVIDIA
Christopher Gottbrath - Senior Manager, NVIDIA
Martin Doe, NVIDIA
Favorite
S81017 - Spectre/Meltdown Impact on High Performance Workloads

The impact of the recent Spectre and Meltdown security vulnerabilities has reached every corner of the compute ecosystem. Red Hat's Performance Engineering team has a keen interest in quantifying a wide variety of workloads in order to provide feedback to upstream developers working on these problems. This presentation will detail our team's involvement over the last several months, share selected performance impacts from a variety of common enterprise and HPC workloads, how to potentially mitigate overheads, and inform the audience about what's being done to reduce impacts going forward.

25-minute Talk Jeremy Eder - Senior Principal Performance Engineer, Red Hat
Favorite
S81045 - TuSimple Autonomous Trucks: Prototypes to Products

Overview of TuSimple's unique full vision-based autonomous driving solution with a case study of it's camera based perception solution. Deep dive into the hidden elements of autonomous truck system in terms of algorithm, big data and hardware. Plus, a look into the future of autonomous driving developments in sensors (Camera VS Lidar), redundant systems, and computational resources.

25-minute Talk Xiaodi Hou - CTO, TuSimple
Favorite
S81049 - Putting AI to Work in an Enterprise: Deep Learning as a Service (Presented by IBM)

Now that Deep learning has moved out of the lab and into production, how do you provide training environments to all your internal customers working across business units with different requirements and avoid provisioning separate clusters? IBM has applied decades of HPC experience to build a production ready learning stack, including servers accelerated with NVIDIA GPUs, workload and resource management software, ready to use open source frameworks and it's all covered by IBM support. The solution provides a secure multi-tenant environment so multiple data scientists can share a common set of resources, eliminating silos, while running multiple instances of the same or different applications. The deep learning effort is enhanced with end-to-end pipeline support from data ingestion and preparation, through model training and tuning, to inference. In this session, we will explore what an enterprise deep learning environment looks like and provide insights into the unique IBM value for accelerating the use of deep learning across a wide variety of industries.

50-minute Talk Nick Werstiuk, IBM
Favorite
S8190 - Performance Optimization for Scientific Applications We'll take you on a journey through enabling applications for GPUs; interoperability of different languages (including Fortran, OpenACC, C, and CUDA); CUDA library interfacing; data management, movement, and layout tuning; kernel optimization; tool usage; multi-GPU data transfer; and performance modeling. We'll show how careful optimizations can have a dramatic effect and push application performance towards the maximum possible on the hardware. We'll describe tuning of multi-GPU communications, including efficient exploitation of high-bandwidth NVLink hardware. The applications used in this study are from the domain of numerical weather prediction, and also feature in the ESCAPE European collaborative project, but we'll present widely relevant techniques in a generic and easily transferable way. 50-minute Talk Alan Gray - Developer Technology Engineer, NVIDIA
Favorite
S8388 - One System to Render Them All To enable our users to Image, Design and Create Anything, using a vast gamut of product offerings, consistent visualization and design experience has been the main focus. This session provides a quick tour of Autodesk's internal graphics system for interactive GPU rendering. It lists some of the challenges faced and how these were overcome. 25-minute Talk Ashwin Bhat - Sr. Principal Engineer, Autodesk Inc.
Rama Hoetzlein - Graphics Research Engineer, NVIDIA
Favorite
S8598 - Building Seeing AI : The Talking Camera App for the Blind We'll detail the journey of building Seeing AI, an app from Microsoft AI & Research that narrates the world around you. Designed for the blind and low-vision community, this research project harnesses the power of AI to describe people, text, and objects. Seeing AI leverages object classification, detection, image captioning, and more, with several running on the device in real time at more than 15 frames per second. We'll go over the learnings, challenges, hits, and misses we encountered while developing the application. 50-minute Talk Anirudh Koul - Senior Data Scientist, Microsoft
Favorite
S8683 - Advancing Representation Learning for Language and Vision As an NVAIL partner, Machine Perception Group at Tokyo university is focusing on various research areas of AI, particularly on natural language processing, computer vision, and their cross-disciplinary domain. Since deep learning has revolutionalized all these fields, one of the core issues has been how to effectively extract powerful semantic representations from low-level inputs in an end-to-end manner. Indeed, remarkable progress has been made on this point in recent years, enabling many spectacular cross-modal applications. In this talk, we will introduce several research projects in our group related to representation learning for language and vision, and discuss future direction. 50-minute Talk Hideki Nakayama - Assistant Professor, The University of Tokyo
Favorite
S8729 - Pioneering AI for All Businesses of all sizes are increasingly recognizing the potential value of AI, but few are sure how to prepare for the transformational change it is sure to bring to their organizations. Danny Lange rolled out company-wide AI platforms at Uber and Amazon; now, through Unity Technologies, he's making AI available to the rest of us. He'll also share his thoughts for the most exciting advances that AI will bring over the next year. His insights will help you understand the true potential of AI, regardless of your role or industry. 50-minute Talk Danny Lange - VP of AI and Machine Learning, Unity
Favorite
S8791 - Designing Wireless Systems with Deep Learning - An Autoencoder-Based Approach to PHY Layer Design The field of wireless engineering is on the cusp of a revolution, driven by deep learning, that will define the next paradigm in wireless system design. While wireless communications technology has advanced considerably since its invention in the 1890s, the fundamental design methodology has remained unchanged throughout its history - expert engineers hand-designing radio systems for specific applications. Deep learning enables a new, radically different approach, where systems are learned from wireless channel data. As the world becomes more connected and the Internet of Things becomes a reality, it is difficult to overstate the enormity of the impact to both commercial and military systems. This talk will provide a high-level overview of deep learning applied to wireless communications, discuss the current state of the technology and research, and present a vision for the future of wireless engineering. 25-minute Talk Tim O'Shea - CTO, DeepSig Inc.
Ben Hilburn - Director of Engineering, DeepSig Inc.
Favorite
S8804 - Predictive Rendering for Industrial Application Physics-based rendering has become a standard in the field of rendering or visualization. Anyway, PBR is not enough for a tool to be predictive, the industrial markets (automotive, aerospace, architecture) are very demanding and having them accept a software as a decision-making tool requires a lot of effort and validation. At Optis, we've strongly focused on optics and photometry simulation for nearly 30 years. Working on real-time GPU-based applications for 10 years, we're now releasing the first GPU-accelerated predictive renderer that mixes rasterization, deterministic ray tracing, and Monte Carlo ray tracing to provide highly realistic spectral propagation of light. In addition with our special-varying Brdf model that can be captured with the device we develop (optical material scanner), the photometry results computed by the simulation are trustable enough to take costly decisions in industrial use cases such as color and trim, reflection analysis, lighting design, or high-end VR configurators. 25-minute Talk Nicolas Dalmasso - Innovation Director, OPTIS
Favorite
S8842 - How to Win the Amazon Robotics Challenge with Deep Learning and Robotic Vision The Amazon Robotics Challenge had 16 teams compete at the 2017 Amazon Robotics Challenge global finals in Nagoya, Japan. Each team was challenged to design a pick-and-place robot for autonomous warehousing to address the need for development in robotic vision and manipulation. We'll present Cartman, our custom-built, cost-effective robot system, which won first place in the competition finals by stowing 14 (out of 16) and picking all nine items in 27 minutes. We'll highlight our experience-centered design methodology and key aspects of our system that contributed to our competitiveness. In particular, for a perception system we built a deep learned semantic segmentation network, which was trained on only about a dozen images per previously unseen item. By conducting training on four GeForce GTX 1080 GPUs, we created an effective robot system with robust perception that was tightly integrated with our hardware and critical to our win. 50-minute Talk Doug Morrison - PhD Researcher, Australian Centre for Robotic Vision
Juxi Leitner - Research Fellow, Australian Centre for Robotic Vision
Favorite
S8461 - Extreme Multi-View Rendering for Light-Field Displays A light-field display projects a 3D aerial scene that is visible to the unaided eye without glasses or head tracking and allows for the perspective correct visualization of the scene within the display's projection volume. The light-field display computes a synthetic radiance image from a 3D scene/model and projects the radiance image through a lens system to construct the 3D aerial scene. Binocular disparity, occlusion, specular highlights, gradient shading, and other expected depth cues are correct from the viewer's perspective as in the natural real-world light-field. There are a few processes for generating the synthetic radiance image; the difference between the two most common rasterization approaches is the order in which they decompose the 4D light-field (two dimensions of position, two dimensions of direction) into 2D rendering passes. This talk will describe Double Frustum and Oblique Slice and Dice synthetic radiance image rendering algorithms and their effective use for wide-area light-field displays. 25-minute Talk Thomas Burnett - CTO, FoVI3D
Favorite
S8506 - Mapping MPI+X Applications to Multi-GPU Architectures: A Performance-Portable Approach Learn how to map parallel scientific applications on multi-GPU architectures using a performance-portable approach. This approach is built on three fundamental aspects: Firstly, the memory hierarchy is the primary design consideration; secondly, there is a global awareness of hybrid programming abstractions, such as MPI+CUDA+OpenMP; and thirdly, a framework that enables the integration and support of heterogeneous devices. We'll provide example mappings on a CORAL early access system consisting of IBM Power8+ processors with NVIDIA Pascal GPUs. We'll also discuss the performance of micro-benchmarks and an earthquake ground motion simulation code relative to other mapping approaches. 25-minute Talk Edgar Leon - Computer Scientist, Lawrence Livermore National Laboratory
Favorite
S8613 - Hail-O on DGX-1, Law Enforcement's AI Tool to Combat Child Abuse

Every law enforcement agency receives tens if not hundreds of suspected child abuse cases every month. Each case may contain one or more hard disks and/or other storage media. On average, each hard disk contains about 200,000 images and hundreds of hours of video recordings. To successfully prosecute the offenders, every one of those images/videos needs to be correctly graded to help the courts assess the level of the offence. Even though many simple tools are used to accelerate this laborious process, every case can take hours if not days to prepare. This puts a significant strain on law enforcement worldwide. At this scale, the process is also very error prone, where evidence can be missed or ignored. Hail-O platform, which is the primary focus of this talk, rapidly inspects all images and videos detected on the disk of the offender, and using artificial intelligence automatically detects and grades indecent images of children. The adaptation of Hail-O allows law enforcement agencies to significantly reduce the workload required in prosecuting offenders and, at the same time, ensuring the consistency of the grading process. Hail-O will reduce law enforcement workload.

25-minute Talk Elan Raja - Director, Scan
Eyal Lemberger - Owner, Lemberger & Associates Limited
Favorite
S81002 - Accelerated Deep Learning Discovery in Fusion Energy Science

Deep learning/artificial intelligence methods are increasingly being deployed to enable new avenues of big-data-driven discovery in key scientific application areas such as the quest to deliver Fusion Energy – identified by the 2015 CNN "Moonshots for the 21st Century" series as one of 5 prominent modern grand challenges. Princeton University's associated R&D methods have been successfully applied to accelerate progress in reliably predicting and avoiding large-scale losses (called "disruptions") of the thermonuclear plasma fuel in magnetically-confined devices – the largest of which is the $25B international ITER device – a burning plasma experiment under construction with the potential to exceed "breakeven" fusion power (i.e., "power out = power in") by a factor of 10 or more.

50-minute Talk William Tang - New Jersey, Princeton University
Favorite
S81012 - Training Neural Networks with Mixed Precision: Real Examples We will cover the techniques for training DNNs with Tensor Cores described in "S8923 - Training Neural Networks with Mixed Precision: Theory and Practice". These methods were introduced for AI processing with the Volta GPU architecture. Tensor Cores provide up to 120 TFlops throughput, mixing operations on IEEE half- and single-precision floats. Techniques used will include loss-scaling, master weights copy, and choosing the proper precision for a given operation. For each of TensorFlow and PyTorch we will describe a fp32 network definition and then demonstrate the same network using mixed precision techniques. 80 Minutes Tutorial Benjamin Barsdell - Senior Deep Learning Engineer, NVIDIA
Michael O'Connor - Director, NVIDIA
Christian M. Sarofeen - Senior Deep Learning Engineer, NVIDIA
Favorite
S81021 - A Physician-Data Scientist Grand Vision: A Virtual Medical Oracle

The 10 necessary steps for this astounding pinnacle of "medical" intelligence to be realized will be outlined; among these requisite steps are: an AI-inspired acquisition of medical data; an universal common medical data repository; an internet of everything strategy for medical data; a close physician-data scientist collaboration; and a true deep learning/cognitive architecture hybrid structure for medical knowledge.    

25-minute Talk Anthony Chang - Department Advisor, CHOC
Favorite
S81023 - The SETI Institute: Using GPUs for Systems Science, Technology, and Exploration

The SETI Institute (SI) approaches the question of the origin and nature of life in the universe. Our NASA Astrobiology Institute team develops new exploration strategies and detection methods to support the search for biosignatures on Mars and other planets. SI is also driving a new paradigm for the exploration of biosignatures and signs of technology at all scales, using a holistic approach. This new direction requires the rapid analysis of vast amounts of data. In this presentation, we'll describe the history, successes, and challenges to current approaches, and describe SI's current and future efforts in FDL and other areas to incorporate AI and deep learning to drive this new big data paradigm for finding life in the universe.

50-minute Talk Nathalie A. Cabrol - Senior Research Scientist and Director of the Carl Sagan Center, Seti, Seti Institute
Graham Mackintosh - AI Consultant for Space Science Applications, NASA-STC, SETI Institute
Favorite
S8115 - BigQuery and TensorFlow: Data Warehouse + Machine Learning Enables the "Smart" Query BigQuery is Google's fully managed, petabyte-scale data warehouse. It's user-defined function realizes "smart" queries with the power of machine learning, such as similarity search or recommendation on images or documents with feature vectors and neural network prediction. We'll see how TensorFlow and its GPU-accelerated training environment enables a powerful "data warehouse + machine learning" solution. 25-minute Talk Kaz Sato - Developer Advocate, Google Cloud, Google
Favorite
S8405 - Large-Scale Multi-Parameter Waveform Inversion with GPUs on the Cloud: A Pipelined Implementation We'll describe how we accelerate the estimation of multiple subsurface properties with GPU-equipped cloud computers and save cost at the same time. Traditionally, institutions spend millions of dollars to build and maintain computing infrastructures that are rarely occupied at full capacity. Cloud computing offers a solution to this via on-demand provisioning that can flexibly meet an institution's needs, but it comes with two potential problems: preemption and no guarantee of low-latency inter-node communication. To sidestep these issues, we implement a pipeline processing model that fully utilizes CPU memory and GPU global memory to hide latency without having to decompose the computational domain into multiple nodes. 25-minute Talk Huy Le - Ph.D. Student, Stanford University
Favorite
S8526 - Creative AI: The Developing Friendship Between Human and Machine (Presented by Autodesk)

Generative design technologies for architecture, manufacturing and construction are rapidly becoming more useful and automated. Generative design, what we call "Creative A.I.", will soon reach a level of maturity that will empower designers not only to create what was not possible before, but to predict and reveal non-intuitive, high-performing design variations. One of the key drivers for this success is the availability of practical applications for artificial intelligence and machine learning that power these generative approaches. New human interfaces must be simple, approachable, and visualize trade-offs across many dimensions simultaneously, but also learn from and include the preference and style cues of the human being. This will be the key to building and maintaining trust between designers and their new team of cloud collaborators. In this new way of working, designers and machines will finally be in a true partnership. A new type of relationship will be formed, a friendship between human and computer.

50-minute Talk Mark Davis - Senior Director - Design Research, Autodesk
Favorite
S8668 - Scaling Machine Learning through Decentralization, Quantization, and Structured Sparsity

In this session, participants will get a taste of state-of-the-art techniques for scaling Deep Learning on GPU clusters. We present SuperML, a general and efficient communication layer for machine learning, which can scale neural network training to hundreds of GPU nodes. SuperML builds on three main ideas: decentralization, which allows algorithms to converge without a centralized coordinator (parameter server) or all-to-all communication, communication quantization, which significantly speeds up point-to-point messaging, and structured sparsity, by which SuperML induces model updates which only have a limited number of non-zero entries. From the technical perspective, SuperML provides a new implementation of the classic MPI standard, re-designed and re-implemented to provide efficient support for quantization and sparsity. We illustrate the performance characteristics of SuperML on CSCS Piz Daint, Europe's most powerful supercomputer, and on Amazon EC2, improving upon other highly optimized implementations such as CrayMPI and NVIDIA NCCL.

50-minute Talk Ce Zhang - Assistant Professor, ETH Zurich
Dan Alistarh - Professor, IST Austria
Favorite
S8854 - CUTLASS: Software Primitives for Dense Linear Algebra at All Levels and Scales within CUDA Audience members will learn how to implement efficient Deep Learning computations using CUDA C++ in the context of CUTLASS. CUTLASS is an open-source collection of C++ template abstractions for implementing high-performance matrix-multiplication (GEMM) at all levels of the CUDA thread hierarchy. We will describe many of the algorithmic strategies used by cuBLAS and cuDNN, and how they can be implemented using C++ templates to cover an extensive space of problem sizes, data layouts, and data types. In particular, we will emphasize how to support alternative and mixed precision math operations such as Pascal's integer DP4A operation and Volta's TensorCores. Finally, we will illustrate how CUTLASS primitives can be combined with custom functionality to implement related algorithms such as convolution. Although this talk highlights CUTLASS, the architecture concepts and algorithm details are relevant to any CUDA programmer focused on Deep Learning. 50-minute Talk Andrew Kerr - Senior GPU Compute Architect, NVIDIA
Duane Merrill - Senior Research Scientist, NVIDIA
Favorite
S8861 - Crowd-sourcing, Map updates, and Predictions as Complementary Solutions for Mapping Autonomous driving requires thorough mapping capabilities in the car and in the cloud. In this session, various vendors present their mapping approaches to identify synergies and opportunities in the self-driving ecosystem. 80 Minutes Tutorial Kevin Tsurutome - VP, Business Development, DeepMap
Justyna Zander - Senior Automotive Product Manager, NVIDIA
Akshay Goel - CEO and Founder, explorer.ai
Peter Atalla - Founder and CEO, VoxelMaps
Favorite
S8927 - Generalizable Autonomy for Robotic Mobility and Manipulation Understanding the link between perception and action is key to building autonomous agents that can perform challenging tasks in unstructured environments among humans. The Stanford Vision & Learning Lab works at the interface of vision, language and robotics and, in this talk, we will discuss recent advances with deep learning in both mobility and manipulation. We will talk about our mobile experimental platform, JackRabbot, which is equipped with an on-board GPU to perform visual tasks in real time, and discuss topics related to human motion understanding. We will also talk about robot autonomy, requiring both understanding of perceptual inputs and reasoning at different levels of abstractions. We will present new approaches to imitation learning for robot manipulation, including Neural Task Programming. This new approach to meta-learning capitalizes on hierarchy and learns to "program" with a modular Robot API to perform unseen tasks with a single test example. Finally, we will discuss task structure learning as an intermediate step towards imitation from videos for complex tasks. 50-minute Talk Marynel Vázquez - Postdoctoral Scholar, Stanford University
Animesh Garg - Postdoctoral Researcher, Stanford University
Favorite
S8941 - Synthetic Label Data for Training Deep Learning ISR Algorithms This presentation will walk through the research and development Harris has performed in creating an automated pipeline for synthetic label data generation for training deep learning ISR algorithms. The benefit of artificial intelligence, machine learning and specifically deep learning to various industries has become obvious but has also exposed some challenges and hurdles to adoption. In the realm of Intelligence, Surveillance and Reconnaissance applications, the availability of labeled training data has proven to be a costly barrier to entry. Remote sensing physics-based modeling and simulation provide a solution to this challenge by synthesizing radiometrically accurate labeled training data in mass quantities. 50-minute Talk William Rorrer - Program Manager, Harris Corporation
Favorite
S8996 - AI Solutions and Use Cases Up Close (Presented by Inspur Systems)

Inspur has been deploying AI solutions with our customers, such as Microsoft, Alibaba, Baidu, BMW, for many years. We will share AI use cases on how we deploy AI at scale and take a close look at the technologies that enable AI deployments.

50-minute Talk Dolly Wu - Vice President/GM, Inspur
Favorite
S81024 - A Deep Learning-Assisted Platform for Precision Cardiovascular Phenotyping

Beyond the debate on whether or not AI-based technologies may one day replace physicians lies the promise of AI to change how we phenotype, in both research and clinical settings. Utilities for precision phenotyping and opportunities for cross-disciplinary collaboration will be discussed, particularly in the context of image-based phenotypes. Imaging is both information-rich and essential in medicine but complex to acquire and interpret, currently requiring trained human experts. A prime example is cardiac ultrasound, which comprises manually acquired, multi-view, multi-modality image views of the heart. We present a use case for CNNs in classifying these views, highlighting the use of small datasets and real-world clinical images, as a foundational step in artificial intelligence-assisted cardiac image interpretation.

25-minute Talk Rima Arnaout - Assistant Professor of Medicine, Cardiology, Institute for Computational Health Sciences, University of California, San Francisco
Favorite
S8459 - Performance Portability of Sparse Tensor Decomposition for GPUs using Kokkos Tensors have found utility in a wide range of applications, such as chemometrics, network traffic analysis, neuroscience, and signal processing. Many of these applications have increasingly large amounts of data to process and require high-performance methods to provide a reasonable turnaround time for analysts. In this work, we consider decomposition of sparse count data using CANDECOMP-PARAFAC alternating Poisson regression (CP-APR) with both multiplicative update and quasi-Newton methods. For these methods to remain effective on modern large core count CPU, many integrated core, and GPU architectures, it is essential to expose thread- and vector-level parallelism and take into account the memory hierarchy and access patterns for each device to obtain the best possible performance. We'll highlight the effect of the sparsity patterns in the input tensor on each architecture and how that relates to the access patterns and effective cache and/or shared memory utilization of each solver method and how to potentially exploit the sparsity structure in performance optimizations. 25-minute Talk Keita Teranishi - Principal Member of Technical Staff, Sandia National Laboratories
Favorite
CE8160 - Connect with the Experts: Deep Learning Basics (5)

Attend this session to get your questions on deep learning basics and concepts answered. NVIDIA experts can help you with the fundamentals and provide guidance on how and when to apply Deep Learning and GPUs to your work. No question is too basic to ask.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject.  

1 Hour Connect with the Experts Hans Mortensen - Sr. Solutions Architect, NVIDIA
Rajan Arora - Solution Architect, NVIDIA
Favorite
L8116 - Best GPU Code Practices Combining OpenACC, CUDA, and OmpSs We'll guide you step by step to port and optimize an oil-and-gas miniapplication to efficiently leverage the amazing computing power of NVIDIA GPUs. While OpenACC focuses on coding productivity and portability, CUDA enables extracting the maximum performance from NVIDIA GPUs. OmpSs, on the other hand, is a GPU-aware task-based programming model that may be combined with CUDA, and recently with OpenACC as well. Using OpenACC, we'll start benefiting from GPU computing, obtaining great coding productivity, and a nice performance improvement. We can next fine-tune the critical application parts developing CUDA kernels to hand-optimize the problem. OmpSs combined with either OpenACC or CUDA will enable seamless task parallelism leveraging all system devices. 120 Minutes Instructor-Led Lab Antonio J. Peña - Sr. Researcher, BSC
Pau Farre - Software Engineer, BSC
Favorite
L8149 - Signal Processing with DIGITS Prerequisites: 'Fundamentals of Deep Learning with Computer Vision' or similar experience

Duration: 2 hours

Framework: Caffe, DIGITS

The fact that deep neural networks are better at classifying images than humans has implications beyond what we typically think of computer vision.

In this hands-on lab, you'll convert radio frequency (RF) signals into images to detect a weak signal corrupted by noise. You'll be trained how to:

● Treat non-image data as image data

● Implement a deep learning workflow (load, train, test, adjust) in DIGITS

● Test performance programmatically and guide performance improvements

Upon completion, you'll be able to classify both image and image-like data using deep learning.

Presented by the NVIDIA Deep Learning Institute (DLI).
120 Minutes Instructor-Led Lab Mike Mendelson - Deep Learning Institute Curriculum Developer, NVIDIA
Favorite
L8156 - Object Detection for Full Motion Video

Detecting specific types of objects is often the first step in more sophisticated workflows such as identification, classification, segmentation, prediction/recommendation, etc. In this lab, you'll learn how to implement object detection in full motion videos through a series of hands-on exercises, using convolutional neural networks to analyze video captured from low-altitude platforms.

You'll:

•Use the TensorFlow Object Detection API to train and evaluate deep learning models

•Review state-of-the-art object tracking methods and performance metrics

•Learn how to overcome challenges associated with overhead video

Upon completion, you'll know how analyze video data using deep learning to implement state-of-the-art object detection methods and how to train and evaluate model performance.

Presented by the NVIDIA Deep Learning Institute (DLI).

120 Minutes Instructor-Led Lab Jonathan Howe - DLI Instructor, NVIDIA
Favorite
S81046 - Accelerating AI Adoption and Impact (Presented by Dell EMC)

Attendees will learn and understand why AI techniques are so powerful, why developing and deploying optimal AI solutions is complex, why using AI techniques effectively is still difficult--and what Dell Technologies is doing to remove these difficulties and bring easier, effective AI to everyone. Dell Technologies includes seven companies with a comprehensive portfolio of technology products, services and solutions for global industry, government, and education markets, and aims to be the leader in designing and delivering the best AI solutions for every customer, of every type and scale. From Dell Precision workstations for developers and Gateways for edge sensors, to Dell EMC GPU-optimized PowerEdge Servers and Ready Solutions for Deep Learning and hybrid cloud offerings, Dell is leveraging its leadership in technology and in enterprise relationships to design a world-class portfolio of AI solutions for diverse customer workloads, requirements and objectives. This presentation will cover AI and deep learning in an enterprise context, including customer challenges and needs, and then discuss Dell AI solutions and strategy to empower people to use AI rapidly and effectively.

50-minute Talk Jay Boisseau - AI & HPC Technology Strategist, Dell
Favorite
S8172 - Evaluation of Hybrid Cache-Coherent Concurrent Hash Table on POWER9 System with NVLink 2 At the 2014 GTC, we described a novel concurrent cache-aware hash table that used a multi-level bounded linear probing hashing algorithm. This year we'll discus how the design has expanded using a hybrid (CPU-GPU based) hash table where the data is stored on the host CPU memory and accessed via the GPU using the unified memory constructs. The hash table is designed such that multiple CPU threads can update it concurrently and multiple GPU threads can fetch data from the hash table in a cache-coherent manner using NVLink 2.0. The hash-table is implemented on a POWER9 system with NVLink 2.0 connected Tesla V100 GPUs. We'll present detailed performance measurements of throughput and virtual memory activities from CPU updates and GPU fetches. We also compare the performance of our design against a hybrid hash table built using the Cuckoo hashing approach. 50-minute Talk Rajesh Bordawekar - Principal Research Staff Member, IBM T. J. Watson Research Center
Pidad Gasfar D'Souza - System Performance Architect, IBM Systems Development Lab
Favorite
S8257 - Rebooting VDI: How NVIDIA GRID Technology Has Propelled DigitalGlobe's Success Building on the somewhat unlikely success of our vGPU-accelerated Linux virtual desktops, DigitalGlobe is teaming up with NVIDIA and Nutanix to replace an ailing Windows 7 VDI environment that has been unable to keep up with the company's demands of it. As the world leader in commercial satellite imagery and data, DigitalGlobe has continually challenged the limits of virtual desktops. By leveraging NVIDIA GRID technology, we've been able to step beyond niche use-cases for VDI and broaden adoption across the company while leaving multiple avenues open for us to fulfill a rapidly broadening customer base. 50-minute Talk Mike Bantz - Virtualization Engineer, DigitalGlobe
Favorite
S8413 - Wildlife Conservation Using Autonomous Drones Timely responses to suspected poaching activity are critically important for people working in areas where poaching is a problem. Our fixed-wing autonomous UAV solution is designed specifically to detect objects (people, vehicles, elephants, etc.) for wildlife conservation. Some other solutions use either a purely offline approach or stream all data back to a base station for processing. Detection of objects at altitude is difficult, as they are farther away and have less pixel area than what is expected in canonical objects recognition tasks. Performing detection in real-time onboard the drone reduces operator fatigue and ensures expedited response times to each imminent threat. 50-minute Talk Jeremy Bensley - Senior Research Software Engineer, Vulcan Inc.
Paul Aarseth - Lead Research Software Engineer, Vulcan Inc.
Favorite
S8428 - Highly Accurate Brain Stroke Diagnosis System and Generative Stroke Lesion Model

Learn CAIDE Systems' unique diagnosis system with highly accurate prediction and delineation of brain stroke lesion. We'll present how we increase sensitivity in medical diagnosis system and how we develop a state-of-the-art generative deep learning model for acquiring segmented stroke lesion CT images, and demonstrate our market-ready product: a diagnostic tool as well as a medical deep learning platform. We trained our diagnostic system using CT image data from thousands of patients with brain stroke and tested to see commercial feasibility of use for hospitals and mobile ambulances.

 

25-minute Talk Junghwan Cho - Chief Research Scientist, CAIDE Systems, Inc
Favorite
S8440 - CuLE : A Companion Library for Accelerated RL Training Traditional RL training is dominated by experience collection processes executing on the CPU. However, this CPU oriented design pattern limits the utility of DL accelerators, such as GPUs. In this talk we present CuLE (cuda learning environment), an experimental deep RL companion library, to facilitate the generation of RL updates directly on the GPU. CuLE provides an implementation of ALE (atari learning environment), a challenging RL benchmark for discrete episodic tasks, executing directly on the GPU with the number of environments ranging from a few hundred to several thousand. Although traditional deep RL implementations use 12-16 agents coupled with replay memory to achieve training efficiency CuLE can generate a massive number of samples per step and supports new training scenarios that minimize expensive data movement operations. With 1024 agents CuLE achieves an 8-10x performance improvement by executing directly on the GPU compared to 1024 agents running in parallel on a 12-core CPU. We plan to extend CuLE to support a new set GPU-centric deep RL training schemes and new challenging training environments through integration with GFN.​ 25-minute Talk Iuri Frosio - Senior Research Scientist, NVIDIA
Favorite
S8474 - GPUDirect: Life in the Fast Lane

Hear about the latest developments concerning the NVIDIA GPUDirect family of technologies, which are aimed at improving both the data and the control path among GPUs, in combination with third-party devices. We'll introduce the fundamental concepts behind GPUDirect and present the latest developments, such as changes to the pre-existing APIs, the new APIs recently introduced. We'll also discuss the expected performance in combination with the new computing platforms that emerged last year.

50-minute Talk Davide Rossetti - Senior Software Engineer, NVIDIA
Elena Agostini - Software Engineer, NVIDIA
Favorite
S8705 - Implementing Advanced Data Visualizations with NVIDIA IndeX Assessing complex, multi-dimensional scientific data visually requires custom and well-design visualization strategies that target the human's visual perception. NVIDIA IndeX grants scientists and developers access to its innermost raycasting operations and lets them inject CUDA code that is compiled and executed at runtime to customize the appearance of data samples. In this way, scientists and developers can leverage NVIDIA IndeX as an effective tool for creating user-defined and domain-specific advanced visualization techniques and for applying these techniques to enormously large scientific data interacitvely for substantial visual insight into the data. The tutorial guides scientists and developers through the process of creating and implementing custom visualization techniques using NVIDIA IndeX and CUDA runtime compilation functionality. With the help of well-choosen and constantly evolving example the tutorial will teach the audience how easily comprehensive visualization techniques can be implemented to gain deep insight into complex scientific data. 50-minute Talk Alexander Kuhn - Senior Software Engineer, NVIDIA
Favorite
S8765 - Performance Optimization for Deep-Learning on the latest OpenPOWER systems

We'll discuss how cognitive workloads could leverage the latest OpenPOWER systems with NVIDIA Volta V100 GPUs and fast NVLink 2.0 CPU-GPU interconnects. IBM has formed a close partnership with NVIDIA to offer GPU-enabled OpenPOWER systems and PowerAI software to our customers and developers. We'll focus on the latest OpenPOWER systems and how large-scale deep-learning neural network training could leverage the unique capabilities of these systems with PowerAI Release 4. Also discussed is the new IBM distributed deep learning (DDL) technology that allows neural network model training to scale almost linearly across hundreds of NVIDIA GPUs.

50-minute Talk Khoa Huynh - Senior Technical Staff Member (STSM), IBM
Brian Wan - Staff Software Engineer, IBM
Jonathan Samn - Software Engineer, IBM
Favorite
S8813 - Design Intelligence Providing the reference CAD tools of the industry with CATIA and SOLIDWORKS, Dassault Systèmes has largely invested in 3DEXPERIENCE, a collaborative platform covering enterprise value streams, from concept ideation, design, simulation, manufacturing, project management to marketing. We'll present the latest improvement achieved by EXALEAD, core search engine technology of the 3DEXPERIENCE platform, to enable cognitive augmented design, taking into account the exhaustive digital knowledge acquired during the design process, the multi-physics simulation studies and the user experience. The 3DExperience Platform instantly accessible to the different company departments, partners and customers, will drastically reduce the product conception time. Instead of improving the conventional design methods, the 3DEXPERIENCE platform is looking at changing the paradigm from an iterative design/simulation/validation process automated through templates to a cognitive design process with proactive design proposal leveraging company's historical know-how and expertise. 50-minute Talk Arnaud Nonclercq - EXALEAD R&D Applications Director, Dassault Systèmes
Morgan Zimmerman - EXALEAD CEO, Dassault Systèmes
Favorite
S8866 - Deep Learning Brings Disruptive Changes to Ophthalmology

Hear about how GPU technology is disrupting the way your eye doctor works and how ophthalmic research is performed today. The rise of Electronic Medical Records in medicine has created mountains of Big Data particularly in ophthalmology where many discrete quantitative clinical elements like visual acuity can be tied to rich imaging datasets. In this session, we will explore the transformative nature that GPU acceleration has played in accelerating clinical research and show real-life examples of deep learning applications to ophthalmology in creating new steps forward in automated diagnoses, image segmentation, and computer aided diagnoses.

25-minute Talk Aaron Lee - Assistant Professor, University of Washington
Favorite
S81006 - Volta: Architecture and Performance Optimization This talk will review Volta GPU architecture and related guidance for optimizing performance of compute applications. Details will include the memory hierarchy, execution pipelines, and some new additions to the programming model. 50-minute Talk Guillaume Thomas Collignon - Developer Technology Engineer, NVIDIA
Paulius Micikevicius - Compute Architecture, NVIDIA
Favorite
S8567 - Novel Deep Neural Networks for Seismic Interpretation: Bridging the Gap between Geophysicists and AI This talk explores a robust algorithm to detect subsurface fault architecture from seismic images covering large areas involved with complex geologic varieties. By incorporating deep learning models in geophysics, the time to evaluate subsurface fault architecture has been significantly reduced from years to months. Optimum selections of optimization methods, learning parameters, and model hyperparameters are discussed. In addition, extensive experiences are presented specially for seismic images, including: training dataset preparation, image augmentation techniques, color scale effect, transfer learnings, generation of 3D probability cubes in high fidelity, model self-adjustment capabilities, etc. Finally, Gulf of Mexico offshore asset is selected as a show case to illustrate the robustness of the developed algorithm in detecting very large counts of deep, shallow and subtle faults that are too time intensive for asset Geophysicists to manually interpret. All in all, the algorithm has proven to be remarkably effective at rapidly building high fidelity fault architectures, and promoted invaluable data-driven decision-making capability to E&P oil companies' daily business. 25-minute Talk Ping Lu - Senior Data Scientist, Anadarko Petroleum Corporation