View More
View Less
System Message
My Schedule
An unknown error has occurred and your request could not be completed. Please contact support.
Wait Listed
Personal Calendar
Conference Event
There aren't any available sessions at this time.
Conflict Found
This session is already scheduled at another time. Would you like to...
Please enter a maximum of {0} characters.
{0} remaining of {1} character maximum.
Please enter a maximum of {0} words.
{0} remaining of {1} word maximum.
must be 50 characters or less.
must be 40 characters or less.
Session Summary
We were unable to load the map image.
This has not yet been assigned to a map.
Search Catalog
Replies ()
New Post
Microblog Thread
Post Reply
Your session timed out.
This web page is not optimized for viewing on a mobile device. Visit this site in a desktop browser to access the full set of features.
GTC 2018 Silicon Valley
Remove from My Interests
Browse & Search for sessions, and click "Add to Schedule" to save sessions to your agenda.

Note sessions are first come, first serve on the day of the conference. Arrive early to the room for high priority sessions.

Sign-up is required for Conference + Training pass holders to reserve seats in Instructor-Led Labs.

TDLIW04 - Pre-GTC DLI Workshop: Fundamentals of Accelerated Computing with CUDA C/C++

Pre-requisite: None

Duration: 8 hours

Format: Self-paced online or instructor-led

Languages: English

The CUDA computing platform enables the acceleration of CPU-only applications to run on the world's fastest massively parallel GPUs. Experience C/C++ application acceleration by:

  • Accelerating CPU-only applications to run their latent parallelism on GPUs
  • Utilizing essential CUDA memory management techniques to optimize accelerated applications
  • Exposing accelerated application potential for concurrency and exploiting it with CUDA streams
  • Leveraging command line and visual profiling to guide and check your work

Upon completion of this workshop, you'll be able to accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques. You’ll understand an iterative style of CUDA development that will allow you to ship accelerated applications fast.

See GTC Pricing for more information.

8 hours Pre-GTC DLI Workshops
SE0000 - Welcome Reception

At this reception, meet NVIDIA staff and other GTC alumni to get tips, especially if you're a first-timer.

Special Event - 2 h Special Event
SE0002 - Dinner with Strangers (Sun)

Join a random group of GTC attendees for enlightening conversations over a self-hosted dinner in great restaurants nearby. Less creepy than it sounds, this is one of the more popular programs at GTC.

Sign up in Main Lobby.

Special Event - 2 h Special Event
CE8164 - Connect with the Experts: CUDA-based Raytracing and Rendering

We will answer your questions on the design and implementation of renderers based on raytracing using CUDA, and discuss how to get the best performance out of NVIDIA hardware in your renderer. 

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject. 

1 Hour Connect with the Experts Carsten Waechter - Ray Tracing Software Architect, NVIDIA
Pascal Gautron - Senior Developer Technology Engineer, NVIDIA
L8111A - Jetson Developer Tools Training Labs

This lab is focused on teaching you how to maximize the productivity when developing software for the Jetson platform. You will experience first hand how to manage source code on the host PC to cross-compile the software, initiate remote debugging sessions to debug CPU C/C++ and CUDA C code. Through a comprehensive set of exercises, you will also learn how to use the CUDA Visual Profiler for optimizing CUDA kernels, use the Tegra System Profiler for optimizing CPU code and tracing multi-process system-wide activities, and use Tegra Graphics Debugger for debugging and profiling 3D graphics applications. Prerequisites: Basic CUDA-C and C++ coding skills.

120 Minutes Instructor-Led Lab Sebastien Domine - VP SW Eng. - Developer Tools, NVIDIA
L8167 - Image Creation using Generative Adversarial Networks using TensorFlow and DIGITS This lab will guide you through the process of training a Generative Adversarial Network (GAN) to generate image contents in DIGITS. You'll learn how to: • Use Generative Adversarial Networks (GANs) to create handwritten numbers • Visualize the feature space and use attribute vector to generate image analogies • Train a GAN to generate images with set attributes Upon completion, you'll be able to use GANs to generate images by manipulating feature space. Prerequisites: Fundamentals of Deep Learning with Computer Vision or similar experience 120 Minutes Instructor-Led Lab Jonathan Bentz, NVIDIA
S8225 - Sharing Physically Based Materials Between Renderers with MDL We'll discuss the basics of NVIDIA's Material Definition Language, showing how a single material can be used to define matching appearances between different renderers and rendering techniques. End users will learn how physically based definitions can be defined, while developers will learn what's entailed in supporting MDL within their own products or renderers. 50-minute Talk Jan Jordan - Software Product Manager MDL, NVIDIA
Lutz Kettner - Director, Rendering Software and Material Definition, NVIDIA
S8236 - Singularity: Reproducible, Trusted Containers for Scientific Computing Singularity is a container technology which is widely supported by HPC centers and service providers because it facilitates extreme mobility of compute via verifiable, trusted containers. This talk will cover a high level view of container computing and an introduction to Singularity, description of the Singularity Image Format (SIF), as well as technical recipes and usage examples with GPUs. After attending this talk, you will have a strong understanding of containerization and how to leverage this technology to create extremely reproducible workflows. 50-minute Talk Gregory Kurtzer - CTO, SyLabs
S8286 - Quick and Easy DL Workflow Proof of Concept Spin up a deep learning (DL) proof-of-concept on a budget. We'll walk you through a DL workflow in the cloud leveraging DIGITS, then download a trained model, and run inference on a Jetson TX2. This session considers multiple options such as Nimbix, AMI, and NGC on Tesla P100, Tesla V100, and NVIDIA DGX-1 servers. This tutorial will be a combination of lecture, live demos, and detailed instructions. 50-minute Talk Jeff Weiss - Director, Solution Architects, NVIDIA
Alec Gunny - Solutions Architect, NVIDIA
Ken Hester - Solution Architect, NVIDIA
S8382 - Zero to GPU Hero with OpenACC GPUs are often the fastest way to obtain your scientific results, but many students and domain scientists don't know how to get started. In this tutorial we will take an application from simple, serial loops to a fully GPU-enabled application. Students will learn a profile-guided approach to accelerating applications, including how to find hotspots, how to use OpenACC to accelerated important regions of code, and how to get the best performance they can on GPUs. No prior experience in GPU-programming or OpenACC is required, but experience with C, C++, or Fortran is a must. Several books will be given away to attendees who complete this tutorial. 80 Minutes Tutorial Jeff Larkin - Senior DevTech Software Engineer, NVIDIA
S8483 - Empowering CUDA Developers with Virtual Desktops You've just been tasked with deploying the NVIDIA CUDA Toolkit to a group of developers. Wouldn't it be great if you could save time deploying it, protect the developers work, reduce the amount of unique workstation hardware needed, & get more out of your hardware investment? This session will show how this can be done with VMware Horizon Virtual Desktops leveraging vGPUs and the CUDA Toolkit. The CUDA Toolkit is a core component of most developer's desktops and provides the underpinnings for many development operations that take advantage of GPU technology. It can, and often is, difficult to install on Virtual Machines. We will walk through its deployment on Linux virtual machines, insuring requirements for both the CUDA Toolkit & VMware Horizon with vGPU are met. 50-minute Talk Tony Foster - Sr. Advisor, Technical Marketing - Ready Bundles for HPC, Dell EMC
S8587 - Recent Progress in Accelerating Monte Carlo Simulation on GPU for Pricing and Risk Management of Financial Instruments

Learn about recent progress in accelerating Monte Carlo simulation on the GPU in applications for pricing financial instruments and risk management. We'll focus on the forward Monte Carlo simulation, which allows for a natural parallelization across CUDA cores, and present a recent extension of our implementation to a broad selection of industry standard valuation models for different asset classes, including hybrid models that can be used to price multi-currency and multi-asset portfolios. Even with increasing complexity and dimensionality of valuation models, our benchmarks show stable GPU speedup factors in the ranges of 20x and 30x for calculations with floating point double precision FP64 and single precision FP32, respectively. We also briefly summarize a most recent research project on a more complex backward (/American / Least Squares) Monte Carlo simulation method, based on regression algorithms used to price general financial instruments with optionality. The latter method heavily relies on matrix calculations and benefits from using GPU- accelerated libraries, cuBLAS for linear algebra and cuSOLVER for solvers.

25-minute Talk Serguei Issakov - Global Head of Quantitative Research and Development, Senior Vice Pres, Numerix
S8596 - Overcoming Missing Modalities in Remote Sensing

Recent advances in earth observation are opening up a new exciting area for exploration of satellite image data. We'll teach you how to analyse this new data source with deep neural networks. Focusing on emergency response, you will learn how to apply deep neural networks for semantic segmentation on satellite imagery. We will specifically focus on multimodal segmentation and the challenge of overcoming missing modality information during inference time. It is assumed that registrants are already familiar with fundamentals of deep neural networks.

25-minute Talk Damian Borth - Director, German Research Center for Artificial Intelligence (DFKI)
Benjamin Bischke - PhD Candidate, German Research Center for Artificial Intelligence (DFKI)
S8660 - A Deep Neural Network for Estimating Depth from Stereo We present a deep neural network architecture for estimating 3D depth from stereo images. The network is modeled after computer vision stereo matching pipelines to simplify training process. Our loss function consists of a photometric loss term and Lidar based loss terms. This combination makes it possible to train our DNN in a supervised, semi-supervised and completely unsupervised way. Our DNN produces depth maps that have accuracy similar to Lidar based depth. We also compare our stereo DNN architecture to other stereo architectures as well as to a monocular depth DNN architecture. We demonstrate qualitative and quantitative test results. 50-minute Talk Nikolai Smolyanskiy - Principal Deep Learning and Computer Vision Engineer, NVIDIA
Alexey Kamenev - Senior Deep Learning and Computer Vision Engineer, NVIDIA
S8666 - Deploying Autonomous Vehicles with NVIDIA DRIVE

DRIVE PX is an open platform for Autonomous Driving Ecosystem. It’s been adopted by over 300 partners in the automotive ecosystem to develop solutions for vehicles that are intelligent and autonomous. This talk will outline the technical challenges facing development of autonomous intelligent vehicles and provide details of how the next generation of DRIVE AI car computer i.e. DRIVE Xavier and DRIVE Pegasus address these challenges.

50-minute Talk Shri Sundaram - Senior Product Manager - DRIVE PX 2, NVIDIA
S8704 - NVIDIA IndeX - Advanced Large-Scale Data Visualizations on the NVIDIA GPU Cloud (NGC)

NVIDIA IndeX incorporates NVIDIA's hardware and software technology to enable interactive high-quality 3D visual exploration and real time evaluation of computed and simulated large data for a wide range of scientific fields: NVIDIA IndeX is deployed for DGX technology and can be made available as a container on the cloud, such as AWS or NGC. With NVIDIA IndeX scientists gain unique insights into unlimited size and complexity of 3D data and NV-IndeX's in-situ solution allows scientists envisioning remarkable new data simulation and visualization workflows. We present NVIDIA IndeX's CUDA programming interface for implementing novel visualization techniques, illustrates CUDA programs that produce various high-fidelity visualizations and demonstrates large-scale data visualization on the NVIDIA GPU Cloud based on custom visualization techniques.

25-minute Talk Marc Nienhaus - Sr. Manager Software Engineering, NVIDIA IndeX, NVIDIA
Alexander Kuhn - Senior Software Engineer, NVIDIA
Christopher Lux - Senior Software Engineer, NVIDIA
S8727 - Improving NAMD Performance on Volta GPUs In 2007, NAMD was the first full-featured production molecular dynamics software to use CUDA for accelerating its costliest computations. We'll describe our latest efforts, techniques, and results in our quest to optimize NAMD to make best use of the tremendous computational capabilities of state-of-the-art Volta GPUs, particularly in new dense node configurations such as the NVIDIA DGX and ORNL Summit systems that feature NVLink-connected GPUs. In existence now for over 20 years, NAMD is a sophisticated parallel molecular dynamics program. NAMD development has emphasized parallel scalability to support large-size and long-timescale biomolecular simulations running on petascale supercomputers. As GPU technology has evolved, NAMD has benefited from moving greater amounts of work to the GPU. NVIDIA's release of Volta has now shifted the balance almost entirely to the GPU, with the small remaining CPU calculations often posing bottlenecks to NAMD's performance. Attendees will learn optimization strategies and pitfalls for achieving higher performance as Amdahl's Law poses an ever increasing challenge for mature GPU-accelerated codes like NAMD. 50-minute Talk David Hardy - Research Programmer, University of Illinois at Urbana-Champaign
Ke Li - HPC Developer Technology Engineer, NVIDIA
John Stone - Senior Research Programmer, University of Illinois at Urbana Champaign
S8782 - A Cross-Field VR Case Study to Treat Children with Autism Spectrum Disorder We build a contextualized learning system with realistic interaction for medical education. This system is intended to integrate Virtual Reality (VR) with the knowledge of occupational therapy, especially for autistic children. Our system supports a variety of scenes to facilitate training for children's confidence, adaptability and social ability. Adopting our system, the training content is no longer limited to the traditional treatment room. Clearly, therapist and children are able to save their arranging time and focus on immersive training. 25-minute Talk Huai-Sheng Huang - Assistant Professor, Fu Jen Catholic University - Department of Information Management
S8873 - GBM Inferencing on GPU We'll present a novel GPU implementation for batched GBM inferencing. We'll also present detailed performance comparison of our implementation against the state-of-the-art libraries such as XGBoost and Treelite. We'll then compare inference performance on various real-world datasets. 50-minute Talk Shankara Rao Thejaswi Nanditale - Compute Devtech Engineer, NVIDIA
Vinay Deshpande - Compute DevTech Engineer, NVIDIA
S8979 - An Introduction to CUDA Programming Session 1 of 4 (Presented by Acceleware)

Join us for an informative introduction to CUDA programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!

80 Minutes Tutorial Dan Cyca - Chief Technology Officer, Acceleware
Chris Mason - Technical Product Manager, Acceleware
S8963 - How Will Machine Learning and Artificial Intelligence Change the Practice of Healthcare

This session will give an overview of new methods that leverage machine learning and causal inference to enable reliable individualized decision-making. We will present applications in different areas of healthcare where real-time inference is changing the practice of medicine. The latter also gives rise to new challenges in developing human-machine collaborative systems.

25-minute Talk Suchi Saria - John C. Malone Assistant Professor, Johns Hopkins University
CE8126 - Connect with the Experts: Deep Learning Basics

Attend this session to get your questions on deep learning basics and concepts answered. NVIDIA experts can help you with the fundamentals and provide guidance on how and when to apply Deep Learning and GPUs to your work. No question is too basic to ask.

Connect with Experts are informal sessions where you can ask experts from NVIDIA and other organizations your burning questions about a specific subject.  

1 Hour Connect with the Experts
S81014 - Advancing State-of-the-Art of Autonomous Vehicles and Robotics Research using AWS GPU Instances (Presented by Amazon Web Services)

Toyota Research Institute's (TRI) mission is to improve the quality of human life through advances in artificial intelligence, automated driving, and robotics. Learn more about their research and how they are using AWS EC2 P3 instances, industry's most powerful GPUs instances, in combination with other AWS services to enable autonomous vehicles and robots at scale.

50-minute Talk Chetan Kapoor - Senior Product Manager - EC2, Amazon Web Services
Adrien Gaidon - California, Toyota Research Institute (TRI)
Mike Garrison - Senior Infrastructure Engineer, Toyota Research Institute (TRI)
S8216 - Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image Learn how to predict a dense depth image from a sparse set of depth measurements and a single RGB image. This approach can be applied to serve as a plug-in module in simultaneous localization and mapping to convert sparse maps to dense maps, and as a super-resolution of LiDAR depth data. We'll describe the performance of our prediction method, explain how to train the depth prediction network, and showcase examples of its applications. Codes and video demonstration are also publicly available. This session is for registrants who are already familiar with basic machine learning techniques. 25-minute Talk Fangchang Ma - Ph.D. Candidate, Massachusetts Institute of Technology
S8227 - Integrating the NVIDIA Material Definition Language MDL in Your Application The NVIDIA MDL SDK provides a rich toolset to integrate MDL in a wide range of renderers, from physically based ray tracing to real-time applications. In this tutorial-like session, we'll show how MDL materials and texturing functions can be compiled for OptiX/CUDA, x86, and OpenGL target platforms. We'll present how the MDL Distiller can be used to simplify MDL materials for use with real-time rendering solutions. Developers will learn about the available APIs and example code. 50-minute Talk Sandra Pappenguth - Senior Software Engineer, NVIDIA
Matthias Raab - Senior Graphics Software Engineer, NVIDIA
S8380 - Image Data Augmentation on GPU: One Method That Does It All Data augmentation is an effective method to boost your deep-learning training performance. There are many ways of doing this augmentation, and the ways to do so are not well established, and not all deep learning frameworks support augmentation natively. We present a method of doing data augmentation that is based on transformation matrices to perturb both space and color, in a way that is easy to use and understand, framework-agnostic, and fast (runs on GPU). This method works especially well for augmentations that need to be applied to both images and labels, typical in object detection and segmentation tasks. Image augmentation is a job that GPU's excel at, and it will significantly reduce the load, and need, for a fast CPU. 25-minute Talk Tim Zaman - Software Engineer, NVIDIA
S8504 - Creating Immersive AI-Powered Virtual Reality Simulation Training For Medical Professionals Experiential learning is among the best ways to practice for pediatric emergencies. However, hospitals are spending millions on expensive and inefficient mannequin-based training that does not consistently offer an authentic experience for med students and doctors, or offer convenient repeatability. Come hear about a groundbreaking pilot program that brought together a hospital and two unique VR and AI developer teams to deliver virtual reality training simulations for some of the most high stakes emergencies hospitals see: pediatric trauma. Learn how doctors aided in the design process to create authentic trauma room scenarios; how expert content and simulation developers crafted a VR experience that would have impact in a world where there is no room for error; and why Oculus supported this project with funding and hardware. 25-minute Talk Shauna Heller - President, North America, AiSolve
S8599 - Object Detection Meets Object Tracking We'll discuss an end-to-end solution to holistically reason about object detection and multiple object tracking. To effectively track objects for real-time decision making in an autonomous vehicle, we need to extract features from a tracked object in one frame, and localize the same object with the feature correlation in the next frame. In the proposed approach, the object detection specifies which object to track, and where to focus in the previous frame, so that the tracker can localize the same object in the current frame. The holistic model follows the spirit of one phase object detection (e.g., YOLO, SSD), allowing it to run in real time in a self-driving car. 50-minute Talk Jian Yao - Senior Machine Learning and Data Scientist, Nvidia
S8608 - A Low-Latency Inference System for Recurrent Neural Networks We'll present cellular batching, which is a new way of performing batching on GPUs to accelerate model inference for recurrent neural networks (RNNs). Existing deep learning systems perform batching by collecting a fixed set of input samples and fusing their underlying dataflow graphs together for execution. This approach does not perform well for RNNs with input-dependent dataflow graphs. We propose cellular batching, which can significantly improve both the latency and throughput of RNN inference. Cellular batching performs batching at the granularity of an RNN "cell'' -- a subgraph with shared weights -- and dynamically assembles a batched block for execution as requests join and leave the system. We show that this new way of batching can reduce the inference latency by 50 to 90 percent, while also increasing the throughput by 10 to 200 percent. 50-minute Talk Jinyang Li - Associate Professor, New York University
S8669 - Deep Learning Demystified

What is Deep Learning? In what fields is it useful? How does it relate to artificial intelligence? We'll discuss  deep learning and why this powerful new technology is getting so much attention, learn how deep neural networks are trained to perform tasks with super-human accuracy, and the challenges organizations face in adopting this new approach. We'll also cover some of the best practices, software, hardware, and training resources that many organizations are using to overcome these challenges and deliver breakthrough results.

50-minute Talk Will Ramey - Director, Developer Programs, NVIDIA
S8715 - Reinforcement Learning for Multiplayer Agents at SEED Over the last couple of years, neural nets have enabled significant breakthroughs in computer vision, voice generation and recognition, translation, and self-driving cars. Neural nets will also be a powerful enabler for future game development. We'll give an overview of the potential of neural nets in game development, as well as provide an in-depth look at how we can use neural nets combined with reinforcement learning for new types of game AI. 50-minute Talk Magnus Nordin - Technical Director, Electronic Arts / SEED
S8750 - Porting VASP to GPUs with OpenACC VASP is a software package for atomic-scale materials modeling. It's one of the most widely used codes for electronic-structure calculations and first-principles molecular dynamics. We'll give an overview and status of porting VASP to GPUs with OpenACC. Parts of VASP were previously ported to CUDA C with good speed-ups on GPUs, but also with an increase in the maintenance workload as VASP is otherwise written wholly in Fortran. We'll discuss OpenACC performance relative to CUDA, the impact of OpenACC on VASP code maintenance, and challenges encountered in the port related to management of aggregate data structures. Finally, we'll discuss possible future solutions for data management that would simplify both new development and maintenance of VASP and similar large production applications on GPUs. 50-minute Talk Markus Wetzstein - HPC DevTech Engineer, NVIDIA
Stefan Maintz - DevTech Engineer, NVIDIA
S8806 - NVIDIA vGPU and Red Hat Virtualization: High End Virtual Workstations A shared physical graphics processor unit (GPU) exposed to virtual guests as a virtual GPU drastically changes the dynamics of what is possible from both a technical and monetary standpoint in high tech virtual workstations. You are able to run lots of GPU based workloads in multiple VMs on one host utilizing NVIDIA Tesla cards. Attendees will learn about vGPU technology, Virtual Function IO (VFIO) and associated roadmaps. 25-minute Talk Andre Beausoleil - Senior Principal Partner Manager, Red Hat, Inc.
S8895 - A Component-Based AI Engine Platform for Medical Workflow

As deep learning techniques have been applied to the field of healthcare, more and more AI-based medical systems continue to come forth, which are accompanied by new heterogeneity, complexity and security risks. In the real-world we've seen this sort of situation lead to demand constraints, hindering AI applications development in China's hospitals. First, we'll share our experience in building a unified GPU accelerated AI engine system to feed component-based functionality into the existing workflow of clinical routine and medical imaging. Then, we'll demonstrate in a pipeline of integrating the different types of AI applications (detecting lung cancer, predicting childhood respiratory disease and estimating bone age) as microservice to medical station, CDSS, PACS and HIS system to support medical decision-making of local clinicians. On this basis, we'll describe the purpose of establishing an open and unified, standardized, legal cooperation framework to help AI participants to enter the market in China to build collaborative ecology.

25-minute Talk Xu Chen - Director of AI Research, Winning Health
S8416 - Real-Time Inference of Deep Generative Models on Tegra X1 at Traffic Intersections Detecting objects, whether they're pedestrians, bicyclists, or other vehicles, at a traffic intersection is essential to ensure efficient traffic flow and the safety of all participants. We'll present an experiment to assess training and real-time inference of a NVIDIA Tegra X1 SoC module with a suite of GigE Flea3 Point Grey cameras installed on a vehicle. The system is to be trained using a subset of data collected from different types of busy intersections on a university campus and testing is to be done on the remaining data. We'll use a deep generative model that can learn and reconstruct the traffic scene. We'll share our CUDA optimization strategies on the Tegra X1 and the real-time performance of the inference model. 25-minute Talk Menna El-Shaer - Doctoral Student/Researcher, The Ohio State University
S8425 - Deep Learning for Surface Reconstruction

We'll present deep learning algorithm in reconstructing the surfaces from massive data points. The deep learning consists of multiple layers in organizing the neurons for optimal neighborhood representations. The implementations are done by slicing into half the standard self organizing map (SOM) network to form multiple layers. The Z-axis distance is omitted in the computation of neighborhood distance when updating the weighted neurons to avoid surface points discontinuity due the layers depth. In this scenario, the distance determining the winning node is computed using 2D calculation from four directions. As the layers increase, the complexity computations arise, and the processing power should increase as well. Thus, we implement CUDA programming to update the weights and distance of the winning node. Reduction techniques are implemented to obtain the smallest distance for the winning node. For weight updating process, each thread is given several nodes to calculate the distance between the winning node and the current node. Two parts are involved in designing and developing the algorithms: point reduction and point optimization for surface reconstruction.

50-minute Talk Siti Mariyam Shamsuddin - Director, UTM Big Data Centre, Universiti Teknologi Malaysia
Shafaatunnur Hasan - Senior Lecturer and GPU Principal Research, UTM Big Data Centre, Universiti Teknologi Malaysia
S8476 - Accelerating Graph Algorithms for Government and Industry We'll discuss our efforts regarding the acceleration of large-scale graph algorithms in the context of projects funded by various government agencies. Graph methods are key kernels for large-scale data analytics, as well as for several exascale application domains, including smart grids, computational biology, computational chemistry, and climate science. We'll present our latest results on distributed implementations employing GPUs and accelerators of graph kernels, such as community detection and B-matching, showing how we can tackle large-scale problems with heterogeneous supercomputers. On the basis of the experience and results in optimizing these algorithms for high performance computing platforms, we'll then discuss new requirements, upcoming opportunities, and potential solution for next-generation, high-performance, integrated graph toolkits. 50-minute Talk Antonino Tumeo - Senior Research Scientist, Pacific Northwest National Laboratory
Mahantesh Halappanavar - Senior Research Scientist, Pacific Northwest National Laboratory
S8528 - Accelerating Bioinformatics: End-to-End Computation of NASA GeneLab Data with GPU Data Frame

Protecting crew health is a critical concern for NASA in preparation of long duration, deep-space missions like Mars. Spaceflight is known to affect immune cells. Splenic B-cells decrease during spaceflight and in ground-based physiological models. The key technical innovation presented by our work is end-to-end computation on the GPU with the GPU Data Frame (GDF), running on the DGXStation, to accelerate the integration of immunoglobulin gene-segments, junctional regions, and modifications that contribute to cellular specificity and diversity. Study results are applicable to understanding processes that induce immunosuppression—like cancer therapy, AIDS, and stressful environments here on earth.

25-minute Talk Venkat Krishnamurthy - Head, Product Management, MapD Technologies
Jacci Cenci - Solutions Architect (Partner SA), NVIDIA
S8580 - Modernizing OpenMP for an Accelerated World OpenMP has come a long way in its first 20 years, but the last few have brought by far the most change. With accelerated computing on the rise, OpenMP integrated features to address distributed memory devices and offloading to accelerators. Now, as we prepare for the next generation of supercomputers and GPUs, OpenMP is growing to meet the challenges of productively programming scientific applications in a world of accelerators, unified memory, and explicitly hierarchical memories. This talk will discuss the present and future of OpenMP as we ramp up to version 5.0, presenting some of the new features incorporated so far and how they are shaped by and in turn how they shape large scale scientific applications. 25-minute Talk Bronis de Supinski - Chief Technology Officer for Livermore Computing, Lawrence Livermore National Laboratory
Tom Scogland - Computer Scientist, Lawrence Livermore National Laboratory
S8638 - Make Yield Curve Construction More Intelligent with GPU

The yield curve provides the information of bonds' returns of various maturities, and reflects extremely complex market interactions and monetary policy. The yield curve constructing models, such as the Spline Fitting Model, use a number of bond sample points and model parameters to deduce the yield curve. It involves repeated experiments by choosing appropriate bond samples which rely highly on manual operation. Due to the amount of relevant information and rapid growth of transaction data, this task becomes even more challenging. Some literatures show that deep learning can detect and exploit interactions in the data that are, invisible to any existing financial economic theory. By discovering latent patterns in historical data, it can be a good supplement for choosing active samples and assessing curve's quality. In financial applications, accuracy and speed are both of critical importance. The GPU is applied to both deep learning framework and yield curve construction. Intelligent, fast and accurate, our yield curve construction framework achieves 5x speed up vs manual operation, and provides a feasible way for future practice.

25-minute Talk Joe Zhang - Project Manager, Shanghai Clearing House
S8976 - Create Customer Value with Google Cloud AI (Presented by Google)

In this session, you will learn how Google Cloud helps enterprises make the most out of data, and deliver customer value. We will provide an in-depth overview of the Cloud AI and Data Analytics offering that helps enterprises manage their ML lifecycle, from data ingestion to insights and prediction. We will also demonstrate some breakthrough solutions, like AutoML, that are making ML accessible to everyone.

50-minute Talk Chris Kleban - Product Manager, GPUs on Google Cloud, Google Inc.
S8980 - An Introduction to the GPU Memory Model - Session 2 of 4 (Presented by Acceleware)

Explore the memory model of the GPU! This session will begin with an essential overview of the GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. Features such as shared memory configurations and Read-Only Data Cache are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!

80 Minutes Tutorial Dan Cyca - Chief Technology Officer, Acceleware
Chris Mason - Technical Product Manager, Acceleware
S8273 - Programming GPU Supercomputers Ten Years From Now We'll briefly review how programming for GPU computing has progressed over the past ten years, and where it is going over the next ten years, specifically for data management and parallel compute management. CUDA languages expose all aspects of data and compute management, allowing and sometimes requiring programmers to take control of both. Libraries typically internalize all compute management, and some internalize all data management as well. Directives virtualize both data and compute management, but don't completely hide either. Future hardware and software capabilities will allow programs to enjoy automatic data movement between DDR memory and GPU device memory, and enhanced caching hardware reduces the need for explicit scratchpad memory programming. As parallel constructs are added to standard programming languages, writing parallel programs for GPU computing will become no more or less difficult than multicore programming. 25-minute Talk Michael Wolfe - Compiler Engineer, NVIDIA / PGI
S8314 - Multi GPU Programming with MPI Learn how to program multi-GPU systems or GPU clusters using the message passing interface (MPI) and OpenACC or NVIDIA CUDA. We'll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we'll cover advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. We'll also cover the latest improvements with CUDA-aware MPI, interaction with unified memory, the multi-process service (MPS, aka Hyper-Q for MPI), and MPI support in NVIDIA performance analysis tools. 50-minute Talk Jiri Kraus - Senior Devtech Compute, NVIDIA
S8524 - Self-Driving Everything Data fuels so much of our lives. It accelerates our conversations, our decisions, our very ideas. And in the physical world, data is already acting as an accelerant to how we take these ideas and make them real. As the things we make become increasingly connected, our world becomes increasingly computable. And anything that becomes easily computable, becomes equally mutable. What does this mean for the world we live in? As we let go of our design tools and hand more control to intelligent algorithms, we'll see this reflected in the real world: the world of self-driving everything. 25-minute Talk Radha Mistry - Story Strategist, Autodesk
S8532 - Cascaded 3D Fully Convolutional Networks for Medical Image Segmentation

We'll show how recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. FCNs can be trained to automatically segment 3D medical images, such as computed tomography (CT) scans based on manually annotated anatomies like organs and vessels. The presented methods achieve competitive segmentation results while avoiding the need for handcrafting features or training class-specific models, in a clinical setting. We'll explain a two-stage, coarse-to-fine approach that will first use a 3D FCN based on the 3D U-Net architecture to roughly define a candidate region. This candidate region will then serve as input to a second 3D FCN to do a fine prediction. This cascaded approach reduces the number of voxels the second FCN has to classify to around 10 percent of the original 3D medical image, and therefore allows it to focus on more detailed segmentation of the organs and vessels. Our experiments will illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results on many datasets. Code and trained models will be made available.

25-minute Talk Holger Roth - Assistant Professor (Research), Nagoya University
S8561 - "Free" In Situ Volume Compression Using NVENC Scientific simulations typically store just a small fraction of their computed timesteps--as few as one in 500--due to I/O and storage limitations. Previous work has demonstrated in situ software-based compression, but at the cost of compute cycles that simulation scientists are loath to sacrifice. We propose the use of the special-purpose video processing unit (VPU), currently unutilized in the HPC context, for inexpensive lossy encoding. Our work demonstrates that video encoding quality is suitable for volumes and recommends encoder settings. We'll show that data can be encoded in parallel with a hybrid CPU/GPU computation with minimal impact on run time. We'll also demonstrate that decoding is fast enough for on-the-fly decompression during analysis. 25-minute Talk Nick Leaf - Graduate Student Researcher, University of California, Davis
S8601 - NVIDIA GPU Video Technologies and Video Codec SDK: Updates and Roadmap NVIDIA's video SDK is a set of APIs for hardware-accelerated video encoding and decoding using NVIDIA GPUs. We'll provide an overview of the APIs, with particular emphasis on the latest features, such as FFmpeg support of NVIDIA-accelerated transcoding, quality and performance enhancements. We'll discuss some strategies on efficient usage of GPU video hardware acceleration for use cases such as video inferencing, transcoding, and media archiving. 50-minute Talk Abhjit Patait - Director, System Software, NVIDIA
S8796 - Deep Neural Network-Based Cooperative Visual Tracking Through Multiple Flying Robots Human and animal full-body motion capture (MoCap) in outdoor scenarios is a challenging and largely unsolved problem. We'll introduce a multiple flying robots-based solution for it. MoCap systems like Vicon, Optitrack, and the 4D Dynamic Body Scanner at MPI-IS Tuebingen achieve high degrees of accuracy in indoor settings. Besides being bulky, they make use of reflected infrared light and heavily rely on precisely calibrated wall or ceiling-mounted fixed cameras. Consequently, such systems cannot be used to perform MoCap in outdoor scenarios where changing ambient light conditions persist and permanent fixtures in the environment cannot be made. Our outdoor MoCap solution involves flying robots with on-board cameras, Intel i7 CPUs, NVIDIA Jetson TX1 GPU modules, and a deep learning-based approach. 50-minute Talk Aamir Ahmad - Research Scientist, Max Planck Institute for Intelligent Systems
Eric Price - PhD Student, Max Planck Institute for Intelligent Systems
S8177 - Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices We'll present Hornet, formerly known as cuSTINGER, a data structure designed for sparse dynamic graphs and matrices. Hornet scales to massive datasets while supporting very fast updates, over 200 million updates per second on a single Tesla P100 GPU. We'll show that replacing CSR, a popular data structure for sparse data, with Hornet does not change the execution time. We'll also show that the memory utilization of Hornet is within that of CSR and COO, and briefly show performance results of several analytics using Hornet. We'll cover the programming model for Hornet in a separate talk. 25-minute Talk Oded Green - Research Scientist, Georgia Institute of Technology
Remove From Schedule Add To Schedule Are you sure you would like to Delete this personal time? Edit My Schedule Edit Personal Time This session is full. Would you like to be added to the waiting list? Would you like to remove "{0}" from your schedule? Would you like to add "{0}" from your schedule? Sorry, this session is full. Waitlist Available Sorry, this session and it's waiting list are completely full. Sessions Available Adding this multi-day session automatically enrolls you for all times shown below. Removing this multi-day session automatically removes you for all times shown below. Adding this multi-day session automatically enrolls you for all session times for this session. Removing this multi-day session automatically removes you for all session times for this session. Click to view details Favorites Hide Interests Search Sessions Export Schedule There is a scheduling conflict. You cannot add this session to your schedule because you are participating in another session at this time. Schedule Conflict. An error occurred while processing this request.. Adding this item creates a conflict with another session on your schedule. Remove from Waiting List Add to waiting list Removing this will remove you from the waiting list for all session times for this session Adding this will add you to the waiting list for all session times for this session. You have nothing scheduled Tap below to see a list of sessions and activities that are available to add to your schedule this week Choose from the list of sessions to the left to add to your schedule for the day Add a Session
Get More Results