No
Yes
View More
View Less
Working...
Close
OK
Cancel
Confirm
System Message
Delete
Schedule
An unknown error has occurred and your request could not be completed. Please contact support.
Scheduled
Wait Listed
Personal Calendar
Speaking
Conference Event
Meeting
Interest
There aren't any available sessions at this time.
Conflict Found
This session is already scheduled at another time. Would you like to...
Loading...
Please enter a maximum of {0} characters.
{0} remaining of {1} character maximum.
Please enter a maximum of {0} words.
{0} remaining of {1} word maximum.
must be 50 characters or less.
must be 40 characters or less.
Session Summary
We were unable to load the map image.
This has not yet been assigned to a map.
Search Catalog
Reply
Replies ()
Search
New Post
Microblog
Microblog Thread
Post Reply
Post
Your session timed out.
This web page is not optimized for viewing on a mobile device. Visit this site in a desktop browser to access the full set of features.
GTC 2018 Silicon Valley
Add to My Interests
Remove from My Interests
New sessions added every week. Be sure to check back often for the latest additions.

S8318 - 3D Convolutional Neural Networks (CNNs) with Fast and Memory Efficient Cross-Hair Filters Over the years, state-of-the-art architectures have been built with convolutional layers and have been employed successfully on 2D image processing and classification tasks. This success naturally appeals for the extension of the 2D convolutional layers to 3D convolutional layers to handle higher dimensional tasks in the form of video and 3D volume processing. However, this extension comes with an exponential increase in the number of computations and parameters in each convolutional layer. Due to these problems, 2D convolutional layers are still widely used to handle 3D images, which suffer from 3D context information. In view of this, we'll present a 3D fully convolutional neural network (FCNN) with 2D orthogonal cross-hair filters that makes use of 3D context information, avoiding the exponential scaling described above. By replacing 3D filters with 2D orthogonal cross-hair filters, we achieve over 20% improvement in execution time and 40% reduction in the overall number of parameters while accuracy is preserved. 25-minute Talk Giles Tetteh - Researcher, Technische Universitaet Muenchen (TUM)
S8420 - Accelerated Functional Mapping of World with NVIDIA GPUs and Deep Learning The functional mapping of man-made facilities from high-resolution remote sensing images provides timely high-fidelity land-use information and population distribution estimates, which facilitates federal, non-governmental agency and industrial expansion efficiency. We'll share our journey of our quest to deliver functional maps of the world that include building extraction, human settlement maps, mobile home parks, and facility mapping using a variety of remote sensing imagery. Our research addresses three frontier challenges: Distinct characteristics of remote sensing data for deep learning, including the model distribution shifts encountered with remote sensing images, multisensor sources, and data multi modalities; training very large deep learning models using on multi-GPU and multi-node HPC platforms; large-scale inference using ORNL's Titan and Summit with NVIDIA TensorRT. We'll also talk about developing workflows to minimize I/O inefficiency, doing parallel gradient-descent learning, and managing remote sensing data in HPC environment. 50-minute Talk Dalton Lunga - Geospatial Research Scientist, Oak Ridge National Laboratory
Christopher Layton - Linux Systems Engineer, Oak Ridge National Laboratory
H. Lexie Yang - Geospatial Data Scientist, Oak Ridge National Laboratory
S8464 - Accelerating Analytical Database Queries Analytical query processors can benefit from GPU execution with an acceleration of popular database operations up to 30 times. However, the necessary data transfer between CPU and GPU memory is still a challenge. We'll highlight different strategies for exchanging data between GPU and CPU, and discuss the implications on the overall query execution performance. 25-minute Talk Christian Tinnefeld - Research Manager, SAP
S8476 - Accelerating Graph Algorithms for Government and Industry We'll discuss our efforts regarding the acceleration of large-scale graph algorithms in the context of projects funded by various government agencies. Graph methods are key kernels for large-scale data analytics, as well as for several exascale application domains, including smart grids, computational biology, computational chemistry, and climate science. We'll present our latest results on distributed implementations employing GPUs and accelerators of graph kernels, such as community detection and B-matching, showing how we can tackle large-scale problems with heterogeneous supercomputers. On the basis of the experience and results in optimizing these algorithms for high performance computing platforms, we'll then discuss new requirements, upcoming opportunities, and potential solution for next-generation, high-performance, integrated graph toolkits. 50-minute Talk Antonino Tumeo - Senior Research Scientist, Pacific Northwest National Laboratory
Mahantesh Halappanavar - Senior Research Scientist, Pacific Northwest National Laboratory
S8296 - Accelerating Object Detection with TensorRT for Large-Scale Video Surveillance We'll discuss a detailed scale-up method for accelerating deep learning-based object detection inference engine with INT8 by using NVIDIA's TensorRT. Previously, converting convolutional neural networks (CNNs) from 32-bit floating-point arithmetic (FP32) to 8-bit integer (INT8) for classification tasks has been researched. However, there is no solid work for accelerating CNN-based object detection tasks. We'll explain how to accelerate YOLO-v2, the state-of-the-art CNN-based object detector with TensorRT using INT8. We improved YOLO-v2 network for better acceleration and more accurate for surveillance and named our network SIDNet. We verified SIDNet on several benchmark object detection and intrusion detection datasets and confirmed that SIDNet with INT8 has only 1% accuracy drop compared with FP32 mode and is 5x faster than the original YOLO-v2 on NVIDIA Tesla P40. 25-minute Talk Shounan An - Deep Learning & Machine Vision Engineer, SK Telecom
S8270 - Acceleration of a LLNL Production Fortran Application on SIERRA Supercomputer The U.S. Department of Energy's (DOE) stockpile stewardship mission relies heavily on petascale simulations that have traditionally run on homogeneous architecture supercomputers. The DOE and Lawrence Livermore National Lab's newest computer, SIERRA, which is scheduled to be the second most powerful supercomputer in the nation, is being installed and employs a heterogeneous architecture leveraging both IBM Power9 CPUs and NVIDIA Volta GPUs. This talk presents performance results for Teton, a mission-critical radiative transport application, as it is re-engineered to leverage heterogeneous computing platforms. The data structure and algorithm optimizations necessary to increase thread level parallelism 1,000 times and achieve GPU, CPU, and network concurrency will also be discussed. 25-minute Talk Aaron Black - Computer Scientist, Lawrence Livermore National Laboratory
S8434 - Acceleration of HPC Applications on Hybrid CPU-GPU Systems: When Can Multi-Process Service Help? When does the Multi-Process Service (MPS) improve performance of HPC codes? We aim to answer this question by exploring the effectiveness of MPS in a number of HPC applications combining distributed and shared memory parallel models. A single complex application typically includes stages with limited degree of parallelism where CPU cores are more effective than GPUs, and highly parallelizable stages acceleratable by offloading to the GPUs. MPS allows offloading computation from a number of processes to the same GPU, and, as a result, more CPU cores per node can tackle tasks characterized by limited shared memory parallelism. We demonstrate effectiveness of MPS in large-scale simulations on the IBM Minsky and Witherspoon nodes with two multi-core POWER CPUs combined with 4-6 NVIDIA GPUs. 25-minute Talk Olga Pearce - Computer scientist, Lawrence Livermore National Laboratory
S8352 - Affective Categorization Using Contactless-Based Accelerometers We'll cover the four known methods for emotion detection: vision, speech, sentiment analysis, and wearable technology. We'll provide a quick dive through each presented solution, and then introduce a novel approach aimed for the future of autonomous vehicles. 25-minute Talk Refael Shamir - Founder and CEO, Letos
S8608 - A Low-Latency Inference System for Recurrent Neural Networks We'll present cellular batching, which is a new way of performing batching on GPUs to accelerate model inference for recurrent neural networks (RNNs). Existing deep learning systems perform batching by collecting a fixed set of input samples and fusing their underlying dataflow graphs together for execution. This approach does not perform well for RNNs with input-dependent dataflow graphs. We propose cellular batching, which can significantly improve both the latency and throughput of RNN inference. Cellular batching performs batching at the granularity of an RNN "cell'' -- a subgraph with shared weights -- and dynamically assembles a batched block for execution as requests join and leave the system. We show that this new way of batching can reduce the inference latency by 50 to 90 percent, while also increasing the throughput by 10 to 200 percent. 50-minute Talk Jinyang Li - Associate Professor, New York University
S8656 - Analyzing Sequences of Time Series Security Data with Recurrent Residual Networks Analyzing time series data from security controls for signs of malicious activity is a common challenge in financial networks. We show how one tool, a recurrent residual deep learning (DL) model, can be used to rapidly analyze variable-length time series data to achieve meaningful analysis. Recurrent networks have long been a popular choice in DL for analyzing data with multiple time-steps where the meaning of data at one point in time is dependent upon data at other time-steps. For example, natural language processing solutions frequently utilize recurrent DL models to achieve state-of-the-art results in classification tasks. However, recurrent models are often plagued by issues concerning training difficulty as a function of the model depth. These issues are often exacerbated by the desire to create very deep models for particularly difficult tasks. Utilizing the ResNet concept developed by Microsoft research applied to a recurrent model, we show how models analyzing large sequences can achieve state-of-the-art results with fewer parameters and faster training times. 50-minute Talk Leon DeFrance - VP, Security R&D, US Bank
Ivko Cvejic - Data Scientist, US Bank
S8517 - Applying AI to Simplify Support- Lessons Learnt We'll provide insights into how customer support built on the foundation of AI can help streamline customer support for large enterprises, especially manufacturers. With AI technologies like image recognition and natural language processing maturing, enterprises should strongly consider building an AI-based support platform, especially those with an omni-channel strategy. Delivering an amazing and differentiated user experience will lead to higher net promoter and customer satisfaction scores. By employing AI-based technologies, enterprises can reduce their contacts, consequently reducing their cost and contact. It will also help them sell more replacement parts online. 25-minute Talk Satish Mandalika - CEO & Co Founder, Drishyam.ai
S8266 - AstroAccelerate - GPU-Accelerated Signal Processing for Next Generation Radio Telescopes AstroAccelerate is a GPU-enabled software package that focuses on enabling real-time processing of time-domain radio-astronomy data. It uses the CUDA programming language for NVIDIA GPUs. The massive computational power of modern day GPUs allows the code to perform algorithms such as de-dispersion, single pulse searching, and Fourier domain acceleration searching in real time on very large datasets, which are comparable to those that will be produced by next-generation radio telescopes such as the Square Kilometre Array. 50-minute Talk Wes Armour - Director, OeRC, Department of Engineering Science, University of Oxford
S8429 - At the Intersection of AI Cities and ANNs Artificial intelligence has promised a lot. Now decades old, It's still hard to tell the theoretical from the practical. Smart cities are a clear example of where AI technology is solving real world problems today. Public transportation, smart grids, and data focused city planning are all being pushed forward, but more visionary goals like safety, walkability and improved citizens experience remain elusive. Intersections have long been as much an opportunity as a challenge with complicated traffic pattern, cars interacting in multiple directions and at varied velocities. Add pedestrians, and the ability to quantify the complex interaction is only possible through advanced technology. At Motionloft we've committed to digitized the physical world. To bring the most meaningful data possible, we realized that we couldn't rely on the marketplace for sensors. Since there is no path to accurate data without Artificial Neural Network and no weather proof Computer Vision, we endeavored to develop our own heuristics using all tools available. This session describes variety of use cases this allowed, with particular focus on traffic intersections. 50-minute Talk Paul McAlpine - VP OF ENGINEERING, MOTIONLOFT
S8525 - Automated Segmentation of Suspicious Breast Masses from Ultrasound Images Learn how to apply deep learning for detecting and segmenting suspicious breast masses from ultrasound images. Ultrasound images are challenging to work with due to the lack of standardization of image formation. Learn the appropriate data augmentation techniques, which do not violate the physics of ultrasound imaging. Explore the possibilities of using raw ultrasound data to increase performance. Ultrasound images collected from two different commercial machines are used to train an algorithm to segment suspicious breast with a mean dice coefficient of 0.82. The algorithm is shown to perform at par with conventional seeded algorithm. However, a drastic reduction in computation time is observed enabling real-time segmentation and detection of breast masses. 25-minute Talk Viksit Kumar - Senior Research Fellow, Mayo Clinic College of Medicine and Science
S8636 - AUTOWARE on DRIVE PX: The Open-Source Self-Driving Platform We'll present a complete open-source software stack for self-driving vehicles, called Autoware, and its open integration with the NVIDIA DRIVE PX platform. Autoware implements working modules of localization and 3D mapping with LiDAR and GNSS, object detection and traffic light recognition with deep learning, path planning with lattice and search methods, and vehicle dynamics control. Compute-intensive tasks of these modules are accelerated by using CUDA, and timing-aware tasks are protected by RTOS capabilities. We'll discuss the impact of CUDA acceleration on self-driving vehicles and its performance evaluation. Learn how Autoware enables any by-wire vehicles to become high-quality self-driving vehicles that can operate in real-world environments. 50-minute Talk Shinpei Kato - CTO, Tier IV, Inc.
S8115 - BigQuery and TensorFlow: Data Warehouse + Machine Learning Enables the "Smart" Query BigQuery is Google's fully managed, petabyte-scale data warehouse. It's user-defined function realizes "smart" queries with the power of machine learning, such as similarity search or recommendation on images or documents with feature vectors and neural network prediction. We'll see how TensorFlow and its GPU-accelerated training environment enables a powerful "data warehouse + machine learning" solution. 25-minute Talk Kaz Sato - Developer Advocate, Google Cloud, Google
S8723 - BMW VR Experience and Design Visualization with Substance BMW Design Visualisation will present how Allegorithmic's Substance Software is being used to create proper material and texture's content for their hyper-realistic real-time VR experiences. Substance PBR material description format has allowed them to leverage a unique library of materials across all their VR tools and workflow. We'll also cover car staging within an adapted environment, a key step for better VR immersion. 50-minute Talk Gareth Rogers - Head of Design Visualisation , BMW
Pierre Maheut - Product Manager for Architecture and Industrial Design, Allegorithmic
S8698 - Boosting Depth Fusion for Mixed Reality with NVIDIA Quadro GPUs We'll discuss how a headset-mounted depth camera can be used to enable real-time scene reconstruction for immersive mixed reality applications using NVIDIA Quadro GPUs. We'll present and benchmark optimized CUDA kernels that squeeze the last bit of performance from an NVIDIA GP100. Furthermore, we'll show how knowledge of the headset's position and orientation in space can be leveraged to improve and make more robust the reconstruction process. 50-minute Talk Sven Middelberg - Developer Technology Engineer, NVIDIA
S8417 - Breaking the Speed of Interconnect with Compression for Database Applications Learn strategies for efficiently employing various cascaded compression algorithms on the GPU. Many database input fields are amenable to compression since they have repeating or gradually increasing pattern, such as dates and quantities. Fast implementations of decompression algorithms such as RLE-Delta will be presented. By utilizing compression, we can achieve 10 times greater effective read bandwidth than the interconnect allows for raw data transfers. However, I/O bottlenecks still play a big role in the overall performance and data has to be moved efficiently in and out of the GPU to ensure optimal decompression rate. After a deep dive into the implementation, we'll show a real-world example of how BlazingDB leverages these compression strategies to accelerate database operations. 50-minute Talk Nikolay Sakharnykh - Sr. Developer Technology Engineer, NVIDIA Corporation
Felipe Aramburu - CTO, Blazing DB
S8130 - Building a GPU-Accelerated Short-Read Aligner for Bisulfite-Treated DNA Sequences It is not always easy to accelerate a complex serial algorithm with CUDA parallelization. A case in point is that of aligning bisulfite-treated DNA (bsDNA) sequences to a reference genome. A simple CUDA adaptation of a CPU-based implementation can improve the speed of this particular kind of sequence alignment, but it's possible to achieve order-of-magnitude improvements in throughput by organizing the implementation so as to ensure that the most compute-intensive parts of the algorithm execute on GPU threads. 25-minute Talk Richard Wilton - Associate Research Scientist, Johns Hopkins University
S8310 - Can FPGAs Compete with GPUs? Previously, FPGAs were known to be highly energy efficient, but notoriously difficult to program, and unsuitable for complex HPC applications. This is changing due to new technology developments: a high-level programming language (OpenCL), hard floating-point units, and tight integration with CPU cores. We'll compare FPGAs and GPUs with respect to architecture, programming model, programming effort, performance, and energy efficiency, using some radio-astronomical signal-processing and imaging algorithms as examples. Can they compete with GPUs? 25-minute Talk John W. Romein - Senior Researcher, ASTRON (Netherlands Institute for Radio Astronomy)
S8458 - Capture Sparsity in DL Applications We'll present a new technique for improving efficiency of inference and training in Deep Learning in the presence of sparse workloads. We'll start with a brief overview of applications of sparse linear algebra in engineering and data analysis. Then analyze the presence of sparsity in both the training and inference phases of Deep Learning. To exploit this sparsity, we present our method of improving memory locality of sparse applications. We'll establish lower and upper bounds for sparse matrix operations and crossover with dense matrix operations. We will demonstrate how to minimize memory traffic by tiling matrix operations, efficient use of L2, L1, and SMEM. We will conclude with performance comparison of our method with existing techniques on some real pruned weight matrices from GoogLeNet and OpenNMT's multiway translation network. This is joint work of Michael Frumkin, Jeff Pool and Lung Sheng Chien. 25-minute Talk Michael Frumkin - Sr. Compute Architect, NVIDIA
S8532 - Cascaded 3D Fully Convolutional Networks for Medical Image Segmentation We'll show how recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. FCNs can be trained to automatically segment 3D medical images, such as computed tomography (CT) scans based on manually annotated anatomies like organs and vessels. The presented methods achieve competitive segmentation results while avoiding the need for handcrafting features or training class-specific models, in a clinical setting. We'll explain a two-stage, coarse-to-fine approach that will first use a 3D FCN based on the 3D U-Net architecture to roughly define a candidate region. This candidate region will then serve as input to a second 3D FCN to do a fine prediction. This cascaded approach reduces the number of voxels the second FCN has to classify to around 10 percent of the original 3D medical image, and therefore allows it to focus on more detailed segmentation of the organs and vessels. Our experiments will illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results on many datasets. Code and trained models will be made available. 25-minute Talk Holger Roth - Assistant Professor (Research), Nagoya University
S8433 - Challenges in Real-Time Rendering and Software Design for Interactive Immersive Visualization In the field of virtual engineering and design, countless application scenarios for interactive visualization exist. Huge diversity in the kind of data that needs to be handled -- construction data straight out of CAD solutions, results obtained through structural mechanics simulation or fluid dynamics data -- intersects with an ever increasing number of use cases ranging from engineering reviews, exploratory simulation for digital twin or HybridTwin (TM) up to physically based high-quality rendering. Virtual reality's full potential as an environment for collaboration, communication, and decision taking is enabled today by a complex, heterogeneous hardware landscape with output devices as diverse as HMDs, CAVEs, or even mobile streaming clients. We'll talk about how these challenges have been addressed in the design and implementation of the Helios rendering architecture, which serves as the underlying visualization engine in various ESI products and projects. First, we'll have a closer look at the structure and inner workings of Helios, before we demonstrate the benefits of ESI's Helios visualization system through practical examples. 50-minute Talk Jan Wurster - Solution & Technology Expert, ESI Group
Andreas Dietrich - Senior Software Developer, ESI Group
S8253 - Computational Zoom: A Framework to Manipulate Image Composition in Post-Capture Telling the right story with a picture requires the ability to create the right composition. Two critical parameters controlling composition are the camera position and the focal length of the lens. The traditional paradigm to capture a picture is for a photographer to mentally visualize the desired result, select the capture parameters to produce it, and finally take the photograph, thus committing to a particular composition. To break this paradigm, we introduce computational zoom, a framework that allows a photographer to manipulate several aspects of composition in post-capture. Our approach also defines a multi-perspective camera that can generate compositions that are not attainable with a physical lens. Our framework requires a high-quality estimation of the scene's depth. Existing methods to estimate 3D information generally fail to produce dense maps, or sacrifice depth uncertainty to avoid missing estimates. We propose a novel GPU-based depth estimation technique that outperforms the state of the art in terms of quality, while ensuring that each pixel is associated with a depth value. 25-minute Talk Orazio Gallo - Sr. Research Scientist, NVIDIA
S8538 - Connected Automated Driving: Overview, Design, and Technical Challenges We'll discuss the important emerging field of connected automated driving, including technical and policy topics in this area. We'll provide background on vehicular safety communications and current deployments in various parts of the world. Vehicular communication will enable sensor data sharing between vehicles, which could be the key for achieving higher levels of automation. Novel artificial intelligence techniques exploiting sensor data (camera, radar, GPS etc.) from neighboring cars can be used for designing perception and mapping functionalities for automated vehicles. We'll discuss results from field testing and show advantages of connected automated driving. 50-minute Talk Gaurav Bansal - Principal Researcher, Toyota InfoTechnology Center, USA
S8422 - Cooperative Groups and Domain Decomposition for Explicit PDE Solvers Discover the swept rule, a new technique for communication-avoiding domain decomposition for explicit PDE solvers. We'll build on previous work exploring the effects of memory hierarchy and launch configuration on implicitly synchronized, 1-D solver performance using swept and naive decomposition schemes by (1) extending the swept scheme to two dimensions, and (2) comparing the performance of implicit synchronization to in-kernel grid synchronization enabled by the cooperative group interface introduced in CUDA 9. We'll provide examples that use the wave equation on the readily extensible, open-source software developed for this project. 50-minute Talk Daniel Magee - Graduate Research Assistant, Oregon State University
S8680 - Creating Immersive Audio Effects in Games and Applications Using VRWorks Audio We'll provide a tutorial on how to enable immersive audio effects in games using NVIDIA VRWorks Audio game engine plugins. We'll also do a walk-through of VRWorks Audio SDK C API and how it can be used to simulate acoustics for architectural and pro use cases. We'll talk about cover usage of VRWorks Audio within Carbon game engine and Holodeck. 50-minute Talk Ambrish Dantrey - Manager, System Software, NVIDIA
S8278 - CUDA [Version] and Beyond CUDA is NVIDIA's parallel computing platform and programming model. You'll learn about new programming model enhancements and performance improvements in the latest release of CUDA, preview upcoming GPU programming technology, and gain insight into the philosophy driving the development of CUDA and how it will take advantage of current and future GPUs. You'll also learn about NVIDIA's vision for CUDA and the challenges for the future of parallel software development. 50-minute Talk Mark Harris - Chief Technologist, NVIDIA
S8412 - Deep Imaging: Quantitative Biomarkers for Clinical Decision Making The transformation towards value-based healthcare needs inventive ways to lower cost and increase patient health outcomes. Artificial intelligence is vital for realizing value-based care. Turning medical images into biomarkers helps to increase effectiveness of care. 50-minute Talk Joerg Aumueller - Global Product Management Artificial Intelligence, Siemens Healthineers
S8224 - Deep Learning at BMW: Robust AI in the Production Chain We'll present new solutions in deploying robust and efficient deep learning approaches in production chain and logistics. For process automation in industrial environments, highly accurate and error-resistant models are needed. To reach this level of confidence for error-prone tasks, such as object detection or pose estimation, we developed an efficient NVIDIA-based development pipeline: From tools for fast (semi-automated) data labeling over concurrent detection models, to deep reinforcement solutions for adaptive online learning. 50-minute Talk Markus Bonisch - General Manager, BMW Group
Norman Muller - Ph.D. Candidate in Artificial Intelligence, BMW Group
S8258 - Deep Learning Autonomous Driving Simulation Realistic automotive simulation platforms, where virtual cars travel virtual roads in virtual cities in remarkably true-to-life conditions, will be a vital part of developing and testing autonomous vehicles. The technology behind the Cognata simulation engine is heavily based on deep learning, computer vision, and other advanced AI methods. We'll present a cloud-based simulation engine, and discuss how it works and how to develop with it. 25-minute Talk Danny Atsmon - CEO, Cognata
S8140 - Deep Learning for Automated Systems: From the Warehouse to the Road Learn about our application of deep learning techniques for perception systems in autonomous driving, reinforcement learning for autonomous systems, label detection in warehouse inventory management, and undergraduate engagement in this research. In collaboration with Clemson University's International Center for Automotive Research, we've developed a perception module that processes camera inputs to provide environmental information for use by a planning module to actively control the autonomous vehicle. We're extending this work to include an unsupervised planning module for navigation with reinforcement learning. We've also applied these techniques to automate the job of warehouse inventory management using a deep neural network running on a mobile, embedded platform to automatically detect and scan labels and report inventory, including its location in the warehouse. Finally, we'll discuss how we involve undergraduate students in this research. 50-minute Talk Melissa Smith - Associate Professor, Clemson University
S8242 - Deep Learning for Computational Science We'll review our study of the use of artificial intelligence to augment various domains of computational science in order to improve time to solution for various HPC problems. We'll discuss the current state-of-the-art approaches and performance gains where applicable. We'll also investigate current barriers to adoption and consider possible solutions. 25-minute Talk Jeff Adie - Principal Solutions Architect, NVIDIA
S8626 - Deep Learning for Driver State Sensing We'll explore how deep learning approaches can be used for perceiving and interpreting the driver's state and behavior during manual, semi-autonomous, and fully-autonomous driving. We'll cover how convolutional, recurrent, and generative neural networks can be used for applications of glance classification, face recognition, cognitive load estimation, emotion recognition, drowsiness detection, body pose estimation, natural language processing, and activity recognition in a mixture of audio and video data. 50-minute Talk Lex Fridman - Research Scientist, MIT
S8222 - Deep Learning for Heliophysics NASA's heliophysics division operates a fleet of spacecraft, the so-called Heliophysics System Observatory, to monitor the Sun's activity and how its changes drive space weather in interplanetary space and in the near-Earth environment. We'll present case studies of how a number of challenging problems encountered in heliophysics can be tackled using deep learning: spectropolarimetric inversions for measuring the magnetic field on the solar surface, and mega-Kelvin thermometry of the Sun's corona by using a deep neural network to solve a compressed sensing problem. These low-cost solutions make possible new concepts for deep space missions for space weather monitoring. Some of the work in this presentation was made possible by NASA's Frontier Development Lab, a public-private partnership between the agency and industry partners (including the SETI Institute, NVIDIA, IBM, Intel, kx & Lockheed Martin), whose mission is to use artificial intelligence to tackle problems related to planetary defense and heliophysics. 25-minute Talk Mark Cheung - Staff Physicist, Lockheed Martin Solar & Astrophysics Laboratory
S8602 - Deep Learning for Shallow Sequencing The NVIDIA Genomics Group has developed a deep learning platform to transform noisy, low quality DNA sequencing data into clean, high quality data. Hundreds of DNA sequencing protocols are used to profile phenomena such as protein-DNA binding and DNA accessibility. For example the ATAC-seq protocol identifies open genomic sites by sequencing open DNA fragments; genome-wide fragment counts provide a profile of DNA accessibility. Recent advances enable profiling from smaller patient samples than previously possible. To reduce sequencing cost, we developed a convolutional neural network that denoises data from a small number of DNA fragments, making the data suitable for various downstream tasks. Our platform aims to accelerate adoption of DNA sequencers by minimizing data requirements. 50-minute Talk Johnny Israeli - Consultant, NVIDIA
S8531 - Deep Learning Infrastructure for Autonomous Vehicles We'll introduce deep learning infrastructure for building and maintaining autonomous vehicles, including techniques for managing the lifecycle of deep learning models, from definition, training and deployment to reloading and life-long learning. DNN autocurates and pre-labels data in the loop. Given data, it finds the best run-time optimized deep learning models. Training scales with data size beyond multi-nodes. With these methodologies, one takes only data from the application and feeds DL predictors to it. This infrastructure is divided into multiple tiers and is modular, with each of the modules containerized to lower infrastructures like GPU-based cloud infrastructure. 50-minute Talk Pradeep Gupta - Head - Solutions Architect, Autonomous Driving, NVIDIA
S8132 - (Deep) Learning to Grasp with a Close-Loop DNN Controller The paradigm for robot programming is changing with the adoption of the deep learning approach in the field of robotics. Instead of hard coding a complex sequence of actions, tasks are acquired by the robot through an active learning procedure. This introduces new challenges that have to be solved to achieve effective training. We'll show several issues that can be encountered while learning a close-loop DNN controller aimed at a fundamental task like grasping, and their practical solutions. First, we'll illustrate the advantages of training using a simulator, as well as the effects of choosing different learning algorithms in the reinforcement learning and imitation learning domains. We'll then show how separating the control and vision modules in the DNN can simplify and speed up the learning procedure in the simulator, although the learned controller hardly generalizes to the real world environment. Finally, we'll demonstrate how to use domain transfer to train a DNN controller in a simulator that can be effectively employed to control a robot in the real world. 50-minute Talk Iuri Frosio - senior research scientist, NVIDIA
S8441 - Deep Learning with Myia Myia is a new, experimental deep learning framework that aims to provide to deep learning researchers both the expressive power and the performance that they need. Symbolic frameworks such as TensorFlow only cover a curated subset of programming language features and do not support second order gradients very well. Dynamic frameworks such as PyTorch, while very powerful, use an operator overloading approach for automatic differentiation, which does not lend itself well to optimization. With Myia, we attempt to have the best of both worlds: we implement a general and composable approach to automatic differentiation over a functional abstraction of a subset of the Python programming language. That subset includes if, while, for, and recursion, providing plenty of expressive power, and yet it can also be analyzed statically to provide the best possible performance. We'll present the Myia language from a high-level technical perspective, including a short primer on functional programming and automatic differentiation. It is of special interest to deep learning framework or library implementers. 50-minute Talk Olivier Breuleux - Computer Analyst, MILA
S8603 - Deep Reinforcement Learning for Real-World Robotic Manipulation Deep reinforcement learning (deep RL) has emerged as a promising direction for autonomous acquisition of complex behaviors due to its ability to process complex sensory input and to acquire elaborate behavior skills, using general-purpose neural network representations. Since learning expressive function approximators requires large quantities of data, deep RL has been mostly applied to simulated domains, such as video games and simulated robotic locomotion and manipulation tasks, where the data collection can occur faster than real time and be trivially parallelized. We'll address techniques that have been proposed to enable deep RL for real-world robotics, and discuss how the maximum-entropy principle can be leveraged to reduce the required amount of real-world interaction. 25-minute Talk Tuomas Haarnoja - Graduate Student , UC Berkeley
S8624 - Democratizing Autonomous Driving AutoX is striving to democratize autonomy and make autonomous driving universally accessible to everyone. With over 10 years of experience in computer vision and robotics, AutoX founder and CEO Jianxiong Xiao is working to reduce the price of entry into the autonomous driving field for several orders of magnitudes with an innovative camera-first solution -- using cameras as primary sensors. By doing so, the safety a+nd convenience benefits of autonomy will be delivered to more people. He will share how he founded a company with the mission of democratizing autonomy, gathered an expert team, and commercialized his passion. 25-minute Talk Jianxiong Xiao - CEO, AutoX
S8546 - Deploying Containerized Distributed GPU Applications at Scale We'll demonstrate how to seamlessly deploy containerized GPU-enabled distributed applications on a cluster. We'll review a number of existing solutions that address resource management and containerization before arriving to our proposed system architecture: a one-stop solution addressing GPU allocation, containerization, deployment, and scheduling. Our solution can be deployed on premises or on the cloud and accelerate your workflows while easing the burden of deploying applications across a cluster. Examples will be given, including distributed TensorFlow applications and hyperparameter optimization. 50-minute Talk Thuc Tran - Machine Learning Engineer, Capital One Financial
Athanassios Kintsakis - Machine Learning Engineer, Capital One Financial
S8398 - Designing Human Centric Spaces with Holodeck and AI The growth in density of housing in cities like London and New York has resulted in the higher demand for efficient smaller apartments. These designs challenge the use of space and function while trying to ensure the occupants have the perception of a larger space than provided. The process of designing these spaces has always been the responsibility and perception of a handful of designers using 2D and 3D static platforms as part of the overall building design and evaluation, typically constraint by a prescriptive program and functional requirement. A combination of human- and AI-based agents creating and testing these spaces through design and virtual immersive environments (NVIDIA Holodeck) will attempt to ensure the final results are efficient and best fit for human occupancy prior to construction. 50-minute Talk Cobus Bothma - Applied Research Director, KPF
S8215 - Displaying and Interacting with Desktop Apps in VR Displaying traditional desktop applications in virtual reality requires techniques to overcome the limited resolution of current displays while simultaneously taking advantage of the 360 real estate. Interacting with these applications is helped with the use of gestures using the controllers and hands. We'll go over the use of mixed reality for easier keyboard typing when necessary, general safety, and finding things around, such as cables, chairs, and coffee. All techniques described are implemented and available in the commercially available software, called VR Toolbox. 25-minute Talk Rouslan Dimitrov - Programmer, VR Toolbox
S8375 - Enabling Deep Learning Applications in Radio Frequency Systems Artificial intelligence has made great strides in many technology sectors, however, it has yet to impact the design and applications of radio frequency (RF) and wireless systems. This is primarily due to the industry's preference towards field programmable gate array (FPGA) systems. Conversely, the deep learning revolution has been fueled by GPUs and the ease with which they may be programmed for highly parallel computations. The next generation RF and wireless technology will require wide-band systems capable of real-time machine learning with GPUs. Working with strategic partners, we've designed a software configurable wide-band RF transceiver system capable of performing real-time signal processing and machine learning with a Jetson TX2. We discuss system performance, collection of RF training data, and the software used by the community to create custom applications. Additionally, we'll present data demonstrating applications in the field of RF machine learning and deep learning. 50-minute Talk John Ferguson - President / CEO, Deepwave Digital
S8172 - Evaluation of Hybrid Cache-Coherent Concurrent Hash Table on POWER9 System with NVLink 2 At GTC in 2014, we described a novel concurrent cache-aware hash table that used a multi-level bounded linear probing hashing algorithm. We'll extend this design to develop a hybrid (CPU-GPU based) hash table in which the data is stored on the host CPU memory, and accessed via the GPU using the unified memory constructs. The hash table is designed such that multiple CPU threads can update it concurrently and multiple GPU threads can fetch data from the hash table in a cache-coherent manner using NVLink 2.0. We implement this hash-table on a POWER9 system, with NVLink 2.0 connected Tesla V100 GPUs, and present detailed performance measurements of throughput and virtual memory activities from CPU updates and GPU fetches. We also compare the performance of our design against a hybrid hash table built using the Cuckoo hashing approach. 50-minute Talk Rajesh Bordawekar - Principal Research Staff Member, IBM T. J. Watson Research Center
Pidad Gasfar D'Souza - System Performance Architect, IBM Systems Development Lab
S8430 - Everything You Need to Know About Unified Memory We'll cover all the things you need to know about Unified Memory: fundamental principles, important use cases, advances in the latest GPU architectures, HMM and ATS details, performance considerations and optimization ideas, and new application results, including data analytics and deep learning. 2018 is going to be the year of Unified Memory. Both HMM and ATS will be available and developers will start using the true Unified Memory model with the system allocator "the way it's meant to be played." We'll discuss all the caveats and differences between cudaMallocManaged and malloc. A big part of the talk will be related to performance aspects of Unified Memory: from migration throughput optimizations to improving the overlap between kernels and prefetches. 50-minute Talk Nikolay Sakharnykh - Sr. Developer Technology Engineer, NVIDIA Corporation
S8113 - Experiences of End2end Deep Learning Optimization on Alibaba PAI Deep Learning Platform Experiences of end-to-end deep learning optimization on Platform of Artificial Intelligence(PAI) from Alibaba will be shared in this session, including both offline training and online inference. For offline training, dedicated optimization is made for local and distributed environment. For online inference, the optimization is done through both algorithm and system perspectives. Both the methodology and benchmark number will be shared in this presentation. Also several business applications driven by these optimizations are shared to ensure attendees could learn to bridge the gap between low-level optimization and real business scenarios. 50-minute Talk Jun Yang - Algorithm Architect, Alibaba
S8651 - Extracting Data from Tables and Charts in Natural Document Formats Financial analysis depends on accurate financial data, and these data are often distributed via PDF and other "natural document" formats. While these formats are optimized for easy human comprehension, automatically extracting the data can be quite challenging. We'll describe our work using a deep learning pipeline to extract data from tables and charts in PDF documents. We'll also show some of our latest research, inspired by image captioning models, for directly going from images of tables to a markup language (LaTeX) representation. 50-minute Talk David Rosenberg - Data Scientist, Office of the CTO, Bloomberg
Philipp Meerkamp - Financial Software Engineer, Bloomberg
Remove From Schedule Add To Schedule Are you sure you would like to Delete this personal time? Edit My Schedule Edit Personal Time This session is full. Would you like to be added to the waiting list? Would you like to remove "{0}" from your schedule? Would you like to add "{0}" from your schedule? Sorry, this session is full. Waitlist Available Sorry, this session and it's waiting list are completely full. Sessions Available Adding this multi-day session automatically enrolls you for all times shown below. Removing this multi-day session automatically removes you for all times shown below. Adding this multi-day session automatically enrolls you for all session times for this session. Removing this multi-day session automatically removes you for all session times for this session. Click to view details Interests Hide Interests Search Sessions Export Schedule There is a scheduling conflict. You cannot add this session to your schedule because you are participating in another session at this time. Schedule Conflict. An error occurred while processing this request.. Adding this item creates a conflict with another session on your schedule. Remove from Waiting List Add to waiting list Removing this will remove you from the waiting list for all session times for this session Adding this will add you to the waiting list for all session times for this session. You have nothing scheduled Tap below to see a list of sessions and activities that are available to add to your schedule this week Choose from the list of sessions to the left to add to your schedule for the day Add a Session
Get More Results