Opencl benchmark ucsd

5/9/2023

Hardware generator: Basic buliding blocks for neural networks, and address generation unit (RTL).DeepBurning: Automatic Generation of FPGA-based Learning Accelerators for the Neural Network Family.Opportunistic Turbo Execution in NTC: Exploiting the Paradigm Shift in Performance Bottlenecks.Design Methodology for Operating in Near-Threshold Computing (NTC) Region.Scalable Effort Classifiers for Energy Efficient Machine Learning.(Universtiy of Pittsburgh, Tsinghua University, San Francisco State University, Air Force Research Laboratory, University of Massachusetts.) Reno: A Highly-Efficient Reconfigurable Neuromorphic Computing Accelerator Design.Papers of significance are marked in bold. The emphasis is focused on, but not limited to neural networks on silicon. This is a collection of conference papers that interest me. A layer-based scheduling framework is proposed to optimize both system-level energy efficiency and performance.A 4-level CONV engine is designed to to support different tiling parameters for higher resource utilization and performance.This is the first work to assign Input/Output/Weight Reuse to different layers of a CNN, which optimizes system-level energy consumption based on different CONV parameters.Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns.Hope my new works will come out soon in the near future. I'm trying to improve the architecture for ultra low-power compting. A deep convoultional neural network architecture (DNA) has been designed with 1~2 orders higher energy efficiency over the state-of-the-art works. I'm working on energy-efficient architecture design for deep learning. This is an exciting field where fresh ideas come out every day, so I'm collecting works on related topics. One of my research interests is architecture design for deep learning. For more informantion about me and my research, you can go to my homepage. degree with the Institute of Microelectronics, Tsinghua University, Beijing, China. Workshop on automatic performance tuning. Evaluating performance and portability of OpenCL programs. OpenCL Performance Evaluation on Modern Multi Core CPUs. A portable- and high-performance matrix operations library for CPUs, GPUs and beyond. Improving Performance Portability in OpenCL Programs. CUDA Occupancy Calculator NVidia, 2009.The scalable heterogeneous computing (SHOC) benchmark suite. PARTANS: An autotuning framework for stencil computation on multi-GPU systems. Auto-tuning a high-level language targeted to gpu codes.

An experimental study on performance portability of OpenCL kernels. A static task partitioning approach for heterogeneous systems using OpenCL. The Architecture and Evolution of CPU-GPU Systems for General Purpose Computing. Scalability and Parallel Execution of OmpSs-OpenCL Tasks on Heterogeneous CPU-GPU Environment.

OmpSs-OpenCL Programming Model for Heterogeneous Systems. Heterogeneous Computing with OpenCL Morgan Kaufmann.

Kepler Architecture - White Paper Google Scholar.
In addition, the Auto-Tune tool integrated with OmpSs-OpenCL offered a maximum performance gain of 10% for Matmul workload at runtime. We evaluate our proposal on 7 different benchmarks with varied characteristics ported across three different GPU architectures.

The proposed tool focuses on modifying the OpenCL kernel execution configuration based on GPU specifications and does not require any user intervention providing reasonable performance portability. In this paper, we propose an Auto-Tune tool for OmpSs-OpenCL kernels to ease the process of porting applications across different generations of GPUs.

There is, however a lack of tool-chain support that can help with porting applications across GPUs with myriad computing capabilities. In this light, performance portability of applications across different generations of GPUs with minimal programmer intervention, is desired. The semiconductor industry has enabled a continuous increase in computing capabilities of GPUs across successive product generations. Perceived benefits of the heterogeneous computing models have led to GPUs becoming an indispensable part of existing/future HPC infrastructures.

0 Comments

Opencl benchmark ucsd

Leave a Reply.

Author

Archives

Categories