Dr.-Ing. Nicolas Weber

Dr.-Ing. Nicolas Weber https://www.mergian.de/ Recent content on Dr.-Ing. Nicolas Weber Hugo -- gohugo.io Fri, 11 Apr 2025 00:00:00 +0000 Facilitate high-performance hardware integration into AI Frameworks with the NEC SOL AI compiler https://www.mergian.de/2025/iwapt-sol/ Fri, 11 Apr 2025 00:00:00 +0000 https://www.mergian.de/2025/iwapt-sol/ AI development has become increasingly driven by powerful frameworks like PyTorch and TensorFlow, supported by major tech companies. However, the rapid release cycles of these frameworks – every 3-6 months – pose a challenge for new hardware vendors. They struggle to develop the necessary AI functionality and keep pace with frequent updates. In this talk, we introduce NEC’s SOL AI compiler, which seamlessly integrates with PyTorch, TensorFlow, ONNX, Numpy, and soon JAX. Higher-Rank Irreducible Cartesian Tensors for Equivariant Message Passing https://www.mergian.de/2024/poster-neurips/ Thu, 26 Sep 2024 00:00:00 +0000 https://www.mergian.de/2024/poster-neurips/ arXiv, Poster COMBINATION OF MULTIPLE DATA PROCSSING AND MACHINE LEARNING FRAMEWORKS FOR A TARGET HARDWARE https://www.mergian.de/2024/patent-multitarget/ Tue, 27 Feb 2024 00:00:00 +0000 https://www.mergian.de/2024/patent-multitarget/ FULL ASYNCHRONOUS EXECUTION QUEUE FOR ACCELERATOR HARDWARE https://www.mergian.de/2023/patent-veda/ Tue, 28 Feb 2023 00:00:00 +0000 https://www.mergian.de/2023/patent-veda/ VEDA: Best practices to use hybrid programming on the NEC SX-Aurora TSUBASA https://www.mergian.de/2022/sxaurora-veda/ Sat, 12 Nov 2022 00:00:00 +0000 https://www.mergian.de/2022/sxaurora-veda/ The Vector Engine Driver API (VEDA) was developed to enable easy porting of existing CUDA applications to NEC’s SX-Aurora TSUBASA. While the API enables a smooth transition between the different architectures, there are unique features that require special attention, to achieve optimal performance. In this article we present multiple methods to improve your code. First, we explain how to use C++ function overloading and templates. Second, we show how to make best use of the unique features of VEDAdeviceptrs. Keras Merge https://www.mergian.de/2022/keras-merge/ Wed, 09 Nov 2022 00:00:00 +0000 https://www.mergian.de/2022/keras-merge/ Today we released my newest Open Source project: Keras Merge! Keras Merge allows you to merge two Keras models, even when you don’t have access to their building functions! Just run pip3 install keras-merge to install it. A = init_model_a() # -> keras.Model B = init_model_b() # -> keras.Model input_a = init_input_a() input_b = init_input_b() c = B(input_b, A(input_a)) import keras_merge as km C = km.merge(A, B, # models [*A.inputs, B. ACCELERATION OF NEURAL NETWORKS USING DEPTH - FIRST PROCESSING https://www.mergian.de/2022/patent-brainslug/ Tue, 30 Aug 2022 00:00:00 +0000 https://www.mergian.de/2022/patent-brainslug/ SOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks https://www.mergian.de/2022/sxaurora-sol/ Sun, 01 May 2022 00:00:00 +0000 https://www.mergian.de/2022/sxaurora-sol/ The increased interest in Artificial Intelligence (AI) raised the need for highly optimized and sophisticated AI frameworks. Starting with the Lua-based Torch many frameworks have emerged over time, such as Theano, Caffe, Chainer, CNTK, MxNet, PyTorch, DL4J, or TensorFlow. All of these provide a high level scripting API that allows users to easily design neural networks and run these on various kinds of hardware. What the user usually does not see is the high effort put into these frameworks to provide peak execution performance. SOL: Single middleware for optimized multi-architecture AI training and deployment https://www.mergian.de/2022/nug-sol/ Sat, 01 Jan 2022 00:00:00 +0000 https://www.mergian.de/2022/nug-sol/ NEC User Group Meeting AVEO-VEDA: Hybrid Programming for the NEC Vector Engine https://www.mergian.de/2021/sxaurora-aveo-veda/ Wed, 14 Jul 2021 00:00:00 +0000 https://www.mergian.de/2021/sxaurora-aveo-veda/ Hybrid programming is a state of the art method for incorporating compute accelerators such as GPUs or vector processors into applications that run on a host system. The main reason for hybrid programming is that compute accelerators are well suited for compute and memory heavy tasks but perform poorly in control flow dominated code sections. Therefore latter are usually executed on CPUs while the compute heavy parts are offloaded to accelerators. Flynn’s reconciliation: Automating the register cache idiom for cross-accelerator programming https://www.mergian.de/2021/acmtaco-flynn/ Sat, 01 May 2021 00:00:00 +0000 https://www.mergian.de/2021/acmtaco-flynn/ ACM Transactions on Architecture and Code Optimization (TACO) SOL: Transparent Neural Network Acceleration on NEC SX-Aurora TSUBASA https://www.mergian.de/2020/icm-sol/ Tue, 01 Sep 2020 00:00:00 +0000 https://www.mergian.de/2020/icm-sol/ Slides SOL: Effortless Device Support for AI Frameworks without Source Code Changes https://www.mergian.de/2020/hpml-sol/ Fri, 01 May 2020 00:00:00 +0000 https://www.mergian.de/2020/hpml-sol/ High Performance Machine Learning (HPML)‘20 SOL4VE: Bringing Deep Neural Networks to the NEC SX-Aurora TSUBASA https://www.mergian.de/2020/nug-sol/ Wed, 01 Jan 2020 00:00:00 +0000 https://www.mergian.de/2020/nug-sol/ NEC User Group Meeting SOL4VE: Running Deep Neural Networks on the NEC SX-Aurora Tsubasa https://www.mergian.de/2020/auroraforum-sol/ Wed, 01 Jan 2020 00:00:00 +0000 https://www.mergian.de/2020/auroraforum-sol/ BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism https://www.mergian.de/2018/arxiv-brainslug/ Tue, 01 May 2018 00:00:00 +0000 https://www.mergian.de/2018/arxiv-brainslug/ BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism https://www.mergian.de/2018/deepmobile-sol/ Tue, 01 May 2018 00:00:00 +0000 https://www.mergian.de/2018/deepmobile-sol/ International Workshop on Embedded and Mobile Deep Learning Detail-Preserving Pooling in Deep Networks https://www.mergian.de/2018/cvpr-dpp/ Tue, 01 May 2018 00:00:00 +0000 https://www.mergian.de/2018/cvpr-dpp/ ArXiv, Source Code Sol: Transparent Neural Network Acceleration Platform https://www.mergian.de/2018/sc-sol/ Mon, 01 Jan 2018 00:00:00 +0000 https://www.mergian.de/2018/sc-sol/ SuperComputing (SC), Poster GPU Array Access Auto-Tuning https://www.mergian.de/2017/thesis-phd/ Sun, 01 Jan 2017 00:00:00 +0000 https://www.mergian.de/2017/thesis-phd/ Source Code MATOG: Array Access Auto-Tuning https://www.mergian.de/2017/acmtaco-matog/ Sun, 01 Jan 2017 00:00:00 +0000 https://www.mergian.de/2017/acmtaco-matog/ ACM Transactions on Architecture and Code Optimization (TACO), Source Code Prospect for Knowledge in Survey Data: An Artificial Neural Network Sensitivity Analysis https://www.mergian.de/2017/ssce-prospect/ Sun, 01 Jan 2017 00:00:00 +0000 https://www.mergian.de/2017/ssce-prospect/ Adaptive GPU Array Layout Auto-Tuning https://www.mergian.de/2016/sem4hpc-matog/ Fri, 01 Jan 2016 00:00:00 +0000 https://www.mergian.de/2016/sem4hpc-matog/ Software Engineering Methods for Parallel and High Performance Applications (SEM4HPC), Source Code Rapid, Detail-Preserving Image Downscaling https://www.mergian.de/2016/siggraphasia-dpid/ Fri, 01 Jan 2016 00:00:00 +0000 https://www.mergian.de/2016/siggraphasia-dpid/ ACM Transactions on Graphics (TOG), SIGGRAPH Asia, Supplemental Material, Source Code Guided Profiling for Auto-Tuning Array Layouts on GPUs https://www.mergian.de/2015/pmbs-matog/ Thu, 01 Jan 2015 00:00:00 +0000 https://www.mergian.de/2015/pmbs-matog/ Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), Source Code Auto-Tuning Complex Array Layouts for GPUs https://www.mergian.de/2014/egpgv-matog/ Wed, 01 Jan 2014 00:00:00 +0000 https://www.mergian.de/2014/egpgv-matog/ Eurographics Symposium on Parallel Graphics and Visualization (EGPGV), Supplemental Material, Source Code Construction of Ray-Tracing Acceleration Structures in an Out-of-Core Multi-GPU Environment https://www.mergian.de/2013/thesis-msc/ Tue, 01 Jan 2013 00:00:00 +0000 https://www.mergian.de/2013/thesis-msc/ Fast Dynamic Memory Allocator for Massively Parallel Architectures https://www.mergian.de/2013/gpgpu-fgmalloc/ Tue, 01 Jan 2013 00:00:00 +0000 https://www.mergian.de/2013/gpgpu-fgmalloc/ Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU), Source Code Transportprotokoll und Systemdienste für den Controller Area Network Bus https://www.mergian.de/2010/thesis-bsc/ Fri, 01 Jan 2010 00:00:00 +0000 https://www.mergian.de/2010/thesis-bsc/ Source Code About me https://www.mergian.de/page/short/ Mon, 01 Jan 0001 00:00:00 +0000 https://www.mergian.de/page/short/ I am Nicolas Weber, research engineer in the Intelligent Software Systems Group at the NEC Laboratories Europe. Before, I was PhD student in the Graphics, Capture and Massively Parallel Computing Group at TU Darmstadt supervised by Prof. Michael Goesele and Associate of the Graduate School of Computational Engineering at TU Darmstadt. My main research interests are the automated optimization of code running on accelerator hardware, especially for Scientific and High Performance Computing, Biomedical and Artificial Intelligence applications.