<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Dr.-Ing. Nicolas Weber</title>
    <link>https://www.mergian.de/</link>
    <description>Recent content on Dr.-Ing. Nicolas Weber</description>
    <generator>Hugo -- gohugo.io</generator>
    <lastBuildDate>Fri, 11 Apr 2025 00:00:00 +0000</lastBuildDate>
    
        <atom:link href="https://www.mergian.de/index.xml" rel="self" type="application/rss+xml" />
    
    
    <item>
      <title>Facilitate high-performance hardware integration into AI Frameworks with the NEC SOL AI compiler</title>
      <link>https://www.mergian.de/2025/iwapt-sol/</link>
      <pubDate>Fri, 11 Apr 2025 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2025/iwapt-sol/</guid>
      <description>AI development has become increasingly driven by powerful frameworks like PyTorch and TensorFlow, supported by major tech companies. However, the rapid release cycles of these frameworks &amp;ndash; every 3-6 months &amp;ndash; pose a challenge for new hardware vendors. They struggle to develop the necessary AI functionality and keep pace with frequent updates. In this talk, we introduce NEC&amp;rsquo;s SOL AI compiler, which seamlessly integrates with PyTorch, TensorFlow, ONNX, Numpy, and soon JAX.</description>
    </item>
    
    <item>
      <title>Higher-Rank Irreducible Cartesian Tensors for Equivariant Message Passing</title>
      <link>https://www.mergian.de/2024/poster-neurips/</link>
      <pubDate>Thu, 26 Sep 2024 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2024/poster-neurips/</guid>
      <description>arXiv, Poster</description>
    </item>
    
    <item>
      <title>COMBINATION OF MULTIPLE DATA PROCSSING AND MACHINE LEARNING FRAMEWORKS FOR A TARGET HARDWARE</title>
      <link>https://www.mergian.de/2024/patent-multitarget/</link>
      <pubDate>Tue, 27 Feb 2024 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2024/patent-multitarget/</guid>
      <description></description>
    </item>
    
    <item>
      <title>FULL ASYNCHRONOUS EXECUTION QUEUE FOR ACCELERATOR HARDWARE</title>
      <link>https://www.mergian.de/2023/patent-veda/</link>
      <pubDate>Tue, 28 Feb 2023 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2023/patent-veda/</guid>
      <description></description>
    </item>
    
    <item>
      <title>VEDA: Best practices to use hybrid programming on the NEC SX-Aurora TSUBASA</title>
      <link>https://www.mergian.de/2022/sxaurora-veda/</link>
      <pubDate>Sat, 12 Nov 2022 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2022/sxaurora-veda/</guid>
      <description>The Vector Engine Driver API (VEDA) was developed to enable easy porting of existing CUDA applications to NEC&amp;rsquo;s SX-Aurora TSUBASA. While the API enables a smooth transition between the different architectures, there are unique features that require special attention, to achieve optimal performance.
In this article we present multiple methods to improve your code. First, we explain how to use C++ function overloading and templates. Second, we show how to make best use of the unique features of VEDAdeviceptrs.</description>
    </item>
    
    <item>
      <title>Keras Merge</title>
      <link>https://www.mergian.de/2022/keras-merge/</link>
      <pubDate>Wed, 09 Nov 2022 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2022/keras-merge/</guid>
      <description>Today we released my newest Open Source project: Keras Merge! Keras Merge allows you to merge two Keras models, even when you don&amp;rsquo;t have access to their building functions! Just run pip3 install keras-merge to install it.
A = init_model_a() # -&amp;gt; keras.Model B = init_model_b() # -&amp;gt; keras.Model input_a = init_input_a() input_b = init_input_b() c = B(input_b, A(input_a)) import keras_merge as km C = km.merge(A, B, # models [*A.inputs, B.</description>
    </item>
    
    <item>
      <title>ACCELERATION OF NEURAL NETWORKS USING DEPTH - FIRST PROCESSING</title>
      <link>https://www.mergian.de/2022/patent-brainslug/</link>
      <pubDate>Tue, 30 Aug 2022 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2022/patent-brainslug/</guid>
      <description></description>
    </item>
    
    <item>
      <title>SOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks</title>
      <link>https://www.mergian.de/2022/sxaurora-sol/</link>
      <pubDate>Sun, 01 May 2022 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2022/sxaurora-sol/</guid>
      <description>The increased interest in Artificial Intelligence (AI) raised the need for highly optimized and sophisticated AI frameworks. Starting with the Lua-based Torch many frameworks have emerged over time, such as Theano, Caffe, Chainer, CNTK, MxNet, PyTorch, DL4J, or TensorFlow.
All of these provide a high level scripting API that allows users to easily design neural networks and run these on various kinds of hardware. What the user usually does not see is the high effort put into these frameworks to provide peak execution performance.</description>
    </item>
    
    <item>
      <title>SOL: Single middleware for optimized multi-architecture AI training and deployment</title>
      <link>https://www.mergian.de/2022/nug-sol/</link>
      <pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2022/nug-sol/</guid>
      <description>NEC User Group Meeting</description>
    </item>
    
    <item>
      <title>AVEO-VEDA: Hybrid Programming for the NEC Vector Engine</title>
      <link>https://www.mergian.de/2021/sxaurora-aveo-veda/</link>
      <pubDate>Wed, 14 Jul 2021 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2021/sxaurora-aveo-veda/</guid>
      <description>Hybrid programming is a state of the art method for incorporating compute accelerators such as GPUs or vector processors into applications that run on a host system. The main reason for hybrid programming is that compute accelerators are well suited for compute and memory heavy tasks but perform poorly in control flow dominated code sections. Therefore latter are usually executed on CPUs while the compute heavy parts are offloaded to accelerators.</description>
    </item>
    
    <item>
      <title>Flynn’s reconciliation: Automating the register cache idiom for cross-accelerator programming</title>
      <link>https://www.mergian.de/2021/acmtaco-flynn/</link>
      <pubDate>Sat, 01 May 2021 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2021/acmtaco-flynn/</guid>
      <description>ACM Transactions on Architecture and Code Optimization (TACO)</description>
    </item>
    
    <item>
      <title> SOL: Transparent Neural Network Acceleration on NEC SX-Aurora TSUBASA</title>
      <link>https://www.mergian.de/2020/icm-sol/</link>
      <pubDate>Tue, 01 Sep 2020 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2020/icm-sol/</guid>
      <description>Slides</description>
    </item>
    
    <item>
      <title>SOL: Effortless Device Support for AI Frameworks without Source Code Changes</title>
      <link>https://www.mergian.de/2020/hpml-sol/</link>
      <pubDate>Fri, 01 May 2020 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2020/hpml-sol/</guid>
      <description>High Performance Machine Learning (HPML)‘20</description>
    </item>
    
    <item>
      <title>SOL4VE: Bringing Deep Neural Networks to the NEC SX-Aurora TSUBASA</title>
      <link>https://www.mergian.de/2020/nug-sol/</link>
      <pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2020/nug-sol/</guid>
      <description>NEC User Group Meeting</description>
    </item>
    
    <item>
      <title>SOL4VE: Running Deep Neural Networks on the NEC SX-Aurora Tsubasa</title>
      <link>https://www.mergian.de/2020/auroraforum-sol/</link>
      <pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2020/auroraforum-sol/</guid>
      <description></description>
    </item>
    
    <item>
      <title>BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism</title>
      <link>https://www.mergian.de/2018/arxiv-brainslug/</link>
      <pubDate>Tue, 01 May 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2018/arxiv-brainslug/</guid>
      <description></description>
    </item>
    
    <item>
      <title>BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism</title>
      <link>https://www.mergian.de/2018/deepmobile-sol/</link>
      <pubDate>Tue, 01 May 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2018/deepmobile-sol/</guid>
      <description>International Workshop on Embedded and Mobile Deep Learning</description>
    </item>
    
    <item>
      <title>Detail-Preserving Pooling in Deep Networks</title>
      <link>https://www.mergian.de/2018/cvpr-dpp/</link>
      <pubDate>Tue, 01 May 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2018/cvpr-dpp/</guid>
      <description>ArXiv, Source Code</description>
    </item>
    
    <item>
      <title>Sol: Transparent Neural Network Acceleration Platform</title>
      <link>https://www.mergian.de/2018/sc-sol/</link>
      <pubDate>Mon, 01 Jan 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2018/sc-sol/</guid>
      <description>SuperComputing (SC), Poster</description>
    </item>
    
    <item>
      <title>GPU Array Access Auto-Tuning</title>
      <link>https://www.mergian.de/2017/thesis-phd/</link>
      <pubDate>Sun, 01 Jan 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2017/thesis-phd/</guid>
      <description>Source Code</description>
    </item>
    
    <item>
      <title>MATOG: Array Access Auto-Tuning</title>
      <link>https://www.mergian.de/2017/acmtaco-matog/</link>
      <pubDate>Sun, 01 Jan 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2017/acmtaco-matog/</guid>
      <description>ACM Transactions on Architecture and Code Optimization (TACO), Source Code</description>
    </item>
    
    <item>
      <title>Prospect for Knowledge in Survey Data: An Artificial Neural Network Sensitivity Analysis</title>
      <link>https://www.mergian.de/2017/ssce-prospect/</link>
      <pubDate>Sun, 01 Jan 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2017/ssce-prospect/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Adaptive GPU Array Layout Auto-Tuning</title>
      <link>https://www.mergian.de/2016/sem4hpc-matog/</link>
      <pubDate>Fri, 01 Jan 2016 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2016/sem4hpc-matog/</guid>
      <description>Software Engineering Methods for Parallel and High Performance Applications (SEM4HPC), Source Code</description>
    </item>
    
    <item>
      <title>Rapid, Detail-Preserving Image Downscaling</title>
      <link>https://www.mergian.de/2016/siggraphasia-dpid/</link>
      <pubDate>Fri, 01 Jan 2016 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2016/siggraphasia-dpid/</guid>
      <description>ACM Transactions on Graphics (TOG), SIGGRAPH Asia, Supplemental Material, Source Code</description>
    </item>
    
    <item>
      <title>Guided Profiling for Auto-Tuning Array Layouts on GPUs</title>
      <link>https://www.mergian.de/2015/pmbs-matog/</link>
      <pubDate>Thu, 01 Jan 2015 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2015/pmbs-matog/</guid>
      <description>Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), Source Code</description>
    </item>
    
    <item>
      <title>Auto-Tuning Complex Array Layouts for GPUs</title>
      <link>https://www.mergian.de/2014/egpgv-matog/</link>
      <pubDate>Wed, 01 Jan 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2014/egpgv-matog/</guid>
      <description>Eurographics Symposium on Parallel Graphics and Visualization (EGPGV), Supplemental Material, Source Code</description>
    </item>
    
    <item>
      <title>Construction of Ray-Tracing Acceleration Structures in an Out-of-Core Multi-GPU Environment</title>
      <link>https://www.mergian.de/2013/thesis-msc/</link>
      <pubDate>Tue, 01 Jan 2013 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2013/thesis-msc/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Fast Dynamic Memory Allocator for Massively Parallel Architectures</title>
      <link>https://www.mergian.de/2013/gpgpu-fgmalloc/</link>
      <pubDate>Tue, 01 Jan 2013 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2013/gpgpu-fgmalloc/</guid>
      <description>Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU), Source Code</description>
    </item>
    
    <item>
      <title>Transportprotokoll und Systemdienste für den Controller Area Network Bus</title>
      <link>https://www.mergian.de/2010/thesis-bsc/</link>
      <pubDate>Fri, 01 Jan 2010 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/2010/thesis-bsc/</guid>
      <description>Source Code</description>
    </item>
    
    <item>
      <title>About me</title>
      <link>https://www.mergian.de/page/short/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://www.mergian.de/page/short/</guid>
      <description>I am Nicolas Weber, research engineer in the Intelligent Software Systems Group at the NEC Laboratories Europe. Before, I was PhD student in the Graphics, Capture and Massively Parallel Computing Group at TU Darmstadt supervised by Prof. Michael Goesele and Associate of the Graduate School of Computational Engineering at TU Darmstadt.
My main research interests are the automated optimization of code running on accelerator hardware, especially for Scientific and High Performance Computing, Biomedical and Artificial Intelligence applications.</description>
    </item>
    
  </channel>
</rss>
