Prof. Dr. Jürgen Teich

Department of Computer Science

Our research centers around the systematic design (CAD) of hardware/software systems, ranging from embedded systems to HPC platforms. One principal research direction is domain-specific computing that tries to tackle the very complex programming and design challenge of parallel heterogeneous computer architectures. Domain-specific computing drastically separates the concerns of algorithm development and target architecture implementation, including parallelization and low-level implementation details. The key idea is to take advantage of the knowledge being inherent in a particular problem area or field of application, i.e., a particular domain, in a well-directed manner and thus, to master the complexity of heterogeneous systems. Such domain knowledge can be captured by reasonable abstractions, augmentations, and notations, e.g., libraries, Domain-specific programming languages (DSLs), or combinations of both (e.g., embedded DSLs implemented via template metaprogramming). On this basis, patterns can be utilized to transform and optimize the input description in a goal-oriented way during compilation, and, finally, to generate code for a specific target architecture. Thus, DSLs provide high productivity plus typically also high performance. We develop DSLs and target platform languages to capture both domain and architecture knowledge, which is utilized during the different phases of compilation, parallelization, mapping, as well as code generation for a wide variety of architectures, e.g., multi-core processors, GPUs, MPSoCs, FPGAs. All these steps usually go along with optimizing and exploring the vast space of design options and trading off multiple objectives, such as performance, cost, energy, or reliability.

Research projects

  • Diffusion-weighted imaging and quantitative susceptibility mapping of the breast, liver, prostate, and brain
  • Development of new MRI pulse sequences
  • Development of new MRI post-processing schemes
  • Joint evaluation of new MR methods with radiology
  • Domain-specific Computing for Medical imaging
  • Hipacc – the Heterogeneous Image Processing Acceleration Framework
  • AI Laboratory for System-level Design of ML-based Signal Processing Applications
  • Architecture Modeling and Exploration of Algorithms for Medical Image Processing

Current projects

  • Neural Approximate Accelerator Architecture Optimization for DNN Inference on Lightweight FPGAs

    (Third Party Funds Single)

    Term: 1. March 2024 - 28. February 2027
    Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)

    Embedded Machine Learning (ML) constitutes an admittedly fast-growing field that comprises ML algorithms, hardware, and software capable of performing on-device sensor data analyses at extremely low power, enabling thus several always-on and battery-powered applications and services. Running ML-based applications on embedded edge devices witnesses a phenomenal research and business interest for many reasons, including accessibility, privacy, latency, cost, and security. Embedded ML is primarily represented by artificial intelligence (AI) at the edge (EdgeAI) and on tiny, ultra resource constrained devices, a.k.a. TinyML. TinyML poses requirements for energy efficiency but also low latency as well as to retain accuracy in acceptable levels mandating, thus, optimization of the software and hardware stack.
    GPUs form the default platform for DNN training workloads, due to their high parallelism computing originating by the massive number of processing cores. Though, GPU is often not an optimal solution for DNN inference acceleration due to the high energy-cost and the lack of reconfigurability, especially for high sparsity models or customized architectures. On the other hand, Field Programmable Gate Arrays (FPGAs) have a unique privilege of potentially lower latency and higher efficiency than GPUs while offering high customization and faster time-to-market combined with potentially longer useful life than ASIC solutions.
    In the context of TinyML, NA³Os focuses on a neural approximate accelerator-architecture co-search targeting specifically lightweight FPGA devices. This project investigates design techniques to optimally and automatically map DNNs to resource- constrained FPGAs while exploiting principles of approximate computing. Our particular topics of investigation include:

    • Efficient mapping of DNN operations onto approximate hardware components (e.g., multipliers, adders, DSP Blocks, BRAMs).
    • Techniques for fast and automated design space exploration of mappings of DNNs defined by a set of approximate operators and a set of FPGA platform constraints.
    • Investigation of a hardware-aware neural architecture co-search methodology targeting FPGA-based DNN accelerators.
    • Evaluation of robustness vs. energy efficiency tradeoffs.
    • Finally, all developed methods shall be evaluated experimentally by providing a proper synthesis path and comparing the quality of generated solutions with state-of-the-art solutions.
  • Automatic Cross-Layer Synthesis of High Performance, (Ultra-)Low Power Hardware Implementations from Data Flow Specifications by Integration of Emerging FeFET Technology

    (Third Party Funds Single)

    Term: 1. March 2024 - 1. March 2027
    Funding source: Deutsche Forschungsgemeinschaft (DFG)

    High throughput data and signal processing applications can be specified preferably by dataflow networks, as these naturally allow the exploitation of parallelism as well globally (at the level of a network of communicating actors) as locally at the actor level, e.g., by implementing each actor as a hardware circuit. Today, there exist a few system-level design approaches to aid an algorithm designer in compiling a dataflow network to a set of processors or, alternatively, to synthesize the network directly in hardware for achieving high processing speeds. But embedded systems, particularly in the context of IoT applications, have additional requirements: Safe operation, even in an environment of intermittent power shortages, and in general (ultra-)low power requirements. Altogether, these requirements seem to be contradictory.

    Our proposed project named HiLoDa (High performance, (ultra-Low) power Dataflow) Nets attacks this obvious discrepancy and conflict in requirements by a) introducing, exploiting, and integrating for the first time emerging FeFET technology for the design of actor networks, i.e., by investigating and designing persistable FIFO-based memory units. b) In particular, circuit devices being able to operate in mixed volatile/non-volatile mode of operation shall be modeled, characterized, and designed. c) By combining the system-level concept of dataflow, which is based on self-scheduled activations of computations with emerging CMOS-compatible FeFET technology, inactive actors or even subnets shall inherit the capability of self-powering (down and wakeup). In addition, for a continuously safe mode of operation, a down-powering must also be triggered upon any intermittent shortage of power supply. Analogously, actors shall perform an auto-wakeup after recovery from a power shortage but also subject to fireability.

    HiLoDa Nets will be able to combine high clock-speed data processing of each synthesized actor circuit in power-on mode and automatic state retention using FeFET technology in power-off mode, self-triggered during time intervals of either data unavailability or power shortage. d) A fully automatic cross-layer synthesis from system-level dataflow specification to optimized circuit implementation involving FeFET devices shall be developed. This includes e) the DSE (design space exploration) of actor clusterings at the system level to explore individual power domains for the optimization of throughput, circuit cost, energy savings, and endurance. Finally, f) HiLoDa Nets shall be compared to conventional CMOS technology implementations with respect to energy consumption for applications such as spiking neural networks. Likewise, shutdown (backup) and recovery latencies from power shortages shall be evaluated and optimized.

  • Optimization and Toolchain for Embedding AI

    (Third Party Funds Single)

    Term: 1. March 2023 - 28. February 2026
    Funding source: Industrie

    Artificial Intelligence (AI) methods have quickly progressed from research to productive applications in recent years. Typical AI models (e.g., deep neural networks) yield high memory demands and computational efforts for training and when making predictions during operation. This is opposed to the typically limited resources of embedded controllers used in automotive or industrial applications. To comply with these limitations, AI models must be streamlined on different levels to be applicable to a given specific embedded target hardware, e.g., by architecture and feature selection, pruning, and other compression techniques. Currently, model adaptation to fit the target hardware is achieved by iterative, manual changes in a “trial-and-error” manner: the model is designed, trained, and compiled to the target hardware while applying different optimization techniques. The model is then checked for compliance with the hardware constraints, and the cycle is repeated if necessary. This approach is time-consuming and error-prone.

    Therefore, this project, funded by the Schaeffler Hub for Advanced Research at Friedrich-Alexander-Universität Erlangen-Nürnberg (SHARE at FAU), seeks to establish guidelines for hardware selection and a systematic toolchain for optimizing and embedding AI in order to reduce the current efforts of porting machine learning models to automotive and industrial devices.

  • HYPNOS – Co-Design of Persistent, Energy-efficient and High-speed Embedded Processor Systems with Hybrid Volatility Memory Organisation

    (Third Party Funds Group – Sub project)

    Overall project: DFG Priority Programme (SPP) 2037 - Disruptive Memory Technologies
    Term: 21. September 2022 - 21. September 2025
    Funding source: DFG / Schwerpunktprogramm (SPP)

    This project is funded by the German Research Foundation (DFG) within the Priority Program SPP 2377 "Scalable Data Management for Future Hardware".

    HYPNOS explores how emerging non-volatile memory (NVM) technologies could beneficially replace not only main memory in modern embedded processor architectures, but potentially also one or multiple levels of the cache hierarchy or even the registers and how to optimize such a hybrid-volatile memory hierarchy for offering high speed and low energy tradeoffs for a multitude of application programs while providing persistence of data structures and processing state in a simple and efficient way.   

    On the one hand, completely non-volatile (memory) processors (NVPs) that have emerged for IoT devices are known to suffer from low write times of current NVM technologies as well as by orders of magnitude lower endurance than, e.g., SRAM, thus prohibiting an operation at GHz speeds. On the other hand, existing NVM main memory computer solutions suffer from the need of the programmer to explicitly persist data structures through the cache hierarchy.     

    HYPNOS (Named after the Greek god of sleep.) systematically attacks this intertwined performance/endurance/programmability gap by taking a hardware/software co-design approach:

    Our investigations include techniques for

    a) design space exploration of hybrid NVM memory processor architectures} wrt. speed and energy consumption including hybrid (mixed volatile) register and cache-level designs,

    b) offering instruction-level persistence for (non-transactional) programs in case of, e.g., instantaneous power failures through low-cost and low-latency control unit (hardware) design of checkpointing and recovery functions, and additionally providing

    c) application-programmer (software) persistence control on a multi-core HyPNOS system for user-defined checkpointing and recovery from these and other errors or access conflicts backed by size-limited hardware transactional memory (HTM).

    d) The explored processor architecture designs and different types of NVM technologies will be systematically evaluated for achievable speed and energy gains, and for testing co-designed backup and recovery mechanisms, e.g., wakeup latencies, etc., using a gem5-based multi-core simulation platform and using ARM processors with HTM instruction extensions.

    As benchmarks, i) simple data structures, ii) sensor (peripheral device) I/O and finally iii) transactional database applications shall be investigated and evaluated. 

  • ACoF -- Approximate Computing on FPGAs

    (Third Party Funds Single)

    Term: since 1. June 2021
    Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
    Approximate Computing systematically exploits the trade-off between accuracy, power/energy consumption, performance, and cost of many applications of daily life, e.g., computer vision, machine learning, multimedia, big data analysis and gaming. Computing results approximately is a viable approach here thanks to inherent human perceptual limitations, redundancy, or noise in input data.In this project, we want to investigate novel techniques for the design and optimization of approximate logic circuits for FPGA (field-programmable gate array) targets. These devices are known to perfectly combine high performance of hardware designs with the re-programmability of software and are used in many products of daily life and even cloud servers. The goal of our research is a) to investigate novel techniques for function approximation exploiting FPGA artifacts, i.e., DPS blocks and BRAM, b) to study new error metrics and a calculus for error propagation in networks of approximate arithmetic modules, c) to develop novel FPGA-specific optimization techniques for design space exploration and synthesis of approximate multi-output Boolean functions, and d) study how to integrate error modeling and analysis techniques into existing high-level programming languages and subsequent synthesis of approximate Verilog or VHDL designs.
  • Cybercrime and Forensic Computing -- Hardware Security

    (Third Party Funds Group – Sub project)

    Overall project: Research Training Group 2475: Cybercrime and Forensic Computing
    Term: 1. October 2019 - 1. October 2028
    Funding source: DFG / Graduiertenkolleg (GRK)
    This project is funded by the German Research Foundation (DFG) within the Research Training Group 2475 "Cybercrime and Forensic Computing".
    Cybercrime is becoming an ever greater threat in view of the growing societal importance of information technology. At the same time, new opportunities are emerging for law enforcement, such as automated data collection and analysis on the Internet or via surveillance programs. But how do you deal with the fundamental rights of those affected when "forensic informatics" is used? The RTG "Cybercrime and Forensic Informatics" brings together experts in computer science and law to investigate the research field of "prosecution of cybercrime" in a systematic way.
    At the Chair of Computer Science 12, aspects of hardware security are investigated. The focus is on researching techniques to extract information and traces from technical devices via side channels. The physical implementation of a system emits further, so-called side-channel information to the environment in addition to the actual processing of input data to output data. Known side channels are, for example, the data-dependent time behavior of an algorithm implementation, as well as power consumption, electromagnetic radiation and temperature development.

Recent publications






Related Research Fields