Our research centers around the systematic design (CAD) of hardware/software systems, ranging from embedded systems to HPC platforms. One principal research direction is domain-specific computing that tries to tackle the very complex programming and design challenge of parallel heterogeneous computer architectures. Domain-specific computing drastically separates the concerns of algorithm development and target architecture implementation, including parallelization and low-level implementation details. The key idea is to take advantage of the knowledge being inherent in a particular problem area or field of application, i.e., a particular domain, in a well-directed manner and thus, to master the complexity of heterogeneous systems. Such domain knowledge can be captured by reasonable abstractions, augmentations, and notations, e.g., libraries, Domain-specific programming languages (DSLs), or combinations of both (e.g., embedded DSLs implemented via template metaprogramming). On this basis, patterns can be utilized to transform and optimize the input description in a goal-oriented way during compilation, and, finally, to generate code for a specific target architecture. Thus, DSLs provide high productivity plus typically also high performance. We develop DSLs and target platform languages to capture both domain and architecture knowledge, which is utilized during the different phases of compilation, parallelization, mapping, as well as code generation for a wide variety of architectures, e.g., multi-core processors, GPUs, MPSoCs, FPGAs. All these steps usually go along with optimizing and exploring the vast space of design options and trading off multiple objectives, such as performance, cost, energy, or reliability.
Research projects
Diffusion-weighted imaging and quantitative susceptibility mapping of the breast, liver, prostate, and brain
Development of new MRI pulse sequences
Development of new MRI post-processing schemes
Joint evaluation of new MR methods with radiology
Domain-specific Computing for Medical imaging
Hipacc – the Heterogeneous Image Processing Acceleration Framework
AI Laboratory for System-level Design of ML-based Signal Processing Applications
Architecture Modeling and Exploration of Algorithms for Medical Image Processing
Current projects
Optimization and Toolchain for Embedding AI
(Third Party Funds Single)
Term: 1. March 2023 - 28. February 2026 Funding source: Industrie
Artificial Intelligence (AI) methods have quickly progressed from research to productive applications in recent years. Typical AI models (e.g., deep neural networks) yield high memory demands and computational efforts for training and when making predictions during operation. This is opposed to the typically limited resources of embedded controllers used in automotive or industrial applications. To comply with these limitations, AI models must be streamlined on different levels to be applicable to a given specific embedded target hardware, e.g., by architecture and feature selection, pruning, and other compression techniques. Currently, model adaptation to fit the target hardware is achieved by iterative, manual changes in a “trial-and-error” manner: the model is designed, trained, and compiled to the target hardware while applying different optimization techniques. The model is then checked for compliance with the hardware constraints, and the cycle is repeated if necessary. This approach is time-consuming and error-prone.
Therefore, this project, funded by the Schaeffler Hub for Advanced Research at Friedrich-Alexander-Universität Erlangen-Nürnberg (SHARE at FAU), seeks to establish guidelines for hardware selection and a systematic toolchain for optimizing and embedding AI in order to reduce the current efforts of porting machine learning models to automotive and industrial devices.
This project is funded by the German Research Foundation (DFG) within the Priority Program SPP 2377 "Scalable Data Management for Future Hardware".
HYPNOS explores how emerging non-volatile memory (NVM) technologies could beneficially replace not only main memory in modern embedded processor architectures, but potentially also one or multiple levels of the cache hierarchy or even the registers and how to optimize such a hybrid-volatile memory hierarchy for offering high speed and low energy tradeoffs for a multitude of application programs while providing persistence of data structures and processing state in a simple and efficient way.
On the one hand, completely non-volatile (memory) processors (NVPs) that have emerged for IoT devices are known to suffer from low write times of current NVM technologies as well as by orders of magnitude lower endurance than, e.g., SRAM, thus prohibiting an operation at GHz speeds. On the other hand, existing NVM main memory computer solutions suffer from the need of the programmer to explicitly persist data structures through the cache hierarchy.
HYPNOS (Named after the Greek god of sleep.) systematically attacks this intertwined performance/endurance/programmability gap by taking a hardware/software co-design approach:
Our investigations include techniques for
a) design space exploration of hybrid NVM memory processor architectures} wrt. speed and energy consumption including hybrid (mixed volatile) register and cache-level designs,
b) offering instruction-level persistence for (non-transactional) programs in case of, e.g., instantaneous power failures through low-cost and low-latency control unit (hardware) design of checkpointing and recovery functions, and additionally providing
c) application-programmer (software) persistence control on a multi-core HyPNOS system for user-defined checkpointing and recovery from these and other errors or access conflicts backed by size-limited hardware transactional memory (HTM).
d) The explored processor architecture designs and different types of NVM technologies will be systematically evaluated for achievable speed and energy gains, and for testing co-designed backup and recovery mechanisms, e.g., wakeup latencies, etc., using a gem5-based multi-core simulation platform and using ARM processors with HTM instruction extensions.
As benchmarks, i) simple data structures, ii) sensor (peripheral device) I/O and finally iii) transactional database applications shall be investigated and evaluated.
Approximate Computing systematically exploits the trade-off between accuracy, power/energy consumption, performance, and cost of many applications of daily life, e.g., computer vision, machine learning, multimedia, big data analysis and gaming. Computing results approximately is a viable approach here thanks to inherent human perceptual limitations, redundancy, or noise in input data.In this project, we want to investigate novel techniques for the design and optimization of approximate logic circuits for FPGA (field-programmable gate array) targets. These devices are known to perfectly combine high performance of hardware designs with the re-programmability of software and are used in many products of daily life and even cloud servers. The goal of our research is a) to investigate novel techniques for function approximation exploiting FPGA artifacts, i.e., DPS blocks and BRAM, b) to study new error metrics and a calculus for error propagation in networks of approximate arithmetic modules, c) to develop novel FPGA-specific optimization techniques for design space exploration and synthesis of approximate multi-output Boolean functions, and d) study how to integrate error modeling and analysis techniques into existing high-level programming languages and subsequent synthesis of approximate Verilog or VHDL designs.
Cyberkriminalität wird angesichts der wachsenden gesellschaftlichen Bedeutung der Informationstechnologie zu einer immer größeren Bedrohung. Gleichzeitig bieten sich neue Möglichkeiten der Strafverfolgung, wie etwa automatisierte Datensammlung und -auswertung im Netz oder Überwachungsprogramme. Doch wie geht man mit den Grundrechten der Betroffenen um, wenn „forensische Informatik“ genutzt wird? Das GRK „Cyberkriminalität und Forensische Informatik“ bringt Expertinnen und Experten der Informatik und Rechtswissenschaften zusammen, um das Forschungsfeld „Strafverfolgung von Cyberkriminalität“ systematisch zu erschließen.
Sabih, M., Yayla, M., Hannig, F., Teich, J., & Chen, J.-J. (2023). Robust and Tiny Binary Neural Networks using Gradient-based Explainability Methods. In Eiko Yoneki, Luigi Nardi (Eds.), EuroMLSys '23: Proceedings of the 3rd Workshop on Machine Learning and System (pp. 87–93). Rome, Italy, IT: New York(NY) United States: Association for Computing Machinery (ACM).
Sixdenier, P.-L.A.E., Wildermann, S., Ottens, M., & Teich, J. (2023). Seque: Lean and Energy-aware Data Management for IoT Gateways. In Proceedings of the IEEE International Conference on Edge Computing and Communications (EDGE). Chicago, Illinois USA, US: IEEE.
Hahn, T., Becher, A., Wildermann, S., & Teich, J. (2022). Raw Filtering of JSON data on FPGAs. In Proceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe. Antwerpen, BE.
Hahn, T., Wildermann, S., & Teich, J. (2022). Auto-Tuning of Raw Filters for FPGAs. In IEEE Proceedings of the 32nd International Conference on Field Programmable Logic and Applications. Belfast, United Kingdom.
Heidorn, C., Meyerhöfer, N., Schinabeck, C., Hannig, F., & Teich, J. (2022). Hardware-Aware Evolutionary Filter Pruning. In Springer, Cham (Eds.), Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS XXII) (pp. 283 - 299). Pythagoreio, Samos, GR: Switzerland: Springer Nature.
Sabih, M., Mishra, A., Hannig, F., & Teich, J. (2022). MOSP: Multi-Objective Sensitivity Pruning of Deep Neural Networks. In IEEE (Eds.), 2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC) (pp. 1-8). Virtual: Pittsburgh, PA, USA: Institute of Electrical and Electronics Engineers (IEEE).
Snelting, G., Teich, J., Fried, A., Hannig, F., & Witterauf, M. (2022). Compilation and Code Generation for Invasive Programs. In Jürgen Teich, Jörg Henkel, Andreas Herkersdorf (Eds.), Invasive Computing. (pp. 309-333). FAU University Press.
Teich, J., Brand, M., Hannig, F., Heidorn, C., Walter, D., & Witterauf, M. (2022). Invasive Tightly-Coupled Processor Arrays. In Jürgen Teich, Jörg Henkel, Andreas Herkersdorf (Eds.), Invasive Computing. (pp. 177-202). FAU University Press.
Teich, J., Esper, K., Falk, J., Pourmohseni, B., Schwarzer, T., & Wildermann, S. (2022). Basics of Invasive Computing. In Jürgen Teich, Jörg Henkel, Andreas Herkersdorf (Eds.), Invasive Computing. (pp. 69-95). FAU University Press.
Teich, J., Henkel, J., & Herkersdorf, A. (2022). Introduction to Invasive Computing. In Jürgen Teich, Jörg Henkel, Andreas Herkersdorf (Eds.), Invasive Computing. (pp. 1-66). FAU University Press.
Trautmann, J., Patsiatzis, N., Becher, A., Teich, J., & Wildermann, S. (2022). Real-Time Waveform Matching with a Digitizer at 10 GS/s. In IEEE Proceedings of the 32nd International Conference on Field Programmable Logic and Applications. Belfast, United Kingdom.
Alhaddad, S., Förstner, J., Groth, S., Grünewald, D., Grynko, Y., Hannig, F.,... Wende, F. (2021). HighPerMeshes -- A Domain-Specific Language for Numerical Algorithms on Unstructured Grids. In Proceedings of the 18th International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar) in Euro-Par 2020: Parallel Processing Workshops. Warsaw, PL: Springer.
Keszöcze, O., Brand, M., Witterauf, M., Heidorn, C., & Teich, J. (2021). Aarith: An Arbitrary Precision Number Library. In Proceedings of the ACM/SIGAPP Symposium On Applied Computing. virtual conference, KR.
Sabih, M., Hannig, F., & Teich, J. (2021). Fault-Tolerant Low-Precision DNNs using Explainable AI. In 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). Virtual Workshop: IEEE Xplore.
Streit, F.-J., Krüger, P., Becher, A., Schlumberger, J., Wildermann, S., & Teich, J. (2021). CHOICE – A Tunable PUF-Design for FPGAs. In IEEE Proceedings of the 31th International Conference on Field Programmable Logic and Applications. Dresden, Germany.
Streit, F.-J., Krüger, P., Becher, A., Wildermann, S., & Teich, J. (2021, December). Design and Evaluation of a Tunable PUF Architecture for FPGAs. Paper presentation at International Conference on Field-Programmable Technology (FPT), Auckland, New Zealand, NZ.
Streit, F.-J., Wildermann, S., Pschyklenk, M., & Teich, J. (2021). Providing Tamper-Secure SoC Updates through Reconfigurable Hardware. In Springer Proceedings of the 17th International Symposium on Applied Reconfigurable Computing. Rennes, France, FR: Springer Computer Science Proceedings.
Lengauer, C., Apel, S., Bolten, M., Chiba, S., Rüde, U., Teich, J.,... Schmitt, J. (2020). ExaStencils: Advanced multigrid solver generation. In Hans-Joachim Bungartz, Severin Reiz, Benjamin Uekermann, Philipp Neumann, Wolfgang E. Nagel (Eds.), Lecture notes in computational science and engineering. (pp. 405-452). Cham: Springer.
Lengauer, C., Apel, S., Bolten, M., Chiba, S., Rüde, U., Teich, J.,... Schmitt, J. (2020). ExaStencils – Advanced Multigrid Solver Generation. In Hans-Joachim Bungartz, Severin Reiz, Philipp Neumann, Benjamin Uekermann, Wolfgang Nagel (Eds.), Software for Exascale Computing – SPPEXA 2016-2019. (pp. 405-452). Springer.
Qiao, B., Reiche, O., Özkan, M.A., Teich, J., & Hannig, F. (2020). Efficient Parallel Reduction on GPUs with Hipacc. In Proceedings of the 23rd International Workshop on Software and Compilers for Embedded Systems (SCOPES) (pp. 58-61). Sankt Goar, DE.
Streit, F.-J., Fritz, F., Becher, A., Wildermann, S., Werner, S., Schmidt-Korth, M.,... Teich, J. (2020). Secure Boot from Non-Volatile Memory for Programmable SoC-Architectures. In IEEE Proceedings of the 13th International Symposium on Hardware Oriented Security and Trust. San José, USA, US.
Özkan, M.A., Pérard-Gayot, A., Membarth, R., Slusallek, P., Leißa, R., Hack, S.,... Hannig, F. (2020). AnyHLS: High-Level Synthesis with Partial Evaluation. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). Hamburg, DE.
Our research centers around the systematic design (CAD) of hardware/software systems, ranging from embedded systems to HPC platforms. One principal research direction is domain-specific computing that tries to tackle the very complex programming and design challenge of parallel heterogeneous computer architectures. Domain-specific computing drastically separates the concerns of algorithm development and target architecture implementation, including parallelization and low-level implementation details. The key idea is to take advantage of the knowledge being inherent in a particular problem area or field of application, i.e., a particular domain, in a well-directed manner and thus, to master the complexity of heterogeneous systems. Such domain knowledge can be captured by reasonable abstractions, augmentations, and notations, e.g., libraries, Domain-specific programming languages (DSLs), or combinations of both (e.g., embedded DSLs implemented via template metaprogramming). On this basis, patterns can be utilized to transform and optimize the input description in a goal-oriented way during compilation, and, finally, to generate code for a specific target architecture. Thus, DSLs provide high productivity plus typically also high performance. We develop DSLs and target platform languages to capture both domain and architecture knowledge, which is utilized during the different phases of compilation, parallelization, mapping, as well as code generation for a wide variety of architectures, e.g., multi-core processors, GPUs, MPSoCs, FPGAs. All these steps usually go along with optimizing and exploring the vast space of design options and trading off multiple objectives, such as performance, cost, energy, or reliability.
Research projects
Current projects
Optimization and Toolchain for Embedding AI
(Third Party Funds Single)
Funding source: Industrie
Artificial Intelligence (AI) methods have quickly progressed from research to productive applications in recent years. Typical AI models (e.g., deep neural networks) yield high memory demands and computational efforts for training and when making predictions during operation. This is opposed to the typically limited resources of embedded controllers used in automotive or industrial applications. To comply with these limitations, AI models must be streamlined on different levels to be applicable to a given specific embedded target hardware, e.g., by architecture and feature selection, pruning, and other compression techniques. Currently, model adaptation to fit the target hardware is achieved by iterative, manual changes in a “trial-and-error” manner: the model is designed, trained, and compiled to the target hardware while applying different optimization techniques. The model is then checked for compliance with the hardware constraints, and the cycle is repeated if necessary. This approach is time-consuming and error-prone.
Therefore, this project, funded by the Schaeffler Hub for Advanced Research at Friedrich-Alexander-Universität Erlangen-Nürnberg (SHARE at FAU), seeks to establish guidelines for hardware selection and a systematic toolchain for optimizing and embedding AI in order to reduce the current efforts of porting machine learning models to automotive and industrial devices.
HYPNOS – Co-Design of Persistent, Energy-efficient and High-speed Embedded Processor Systems with Hybrid Volatility Memory Organisation
(Third Party Funds Group – Sub project)
Term: 21. September 2022 - 21. September 2025
Funding source: DFG / Schwerpunktprogramm (SPP)
URL: https://spp2377.uos.de/
This project is funded by the German Research Foundation (DFG) within the Priority Program SPP 2377 "Scalable Data Management for Future Hardware".
HYPNOS explores how emerging non-volatile memory (NVM) technologies could beneficially replace not only main memory in modern embedded processor architectures, but potentially also one or multiple levels of the cache hierarchy or even the registers and how to optimize such a hybrid-volatile memory hierarchy for offering high speed and low energy tradeoffs for a multitude of application programs while providing persistence of data structures and processing state in a simple and efficient way.
On the one hand, completely non-volatile (memory) processors (NVPs) that have emerged for IoT devices are known to suffer from low write times of current NVM technologies as well as by orders of magnitude lower endurance than, e.g., SRAM, thus prohibiting an operation at GHz speeds. On the other hand, existing NVM main memory computer solutions suffer from the need of the programmer to explicitly persist data structures through the cache hierarchy.
HYPNOS (Named after the Greek god of sleep.) systematically attacks this intertwined performance/endurance/programmability gap by taking a hardware/software co-design approach:
Our investigations include techniques for
a) design space exploration of hybrid NVM memory processor architectures} wrt. speed and energy consumption including hybrid (mixed volatile) register and cache-level designs,
b) offering instruction-level persistence for (non-transactional) programs in case of, e.g., instantaneous power failures through low-cost and low-latency control unit (hardware) design of checkpointing and recovery functions, and additionally providing
c) application-programmer (software) persistence control on a multi-core HyPNOS system for user-defined checkpointing and recovery from these and other errors or access conflicts backed by size-limited hardware transactional memory (HTM).
d) The explored processor architecture designs and different types of NVM technologies will be systematically evaluated for achievable speed and energy gains, and for testing co-designed backup and recovery mechanisms, e.g., wakeup latencies, etc., using a gem5-based multi-core simulation platform and using ARM processors with HTM instruction extensions.
As benchmarks, i) simple data structures, ii) sensor (peripheral device) I/O and finally iii) transactional database applications shall be investigated and evaluated.
ACoF -- Approximate Computing on FPGAs
(Third Party Funds Single)
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
Cyberkriminalität und forensische Informatik
(Third Party Funds Single)
Funding source: Deutsche Forschungsgemeinschaft (DFG)
URL: https://www.cybercrime.fau.de-
Cyberkriminalität wird angesichts der wachsenden gesellschaftlichen Bedeutung der Informationstechnologie zu einer immer größeren Bedrohung. Gleichzeitig bieten sich neue Möglichkeiten der Strafverfolgung, wie etwa automatisierte Datensammlung und -auswertung im Netz oder Überwachungsprogramme. Doch wie geht man mit den Grundrechten der Betroffenen um, wenn „forensische Informatik“ genutzt wird? Das GRK „Cyberkriminalität und Forensische Informatik“ bringt Expertinnen und Experten der Informatik und Rechtswissenschaften zusammen, um das Forschungsfeld „Strafverfolgung von Cyberkriminalität“ systematisch zu erschließen.
Recent publications
2023
2022
2021
2020
Related Research Fields
Contact: