I work on query processing in databases, and the use of artificial intelligence to boost database performance. Regarding the application of databases, I investigate the potential of collecting the queries first and then derive the database design from them.
- Query-driven database design, integration, and optimization
- Using modern hardware (FPGAs) to speed up database-query processing
- Investigate AI methods for the improvement of database technology (e.g. autoencoder)
- Build a repository of neural-ne
Query Optimisation and Near-Data Processing on Reconfigurable SoCs for Big Data Analysis (Phase II)
(Third Party Funds Single)Term: 1. August 2021 - 31. July 2024
Funding source: Deutsche Forschungsgemeinschaft (DFG)
URL: https://www.dfg-spp2037.de/me943-9/Analysing petabytes of data in an affordable amount of time and energy requires massively parallel processing of data at their source. Active research is therefore directed towards emerging hardware architectures to reduce the volume of data close to their source and towards sound query analysis and optimisation techniques to exploit such novel architectures. The goal of the ReProVide project is to investigate FPGA-based solutions for smart storage and near-data processing together with novel query-optimisation techniques that exploit the speed and reconfigurability of FPGA hardware for a scalable and powerful (pre-)filtering of Big Data.
In the first funding phase, we have fostered the fundamentals for this endeavour. In particular, we have designed an FPGA-based PSoC architecture of so-called Reconfigurable Data Provider Units (RPUs). For data processing and filtering, an RPU exploits the capabilities of dynamic (run-time) hardware reconfiguration of modern FPGAs to load pre-designed hardware accelerators on-the-fly. An RPU is able to process SQL queries or parts of them in hardware in combination with CPU cores also available on the PSoC. For the integration of RPUs into a DBMS, new cost models had to be developed, taking the capabilities and characteristics of an RPU into account. Here, we have elaborated a novel hierarchical (multi-level) query optimisation to determine which operations are worthwhile to be assigned to a RPU (query partitioning) and how to deploy and execute the assigned (sub-)queries or database operators on the RPU (query placement). The implemented query optimiser shares the work between the global optimiser of the DBMS (in our case Apache Calcite) and an architecture-specific local optimiser running on the RPU.
In the second funding phase, our major research goals will be:
1.) Stream processing: RPUs could equally be beneficial for the filtering of streams. Here, a plethora of fundamentally new module functionality will have to be investigated to support non-standard operators, leading to RPUs applicable to a much more diverse class of tasks including window operations and data-preparation functionality.
2.) Scalability: User interaction with modern databases usually involves not only one, but a sequence of queries. At the same time, multiple applications are running concurrently. Here, we will design an eight-node RPU cluster attached to storage and network to enable the distributed and parallel data processing of large databases and data streams. Also required are concepts for data partitioning and novel query optimisation techniques, making use of query-sequence information.
3.) Demonstrator & Evaluation: As a testbed and a proof of the benefits of the ReProVide approach in general and an FPGA-based RPU cluster in particular, we want to analytically as well as experimentally evaluate the margins of energy reductions that become possible through near-data processing.
- Benenson, Z., Freiling, F., & Meyer-Wegener, K. (2022). Soziotechnische Einflussfaktoren auf die "digitale Souveränität" des Individuums. In Glasze, Georg; Odzuck; Eva; Staples, Ronald (Hrg.), Was heißt digitale Souveränität? Diskurse, Praktiken und Voraussetzungen "individueller" und "staatlicher Souveränität" im digitalen Zeitalter. (S. 61 - 87). Bielefeld: transcript Verlag.
- Beena Gopalakrishnan Nair, L., Becher, A., Wildermann, S., Meyer-Wegener, K., & Teich, J. (2021). Speculative Dynamic Reconfiguration and Table Prefetching Using Query Look-Ahead in the ReProVide Near-Data-Processing System. Datenbank-Spektrum. https://dx.doi.org/10.1007/s13222-020-00363-7
- Beena Gopalakrishnan Nair, L., & Meyer-Wegener, K. (2021). COPRAO: A Capability Aware Query Optimizer for reconfigurable Near Data Processors. In Proc. 37th IEEE International Conference on Data Engineering Workshops, ICDE Workshops 2021, Chania, Greece, April 19-22, 2021 (pp. 54-59). Chania, Crete, Greece, GR: IEEE.
- Schwab, P., Röckl, J., Langohr, M., & Meyer-Wegener, K. (2021). Performance Evaluation of Policy-Based SQL Query Classification for Data-Privacy Compliance. Datenbank-Spektrum. https://dx.doi.org/10.1007/s13222-021-00385-9
- Beena Gopalakrishnan Nair, L., Becher, A., & Meyer-Wegener, K. (2020). The ReProVide Query-Sequence Optimization in a Hardware-Accelerated DBMS. In DaMoN '20: Proceedings of the 16th International Workshop on Data Management on New Hardware (pp. 1-3). Portland, Oregon USA: ACM Digital Library.
- Beena Gopalakrishnan Nair, L., Becher, A., Meyer-Wegener, K., Wildermann, S., & Teich, J. (2020). SQL Query Processing Using an Integrated FPGA-based Near-Data Accelerator in ReProVide. In Proceedings of EDBT (pp. 4). Copenhagen, DK.
- Ripperger, S., Carter, G., Page, R., Duda, N., Kölpin, A., Weigel, R.,... Kapitza, R. (2020). Thinking small: next-generation sensor networks close the size gap in vertebrate biologging. Plos Biology. https://dx.doi.org/10.1371/journal.pbio.3000655
- Schwab, P., Langohr, M., & Meyer-Wegener, K. (2020). A Framework for DSL-Based Query Classification Using Relational and Graph-Based Data Models. In ACM (Eds.), GRADES-NDA'20: Proceedings of the 3rd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) (pp. 10:1-10:5). Portland, OR, US: ACM.
- Schwab, P., Langohr, M., & Meyer-Wegener, K. (2020). We Know What You Did Last Session: Policy-Based Query Classification for Data-Privacy Compliance With the DataEconomist. In Association for Computing Machinery (Eds.), Proceedings of the SSDBM 2020: 32nd International Conference on Scientific and Statistical Database Management (pp. 30:1 - 30:4). Vienna, Virtual Conference, AT: New York, NY, United States: International Conference Proceeding Series (ICPS).
- Schwab, P., & Meyer-Wegener, K. (2020). Towards Evolutionary, Domain-Specific Query Classification Based on Policy Rules. In Daniel Trabold, Pascal Welke, Nico Piatkowski (Eds.), Proceedings of the Conference "Lernen, Wissen, Daten, Analysen" (pp. 291-295). Online, DE: CEUR-WS.
- Vöhringer, D., & Meyer-Wegener, K. (2020). Future Fetch -- Towards a ticket-based data access for secondary storage in database systems. In Proc. Conf. "Lernen, Wissen, Daten, Analysen" (pp. 270 - 278). Bonn / Online, DE: CEUR-WS.