databases :meets: AI :meets: modeling
Bad Machine Learning on Big Data may bring you from one dead end to another.
Good Machine Learning is finding that perfect balance between a model that is too simple and a model that is too complex.
Good Machine Learning can take you anywhere.

Our Approach

Typical Big Data Toolbox:

© iStockphoto.com SergZSV

Our Toolbox:

© iStockphoto.com SergZSV

If everything you own is a hammer, every problem looks like a nail. That is not what we do. We are not blindly focussing on a single tool. We know the entire toolset. MapReduce, Spark, and NoSQL won't solve all of your problems, neither will machine learning, modeling, and statistics. However, the right combination will solve your problem!

Our Process

Here is an overview of how we approach each new project.

Understand and Identify

In a short interview we let you describe your company's pain points and problems. We will identify potential for improving your business using big data, AI, machine learning, and modeling.

Grab Real Data

We grab a subset of your data and together with your company's domain experts (explaining the semantics of the data to us) we will analyze that data in a short period of time (weeks to max one month).

Data Preparation and Processing

We perform ETL (extract-transform-load), clean your data, repair missing values, and normalize schemas.

Data Visualiziation

We visualize your data in several different ways to spot interesting patterns, correlations, features, events, and anomalies.

Feature Boosting

We boost the value of your data by semi-automatically adding additional features, we also skip unimportant features, we standardize and rescale data distributions.

Automatic Modeling

We run your data through Daimond's predictor generator engine. We identify promising prediction models and quantify the quality of these prediction models. We generate precision/recall and ROC-curves for that.

Present Insights and Act

We present our results and insights to your team in a joint workshop. Together with your team we identify the impact of our analysis on your company. We help you identify ways how to act.

Our Expertise

Our team offers a unique combination of decade-long experience in virtually all subfields of data science. Including:

High-performance databases
Main-memory databases
Large-scale/big data
Extract-Transform-Load (ETL)
Deep Learning
Keras and Tensorflow
Data Science in Python
Data Science in R
Large timeseries
Event data
Dirty Data
Hidden Markov Models
Decision trees/Random forests
Importance sampling
Bayesian methods
Discrete-event simulation
Parameter Inference

Our Team

© Uwe Bellhäuser

We are a group of computer scientists that are enthusiastic about applying machine learning to real world problems.

Prof. Dr. Verena Wolf

© Uwe Bellhäuser
Verena Wolf is a Full Professor of Computer Science at Saarland University. She holds a degree in Computer Science including a Ph.D. in 2008 from the University of Mannheim. In 2009 she joined the Cluster of Excellence on Multimodal Interaction and Computing at Saarland University as a junior research group leader before receiving a call to a full professorship in 2012. She is a member of the Center for Bioinformatics at Saarland University and an Associate Editor of the ACM Transactions on Modeling and Computer Simulation Journal. In 2013, she received the "Young Innovator under 35 Award" of the Technology Review Magazine. She is a frequent member of scientific program committees of numerous distinguished international conferences including the Conference on Analytical and Stochastic Modelling Techniques and Applications, on Computational Methods in Systems Biology, on Hybrid Systems: Computation and Control, on Measurement, Modelling and Evaluation of Computing Systems and Dependability and Fault Tolerance and on Quantitative Evaluation of Systems. Her research focuses on probabilistic modeling and data science, in particular on statistical and numerical analysis methods, efficient discrete-event simulation techniques, parameter inference, sensitivity analysis and rare event simulation. She enjoys the design of hybrid models that do not exclusively rely on a descriptive/mechanistic approach but are augmented with results of machine learning techniques.

Prof. Dr. Jens Dittrich

© Uwe Bellhäuser
Jens Dittrich is a Full Professor of Computer Science in the area of Databases, Data Management, and Big Data at Saarland University, Germany. Previous affiliations include U Marburg, SAP AG, and ETH Zurich. He received an Outrageous Ideas and Vision Paper Award at CIDR 2011 (conference on Innovative Data Systems Research), a BMBF VIP Grant in 2011, a best paper award at VLDB 2014 (Conference on Very Large Data Bases), two CS teaching awards in 2011 and 2013, as well as several presentation awards including a qualification for the interdisciplinary German science slam finals in 2012 and three presentation awards at CIDR (2011, 2013, and 2015). He has been a PC member and area chair/group leader of prestigious international database conferences and journals such as PVLDB/VLDB, SIGMOD, ICDE, and VLDB Journal. He is on the scientific advisory board of Software AG. He is a keynote speaker at VLDB 2017: “Deep Learning (m)eats Databases“. He will also be a keynote speaker at the DEEM-workshop (Data Management for End-To-End Machine Learning) at SIGMOD 2018. At Saarland University he co-organizes the Data Science Summer School (http://datasciencemaster.de). His research focuses on fast access to big data including in particular: data analytics on large datasets, scalability, main-memory databases, database indexing, timeseries, reproducability, and deep learning. He enjoys coding data science problems in Python, in particular using the keras and tensorflow library for Deep Learning.

Thilo KrÜger

© Uwe Bellhäuser
Thilo Krüger is a PhD student of Computer Science at Saarland University in the final phase. He works in the area of statistical modeling and especially on the simulation of (bio)chemical processes. He holds a degree in chemistry (minor Computer Science) from the Institute for Technical Chemistry at Hamburg University. He has a strong interdisciplinary background, including stochastics, modeling and simulation, Bayesian statistics, machine learning, data analysis but also polymer- and technical chemistry as well as epigenetics. During his work as PhD student, he published several papers at international conferences in the field of statistical modeling, and is an active reviewer for numerous international conferences and journals. He enjoys using sophisticated statistical methods for solving problems from completely new and diverse application areas.

Dr. Endre Palatinus

© Uwe Bellhäuser
Endre Palatinus is a postdoctoral researcher and data scientist at the Information Systems chair of Prof. Jens Dittrich in the Computer Science Department of Saarland University, where he finished his Ph.D. studies in 2016. His research focuses on data layouts, robustness, and code generation from hand-written queries to entire database systems. He has published work in these areas on major conferences and workshops in the field of databases and information systems, including PVLDB/VLDB and IMDM. In addition, he has served as an external reviewer for several prestigious data management conferences including ACM SIGMOD, VLDB, ICDE, EDBT, SOCC, and BTW. He enjoys solving data science problems in R and Tableau, in particular data munging and visualisation.

Contact Us

Please tell us about you and we will identify potential for improving your business using big data, AI, machine learning, and modeling.