My research interest is to develop methods and software that help to understand current questions in biology and evolution. A particular focus are tools to analyze and visualize large scale datasets from different sources, such as high throughput sequencing data, phylogenetic information, and environmental factors. By combining these data, patterns and interactions between them can be found and interpreted in a biological context. My methods draw inspiration from and make use of concepts from diverse fields, such as graph theory, statistical data analysis, and machine learning.
One of the main applications of my research is the so-called phylogenetic placement of metagenomic sequences from environmental samples. The methods allow to understand the connections and patterns in the samples in the context of a phylogenetic reference tree, while taking respective meta-data into account. See below for the publications on these methods.
Over the years, I have also contributed to several empirical studies, in particular large scale metagenomic studies. This helped me to gain an understanding of the apporaches and needs of biologists, and inspired a lot of my research.
Due to the current and future growth of biological datasets, an important consideration is the scalability of the software. To this end, I mostly develop in modern C++11, which allows for highly efficient implementations. Still, my software design strives to be easy to understand and to adapt for other researchers. See my main library genesis for an example. Furthermore, for end users who simply want to use the methods, I offer the command line tool gappa. Both are also described in the software section.