About

About me

I am a German data scientist currently living in London. I have a curious mindset, a sustained enthusiasm for data, and a passion for applying my skill set in a pragmatic way.

My main tool is R, which I use to interact with data sources (data Warehouses, cluster infrastructures, web APIs), and to filter, reshape, process, and explore data. I contribute award-winning code to open-source R libraries and my proprietary R-code has increased operational efficiency and predictive accuracy for my employers. I also use in Python and JavaScript on occasion.

Methods I most commonly use include optimization, supervised, and unsupervised machine learning methods (GLMs, boosted trees, k-means clustering) along with methods ensuring the validity and generalisability of my model results (cross-validation, lasso / elastic net regularization, bagging). I use components of the Hadoop ecosystem (such as Hive and Spark) to build systems that can handle large amounts of data, as well as cloud computing (AWS) to engineer reliable and scalable data pipelines.

Please have a look at the link section to find me in other places on the web, such as Twitter, LinkedIn, and StackOverflow.

About this blog

I intent to use this blog as

an excuse to conduct little side projects
to showcase work that I do elsewhere
to archive them in one place for future reference