1. Building Our Own Open Source Supercomputer with R and AWS

    How to build a scaleable computing cluster on AWS and run hundreds or thousands of models in a short amount of time. We will completely rely on R and open source R packages. This is post 1 out of 2.


  2. Nesting Birds and Models in R Dataframes

    R Dataframes in the tidyverse are more than just simple tables these days. They can store complex information in list columns, and this becomes an immensely powerful framework when we use it to apply methods to different sets of data in parallel. In this article I illustrate this approach using data for a rare UK bird species to investigate if its distribution has been impacted by climate change.


  3. Data Science Machine and Command Line Setup

    Data Scientists require a very particular toolset for their everyday tasks, but unlike software developers, few of them spend a lot of time optimising this toolset for their specific needs. I compiled a simple step-by-step guide that helps to automate the process setting up a brand new data science machine and making it work for you by customising the command prompt and using a dotfile approach to manage configuration, identity, and access information. This gets you from zero to Data Science in minutes on MacOS


  4. fastai Deep Learning Image Classification

    Here I summarise learnings from lesson 1 of the fast.ai course on deep learning. fast.ai is a deep learning online course for coders, taught by Jeremy Howard. Its tag line is to “make neural nets uncool again”. I started the class a couple of days ago and have been impressed with how fast it got me to apply the methods, an approach described by them as top-down learning. I am writing this blog post to document and reflect on the things that I learned and to help other people that may be interested getting started with the class.


  5. Airdrop delivery with A* pathfinding

    This post is an event report and a quick walk through to a submission that I developed with a group of participants at an Alibaba / Met Office UK hackathon. We are using the A* algorithm with a couple of tweaks to route cargo balloons from London to a number of cities in the UK.


  6. Editable Plots from R to PowerPoint

    In this post I am giving a quick overview of how to create editable plots in PowerPoint from R. These plots are comprised of simple vector-based shapes and thus allow you to change labels, colours, or text position in seconds. Your project managers will love it!


  7. Extracting location history

    If you have an android phone then google logs your location. Fortunately, it makes all of that data available to you via the “timeline” dashboard. Unfortunately, there is no easy way to get it off there and into an IDE. So we’ll have to do this the hard way!


  8. Exploring Sales Data

    A big part of the interview process for many data science positions is a data science task or assignment. Companies usually choose a data set that is typical for them, while only in rare cases a sample of their actual production data. Here, I am exploring such a data set, sent out by a leading UK retailer.


  9. Database Connections in rMarkdown

    Connecting R to an enterprise data warehouse? Do it properly and do not hard-code your passwords! Here is how you can do it in R with rMarkdown and RStudio version 1.0+


  10. Old-fashioned Podcasts

    How to listen to your favourite podcast without podcast app? These few lines of python might be able to help!