It’s Tuesday Afternoon in Copenhagen, and it’s about time to show our hand with some code. This article is for the third of our “Six Steps” — you can read the overview here.
This article is aimed at engineers, researchers and execs in the Wind Industry, to help understand the process of digitalisation. If you’re qualified in Systems Architecture or Data/Software Engineering, you’re way ahead of this; just get stuck in already!!
About the ‘Engineer’ step
Partly because we’re keen to just show you what we’re up to already, and partly because examples are much more useful than essays, I’m going to only provide a short general background.
This step is where you get stuck in and actually do something other than fun diagrams and fantasising about worlds covered in hexagons. If you’re not a pro, Cloud Engineering can seem really daunting at first, but follow some tutorials (or have a go at re-implementing our exact setup below) to get started.
We can’t resist mentioning just THREE things you’ll see us do this week. Get these set up for your team and you’ll save a TON of time, we promise.
Use terraform (if you do nothing else, do this)
We were a bit late to the party with this, and recently started using Terraform to specify our Infrastructure As Code (IAC). It works with all the Cloud Providers like GCP, AWS and Azure.
Terraform has been a revelation. Start now. Not later. Don’t even create a single bucket using the console.
Terraform is SO simple to set up, don’t think of it as “something to do later”. Your setup and learning time will be saved after the first day of work. We especially like that the identity/access permissions (the most difficult and most important part to get right!) are clear.
There are plenty of alternatives to Terraform (for example Pulumi syncs nicely with existing infrastructure). We wouldn’t a Provider-specific solution — one of the strengths of Terraform is being provider agnostic.
Version control and conventional commits
Even if you’re working alone, having code on GitHub and using the git workflow lets you remember, check and compare things easily.
Adopting a conventional commits pattern means you not only get a great engineering logbook, but you can auto-generate releases and version numbers. At Octue, we built a whole system of open source tools to help (see an example of the autogenerated releases here).
Continuous Delivery
In each of the repositories we release in this task, look in the .github/workflows
folder. These “actions” deploy code to the cloud systems, create release notes, or run other automations every time we merge code into main branch. We don’t ever think about how to get code onto servers; it’s just there a couple of minutes after we finish. HUGE time saver and great for quality control.
Just show me the code!!!
Here you go. The windeurope72hours-elevations-api
repository defines an API service that queries a database. All the Cloud Infrastructure is also defined in the same repo. There will be two more repositories coming up soon.
Remember, this event is NOT a hackathon — we didn’t write this all in the hall, although we are refining it throughout the week. For transparency, it’s taken about 4 days to produce this basic service, and there’s lots of room for improvement — look at the commits to see a detailed history of how it evolved.
What it does
The API comprises basically one function. It:
- Accepts a request (POSTed data complying to a schema we’ve published)
- Checks the input isn’t outside some basic sensible ranges.
- Queries the database for cell contents
- For hexagon cells that aren’t in the database, it asks a “question” to a scientific data service (more about that in our next step!), in order to populate the database.
- Responds with results from the database, and an “ask again later”
Figuring out the database
We spent some time reading a variety of material around Graph databases, and how to store tree-like data such as this. In the end, we opted for (almost) the simplest arrangement we could:
By storing elevations on a separate node, we doubled the amount of nodes in the database. Why do such a thing? We were bearing in mind the ability to federate additional data in the future — having one set of nodes representing the position allows us to query for those nodes and all types of data associated with them more straightforwardly than if each data type had its own mesh.
We were inspired a lot by this paper, worth a read:
Choosing a Cloud Provider
We use GCP for everything at Octue, we find that it has some subtle technical details (like the atomic clock synchronising all the cloud stores globally) that make things overall run smoother. And the console has a pretty consistent interface between all their different offerings, which makes a big difference to the learning curve.
But mostly, we use it because we’re used to it. If Azure or AWS are your thing, there’s nothing here that doesn’t have an equivalent in those providers.
Single-endpoint API
The API that we laid out in our architecture is super simple — it has a single endpoint. So rather than spin up a whole server to handle that, we’ve gone with a “serverless” Cloud Function which is nice and easy to deploy:
Cloud Infrastructure — Terraform
The entire infrastructure is defined in our terraform files here. We’ve added notes to each entry describing why and what it’s for. You’ll notice there are some things we haven’t talked about yet (like a Cloud Run service)... We’ll come back to those!
To reproduce this entire project for yourself, you would:
- Create an account on GCP
- Create a new project in your GCP account
- Fork the code repository to your GitHub account
- Check out code to your laptop and install terraform
- Create a service account in your GCP project, named
terraform,
with Editor permissions. Save the JSON key file toterraform/gcp-credentials.json
- Change the variables in
variables.tf
to match your project then runtf apply
You’ll be asked to enable a lot of APIs which is a chore, but once you’ve clicked the links and enabled all the APIs you’re done.
Pro tip: We always choose our regions to have lowest CO2 impact. Here, we’ve set everything up in
europe-west-1
— there’s no need to worry about global query speed or redundancy for a little demo service like this.
Cloud Infrastructure — Database
The only thing we didn’t provision using Terraform is the database, because we prefer to use managed database services, we signed up for a paid tier on AuraDB, hosted by neo4j. The main motivation for this selection was simply that we wanted to try it.
Pro tip: Moving from the free to the paid tier on AuraDB got us a dedicated instance in the same region as our cloud function is deployed, which sped up API access calls from 5s to 1s!
Pro tip: Managing databases requires a dedicated expert. Unless you have access to one of those people in your organisation, always used one of the managed database services from your cloud provider. It costs less in time, and carries much lower risk (unless your data is easily reproducible).
Limiting Costs
The bright-eyed among you will have realised by now that to handle the trillions of nodes we’d need an EXTREMELY large and costly database!
So, rather than populating the entire planet we chose to “Lazily Load” data. That means, on request for a particular location / h3 cell, we populate that area on demand.
Pro tip: Provided you’ve wrapped complexity up behind a straightforward API, you can always change the mechanics later…
If this takes off, we can use all sorts of strategies to reduce storage cost. We could use advanced heuristics to remove rarely accessed data, or use more advanced tree-walking to cover areas of ocean without storing all the fine level cells, or ditch the database altogether and pre-process raw tif images into a much quicker-to-access custom binary format, just o name a few ideas.
We’ve also heavily limited the number of simultaneous attempts to access the API. This is purely to keep our costs down (we’re providing this for free, after all!).
The cloud function keeps a cache of all the cells it’s asked to be populated so it doesn’t ask for the same ones again.
Something to ponder: What do you think is wrong with our caching strategy? Could it be improved and how much would that cost? Answers on a postcard!
Wrapping Up
We made a really simple API and deployed a free tier database, which we later upgraded to the (cheapest!) paid tier for performance reasons.
We knew that we couldn’t store all the data, so we developed a way of lazy-loading it on demand, which can be improved and streamlined later.
But that’s useless right now. The next step? Let’s get some data into it!