Why
As I started working more with data in AWS I wanted to get my feet wet. My home automation system gives me historical data on all my sensors going back almost two years now and something I’ve wanted is historical weather data. That can be fairly expensive, but weatherapi.com will return for free the prior several days of historical weather data, so every day I grab that data and save it to S3 so I can query it with Athena later. I also save to a local MariaDB just to have a local copy in case I ever decide to just not use the AWS data or just… just because. “Just because” is always a good enough reason to try something.
Architecture Diagram
This is pretty straightforward and a fairly common pattern. Nothing fancy here. I have a web service on-prem that queries weatherapi.com for the historical data for the prior date, triggered on a schedule every morning by AirFlow. It saves the data to MariaDB locally and also ships a parquet file to S3. Since I’m a penny-pincher I crawl the new data every Saturday only, since I’m not constantly querying it. I’m not even using the data for anything at all yet, other than a few POCs to demonstrate I can.
I wanted to see how easily I could accomplish this and I did. I learned what I needed to out of it.
Future Plans
- React dashboard app to show charts (just because)
- Tie historical weather data to historical sensor data within Home Assistant
- MariaDB has this information, of course, but again half my projects are “just to see if I can”