I learned this when I was 20. It's not hard to start.
There are 2 types of queries: reads and writes. Most of your queries are reads, handle those first.
Do this, in order:
1/ Reduce reads - Cache, cache, cache. Use read-thru and write-thru caching. Use Redis.
2/ Optimize
Every single software company CFO saw what happened with Twitter (90% labor cost reduction) and started asking hard questions of leadership.
Also section 174.
Building a data pipeline in 2020 is like building a bridge in the 14th century
• You do a lot of work that gets thrown away
• Half the job is getting rid of the stuff you dont want
• The folks who started it are dead by the time it's done
i’m coaching someone through a data analyst job search, and when i asked if they knew sql their response was “i dont need to learn it bc it’s easy to pick up on the job”
sql is deceptively complex. easy to pick up. difficult to master. please take the time to learn sql
I have a teeny little announcement:
I've worked with cloud data warehouses for 9 years. And for 9 years, I've felt frustrated at the dev experience.
To fix it, I'm working on a new open source project. I'm calling it Titan.
Let me give you a sneak peak at what Titan is –
1/
Snowflake released MATCH_RECOGNIZE. I'd never heard of it before.
Say you run an ecomm store and want to analyze the shopping funnel. To model the steps in SQL you either need a messy set of JOINs or a daisy-chain of window functions.
Or you write 15 lines of MATCH_RECOGNIZE:
This is the story of how I independently landed on the same ideas that make up dbt today.
I joined a DTC startup in 2013. This is how their data infrastructure looked shortly before I inherited.