#10 How to Get Started with Analytics Engineering

DBT Template and a New ML Tool!

📊 How to Get Started with Analytics Engineering

For a brief on what is Analytics Engineering, check out my previous article “What is Analytics Engineering and Why Should You Care?”.

Analytics Engineering consists of 3 parts:

  • Data Collection: Utilize tools like Google Analytics, Mixpanel, or custom Python and SQL scripts for comprehensive data gathering. Employ techniques such as event tracking and API integration to ensure data completeness.

  • Data Transformation: Clean and structure raw data using tools like Apache Spark, DBT, Python with Pandas, or SQL queries. Apply data normalization and feature engineering techniques to derive actionable insights from transformed data.

  • Quality Assurance Testing: Validate data integrity and analytics processes using tests for every metric or column you track. Employ unit testing and monitoring techniques to ensure ongoing quality and reliability.

Example: Let’s calculate the daily number of active users per month that place a trade type X from a table that has: customer id, trade id, trade type, day. Our SQL would look something like this for a dbt config.

Let’s try to break this down:

  1. The dbt template defines a macro called daily_active_users_trade_x() to calculate the daily active users for trade type X.

  2. It retrieves the incremental values from the previous day's data using get_incremental_values() and subtracts yesterday's count from today's count to get the new active users.

  3. Finally, it utilizes incremental() to ensure this job runs incrementally per day, updating only the new data each day.

To get started with Analytics Engineering, I recommend understanding DBT through one of these resources
What is dbt?
The Complete dbt (Data Build Tool) Bootcamp: Zero to Hero

 📰 Data Tools, Articles and Resources 

Featured

ML Flow: Build better models and generative AI apps on a unified, end-to-end, open-source MLOps platform