12 Comments

What are the thoughts on like, examples of different data modelling techniques? Feels like going into the intricacies of something like Data Vault, for example, would be out of scope (fair) but an example or high level explanation might be cool (with examples).

And perhaps a section on what *not* to do ("How to avoid chaos"). Otherwise can't wait

Expand full comment

Stay tuned

Expand full comment

yummy

Expand full comment

How to buy this book?

Expand full comment

The launch date is still TBD. Probably Q1 2025.

In the meantime, you can get early draft chapters on this substack

Expand full comment

Every now and then I see tools trying to provide a semantic layer on top of their somewhat proprietary modeling standards (LookML, dbt). Is that something you will talk about in your book as well?

Expand full comment

I'm trying to keep the book as technology/tool agnostic as possible. If I mention tools, they'll be in the context of "as of the time I write this." That said, semantic layers will be covered

Expand full comment

I would like you to please include data product data modeling as well

Expand full comment

yep. that's going to be an undercurrent and theme throughout the book

Expand full comment

That’s great and please include me in one of your podcasts if you can so that we can discuss on data products 😀. I am actually working on data products

Expand full comment

Looks good. Especially interested in the History chapter and the Analytical modeling chapters.

Expand full comment

I like the outline for Part 3 (chapters 11, 12, 13, and 14). I am looking forward to more as it becomes available!

Some thoughts come to mind in the context of the outline:

I'm having an ongoing discussion [debate] about data acquisition strategy for analytics data modeling. Application architecture believes data for analytics can be sourced (or integrated, to use their words) from enterprise events (i.e., EDA, business process originated). The argument goes a canonical data model can be relevant for all downstream data needs (application integration, analytics, etc.), and who is better than the application owner to define what data should be published to an event stream? Also, having all raw data available for analytics data modeling is a non-starter as they argue that much of the data is never needed, so moving all the raw data to analytics acquisition is a wasted resource.

My POV is different:

1. Defining a canonical data model that satisfies application integration and analytics data modeling is challenging. It will get overly bloated and complicated. I instead think of analytics data modeling as a different use case, perhaps with some overlap with application data modeling.

2. We typically acquire all the source data for analytics using the most robust, cost-effective approach with minimal effort, making it available for data modeling where value creation occurs.

3. Application owners usually don't have the experience to assess the data needed for analytics data modeling, let alone machine learning.

4. Not all valuable application data can be sourced from a business process-related activity (i.e., EDA).

I see a potential for a ven diagram that shows some slight overlap of what I call data events (i.e., CDC; primarily for analytics use cases) vs. business process events (i.e., EDA: primarily for application integration use cases).

Thoughts? Experiences to share? Challenges?

Expand full comment