Practical Data Modeling

Sep 9

mileage will certainly vary. I expect there will be a ton of new approaches with LLMs as well, which is something I'll speculate upon in the final part of the book. That's a whole 'nother can of worms...

Expand full comment

Grant Steans

Aug 24Edited

Very clear read and I agree that many times, practitioners will have to start with reverse engineering to forward engineer something new. Specifically, I speak from the perspective of a business/process analyst looking to achieve more valuable BI. Coming into a process that "works" but is not very data-informed allows us the opportunity for a starting point based off of a few "obvious" entities that already exist. However, only by reverse engineering the process, speaking from a lense of "what data is involved" can we discover where the gaps are (missing entities, relationships, wrongfully defined attributes, etc.).

Expand full comment

Márton Horváth

While I’m totally agree that reverse engineering is more common than forward engineering, I’m strongly against it. Modelling is a means of communication. Software / systems should help to answer or solve business questions. If you approach modelling from business questions it’s forward engineering through which you may arrive to a conclusion that something is missing or incorrect in our systems. I cannot find a compelling narrative for reverse engineering (apart from copying something that works), because it would sound something like “I don’t know the questions, but let’s see what we have and then try to answer something”. I’m happy to hear a compelling narrative for reverse engineering!

Expand full comment

Good points. I'm curious how you work with third party data, or datasets that already exist, and for which there's little to no documentation.

Expand full comment

Márton Horváth

Sure, it's RE, but I always emphasize to first state in plain business terms what we are looking for, or just draft a conceptual model about the domain what is in the software. E.g.: SAP ERP tables could be a deep jungle, so in spend analysis I always put forward three foundational questions for CFOs before jumping into the MM / GL data model: 1) from who we are buying things (-> supplier / partner). 2) what are we buying (-> products and services) 3) in what amount? (-> purchase order). If you have these in mind as as a compass to drive your RE effort, you can avoid getting confused when you are seeing e.g. partner and supplier data in different tables in the same timel (or at least say that something is off with SAP data modelling skills :) (I know the reason, but still...).

Expand full comment

Aug 20Edited

How do you think of this paradigm with Applications?

When I first glanced at this post I thought that System meant "Application" or "Software" (which it still might). In that case, I think many people do Reverse Modeling (you already note this pattern as many systems already exist) as the data output is often "exhaust" e.g. I built a POS application (interpret acronym as you wish) and now I want to get transaction and/or customer data from that. How do I model that based on my application (rhetorical)?

Being uninformed here - what is the typical design path for SWEs given that much of "our" data comes from applications where the data is not the "intended goal"?

Expand full comment

That's a good question. I intentionally used the term "system" to avoid narrowing the discussion to only databases, which is historically what people fixate on in these discussions. If you're reverse engineering, I think it's necessary to investigate the code base. Do you think it's worth including a mention of reverse engineering an application here, or in the chapter on transactional/operational systems?

Expand full comment

Yeah I assumed as much re "system" and would agree to do that given what it represents.

I think given the atomic nature of this piece and the notion of forward / reverse - I think giving a small mention that a "system" could be something other than a database or combination of databases is worthwhile, but I wouldn't change much outside of that.

I'm trying to think through what actually changes when its software and not databases directly.

Expand full comment

Reply (2)

That’s the crux. A lot of application code surrounds a data model, it isn’t the data model itself.

Expand full comment

I think I've seen too much bad software that doesn't surround a data model. This might sound odd but I think the focus on the application itself puts data as a second class citizen. Though I would be curious how good developers feel about this. It might simply circle back to "priorities" and "hygiene" (read: dont be lazy) which applies both to "data systems" and "software systems".

Expand full comment

To flip the question, how do shitty devs view data or work with it?

Expand full comment

I'd be curious to see the spectrum of views from shitty -> 10x

Expand full comment

As it relates to data modeling, please expound

Expand full comment

I think for the moment the other subthread we had just noting the proximity of the system to the data is my significant thought here. Closer to data = more likely to consider data model and vice versa.

I need to think more about how I've seen the relationship between systems and data influence the relationship with data modeling. At the moment I am digesting the dimensions as the position of the system (organizationally) and the physical relationship (software vs data products).

I'd throw in a somewhat simple thought re: whats your (the system dev) familiarity with / exposure to data.

Also just for shits and giggles - throw in the heady "no model is a model" argument because unintentional / unconscious models happen.

Expand full comment