16 Comments
User's avatar
Maury's avatar

The amount of Forward or Reverse Engineering might depend on the project team you are working on, but as well the data management maturity of your organization. If you are on a full-on custom development application, you should be forward engineering most of the time. Reverse engineering if taking in 3rd party data, or other systems you might be interacting with.

As you mentioned, reverse engineering, partly data archeology, is necessary for understanding data relationships for extracting, or making enhancements. Getting information from data catalogs, but importantly enriching the data catalog with the data model (visual) and relationships.

Data modeling, and the resulting DDL, DML, scripts, should be treated like development code, check into source control and deployed to environment.

After reading the excerpt, It is interesting that I never thought of the logical being "mapped" to a physical model. More experience using a data modeling tool, where the physical is "generated" to a physical data model. I guess, behind the scenes, a translation and mapping does occur - attributes, to columns, naming standard is applied, data types are applied to target physical, primary keys syntax, etc. is generated to DDL. The flip side of the coin, reverse engineering with a data tool, the physical and the logical data model can be automatically created from the source database. Your mileage may vary, depending on how well the original model was created. Understanding the book is meant to be tool independent / agnostic - and meant to teach the underlying need and benefits of data modeling as part of software development.

Expand full comment
Joe Reis's avatar

mileage will certainly vary. I expect there will be a ton of new approaches with LLMs as well, which is something I'll speculate upon in the final part of the book. That's a whole 'nother can of worms...

Expand full comment
Grant Steans's avatar

Very clear read and I agree that many times, practitioners will have to start with reverse engineering to forward engineer something new. Specifically, I speak from the perspective of a business/process analyst looking to achieve more valuable BI. Coming into a process that "works" but is not very data-informed allows us the opportunity for a starting point based off of a few "obvious" entities that already exist. However, only by reverse engineering the process, speaking from a lense of "what data is involved" can we discover where the gaps are (missing entities, relationships, wrongfully defined attributes, etc.).

Expand full comment
Márton Horváth's avatar

While I’m totally agree that reverse engineering is more common than forward engineering, I’m strongly against it. Modelling is a means of communication. Software / systems should help to answer or solve business questions. If you approach modelling from business questions it’s forward engineering through which you may arrive to a conclusion that something is missing or incorrect in our systems. I cannot find a compelling narrative for reverse engineering (apart from copying something that works), because it would sound something like “I don’t know the questions, but let’s see what we have and then try to answer something”. I’m happy to hear a compelling narrative for reverse engineering!

Expand full comment
Joe Reis's avatar

Good points. I'm curious how you work with third party data, or datasets that already exist, and for which there's little to no documentation.

Expand full comment
Márton Horváth's avatar

Sure, it's RE, but I always emphasize to first state in plain business terms what we are looking for, or just draft a conceptual model about the domain what is in the software. E.g.: SAP ERP tables could be a deep jungle, so in spend analysis I always put forward three foundational questions for CFOs before jumping into the MM / GL data model: 1) from who we are buying things (-> supplier / partner). 2) what are we buying (-> products and services) 3) in what amount? (-> purchase order). If you have these in mind as as a compass to drive your RE effort, you can avoid getting confused when you are seeing e.g. partner and supplier data in different tables in the same timel (or at least say that something is off with SAP data modelling skills :) (I know the reason, but still...).

Expand full comment
JT's avatar
Aug 20Edited

How do you think of this paradigm with Applications?

When I first glanced at this post I thought that System meant "Application" or "Software" (which it still might). In that case, I think many people do Reverse Modeling (you already note this pattern as many systems already exist) as the data output is often "exhaust" e.g. I built a POS application (interpret acronym as you wish) and now I want to get transaction and/or customer data from that. How do I model that based on my application (rhetorical)?

Being uninformed here - what is the typical design path for SWEs given that much of "our" data comes from applications where the data is not the "intended goal"?

Expand full comment
Joe Reis's avatar

That's a good question. I intentionally used the term "system" to avoid narrowing the discussion to only databases, which is historically what people fixate on in these discussions. If you're reverse engineering, I think it's necessary to investigate the code base. Do you think it's worth including a mention of reverse engineering an application here, or in the chapter on transactional/operational systems?

Expand full comment
JT's avatar

Yeah I assumed as much re "system" and would agree to do that given what it represents.

I think given the atomic nature of this piece and the notion of forward / reverse - I think giving a small mention that a "system" could be something other than a database or combination of databases is worthwhile, but I wouldn't change much outside of that.

I'm trying to think through what actually changes when its software and not databases directly.

Expand full comment
Joe Reis's avatar

That’s the crux. A lot of application code surrounds a data model, it isn’t the data model itself.

Expand full comment
JT's avatar

I think I've seen too much bad software that doesn't surround a data model. This might sound odd but I think the focus on the application itself puts data as a second class citizen. Though I would be curious how good developers feel about this. It might simply circle back to "priorities" and "hygiene" (read: dont be lazy) which applies both to "data systems" and "software systems".

Expand full comment
Joe Reis's avatar

To flip the question, how do shitty devs view data or work with it?

Expand full comment
JT's avatar

I'd be curious to see the spectrum of views from shitty -> 10x

Expand full comment
Joe Reis's avatar

As it relates to data modeling, please expound

Expand full comment
JT's avatar

I think for the moment the other subthread we had just noting the proximity of the system to the data is my significant thought here. Closer to data = more likely to consider data model and vice versa.

I need to think more about how I've seen the relationship between systems and data influence the relationship with data modeling. At the moment I am digesting the dimensions as the position of the system (organizationally) and the physical relationship (software vs data products).

I'd throw in a somewhat simple thought re: whats your (the system dev) familiarity with / exposure to data.

Also just for shits and giggles - throw in the heady "no model is a model" argument because unintentional / unconscious models happen.

Expand full comment
Joe Reis's avatar

Expect to see a lot more book excerpts posted on Practical Data Modeling. Now that my data engineering course is nearing completion, I'm full-time on this book :)

Expand full comment