Data Modeling is Dead. All AI Needs in OBT!

Apr 18

I'm respectfully disagree, of course...

14 Comments

I can at least see an argument that OBT might be sufficient if the database has embedded metadata that is extremely descriptive for AI to use in finding and properly querying that BT. But that doesn’t solve the problem that OBT analytics databases have the same metrics repeated over and over again across tables, making AI discoverability just a speed run of the disaster that is human discoverability in such systems.

In the end I think OBT has an important role to play, and that role is ‘we need this shit now!!’

But when you need it to be accurate, performant, understandable and repeatable you need to model for those outcomes.

Expand full comment

Reply (2)

Alexander Potts

Apr 18

This is exactly what we've done in our few successful GenAI scenarios. Metadata is king

Expand full comment

Reply (1)

Zac

Apr 18

Totally agree with the metadata angle here. The only version of OBT that might survive GenAI isn’t “one big table”—it’s “one big table with deeply structured metadata.” Which means every column has context: who defined it, how it was derived, what assumptions it encodes, what else it touches, lineage, ownership, friction, everything. Wide tables aren't a problem but wide tables with shallow metadata will be in my opinion. Even a thoughtfully-modeled system needs metadata to stay trustworthy.

Expand full comment

Zain Ibrahim Siddiqi

Apr 21

By ‘embedded metadata,’ do you mean things like table and column descriptions?

Expand full comment

Reply (1)

Ryan Dolley

Apr 22

Yes as a baseline. But also much more, all the context necessary to understand how to use the data appropriately.

Expand full comment

Donald Parish

Apr 18

The relational model abides. And nothing is more relational

than the “star schema”.

Expand full comment

Maury

Apr 18

Different horses for different courses. Have not dealt with OBT's directly... have dealt with extremely wide files/tables prepared for data scientists (and old school SAS/SPSS statisticians - pre sexy data scientists... ;) ) where input is preferred to be more binary (true/false).

But overall, would like to know the use case, like temporary - build and tear down - because architecture policy/guidelines should drive controlled chaos. For example, I could not imagine dealing with updates on an OBT or data over time...

As many of my previous posts mention, big fan of star schema data mart EDW from a lot of success - and as mentioned in other posts on this topic -a well metadata-documented star layer can be the foundation of OBT needs. It works well for cube needs, or better yet - stomping and removing the need for proprietary cubes in reporting tools. Discussion for another topic - the dead bodies found in legacy cube layers. ;)

Expand full comment

Matt

Apr 18Edited

I don’t think it works for most businesses let alone AI.

It’s usually used as a lazy get out clause in my opinion, mostly by people who don’t understand the benefits of facts and dimensions.

Expand full comment

Krishnamurthy Jayaraman

Apr 18

Totally agree. Data modelling is even more important with good naming convention and description and not to forget good quality data to make Gen AI to work.

Else it will be Garbage IN Garbage OUT.

Expand full comment

Jeremiah MacClure

Apr 18

Could OBT become the new data swamp, but in tabular form? Keep adding new fields...

On the other hand, you could define structured views on the OBT, and keep them in schema configurations and use them in ETL pipelines as needed. That way OBT just becomes staging in a dynamic ELT process.

Expand full comment

John Baldwin

Apr 18

Appreciate you doling out THE HARD TRUTH like always! I've always been a guy that focuses on the fundamentals, it's nice to hear the sentiment echoed by an expert like you

Expand full comment

Shankar Somayajula

Apr 18

I see OBT as a derived table representing a specific grain cobbled up from a well designed star schema and although it appears to be live independently in the end users mind, Data Architects ought to keep the big picture (star schema plus ... constellations of stars) in mind to serve longer term use cases.

The OBT is pre-joined store of info at a single grain of data (lowest grain typically but can be any level)... For BI Tools to act on multiple levels using data from a single OBT needs lots of internal transformations including an awareness of the dimensional hierarchies. This modeling metadata needs to be provided even for OBT as it doesn't go away when moving from star schema to OBT unless the "star schema ... any/all level(s) story" ease of use story is heavily diluted to create a story suited to a specific combination of levels.

BI actions occur on all grains of info and could be more messy for users whereas AI typically works at the decision level of info which usually corresponds to a lower level grain and could be more suitable for OBT. Not only are the inputs to AI quasi-OBT but the outputs of AI should also be modeled as OBT plus/OBT extensions or perhaps more appropriately, as star schema extensions at the appropriate grain so that both the inputs to decision-making and the outputs (predictions/decisions/model) are all available side by side to suit any data analytics usecase whether AI or BI.

Expand full comment