I can at least see an argument that OBT might be sufficient if the database has embedded metadata that is extremely descriptive for AI to use in finding and properly querying that BT. But that doesn’t solve the problem that OBT analytics databases have the same metrics repeated over and over again across tables, making AI discoverability just a speed run of the disaster that is human discoverability in such systems.
In the end I think OBT has an important role to play, and that role is ‘we need this shit now!!’
But when you need it to be accurate, performant, understandable and repeatable you need to model for those outcomes.
Totally agree with the metadata angle here. The only version of OBT that might survive GenAI isn’t “one big table”—it’s “one big table with deeply structured metadata.” Which means every column has context: who defined it, how it was derived, what assumptions it encodes, what else it touches, lineage, ownership, friction, everything. Wide tables aren't a problem but wide tables with shallow metadata will be in my opinion. Even a thoughtfully-modeled system needs metadata to stay trustworthy.
Different horses for different courses. Have not dealt with OBT's directly... have dealt with extremely wide files/tables prepared for data scientists (and old school SAS/SPSS statisticians - pre sexy data scientists... ;) ) where input is preferred to be more binary (true/false).
But overall, would like to know the use case, like temporary - build and tear down - because architecture policy/guidelines should drive controlled chaos. For example, I could not imagine dealing with updates on an OBT or data over time...
As many of my previous posts mention, big fan of star schema data mart EDW from a lot of success - and as mentioned in other posts on this topic -a well metadata-documented star layer can be the foundation of OBT needs. It works well for cube needs, or better yet - stomping and removing the need for proprietary cubes in reporting tools. Discussion for another topic - the dead bodies found in legacy cube layers. ;)
Totally agree. Data modelling is even more important with good naming convention and description and not to forget good quality data to make Gen AI to work.
Could OBT become the new data swamp, but in tabular form? Keep adding new fields...
On the other hand, you could define structured views on the OBT, and keep them in schema configurations and use them in ETL pipelines as needed. That way OBT just becomes staging in a dynamic ELT process.
Appreciate you doling out THE HARD TRUTH like always! I've always been a guy that focuses on the fundamentals, it's nice to hear the sentiment echoed by an expert like you
I see OBT as a derived table representing a specific grain cobbled up from a well designed star schema and although it appears to be live independently in the end users mind, Data Architects ought to keep the big picture (star schema plus ... constellations of stars) in mind to serve longer term use cases.
The OBT is pre-joined store of info at a single grain of data (lowest grain typically but can be any level)... For BI Tools to act on multiple levels using data from a single OBT needs lots of internal transformations including an awareness of the dimensional hierarchies. This modeling metadata needs to be provided even for OBT as it doesn't go away when moving from star schema to OBT unless the "star schema ... any/all level(s) story" ease of use story is heavily diluted to create a story suited to a specific combination of levels.
BI actions occur on all grains of info and could be more messy for users whereas AI typically works at the decision level of info which usually corresponds to a lower level grain and could be more suitable for OBT. Not only are the inputs to AI quasi-OBT but the outputs of AI should also be modeled as OBT plus/OBT extensions or perhaps more appropriately, as star schema extensions at the appropriate grain so that both the inputs to decision-making and the outputs (predictions/decisions/model) are all available side by side to suit any data analytics usecase whether AI or BI.
I can at least see an argument that OBT might be sufficient if the database has embedded metadata that is extremely descriptive for AI to use in finding and properly querying that BT. But that doesn’t solve the problem that OBT analytics databases have the same metrics repeated over and over again across tables, making AI discoverability just a speed run of the disaster that is human discoverability in such systems.
In the end I think OBT has an important role to play, and that role is ‘we need this shit now!!’
But when you need it to be accurate, performant, understandable and repeatable you need to model for those outcomes.
This is exactly what we've done in our few successful GenAI scenarios. Metadata is king
Totally agree with the metadata angle here. The only version of OBT that might survive GenAI isn’t “one big table”—it’s “one big table with deeply structured metadata.” Which means every column has context: who defined it, how it was derived, what assumptions it encodes, what else it touches, lineage, ownership, friction, everything. Wide tables aren't a problem but wide tables with shallow metadata will be in my opinion. Even a thoughtfully-modeled system needs metadata to stay trustworthy.
By ‘embedded metadata,’ do you mean things like table and column descriptions?
Yes as a baseline. But also much more, all the context necessary to understand how to use the data appropriately.
The relational model abides. And nothing is more relational
than the “star schema”.
Different horses for different courses. Have not dealt with OBT's directly... have dealt with extremely wide files/tables prepared for data scientists (and old school SAS/SPSS statisticians - pre sexy data scientists... ;) ) where input is preferred to be more binary (true/false).
But overall, would like to know the use case, like temporary - build and tear down - because architecture policy/guidelines should drive controlled chaos. For example, I could not imagine dealing with updates on an OBT or data over time...
As many of my previous posts mention, big fan of star schema data mart EDW from a lot of success - and as mentioned in other posts on this topic -a well metadata-documented star layer can be the foundation of OBT needs. It works well for cube needs, or better yet - stomping and removing the need for proprietary cubes in reporting tools. Discussion for another topic - the dead bodies found in legacy cube layers. ;)
I don’t think it works for most businesses let alone AI.
It’s usually used as a lazy get out clause in my opinion, mostly by people who don’t understand the benefits of facts and dimensions.
Totally agree. Data modelling is even more important with good naming convention and description and not to forget good quality data to make Gen AI to work.
Else it will be Garbage IN Garbage OUT.
Could OBT become the new data swamp, but in tabular form? Keep adding new fields...
On the other hand, you could define structured views on the OBT, and keep them in schema configurations and use them in ETL pipelines as needed. That way OBT just becomes staging in a dynamic ELT process.
Appreciate you doling out THE HARD TRUTH like always! I've always been a guy that focuses on the fundamentals, it's nice to hear the sentiment echoed by an expert like you
I see OBT as a derived table representing a specific grain cobbled up from a well designed star schema and although it appears to be live independently in the end users mind, Data Architects ought to keep the big picture (star schema plus ... constellations of stars) in mind to serve longer term use cases.
The OBT is pre-joined store of info at a single grain of data (lowest grain typically but can be any level)... For BI Tools to act on multiple levels using data from a single OBT needs lots of internal transformations including an awareness of the dimensional hierarchies. This modeling metadata needs to be provided even for OBT as it doesn't go away when moving from star schema to OBT unless the "star schema ... any/all level(s) story" ease of use story is heavily diluted to create a story suited to a specific combination of levels.
BI actions occur on all grains of info and could be more messy for users whereas AI typically works at the decision level of info which usually corresponds to a lower level grain and could be more suitable for OBT. Not only are the inputs to AI quasi-OBT but the outputs of AI should also be modeled as OBT plus/OBT extensions or perhaps more appropriately, as star schema extensions at the appropriate grain so that both the inputs to decision-making and the outputs (predictions/decisions/model) are all available side by side to suit any data analytics usecase whether AI or BI.
You formed opinion from this: « AI depends upon solid data».
Its your mistake, and so you are wrong. Looks at the question deeper
Oh sorry man, i didnt see your comment about disagree with her