My friend Cindi Howson posted this the other day. Read it.
tl;dr - data modeling is dead and One Big Table (OBT) is sufficient for GenAI to work.
In my experience, OBT has very mixed results for BI. The meaning of the data - the point of data modeling - is often lost in the process of throwing a bunch of data into a giant table. It’s kind of like putting literally everything you have in your fridge into a blender - eggs, cold pizza, stale beer, a few moldy carrots, a bagel, milk, pudding, etc (this isn’t my fridge, btw) - and calling it “dinner”. You might find that appetizing, but most would find this mix of “stuff" unbearable. It’s just a gross mess.
Same with most OBT. Even for humans, I’ve mostly seen OBT grow from a convenience (“joins are bad”) to an inconvenience to an incomprehensible WTF (wide, tall, full).
As I always ask audiences during my talks - who will blindly throw an LLM on top of their data warehouse today, sight unseen? All I get are squirms and bad looks. I asked this at a popular Modern Data Stack conference, where OBT is very popular. Same response.
The fact is, most companies are still hobbling along with BI, let alone AI. Data modeling matters more than ever. AI depends upon solid data.
I’ve done a lot of production ML/AI. The training data for the ML model is literally a single table, often hundreds, thousands, or millions of columns wide. Call it OBT. But that’s NOT the same OBT in the BI context. These are very different tables. I’m not sure if this head of data is confused with OBT vs matrices? And I’m also unsure if this person understands how deep learning, transformers, GenAI/RAG, etc work under the hood? There’s a lot more than meets the eye.
Whoever this head of data in the post - I’d love to know why you think data modeling is dead and why OBT will make AI work. Message me.
What do you think? Is OBT sufficient for AI to work?
I can at least see an argument that OBT might be sufficient if the database has embedded metadata that is extremely descriptive for AI to use in finding and properly querying that BT. But that doesn’t solve the problem that OBT analytics databases have the same metrics repeated over and over again across tables, making AI discoverability just a speed run of the disaster that is human discoverability in such systems.
In the end I think OBT has an important role to play, and that role is ‘we need this shit now!!’
But when you need it to be accurate, performant, understandable and repeatable you need to model for those outcomes.
The relational model abides. And nothing is more relational
than the “star schema”.