I want to get your thoughts on this. Throughout my travels, I’ve noticed that data modeling is more prevalent in certain geographies than others.
For example, data modeling seems alive and well in Western Europe. In America, it’s very mixed, and I tend to find companies and teams don’t prioritize it as much as in Western Europe.
I also notice data modeling varies by type of company. Mature companies seem to favor data modeling more than smaller companies.
Just curious if I’m a sample size of 1 and missing something, or if others notice this as well?
With respect to data modeling, what are you seeing in your geography and/or company?
I wouldn't call it geography-based, I would call it culture-based. There is the fast food, "Bias for Thoughtless Action" culture that looks only at what's good in the 6 month horizon. Who cares about 2 years? We'll be in a different place by then! Then there are people looking to build long-term value. These two orgs might be right across the street from each other.
Of course, this is true unless regulatory/compliance standards step in. If they will need to pay for today's mistakes in 7 years, then they are forced to think/plan ahead.
Thanks, Joe. You're very kind. I can see why you have such a popular podcast. :)
As with all cultures, there are the religious fanatics who are happy to fight over ideologies( Inmon vs Kimball, for example) and then there are the practical ones that look at what's needed for the applications that the data need to support.
What do you think? You'd have much deeper insights and data on this than I do. :)
" If I look globally I would say that US is star, schema, or Kimball centric, the majority of the work done in the US is around dimensional. If I look at Europe, it’s heavily ensemble or data vault centric, Is that true?
Hans replied:
"Yeah. I gotta say surprised that is still true and I think that you’re right to observe that"
"Europe and Nordics pretty much lead that charge as far as a adopting these techniques. Definitely, Netherlands Sweden, probably top those charts and then all surrounding areas in Western Europe and Nordics seem to be doing quite a bit of it.
In my experiences; i think that LatAm will be more aligned with US; but depend how university or market trends talk about the topics: i have friends in universities and working that don't hear ensemble data modeling; few about Data Vault; but more about dimensional and normalized approach.
My perspective is that they know about clasic Kimball and Inmmon data modeling approachs but not about OBT or unified start schema for example.
I wonder if the UK aligns more with US? I haven't ever come across much DV usage in blighty. I've only really had exposure to Kimball, but we have a few European customers who seem interested in pursuing DV
The general principles of data modeling should be the same regardless of geography and company. Procedures can differ but the theory behind it shouldnt
I think the realisation that there's a need to data model can be a maturity thing. Also, for young companies, does the shaping of their products and services make modelling too early a futile exercise as they're effectively building on sand. I'm hypothesising though. My career in data has always been as a hire into an existing data team, so the recognition that that skillset is required has already happened.
From what I've seen, young and immature companies are focused on delivering features and product. One could argue, especially if the company is tech focused, that data modeling is super critical at this early stage. It's also like arguing that a 5 year old should should start saving for retirement. The tech and data debt is far lower if modeling is done earlier, but that also takes precious time away from building the biz. So, hard to say. Tradeoffs all the way down.
Does it really take precious time away? How precious is quantified? In the basis of short or long term? It doesn’t have to take too much of a time if the skilled people are hired.
I think that the common denominator is culture, which could be correlated w/ both company size and geography.
Data teams - to me, are naturally behind. Let's take a startup as an example. At what point do they make their first data hire? Likely, that individual comes aboard well after data "exists", and right at the point where the business decides that it's worthwhile to start getting value out of the data (e.g. analytics).
... hopefully that first hire is a Data Engineer, but that's an entirely different can of worms...
This Data Team Of One will have their plate full right from the get-go, and likely have pressure to deliver insights yesterday. It seems like Data Modelling is commonly the first casualty of prioritizing velocity over accuracy / scalability. And it works! You can deliver insights at velocity, but what does "query driven data modelling" look like at scale?
I'll add though, that, one can make legitimate arguments for prioritizing velocity at a startup: let's say Data Team Of One estimates that it'll take 6-9 months to build a robust data model. Does the company even have that runway in their budget / funding?
So to come full circle:
- small company = smaller + more-behind data team = more likely to cut corners (ie data modelling) to deliver insights based on demands / velocity
- larger company = larger + less-behind data team (if they've caught up) = more likely to have realized at some point that query driven data modelling isn't sustainable, and - at some point, prioritized building a scalable data model
The geography aspect is interesting... the only thing that I could infer here would be that perhaps there are more startups in NA? So perhaps more of these smaller companies?
Very good insights Ben. I'll need to run the numbers of startups vs other countries, but my hunch is you're spot on. Anecdotally, even large companies in the US are struggling with data modeling in ways that European companies would laugh at. This might also be due to mature US companies being held to account on a quarterly basis, whereas Euro companies tend to move slower and have very tight labor laws. In the US, if you're not moving fast, you get fired. In the EU, good luck firing anyone. Thanks!
Data modeling varies geographically and by company type due to regulatory, cultural, and operational differences. Geographically, compliance with data protection laws like GDPR and regional business practices shapes models. Culturally, variations in language, addressing, and date representations impact design. Operationally, diverse workflows influence how data is structured. The scale and complexity of a company, whether global or local, affect the granularity of data models. Technology infrastructure availability and strategic business goals further shape data modeling decisions. In essence, data modeling adapts to legal, cultural, operational, and strategic contexts unique to each geographic location and company type.
Agreed. The regulation part is interesting. Where I live (the US), we are implicitly subject to GDPR and the EU AI Act because ignoring it means potential penalties. Compliance and regulation cut across borders very easily, especially for bigger companies.
Working with companies in the Nordics where I live, I usually feel that there is more time to think (sometimes even too much 😄) and invest time into proper data modeling.
I also wonder, how does data modeling differ "within" companies around the world? For example, between platform/product/application teams and data teams. How is it similar?
I have worked with a couple of teams in Europe and South America that intentionally or not created a robust conceptual data model during initial product discovery/delivery without a data person which then could be extended for other purposes, features and applications.
Not saying this is how it should be, but compared to other cases I spent much less time trying to figure out how data from different source systems should be integrated and how the data lake / warehouse should be designed.
In general, I tend to find information about data modeling either for transactional or analytical use cases, but not so much on how data modeling is done well across these systems.
Would you say that the data layer in the source systems influence the data modeling in the analytical system? Ex, do you see a difference in techniques depending on internal systems built on RDBMS or NoSQL, monolith or micro services, a company growing through acquisition or organically (ie heterogeneous vs homogeneous sources)? I usually find the analytical system heavily influenced by the operational source systems, but my hypothesis is very biased by the few examples I have practical experience of
I've been thinking about this, sort of a Conway's Law for data modeling. Do the architecture and systems influence data models? I'm starting to think it inevitably does. And those systems are reflections of how companies communicate, per Conway's Law.
This is an interesting point. One very specific example: I bet every European data person has had to deal with SAP. I'm not sure if it's that prevalent in the States? SAP has its own, extremely complicated internal data model, where all the tables are named after 5- or 6-letter abbreviations of German words and there's literally thousands of them... Dealing with that basically forces the data engineer (or analyst) to remodel the data in some way at least, because there's no way you can utilize that directly. Even SAPs own "data access layers" are insanely complex.
The enterprise conceptual data model should follow the Target Operating Model. Use the same terms and concepts identified in TOM. That way the data model can be measure and validated against TOM. Unfortunately, not many organisation develop TOM. Which is crazy!!! That’s the role of Business Architects. Again not many organisation have business architect.
I work at a local municipal, so “government” in California.
Our small, and not that mature, data team does emphasize data modeling. Mostly because me (lead engineer) and my manager (previously lead DBA on infrastructure side) emphasize it.
Not sure where my experience fits, but figured I’d share 🤷♂️
The bureaucracy of government roles. We fear being audited, by leadership or external reviews. And we struggle to retain talent so having a standard approach enables easier maintenance and onboarding.
For instance, all my builds could be virtually automated like dbt style if we were a Python shop (had it done in Hadoop until we were audited). All my database objects have standards for backend objects, views, procedures and delivery to presentation layer. Now using SSDT database projects, SSIS, and CICD through Azure (Microsoft shop).
Some legacy databases were “built fast” by people no longer on our team, tons of data with inconsistent pipeline logic, data locations, cross database dependencies that aren’t always clear. And deployment usually meant the developer running stuff manually in prod, now very inconsistent with what’s in test 😂
Now yes, this involves more than just data modeling, and I think we could instill development practices without concern for the modeling efforts, but since they grew out of my standard Kimball architecture, it does seem that’s probably the real reason for the emphasis! 😖
Australia has a history of “punching above its weight” in data modeling (I hope the phrase works, means performing better than expected). Way back in the annals of history Aussies like Clive Finkelstein and Graeme Simsion pioneered the idea of “model-driven development”. I would say in the 80s and 90s pretty well every major corporate in Australia was putting effort into data modeling.
I’m not the data vault expert but I see a fair bit of it around with corporates, perhaps 50% or more of corporates have done data vault at some point. Digital businesses and the mid market sector are more aligned to dimensional modeling as far as I have seen. Startups generally don’t have the resources, expertise or time to do any data modeling, other than building ML models off flat tables or document stores (it’s a form of implied data modeling).
There are a few really good data modellers in Australia - you know John Giles of course. I don’t know Shane Gibson but judging by his output I’d say he knows a thing or two.
I have worked for a handful of companies that are a patchwork of acquisitions and mergers. You tend to face very different struggles when it comes to data modeling where you are having to try and reconcile all of the similar but different applications and data platforms all piled on top of each other. There are also a lot of politics involved with these situations based on which of the original companies "won" the merger... I have seen the scenario a few times where a better data model lost to something inferior or to the anarchy of "why bother with modeling" approach. I have not done much research to verify if this is valid, but I suspect the EU is more interventionist when it comes to mergers than the US which has been a free for all for the past decade or so.
Just to add to your observations, in my many years of database development across many US companies, I have seen very little to no emphasis on data modeling. Personally, I've worked for both large and small companies and haven't noticed a difference in regard to data modeling emphasis.
Some people mentioned regulatory here. At least from my experience I can't see impact on data models due to GDPR in the EU. Yes, we make sure, that on entry specific data gets filtered out, anonymised or tokenized, but this does not change how the whole model works.
I see your point. I'd argue that regulations cut across borders (GDPR affects US companies), and data modeling is unarguably weaker in the US, compared with Europe.
Well apparently I need to do a better job at my pre-reads because reading this article makes me realize I dont have a good personal definition for data modeling. Where is the line between having and using a database, and 'doing data modeling'?
In my xp it really comes down to the company context and skills in place. If you have a complex system landscape (like you mentioned SAP) or you are working with complex products / services (eg banking) you must model to break down and externalize complexity. In simpler environments you may keep everything in mind / code... etc. for a while. On the other hand, I also think that everyone does modelling (likely w/o erd diagrams) even in a simple excel file, but oftentimes they don't have the skillset / awareness about the topic to be effective (externalize by drawing), causing cognitive overload and stress in the long term without understanding the root cause.
I wouldn't call it geography-based, I would call it culture-based. There is the fast food, "Bias for Thoughtless Action" culture that looks only at what's good in the 6 month horizon. Who cares about 2 years? We'll be in a different place by then! Then there are people looking to build long-term value. These two orgs might be right across the street from each other.
Of course, this is true unless regulatory/compliance standards step in. If they will need to pay for today's mistakes in 7 years, then they are forced to think/plan ahead.
I thought a lot about the culture part. How would you describe various data modeling cultures? Very interesting idea you're bringing up.
Thanks, Joe. You're very kind. I can see why you have such a popular podcast. :)
As with all cultures, there are the religious fanatics who are happy to fight over ideologies( Inmon vs Kimball, for example) and then there are the practical ones that look at what's needed for the applications that the data need to support.
What do you think? You'd have much deeper insights and data on this than I do. :)
I have deffo seen the same thing as you.
Anecdotally Europe seems to have a much larger use of Data Vault modeling compared to the USA.
Which always intrigued me, given Data Vault was invented in the USA.
I asked Hans Hultgren that very question on my podcast episode with him:
https://agiledata.io/podcast/agiledata-podcast/the-patterns-of-data-vault-with-hans-hultgren/
I asked:
" If I look globally I would say that US is star, schema, or Kimball centric, the majority of the work done in the US is around dimensional. If I look at Europe, it’s heavily ensemble or data vault centric, Is that true?
Hans replied:
"Yeah. I gotta say surprised that is still true and I think that you’re right to observe that"
"Europe and Nordics pretty much lead that charge as far as a adopting these techniques. Definitely, Netherlands Sweden, probably top those charts and then all surrounding areas in Western Europe and Nordics seem to be doing quite a bit of it.
Matches almost perfectly with what I'm seeing. I joke that Data Vault is the David Hasselholf of the data world - made in America, popular in Europe.
Don't ruin my Hasselhoff childhood.
Nah Data Vault is George Clooney, damn sexy.
In my experiences; i think that LatAm will be more aligned with US; but depend how university or market trends talk about the topics: i have friends in universities and working that don't hear ensemble data modeling; few about Data Vault; but more about dimensional and normalized approach.
My perspective is that they know about clasic Kimball and Inmmon data modeling approachs but not about OBT or unified start schema for example.
I wonder if the UK aligns more with US? I haven't ever come across much DV usage in blighty. I've only really had exposure to Kimball, but we have a few European customers who seem interested in pursuing DV
The general principles of data modeling should be the same regardless of geography and company. Procedures can differ but the theory behind it shouldnt
Agreed...and I wish this happened
I think the realisation that there's a need to data model can be a maturity thing. Also, for young companies, does the shaping of their products and services make modelling too early a futile exercise as they're effectively building on sand. I'm hypothesising though. My career in data has always been as a hire into an existing data team, so the recognition that that skillset is required has already happened.
From what I've seen, young and immature companies are focused on delivering features and product. One could argue, especially if the company is tech focused, that data modeling is super critical at this early stage. It's also like arguing that a 5 year old should should start saving for retirement. The tech and data debt is far lower if modeling is done earlier, but that also takes precious time away from building the biz. So, hard to say. Tradeoffs all the way down.
Does it really take precious time away? How precious is quantified? In the basis of short or long term? It doesn’t have to take too much of a time if the skilled people are hired.
I think that the common denominator is culture, which could be correlated w/ both company size and geography.
Data teams - to me, are naturally behind. Let's take a startup as an example. At what point do they make their first data hire? Likely, that individual comes aboard well after data "exists", and right at the point where the business decides that it's worthwhile to start getting value out of the data (e.g. analytics).
... hopefully that first hire is a Data Engineer, but that's an entirely different can of worms...
This Data Team Of One will have their plate full right from the get-go, and likely have pressure to deliver insights yesterday. It seems like Data Modelling is commonly the first casualty of prioritizing velocity over accuracy / scalability. And it works! You can deliver insights at velocity, but what does "query driven data modelling" look like at scale?
I'll add though, that, one can make legitimate arguments for prioritizing velocity at a startup: let's say Data Team Of One estimates that it'll take 6-9 months to build a robust data model. Does the company even have that runway in their budget / funding?
So to come full circle:
- small company = smaller + more-behind data team = more likely to cut corners (ie data modelling) to deliver insights based on demands / velocity
- larger company = larger + less-behind data team (if they've caught up) = more likely to have realized at some point that query driven data modelling isn't sustainable, and - at some point, prioritized building a scalable data model
The geography aspect is interesting... the only thing that I could infer here would be that perhaps there are more startups in NA? So perhaps more of these smaller companies?
Very good insights Ben. I'll need to run the numbers of startups vs other countries, but my hunch is you're spot on. Anecdotally, even large companies in the US are struggling with data modeling in ways that European companies would laugh at. This might also be due to mature US companies being held to account on a quarterly basis, whereas Euro companies tend to move slower and have very tight labor laws. In the US, if you're not moving fast, you get fired. In the EU, good luck firing anyone. Thanks!
Data modeling varies geographically and by company type due to regulatory, cultural, and operational differences. Geographically, compliance with data protection laws like GDPR and regional business practices shapes models. Culturally, variations in language, addressing, and date representations impact design. Operationally, diverse workflows influence how data is structured. The scale and complexity of a company, whether global or local, affect the granularity of data models. Technology infrastructure availability and strategic business goals further shape data modeling decisions. In essence, data modeling adapts to legal, cultural, operational, and strategic contexts unique to each geographic location and company type.
Agreed. The regulation part is interesting. Where I live (the US), we are implicitly subject to GDPR and the EU AI Act because ignoring it means potential penalties. Compliance and regulation cut across borders very easily, especially for bigger companies.
Great thoughts.
Working with companies in the Nordics where I live, I usually feel that there is more time to think (sometimes even too much 😄) and invest time into proper data modeling.
I also wonder, how does data modeling differ "within" companies around the world? For example, between platform/product/application teams and data teams. How is it similar?
I have worked with a couple of teams in Europe and South America that intentionally or not created a robust conceptual data model during initial product discovery/delivery without a data person which then could be extended for other purposes, features and applications.
Not saying this is how it should be, but compared to other cases I spent much less time trying to figure out how data from different source systems should be integrated and how the data lake / warehouse should be designed.
In general, I tend to find information about data modeling either for transactional or analytical use cases, but not so much on how data modeling is done well across these systems.
Would you say that the data layer in the source systems influence the data modeling in the analytical system? Ex, do you see a difference in techniques depending on internal systems built on RDBMS or NoSQL, monolith or micro services, a company growing through acquisition or organically (ie heterogeneous vs homogeneous sources)? I usually find the analytical system heavily influenced by the operational source systems, but my hypothesis is very biased by the few examples I have practical experience of
I've been thinking about this, sort of a Conway's Law for data modeling. Do the architecture and systems influence data models? I'm starting to think it inevitably does. And those systems are reflections of how companies communicate, per Conway's Law.
This is an interesting point. One very specific example: I bet every European data person has had to deal with SAP. I'm not sure if it's that prevalent in the States? SAP has its own, extremely complicated internal data model, where all the tables are named after 5- or 6-letter abbreviations of German words and there's literally thousands of them... Dealing with that basically forces the data engineer (or analyst) to remodel the data in some way at least, because there's no way you can utilize that directly. Even SAPs own "data access layers" are insanely complex.
SAP's popular in the US too. If you're big enough, you're using SAP, Oracle, or MS for your erp
I definitely see this a lot - the sources are dominating the early versions of the model and it often then grows organically from there.
☝️ regarding type of company
The enterprise conceptual data model should follow the Target Operating Model. Use the same terms and concepts identified in TOM. That way the data model can be measure and validated against TOM. Unfortunately, not many organisation develop TOM. Which is crazy!!! That’s the role of Business Architects. Again not many organisation have business architect.
I work at a local municipal, so “government” in California.
Our small, and not that mature, data team does emphasize data modeling. Mostly because me (lead engineer) and my manager (previously lead DBA on infrastructure side) emphasize it.
Not sure where my experience fits, but figured I’d share 🤷♂️
Interesting. If you're young and immature as an organization, you could probably punt on data modeling. Why are you emphasizing it?
The bureaucracy of government roles. We fear being audited, by leadership or external reviews. And we struggle to retain talent so having a standard approach enables easier maintenance and onboarding.
For instance, all my builds could be virtually automated like dbt style if we were a Python shop (had it done in Hadoop until we were audited). All my database objects have standards for backend objects, views, procedures and delivery to presentation layer. Now using SSDT database projects, SSIS, and CICD through Azure (Microsoft shop).
Some legacy databases were “built fast” by people no longer on our team, tons of data with inconsistent pipeline logic, data locations, cross database dependencies that aren’t always clear. And deployment usually meant the developer running stuff manually in prod, now very inconsistent with what’s in test 😂
Now yes, this involves more than just data modeling, and I think we could instill development practices without concern for the modeling efforts, but since they grew out of my standard Kimball architecture, it does seem that’s probably the real reason for the emphasis! 😖
How about Australia ? what are your thoughts about AUS ? :)
Australia has a history of “punching above its weight” in data modeling (I hope the phrase works, means performing better than expected). Way back in the annals of history Aussies like Clive Finkelstein and Graeme Simsion pioneered the idea of “model-driven development”. I would say in the 80s and 90s pretty well every major corporate in Australia was putting effort into data modeling.
I’m not the data vault expert but I see a fair bit of it around with corporates, perhaps 50% or more of corporates have done data vault at some point. Digital businesses and the mid market sector are more aligned to dimensional modeling as far as I have seen. Startups generally don’t have the resources, expertise or time to do any data modeling, other than building ML models off flat tables or document stores (it’s a form of implied data modeling).
There are a few really good data modellers in Australia - you know John Giles of course. I don’t know Shane Gibson but judging by his output I’d say he knows a thing or two.
I've seen similar. I think Australia is a strange Galapagos Island for data modeling. More innovative than the US, I think.
And let's not forget New Zealand (where Shane lives I believe) - very good modelers in Kiwiland as well!
Yup New Zealand and Australia are a hot bed of Data Vault models compared to some other countries.
Yeah did not mean to imply Shane was anything but a proud kiwi
I have worked for a handful of companies that are a patchwork of acquisitions and mergers. You tend to face very different struggles when it comes to data modeling where you are having to try and reconcile all of the similar but different applications and data platforms all piled on top of each other. There are also a lot of politics involved with these situations based on which of the original companies "won" the merger... I have seen the scenario a few times where a better data model lost to something inferior or to the anarchy of "why bother with modeling" approach. I have not done much research to verify if this is valid, but I suspect the EU is more interventionist when it comes to mergers than the US which has been a free for all for the past decade or so.
Just to add to your observations, in my many years of database development across many US companies, I have seen very little to no emphasis on data modeling. Personally, I've worked for both large and small companies and haven't noticed a difference in regard to data modeling emphasis.
We definitely see a lot of customers do proper data modeling across Europe. Data Vault especially is very popular at the moment.
Some people mentioned regulatory here. At least from my experience I can't see impact on data models due to GDPR in the EU. Yes, we make sure, that on entry specific data gets filtered out, anonymised or tokenized, but this does not change how the whole model works.
My impression is that this is highly correlated with attention to regulatory standards
I see your point. I'd argue that regulations cut across borders (GDPR affects US companies), and data modeling is unarguably weaker in the US, compared with Europe.
What are you seeing where you live?
In absence of a local regulation, a lot or discussions about what standard to adhere to.
Well apparently I need to do a better job at my pre-reads because reading this article makes me realize I dont have a good personal definition for data modeling. Where is the line between having and using a database, and 'doing data modeling'?
... And its the title of just a few articles forward 👍
In my xp it really comes down to the company context and skills in place. If you have a complex system landscape (like you mentioned SAP) or you are working with complex products / services (eg banking) you must model to break down and externalize complexity. In simpler environments you may keep everything in mind / code... etc. for a while. On the other hand, I also think that everyone does modelling (likely w/o erd diagrams) even in a simple excel file, but oftentimes they don't have the skillset / awareness about the topic to be effective (externalize by drawing), causing cognitive overload and stress in the long term without understanding the root cause.