Semantics is an incredibly popular topic right now. As I mentioned in the prior post about meaning, the worlds of data and knowledge were traditionally separate areas of concern. This is no longer the case. Data modelers need a basic understanding of semantics, and the tools that made the expression of semantics possible.
This is a section from the larger chapter on meaning, which will also include the basics of taxonomies, ontologies, and metadata as they apply to data modeling. Of course, each of these topics is worthy of their own books, and I’m by no means an expert in these areas. But I include them here simply to pique your interest in hopes you’ll explore them in more detail (check out Jessica Talisman, Kurt Cagle, Juha Korpela, Juan Sequeda, etc for more in depth coverage).
Since I’m currently traveling, I expect the draft chapter on meaning will be published later this week, or early next week.
Please leave a comment if you feel I’ve missed something important. And if there are grammatical or similar errors, please leave a note here.
As always, thanks for your support on this book and my other projects. It means a lot.
Joe
Semantics is the heart of data modeling. While traditional modeling has focused on technical structure, it often overlooks a key component: shared understanding. Creating a common interpretation of data is the key to solving communication bottlenecks between people and, increasingly, between people and machines. With the rise of AI as both a consumer and generator of data, semantics is no longer a second-class citizen. Semantics ensures that the real-world meaning of data and the relationships among different data elements are consistently captured and understood both by people and machines.
Semantics introduces a common vocabulary. For instance, instead of just having a “cust_id” field, a semantic model would define what a “customer” is, what their attributes are (e.g., name, address), and how they relate to other concepts like “orders” and “products.” This clarity is crucial for business users who need to understand the data without getting bogged down in technical jargon.
As you learn about semantics, you might run into terms like controlled vocabulary and thesaurus. A controlled vocabulary is a pre-defined, authorized list of terms used to ensure that data is labeled and categorized consistently. You’d use a controlled vocabulary to control synonyms and reduce ambiguity. For instance, the term “Data Engineer” is a designated concept that maps to synonyms such as “ETL Developer” or “Data Platform Engineer.” We further address ambiguity by clarifying terms with multiple meanings. The term “platform” is commonly annoying in engineering circles and means different things to different people so that we might define it as “cloud platform” or “data platform.”
A thesaurus is a more advanced type of controlled vocabulary. It doesn’t just list terms, but also defines the semantic relationships between them. A term might have a broad or narrow hierarchical relationship, such as Data (broad term) -> Data Modeling (narrow term). Or, terms might be related to each other, like Data Modeling (Related Term) <-> Data Governance. These concepts are not hierarchically linked. Instead, they are closely related and are often discussed together. Finally, terms might be equivalent, where one term is used in place of another, such as MLOps (Use) -> ML Operations (Used For). This connects synonyms to the single preferred term.
This becomes more complicated when organizations have multiple systems of record. For example, one system might use “client” while another uses “customer.” Are these the same thing? This is a direct application of the tools we just discussed: a controlled vocabulary defines “customer” as the single preferred term, and a thesaurus maps “client” to it as an equivalent concept. This exercise of mapping varying definitions to a single one might seem familiar if you come from a traditional data modeling background. And that’s a good thing. We’re trying to remove ambiguity and create a shared vocabulary of what something means.
Disambiguating “client” and “customer” is hard enough in the context of mapping database keys like “client_id” and “customer_id”, and their associated attributes. This gets even harder when we move from SQL queries to natural language prompts. Extend this example to an LLM that might get a prompt like “show me the top clients for region X” and “show me the top customers for region X.” Same region, but are the client and customer the same thing? This is where correctly mapped semantics matter more than ever. As you’ll see, there’s a lot of work being done right now to give semantics structure so AI can locate the best data to use, whether structured or unstructured.
Beyond defining core business concepts like ‘customer,’ semantics also clarifies the meaning of individual attributes. This is often accomplished through rich metadata that travels with the data itself. A semantic model might include metadata (which we’ll learn more about later in this chapter) as labels to help capture the data model’s essence and meaning. In practice, this could mean tagging a data field with its units (e.g., location and timestamps of an image) or specifying that “Status” is an enum type with allowed values defined by business rules. By doing so, the data model carries the business context. It’s not just raw data, but data (and data about data) with meaning.
Let’s next look at organizing concepts into a structure called a taxonomy.
(to be continued…)


Hi Joe. I equate the semantic model to the combination for the LDM and PDM content. I know that in BI reporting tools such as OBI, Power BI and many others the term semantic model comes up as well. As an example a star schema LDM and PDM metadata content can be imported into or integrated between a data modeling tool and the reporting tool to reduce time spent re-developing all the attribute names, relationships, etc. Would love to talk more on this!
Thank you, super helpful to me, as a Marketer! 🫶