Example - Building a Traditional Data Model
Using the levels to build a traditional conceptual, logical, and physical data model
Here’s a draft example1 of building a traditional data model. It’s worth noting that in this example, I’m making this more “real world” than other data modeling examples I’ve seen. Most examples assume you’re starting greenfield and building the data model from the ground up. Practitioners find this rarely happens. The reality is most organizations have a data system that’s already running. In these cases, data modeling is an act of understanding what already exists. No matter if you’re forward or reverse engineering, the conceptual, logical, and physical layers of data modeling apply.
As always, comment where you wish. If you find errors, please submit them here.
Thanks,
Joe
Did you know that according to Gartner, 80% of data product related projects fail to meet expectations. One root cause is lack of common understanding between IT and Business. Ellie.ai can help you collaborate when building conceptual and logical business and data models to have a solid foundation for any modern data stack.
Read how Ellie can help create models that reflect reality
Thanks to Ellie.ai for sponsoring this post
We looked at the traditional levels of data modeling. Now let’s look at an example of building a traditional data model.
Designing the Conceptual Model – The Business View
Conceptual modeling is about collaborating with stakeholders to understand their world and use these interactions to create a high-level data model. Imagine you’ve just joined our fictional e-commerce company as a data engineer. There’s no official conceptual model in place (it might exist in people’s heads), and you’ve been tasked with building a data model, from conceptual to physical. Your goal might be to use this data model to understand how to ingest, transform, and serve data for analytics or ML/AI. You might wonder, “Didn’t we just look at building a Data Town Plan, which assumes there’s no existing town?” Yes, we did.
While many conceptual data modeling examples assume you’re starting from scratch and building a greenfield data model, this is far less common in reality. Most of the time, you’re joining a company with existing application systems without a corresponding conceptual or logical model. The reality is that most data models happen organically without any planning. The results are what you might expect. Data quality is a mess, and poor data models impact database performance. In my experience, the exception is when a data model is created from the start, before implementing the physical database and loading it with data. Here, we're starting with a pre-existing implementation of a data model. The same rules apply to conceptual modeling work, whether you’re doing reverse engineering or forward engineering. Your goal is to translate the language of the business to a big picture model that represents this.
Let’s use an extremely simple example where I interview a business stakeholder, Dana, the head of E-commerce Operations. I’m trying to understand a business process, and my goal is to translate this into a conceptual model.
Me: Dana, to get us started, I’d like to understand how you think about the core parts of e-commerce operations. Who or what are the most important things you deal with?
Dana: Sure. Our order workflow is obviously critical to our ability to function as an online company. We have customers, of course. They place orders, which contain one or more products. We also track payments for those orders.
Me: Great. So just to repeat back: we’re working with customers, orders, products, and payments. How are those related?
Dana: A customer can place many orders. Each order usually includes multiple products. And every order has at least one payment—most often a single credit card transaction, but sometimes customers split payments.
Me: Thanks Dana. So if I draw this out as a high-level model, I show this:
Me: In other words, a Customer places an Order, an Order contains one or more Products, and an Order has at least one Payment. Does that capture it?
Dana: Yes, that makes sense. That’s how our order process operates at a very high level.
Me: Thanks Dana. That’s all I need for now.
The above diagram is a very basic conceptual model. It’s not fancy, and that’s by design. Based on your interpretation of this conversation, you might draw something different. That’s fine. This is where the CDM shines in its interactions with the business - you can show it to a non-technical business person, and they’ll understand what you’re describing. Also, there are no attributes, keys, or cardinality annotations, though some approaches to CDM include these. In our example, we’ll add those in the logical layer. It’s worth noting that while our example does not contain cardinality information, it can be included in the conceptual model if it helps improve your and your business stakeholders’ understanding of the model. For now, we have a high-level model that captures the concepts of the order workflow that Dana and you discussed.
Starting with the CDM, we’ll model the high-level business concepts of Customers, Orders, Products, and Payment in a technology-agnostic way. At this point, we’re not concerned with designing tables or fields. We’re simply capturing the nouns and verbs that might be core to what our stakeholder finds important to tell us. When meeting with stakeholders, you could draw this on a single piece of paper, a whiteboard, or a diagramming tool. For the diagram above, you could also sketch this as a basic node-link diagram: Customer → Order → Product and Payment. The implementation is secondary to capturing the correct information and how things flow and relate.
Using regular language, the nouns are the entities, and the verbs are the relationships. Let’s discuss this a bit more. In this example, we have the following entities: Customer, Order, Product, and Payment. Let’s look at the relationships between these entities. A Customer places an Order, which contains Products. An Order has a Payment associated with it.
For clarity, here’s a bulleted summary.
Entities:
Customer
Order
Product
Payment
Relationships:
Customer places an Order
Order contains Product(s)
Order has Payment(s)
Customer makes a Payment
This raises an interesting question you’ll likely encounter when modeling data. During your interview with Dana, she mentions the word “customer.” What is a customer? Is it someone who’s ever purchased a product? Or someone who has purchased it in the last 90 days? You might be surprised that a customer can mean many things to many people. This is where data modeling meets semantics. The same entity label can represent different concepts to different departments, and aligning on these definitions is just as important as drawing the model. Someone in e-commerce operations will have an idea of a customer that completely differs from someone in marketing or logistics. It’s important to ask follow-up questions to clarify nuances, especially if you hear a term used across multiple business units or domains. Compare and contrast what you hear with others in your organization (and possibly outside of it). Get as much information as possible to understand the business from many angles.
The (Intermediate) Logical Model – The Intermediate Design
We have a conceptual understanding of the order process. Let’s next go deeper and build the logical model. Here, you’ll expand on your conceptual model and add attributes, keys, and cardinalities in a technology-agnostic way.
Building on our mock interview with Dana, let’s talk to Mike, a technical stakeholder and senior backend engineer who manages the current production RDBMS database. Although Mike did not explicitly design a conceptual model, he thankfully remembered a bit of database normalization from his database class in university. We’ll cover normalization in more detail later in this book, but for now, know that normalization avoids redundant data, which can create inconsistencies.
You: Mike, let’s move one layer down. I want to take these concepts and start adding more structure. Can you help me understand what attributes each of these entities typically has?
Mike: Sure. The customers table has customer_id, first_name, last_name, email, and created_at.
You: So, customer_id would be the primary key here?
Mike: Yes, it’s an integer. All primary keys are integers in the database.
You: Great. For Orders?
Mike: The Orders table has order_id, customer_id, order_date, and status.
You: And how are products stored? Do you link them directly in the order?
Mike: No, products themselves are in the Products table: product_id, name, sku, price, and category. There’s also an OrderItems table with order_id and product_id, plus a quantity associated with an order.
You: Perfect. So that’s our join table for the many-to-many relationship between Orders and Products?
Mike: Exactly.
You: What about Payments?
Mike: That’s a separate Payments table. It includes payment_id, order_id, customer_id, amount, payment_method, and payment_date.
You: Great. I’ll draft a logical model where the primary and foreign keys are defined, each entity has core attributes, and the relationships between entities are shown. Also, I’ll keep it platform-agnostic for now. Thanks, Mike.
Mike: Sounds good. Best of luck.
Okay, so we have some details of what’s in the application database. Let’s take the Entities from our conceptual model and add Mike's attributes. We’ll also add the OrderItems entity.
Here are the entities and their attributes
Customer - customer_id, first_name, last_name, email, and created_at
Order - order_id, customer_id, order_date, and status
Product - product_id, name, sku, price, and category
OrderItems - order_id and product_id, quantity
Payment - payment_id, order_id, amount, payment_method, and payment_date.
Let’s look at the relationships between entities and their keys. Keys are a way of joining entities together, as we discussed earlier. Let’s map out the keys and how the entities relate to each other, using the verbs from the conceptual model as a guide for the relationships.
[Customer] (customer_id PK) --- (places) ---> [Order] (order_id, customer_id FK)
[Order] (order_id PK) --- (contains) ---> [OrderItems] (order_id FK, product_id FK, Quantity)
[OrderItems] (order_id FK, product_id FK, Quantity) --- (contains) ---> [Product] (product_id PK)
[Order] (order_id PK) --- (has) ---> [Payment](order_id FK)
[Customer] (customer_id PK) --- (makes) ---> [Payment] (customer_id FK)
Let’s now create a logical model that adds more detail to the CDM, but is still technology agnostic. We’ll include the entities and their attributes, primary and foreign keys, and the cardinality of the relationships (one-to-many, many-to-many, etc.).
This logical model provides a good blueprint for implementation in any RDBMS. You can extend this model to semi-structured data (see our alt-relational modeling chapters for ideas).
Now that we’ve built the logical model, it’s time to expand on this at the physical modeling layer.
The Physical Data Model
You’ve captured the entities, attribute names, keys, and relationships in a logical model. It’s time to bring these ideas into the real world with a physical implementation. Remember that we’re working with a physically implemented database. Let’s talk again with Mike, the backend engineer, to clarify the physical RDBMS to understand how it’s designed and to give us information on creating a data model diagram.
You: Now that we’ve established the logical model, let’s discuss its actual implementation. Mike, can you walk me through the key technical decisions in the current database?
Mike: Sure. It’s a PostgreSQL database. We use integers for all primary keys. Email fields are VARCHAR(255), and we index order_date on the Order table because that’s frequently queried.
You: Got it. Are all foreign key relationships enforced in the database?
Mike: Yes.
You: Good to know. And are any fields optional?
Mike: Yeah, payment_method is nullable because we log the order before the payment is confirmed.
You: OK. So in the physical model, I’ll specify stuff like this:
Data types (VARCHAR(255), DECIMAL(10,2), etc.)
Nullability
Indexes
Any deviations from the logical model
Notes about constraints or platform behavior
Does that sound right?
Mike: That’s spot on. Oh, one more thing. We also have a created_at and updated_at timestamp on every table. Except OrderItems.
You: In our previous chat, you didn’t mention that every table has created_at and updated_at. But good call. I’ll make a note of that in the schema documentation.
Mike: Sorry, I forgot. But yes, the created_at and updated_at fields are in both tables.
You: No problem, Mike. Thanks for the clarification. I’ll make a note. And thanks for your time. This has been very helpful.
Mike: Sounds good. While you’re here, should I just give you read access to GitHub where we keep our PostgreSQL database schema?
You: Oh wow, that would be amazing. Thanks Mike.
Mike: Excellent. Holler if you have any other questions.
Your discussion with Mike yielded some good information. In your meeting notes, you mention the database (PostgreSQL) and the need to fill in the actual data types, indexes, nullable fields, and constraints. Even better, you now have access to the SQL data definition language (DDL). Score!
Let’s have a look at the DDL that Mike provided to you.
-- Customers Table
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Orders Table
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL REFERENCES customers(customer_id),
order_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(50) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Products Table
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
sku VARCHAR(100) UNIQUE NOT NULL,
price DECIMAL(10, 2) NOT NULL,
category VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- OrderItems Table (Junction Table)
CREATE TABLE order_items (
order_id INTEGER NOT NULL REFERENCES orders(order_id),
product_id INTEGER NOT NULL REFERENCES products(product_id),
quantity INTEGER NOT NULL CHECK (quantity > 0),
PRIMARY KEY (order_id, product_id)
);
-- Payments Table
CREATE TABLE payments (
payment_id SERIAL PRIMARY KEY,
order_id INTEGER NOT NULL REFERENCES orders(order_id),
customer_id INTEGER NOT NULL REFERENCES customers(customer_id),
amount DECIMAL(10, 2) NOT NULL,
payment_method VARCHAR(50), -- Nullable until confirmed
payment_date TIMESTAMP NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Indexes for performance
CREATE INDEX idx_orders_order_date ON orders(order_date);
CREATE INDEX idx_payments_order_id ON payments(order_id);
Here’s the physical database diagram generated by a PostgreSQL editor.
Again, you’ll create the DDL and physical model diagram in a greenfield situation. Thankfully, Mike had the DDL handy, and the data model looks decent.
You might wonder about starting with the DDL and physical database diagram and working backwards to the conceptual model. Yes, you can. The physical data model didn’t just magically appear. It was created by someone or a group of people who were hopefully trying to map business concepts to a data model. Sometimes an intentional data model exists, but often the physical diagram might be all you have. If this is the case, use this diagram and work backwards, reverse-engineering back to the conceptual model. In all cases, consult your business and technical stakeholders to answer questions and clarify shared understanding of what the data model represents.
Now you have a basic idea of how to apply the levels of data modeling in the traditional sense. Because these steps are well documented in countless database design books and articles, I gave you a high-level overview because this topic has received deep treatment in countless places. To understand more, please refer to the Further Reading section at the end of this chapter.
While applying the levels of data modeling to a traditional use case like a relational database design (or data warehouse) is a good foundation, the world of data consists mostly of unstructured data. In the next section, we will look at how to apply the levels to unstructured data.
Updated May 13, 2025. Thanks to Remco for pointing out the missing customer-payment relationship, which is now added to this section.
Thanks all. I'm not sure if this will be the final version of this section. The challenges for me are:
1) this topic has been written about countless times. I feel like the reader needs a cursory view of the traditional treatment of levels, and provide them references to arguably better resources if they want to pursue this topic in more depth. Entire books have been written on this topic. Also, I feel like these levels were useful in an era that might not exist any longer. The degree to which these levels are used varies a lot, and in some cases, are nonexistent today.
2) The levels chapter will likely end up being 50-60 pages when it's done. That's going to require only keeping what's practical and necessary for the reader to equip them to better model data.
Great comments here and lots to think about.
Intresting starting point and nicely follows the CDM-LDM-PDM approach as been preached by many (especially on the IT siude of the spectrum). What I am missingh is the true Business aproach. I think that Dana woiuld tell you that a customer orders products and needs to pay for that order. In her mind that would sound like two different relationships to start with: Customer orders products and Customer pays for order. In the same discussion with Dana you can explain that there need to be a separate Order-Product relationship to avoid repeating Customer-Order for every product (Order is the event/trigger here).
The other relationship, imo, would be Customer-Order-Payment. These are all on the same grain and does right to what hte business is about - a Customer pays the order (Payment is the event / trigger).
If you start with the separation of relationship it might be that you are loosing the buusinees in understanding the model once you go to LDM and PDM.
But these were just my 2 cents, I will always try to be as true to modeling with business as long as I can.