17 Comments
Sep 11Liked by Joe Reis

I think one massive gap in modeling is how to model "other stuff".

Techniques for OLTP is pretty well documented (normalization)

Techniques for Cubes/Warehouse is pretty well documented (star schema, dimensions, etc.)

We could use a lot more for Master Data Management

We could use a lot more for "everything else" - data mart structures for analytics, etc.

Expand full comment
Sep 11Liked by Joe Reis

Is this going to be a book?

Expand full comment
Sep 16Liked by Joe Reis

Like the idea of the building blocks with data models. If I am tracking, depending on the starting point, I could think the 1 and 2 are reversed, if starting from the inception of data - no data exists. So the initial data is created with "What am I trying to represent" - some data model is built, intentionally, or unintentionally via CRUD or Log data - table, file, etc. Which becomes the "what form of data am I working with". Starting with a user story, developing entities and relationships, before even thinking of physical structures. Then the entities and attributes drive the form of data for storage.

I guess from a "data team" - downstream perspective, the data does exists in some form - "What form of data am I working with", and now my user story drives a data mart, or analytical model or ? of "What am I trying to represent".

Expand full comment
Sep 12Liked by Joe Reis

@joe reis. Martijn Evers has started a beautiful plot of all datamodels and how to position them in terms if conceptuality, flexibility and so on. If you want I could send it to you? It is a work in progress but really good. And I am sure he would like to chat to you about it. Martijn is in this field for over 30 years.

Expand full comment
author

Please do. Thanks!

Expand full comment
Sep 10Liked by Joe Reis

I'm looking forward to reading more. I'm trying to gather more thoughts on "dimensional modeling," which I believe is primarily about organizing data into facts and dimensions within a data warehouse to make it more accessible for analysis. More of a form of relational modeling instead of something fundamentally different.

Expand full comment
author

How about analytical data modeling?

Expand full comment
Sep 11Liked by Joe Reis

Dimensional modeling is probably more clear for historical reasons. I just wouldn’t contrast it with relational modeling. Maybe compare to E-R modeling? But “analytical modeling” isn’t bad. Shows difference in transaction oriented design to “get the data in” as opposed to “getting the data out”.

Expand full comment
Sep 10Liked by Joe Reis

I like this start. This approach makes it possible to add source and ‘collection context’ or something like this to the metadata. Also estimated deletion date etc. These additions will help with gdpr compliance, privacy by design. Data minimalisation, version control usage control etc. So that it will be crystal clear if and when data collected for one purpose is allowed to be used in another. And if it would be suitable.

Expand full comment
Sep 10Liked by Joe Reis

I like where you're going with this... Question for you: Is object orientation going to enter into the mix? There is a bit of buzz starting to happen around the creation of a hypergraphs that look (to me, anyway) a lot like object classes. Hypergraphs could include - in addition to the ones listed - behaviour, allowable states, and even code.

Expand full comment
author

yep, object orientation will be covered a ton. So will graphs. Thanks for the hypergraph suggestion. I think we talked about it a while ago, and it makes sense.

Expand full comment

Do "applications, analytics, and ML/AI" constitute the three mixed model arts? This lines up with the way we've structured our team at my current company, although "applications" is very close to "streaming / real-time".

To that end, I wonder if "state" might be another attribute you'd want to include. I don't typically think of statefulness in modeling discussions, but having proper timestamps, etc., is critical, especially in an application or ML context.

Expand full comment
author

I like this. Need to ponder it. I was going to include temporality. But if you're training an ML classifier, you may not have the notion of time. Good points Stephen. As always, making me think!

Expand full comment
author

I'll also add another form of data: ML and AI artifacts. I'll lump models, embeddings, and model inputs and outputs into this.

Expand full comment

The building blocks are different for different models. A high-level ontology here could look like this:

***

Reality

-- ...

Model

-- Physical

-- Mental

-- Digital (this is where we are with data these days)

---- Business model: *Domain*, *Process*, *Agent*

---- Data model

------ Conceptual model: *Concept*, *Relationship*, *Operation*

------ Logical model: *Entity*, *Key*, *Relationship*, *Attribute*

------ Physical model: *Data structure*, *Table*, *Constraint*, *File*...

------ Transformational model (first saw this term in a book by Serge Gershkovich): *Layer*, *DAG*, *Aggregation*...

---- AI model: AI Utility Function (term used by Bill Schmarzo)

------ ML model: Algorithm, Function, Metric...

Expand full comment
founding
Sep 10·edited Sep 10

1. I'm a bit confused about 'the form of data working with'. Isn't the form the end result of modeling? Is that what you had in mind? The form being the output of modeling, while what you represent being the input? If form is also meant as an input, I fail to see it that way.

2. I'd argue against using structured/unstructured datasets. The data in the datasets is str/unstr, those being properties of data, and not the datasets. Yes, I know, semantics therapy. 8-D

3. I have some questions on the data to be represented, but it's hard to capture them in a comment. I'll ask next time I get a chance.

I'm looking forward to reading the rest! This is a very strong hook, Joe, and I really like the building blocks use!

Expand full comment
author

1. Both. Maybe I’ll post a note I wrote a few weeks ago on this. The form can be the end result. It might also be the raw materials you’re working with. If you’re creating an image classifier, images are your raw materials for input.

2. I will probably rename tabular to structured or use them synonymously. Reinventing the wheel isn’t my goal.

3. Let’s chat about this

Thanks a ton for this Ramona.

Expand full comment