Data Model Smells

Mar 27

Your data stinks

17 Comments

Ran across a data mart with table and column names that have spaces in the physical table! Nothing like adding quotes on all SQL. Some tools handle it fine, many do not. Smelly tables = smelly code.

Expand full comment

Reply (2)

Michał I.

Mar 30

Funny enough I have only seen that in SQL Server databases. Never in Oracle, PostgreSQL. Is it something about the MS stack developers?

Expand full comment

Donald Parish

Mar 27

Spaces aren’t too much to ask for. Easiest biggest keyboard button out there. I agree can be a pain, but if a user viewed column, then best to make it look good.

Expand full comment

Reply (2)

Maury

Mar 28

Don't disagree, a column could be better presentable depending on the end product and user. If a requirement, there are more elegant ways for handling vs in the physical table. For example, views can be automated with the more friendly names, if spaces are a deal breaker. Data modeling tools can spit a view DDL version out with logical names. If tool is not a SQL script can create the view DDL as well selecting from the DB catalog. Unfortunately, unintentional data modeling does not take advantage of the data dictionary 'Comment' or 'Remark' descriptions built into most every database, where this information can be used for better column names or hover help type functionality, sucked into data catalogs and AI. As well, many good reporting tools have capabilities to read this metadata and include in the reporting model.

At the end of the day... who ever is building the report, will probably rename the column yet again! :-)

@joereis - Can't recall from previous chapters, and searched a bit in this stack, the built-in comments via DDL for columns and tables are so under used - be nice to see an example in your book in the metadata section or other section expressing best practices. In my dojo, data dictionary was required to get your MMA Data Modeling "belt" ;)

Expand full comment

Christian S

Mar 30

I don’t really see how using an underscore is harder to read than a space.

Expand full comment

Johnny Winter

Mar 27

Centipede fact tables https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/centipede-fact-table/

Expand full comment

Reply (2)

Matt Kelliher-Gibson

Mar 27

I saw someone try to do that behind a Power BI dashboard years ago. When all was needed was to sit it on top of one table and sum a few columns.

Expand full comment

Joe Reis

Mar 27

Holy hell…

Expand full comment

Reply (1)

Johnny Winter

Mar 27

The other horror story I have currently is a client who has decided to adopt Kimball 145 pattern for every table https://www.kimballgroup.com/2012/05/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/

Expand full comment

Reply (1)

Johnny Winter

Mar 27

These are both real life, current engagement headaches

Expand full comment

Christian Steinert

Mar 27

Doing an append of data for all time after each fiscal quarter and marking the historical records with the corresponding Quarter and Year flag fields instead of properly handling SCDs with type 2. Not scalable or user friendly whatsoever in a BI tool.

Expand full comment

Maury

Mar 27

I ran across a single table that probably breaks all the rules mentioned in your post. One of the main tables is in commercially available purchased software from a $4B Corporation. Granted they buy software via acquisition... so they didn't create it themselves. But... If I was doing a house inspection, and I noticed a poor foundation, I would question the purchase price. Same with looking at the tables, much like the foundation, business model of the software.

Said Operational table:

200 Columns

Columns are Upper Case, Pascal Case, Snake Case

No Class Words on many columns, if Class word exists, could be as the prefix or suffix

Has both "Flag" and "Indicator" columns as Y/N.

Has 10 columns with the same column name and suffix number going from 01 to 10.

40 Columns are various UDF expansion columns

And this system also has had performance problems... imagine that.

Data Archeology.

My theory is by the column name you should be able to have a good idea of what is in the column, without having to look at the data values. i.e. ends in Amount, it is a dollar value. Sometimes will use this table in an interview and have the candidate expound on anything that is not good practice.

Expand full comment

Matt Kelliher-Gibson

Mar 27

Another kind of data smell is when dealing with dashboards and analysis, complaining "I need more memory." I had that at one company, when I asked how much memory his laptop had he replied "128GB." I responded "that's not a memory problem, that's a you problem."

He had over 50 queries for his dashboards. Every time he needed to add a column, it was a new query.

Expand full comment

Adrian

Mar 29

Somebody was advising to program and document the code as if the code reviewer was a madman with an ax in his hand following at each important delivery. Unfortunately, the world we live in is full of surprises, multifolded constraints, and many other issues.

I met also well-known platforms with questionable design and code practices. The strangest design was a data model in which all fields were transposed over 4-5 columns with keys denoting what each piece of information was about. So, if an entity had 30 attributes, 30 records were created. Even if such a design has some advantages, the limits will in time overcome the advantages.

Expand full comment

Patrick Cuba

Mar 28

POSIWID

Expand full comment

Donald Parish

Mar 27

I’m guilty as charged on the first count of data smells.

Expand full comment

Otiniel Cahungo

Mar 27Edited

Through years I’ve been working with a bad data model that gives a lot of problems such as inconsistency. The work to avoid duplicate or even to eliminate them is very painful. Unfortunately the data warehouse system has been compromised for years, and we’re trying to fix it.

Expand full comment

Practical Data Modeling

Data Model Smells