17 Comments
User's avatar
Maury's avatar

Ran across a data mart with table and column names that have spaces in the physical table! Nothing like adding quotes on all SQL. Some tools handle it fine, many do not. Smelly tables = smelly code.

Expand full comment
Michał I.'s avatar

Funny enough I have only seen that in SQL Server databases. Never in Oracle, PostgreSQL. Is it something about the MS stack developers?

Expand full comment
Donald Parish's avatar

Spaces aren’t too much to ask for. Easiest biggest keyboard button out there. I agree can be a pain, but if a user viewed column, then best to make it look good.

Expand full comment
Maury's avatar

Don't disagree, a column could be better presentable depending on the end product and user. If a requirement, there are more elegant ways for handling vs in the physical table. For example, views can be automated with the more friendly names, if spaces are a deal breaker. Data modeling tools can spit a view DDL version out with logical names. If tool is not a SQL script can create the view DDL as well selecting from the DB catalog. Unfortunately, unintentional data modeling does not take advantage of the data dictionary 'Comment' or 'Remark' descriptions built into most every database, where this information can be used for better column names or hover help type functionality, sucked into data catalogs and AI. As well, many good reporting tools have capabilities to read this metadata and include in the reporting model.

At the end of the day... who ever is building the report, will probably rename the column yet again! :-)

@joereis - Can't recall from previous chapters, and searched a bit in this stack, the built-in comments via DDL for columns and tables are so under used - be nice to see an example in your book in the metadata section or other section expressing best practices. In my dojo, data dictionary was required to get your MMA Data Modeling "belt" ;)

Expand full comment
Christian S's avatar

I don’t really see how using an underscore is harder to read than a space.

Expand full comment
Matt Kelliher-Gibson's avatar

I saw someone try to do that behind a Power BI dashboard years ago. When all was needed was to sit it on top of one table and sum a few columns.

Expand full comment
Joe Reis's avatar

Holy hell…

Expand full comment
Johnny Winter's avatar

The other horror story I have currently is a client who has decided to adopt Kimball 145 pattern for every table https://www.kimballgroup.com/2012/05/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/

Expand full comment
Johnny Winter's avatar

These are both real life, current engagement headaches

Expand full comment
Christian Steinert's avatar

Doing an append of data for all time after each fiscal quarter and marking the historical records with the corresponding Quarter and Year flag fields instead of properly handling SCDs with type 2. Not scalable or user friendly whatsoever in a BI tool.

Expand full comment
Maury's avatar

I ran across a single table that probably breaks all the rules mentioned in your post. One of the main tables is in commercially available purchased software from a $4B Corporation. Granted they buy software via acquisition... so they didn't create it themselves. But... If I was doing a house inspection, and I noticed a poor foundation, I would question the purchase price. Same with looking at the tables, much like the foundation, business model of the software.

Said Operational table:

200 Columns

Columns are Upper Case, Pascal Case, Snake Case

No Class Words on many columns, if Class word exists, could be as the prefix or suffix

Has both "Flag" and "Indicator" columns as Y/N.

Has 10 columns with the same column name and suffix number going from 01 to 10.

40 Columns are various UDF expansion columns

And this system also has had performance problems... imagine that.

Data Archeology.

My theory is by the column name you should be able to have a good idea of what is in the column, without having to look at the data values. i.e. ends in Amount, it is a dollar value. Sometimes will use this table in an interview and have the candidate expound on anything that is not good practice.

Expand full comment
Matt Kelliher-Gibson's avatar

Another kind of data smell is when dealing with dashboards and analysis, complaining "I need more memory." I had that at one company, when I asked how much memory his laptop had he replied "128GB." I responded "that's not a memory problem, that's a you problem."

He had over 50 queries for his dashboards. Every time he needed to add a column, it was a new query.

Expand full comment
Adrian's avatar

Somebody was advising to program and document the code as if the code reviewer was a madman with an ax in his hand following at each important delivery. Unfortunately, the world we live in is full of surprises, multifolded constraints, and many other issues.

I met also well-known platforms with questionable design and code practices. The strangest design was a data model in which all fields were transposed over 4-5 columns with keys denoting what each piece of information was about. So, if an entity had 30 attributes, 30 records were created. Even if such a design has some advantages, the limits will in time overcome the advantages.

Expand full comment
Patrick Cuba's avatar

POSIWID

Expand full comment
Donald Parish's avatar

I’m guilty as charged on the first count of data smells.

Expand full comment
Otiniel Cahungo's avatar

Through years I’ve been working with a bad data model that gives a lot of problems such as inconsistency. The work to avoid duplicate or even to eliminate them is very painful. Unfortunately the data warehouse system has been compromised for years, and we’re trying to fix it.

Expand full comment