Ran across a data mart with table and column names that have spaces in the physical table! Nothing like adding quotes on all SQL. Some tools handle it fine, many do not. Smelly tables = smelly code.
Spaces aren’t too much to ask for. Easiest biggest keyboard button out there. I agree can be a pain, but if a user viewed column, then best to make it look good.
Don't disagree, a column could be better presentable depending on the end product and user. If a requirement, there are more elegant ways for handling vs in the physical table. For example, views can be automated with the more friendly names, if spaces are a deal breaker. Data modeling tools can spit a view DDL version out with logical names. If tool is not a SQL script can create the view DDL as well selecting from the DB catalog. Unfortunately, unintentional data modeling does not take advantage of the data dictionary 'Comment' or 'Remark' descriptions built into most every database, where this information can be used for better column names or hover help type functionality, sucked into data catalogs and AI. As well, many good reporting tools have capabilities to read this metadata and include in the reporting model.
At the end of the day... who ever is building the report, will probably rename the column yet again! :-)
@joereis - Can't recall from previous chapters, and searched a bit in this stack, the built-in comments via DDL for columns and tables are so under used - be nice to see an example in your book in the metadata section or other section expressing best practices. In my dojo, data dictionary was required to get your MMA Data Modeling "belt" ;)
Doing an append of data for all time after each fiscal quarter and marking the historical records with the corresponding Quarter and Year flag fields instead of properly handling SCDs with type 2. Not scalable or user friendly whatsoever in a BI tool.
I ran across a single table that probably breaks all the rules mentioned in your post. One of the main tables is in commercially available purchased software from a $4B Corporation. Granted they buy software via acquisition... so they didn't create it themselves. But... If I was doing a house inspection, and I noticed a poor foundation, I would question the purchase price. Same with looking at the tables, much like the foundation, business model of the software.
Said Operational table:
200 Columns
Columns are Upper Case, Pascal Case, Snake Case
No Class Words on many columns, if Class word exists, could be as the prefix or suffix
Has both "Flag" and "Indicator" columns as Y/N.
Has 10 columns with the same column name and suffix number going from 01 to 10.
40 Columns are various UDF expansion columns
And this system also has had performance problems... imagine that.
Data Archeology.
My theory is by the column name you should be able to have a good idea of what is in the column, without having to look at the data values. i.e. ends in Amount, it is a dollar value. Sometimes will use this table in an interview and have the candidate expound on anything that is not good practice.
Another kind of data smell is when dealing with dashboards and analysis, complaining "I need more memory." I had that at one company, when I asked how much memory his laptop had he replied "128GB." I responded "that's not a memory problem, that's a you problem."
He had over 50 queries for his dashboards. Every time he needed to add a column, it was a new query.
Somebody was advising to program and document the code as if the code reviewer was a madman with an ax in his hand following at each important delivery. Unfortunately, the world we live in is full of surprises, multifolded constraints, and many other issues.
I met also well-known platforms with questionable design and code practices. The strangest design was a data model in which all fields were transposed over 4-5 columns with keys denoting what each piece of information was about. So, if an entity had 30 attributes, 30 records were created. Even if such a design has some advantages, the limits will in time overcome the advantages.
Through years I’ve been working with a bad data model that gives a lot of problems such as inconsistency. The work to avoid duplicate or even to eliminate them is very painful. Unfortunately the data warehouse system has been compromised for years, and we’re trying to fix it.
Ran across a data mart with table and column names that have spaces in the physical table! Nothing like adding quotes on all SQL. Some tools handle it fine, many do not. Smelly tables = smelly code.
Funny enough I have only seen that in SQL Server databases. Never in Oracle, PostgreSQL. Is it something about the MS stack developers?
Spaces aren’t too much to ask for. Easiest biggest keyboard button out there. I agree can be a pain, but if a user viewed column, then best to make it look good.
Don't disagree, a column could be better presentable depending on the end product and user. If a requirement, there are more elegant ways for handling vs in the physical table. For example, views can be automated with the more friendly names, if spaces are a deal breaker. Data modeling tools can spit a view DDL version out with logical names. If tool is not a SQL script can create the view DDL as well selecting from the DB catalog. Unfortunately, unintentional data modeling does not take advantage of the data dictionary 'Comment' or 'Remark' descriptions built into most every database, where this information can be used for better column names or hover help type functionality, sucked into data catalogs and AI. As well, many good reporting tools have capabilities to read this metadata and include in the reporting model.
At the end of the day... who ever is building the report, will probably rename the column yet again! :-)
@joereis - Can't recall from previous chapters, and searched a bit in this stack, the built-in comments via DDL for columns and tables are so under used - be nice to see an example in your book in the metadata section or other section expressing best practices. In my dojo, data dictionary was required to get your MMA Data Modeling "belt" ;)
I don’t really see how using an underscore is harder to read than a space.
Centipede fact tables https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/centipede-fact-table/
I saw someone try to do that behind a Power BI dashboard years ago. When all was needed was to sit it on top of one table and sum a few columns.
Holy hell…
The other horror story I have currently is a client who has decided to adopt Kimball 145 pattern for every table https://www.kimballgroup.com/2012/05/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/
These are both real life, current engagement headaches
Doing an append of data for all time after each fiscal quarter and marking the historical records with the corresponding Quarter and Year flag fields instead of properly handling SCDs with type 2. Not scalable or user friendly whatsoever in a BI tool.
I ran across a single table that probably breaks all the rules mentioned in your post. One of the main tables is in commercially available purchased software from a $4B Corporation. Granted they buy software via acquisition... so they didn't create it themselves. But... If I was doing a house inspection, and I noticed a poor foundation, I would question the purchase price. Same with looking at the tables, much like the foundation, business model of the software.
Said Operational table:
200 Columns
Columns are Upper Case, Pascal Case, Snake Case
No Class Words on many columns, if Class word exists, could be as the prefix or suffix
Has both "Flag" and "Indicator" columns as Y/N.
Has 10 columns with the same column name and suffix number going from 01 to 10.
40 Columns are various UDF expansion columns
And this system also has had performance problems... imagine that.
Data Archeology.
My theory is by the column name you should be able to have a good idea of what is in the column, without having to look at the data values. i.e. ends in Amount, it is a dollar value. Sometimes will use this table in an interview and have the candidate expound on anything that is not good practice.
Another kind of data smell is when dealing with dashboards and analysis, complaining "I need more memory." I had that at one company, when I asked how much memory his laptop had he replied "128GB." I responded "that's not a memory problem, that's a you problem."
He had over 50 queries for his dashboards. Every time he needed to add a column, it was a new query.
Somebody was advising to program and document the code as if the code reviewer was a madman with an ax in his hand following at each important delivery. Unfortunately, the world we live in is full of surprises, multifolded constraints, and many other issues.
I met also well-known platforms with questionable design and code practices. The strangest design was a data model in which all fields were transposed over 4-5 columns with keys denoting what each piece of information was about. So, if an entity had 30 attributes, 30 records were created. Even if such a design has some advantages, the limits will in time overcome the advantages.
POSIWID
I’m guilty as charged on the first count of data smells.
Through years I’ve been working with a bad data model that gives a lot of problems such as inconsistency. The work to avoid duplicate or even to eliminate them is very painful. Unfortunately the data warehouse system has been compromised for years, and we’re trying to fix it.