Exercise List: Designing The Database Schema
Now that we have our design / graphical prototype down on paper, we have a clear understanding of all the big-picture data points that are going to be required for this application. Taking that, we can flesh our our data schema. In the past, people have told me that when you approach an object oriented application, you don't think in terms of database schema, you think in terms of objects and responsibilities? That's all good, but we are rocking some procedural code to start with and my mind simply hasn't been trained to work that way yet. As such, I design my database first.
The database schema is going to be fairly simple as our application has very litte functionality. The bulk of the information will be stored in the exercise table and it's join to the joint actions. Let's take a look:
NOTE: Since these database tables are part of my Kinky Solutions site database, I am starting them all with "el_" so that none of them will ever overlap with my site tables.
el_exercise
- id INT (primary key)
- name VARCHAR( 100 )
- description TEXT
- contraindications TEXT
- alternate_names TEXT
- is_deleted TINYINT
- date_updated
- date_created
el_exercise_joint_jn
- exercise_id INT (foreign key)
- joint_id INT (foreign key)
- joint_action_id INT (foreign key)
- movement_plane_id INT (foreign key)
- movement_symmetry_id INT (foreign key)
NOTE: The entire row will act as the primary key.
el_joint
- id INT (primary key)
- name VARCHAR( 20 )
- sort TINYINT
el_joint_action
- id INT (primary key)
- name VARCHAR( 20 )
- sort TINYINT
el_movement_plane
- id INT (primary key)
- name VARCHAR( 20 )
- sort TINYINT
el_movement_symmetry
- id INT (primary key)
- name VARCHAR( 20 )
- sort TINYINT
That's all there is to it. The sort columns are for display purposes so that we don't have to list things out alphabetically. For example, on joints, I like to display them in a top-to-bottom listing, which is not alphabetical. I would also like to put the most commong joint actions first. I don't know if this is the best way to handle it, but at least we have the flexibility built into the data schema.
Reader Comments
I really prefer to use primary keys in the fashion [table_name]_id like you did for the as foreign keys. Just ID is confusing and means you have to alias pk's in every joined query.
@Zac,
I used to do it that way as well, in fact, I used to create my tables in a way that all the fields started with [table_name]_. We had a company meeting to discuss naming conventions and we decide to go this way. At first, I was resistant, but looking back, I am loving it.
In the long run, you don't really have many naming conflicts. All you have to do is alias your tables:
SELECT
t.id
FROM
table t
It's hardly any extra typing, and in fact, I am in the habit of aliasing my table names even when I am using a single table query. The only time you have to alias a column name itself is when you need to return more than one table primary key. However, in 90% of those cases, I can generally return the foreign_key column which is named using your convention, so it's all good.
I second Zac's comment; I've also recently started prefixing foreign key fields with "frn_" - for example, if referring to joint_id from another table, I would name the column frn_joint_id. Just makes it clearer you're referring to another table...
@Seb,
In my database naming convention, 95% of the time, if a column ends in "_id", then it is referring to a foreign key column (the primary key ID column in another table). Rarely, it refers to a code-based variable, but that is usually because I am too lazy to make a property table to link to.
I confess to being a [tablename]_id weirdo as well.
I think one of the biggest advantage of naming primary keys with distinct names is there's no confusion as to what the data contains. When I first started, I used to use just "id" for a primary key names, but I quickly found that one large projects that naming convention became really confusing.
Using a variable of table name + id has simplified life for me. I always know exactly where my primary keys originate from now. On projects where you have lots of tables it's extremely helpful. One of the codebases I currently work on has almost 200 tables. Using just the name "id" wouldn't cut it for me.
@Dan,
I agree and I disagree. I think as long as you keep the naming convention - ID = primary key - then on small queries, there will never be any confusion. Then, on larger queries, where you don't have a base-set of data (meaning, there is no clear originating table whose ID would be the base ID of each record), then you can simply alias the columns for clarification. So, I agree that on larger, multi-join queries, aliasing can help.
When I first starting naming things this way, I was very hesitant. But, over the last two years, I haven't looked back. It's been a joy to have shorter column names, and much less redundant.
So how do you refer to the ID named pk in terms of urls and form fields?
I'm on the 'use the name of the table on the primary key' camp.
It just makes much more sense to me if I'm dealing with a table Categories to have the primary key called CategoryId, down at the Products table I'll have my ProductId and a foreign key called >CategoryId.
Makes it really easy to understand, not just for me, but for future developers that see DB design after I'm long gone.
It's the first time I see somebody using the name ID on every table. it would be really confusing for me to try to understand the whole schema if I had to do it that way.
While on the subject, what tools do you guys use for DB Design? to create the ERD and then generate the DDL?
@Zac,
I just pass ID around in the URL and FORM just as you would other table column names. For example:
edit.cfm?id=4
The way it gets passed really has nothing to do with the column name, as far as I can see.
Hi Ben,
Don't take it hard, its really poor db design.
1: name is reserve word,
2: each column should explicitly express itself. like intID, vcName , nvcName, dtTimeCreate, dtTimeUpdated, bActive, txtDescription., ntxtDescription.
3: table name should be in your case etblJoint, etblUser
4: view name like this evwJointL, evwUserC
for more info have a look http://dublincore.org/
Regards
@Sana,
I am not sure how your design differs from mine. It looks like the difference is merely naming conventions. And as far as each column referencing the data type - I love Hungarian notation as much as the next person, but I think enough people hate it for that to be an official database standard.
Why would to specify the difference between a View and Table in terms of naming prefix? From the end user's perspective, aren't they the same thing?
The point is that we both have a naming standard that we stick to.
i assume everything is a table and therefore don't prefix tables, i preview views with V_ or MV_ for a materialised view.
i use mostly oracle (which ignores case) and avoid camel casing and use underscores for formatting, which i carry over in to my CF
naming things in a database is a real form :) a good name will infer it's type ie LAST_UPDATED
@Zac,
I am with you on the good naming = data type inference. For example, all my date/time fields start with "date_", example:
date_created
date_updated
date_registered
date_last_login
All of my boolean type fields start with a boolean qualifier:
is_active
has_contract
was_kissed
The rest of my fields are usually text fields (Text, varchar, char, etc) and just rely on names that describe their meaning:
description
content
formatted_content
name
So far, it has been working very well. I am rarely ever confused as to what is in a field and I have an extremely high signal-to-noise ratio for my SQL code.
@Sana,
I don't know what DB you use, but NAME is not a reserved keyword or a future reserved keyword in Microsoft SQL Server. However, you can simply put [] around words that are considered keywords if you need to use them.
Hi Ben,
I'm with you on the naming conventions. I used to use preface all table attributes with the table name, but have opted for a style similar to yours.
My only suggestion, would be to avoid the TEXT fields in favour of long(ish) varchar fields. That is, if you plan to retrieving large data sets including the text fields. The extra lookup time on text fields severely impacts query performance for large data sets.
@Paul,
Good idea. My only concern with that non-Text field is that I then have to worry about the length of data being submitted. Really, I am just being lazy. I mean, I have Description, which will be short and Contraindications, which will be even short most likely.
@Sana,
I was just thinking - one of the benefits of an object oriented style of coding is that my eventually, my code will not have to worry about the underlying database structure. I am NOT saying that that is an excuse to have a bad database structure; I am just saying that that will be one of the underlying perks of this project.
I use TableName_ID for the primary key, and ForeignTableName_ID for all my foreign keys.
I do this because I like the looks of:
... WHERE Widget.Container_ID = Container.Container_ID
It may be longer to type, but I've found that when looking back over code that I wrote a while ago, I can figure out what is happening faster (because I don't really have to think at all).
I used to name everything with hungarian notation and other funky prefixes (prefices?), and while they contain all the information I need - it takes longer for my brain to process - especially when I'm able to look at the database diagram seperately.
Lately I'm using the DBDesigner tool from FabForce.net. There's an option when reverse engineering a Database that allows you to build References and Foreign Keys based on the name of your columns. I find this to be a very important feature because it allows me or any other developer to have a graphical understanding of the database diagram without doing a lot of digging.
If I were to use ID as the primary key on every table it would be really hard for me to come up with an ERD explaining the DB model. I guess that if everything is done correctly and Foreign Keys are formally defined then it really doesn't matter what name you use. However how many times have we inherited a system where nothing works the way is supposed to.
Personally I'm sticking to TableName_id for primary and foreign keys.
@ Matt Voorman
I also like the idea of reading this
WHERE Widget.Container_ID = Container.Container_ID
Hi Ben
Let have a look about using standard DB design
[strong]Benefits:[/strong]
1: Our Naming Convention, never worried about that column name or table name will assume as a serve word.
2: As OOP approach, you will be more aware of what type of data-type you are dealing with.
(i) was_kissed its boolean [No]
(ii) was_fu*k*d its boolean [yes]
data-type makes more sense when come to display. we could easily handle in cf yesnoformat. table columns are not just for you, but its for your team.
3: Never conflict with Javascript. event, name, element, document etc
4: DB approach is more elegant than the front end. in our experience that 80% of heavy duty task are totally rely on DB. To generate 200MX xml file in CF. Then oh now its has problems, use java.io for file writing, ohhh no use java.io buffer, ohhh no use java buffer stream to write file, oh now we have to control buffer size in kb. then close the buffer because its sucking ram.
I did read your blog post related parsing CSV file using regular expression, to be honest try mssql server to parse in tmp-table then get the data back in nano seconds.
name is reserve word, I am working on oracle and sql-server, try my best not to have any reserve word at-all. skipping [] reserved word never be nice practice and its just cause more frustration.
5: Cleary differentiate your db variables, html variables, CF variables
6: <cfset to = 30> is this looks right, yes it does work but not good interms of implementation. why not it should be
<cfset nTo = 30 /> <cfset nFrom = 1 />
n = numbers
s = strings
d = dates
b = boolean
st = Structure
ar = arrays
obj = objects
qry = query
x = xml
wx = wddx
7: Its for more than personal choice, which deliver more readable and easily maintainable codes
8: no offense:
9: Have a nice evening :) :)
10: tomorrow rest of discussion :
@Sana,
If I may respond to your points.
1: Our Naming Convention, never worried about that column name or table name will assume as a serve word.
This is a valid point. By using your naming convention, you never have to worry about conflicts like this. However, I am not sure that this, on its own, could outweigh the extra amount of code (enough to outweight the what, 1% of the time that your names do conflict and you have to use [] notation).
2: As OOP approach, you will be more aware of what type of data-type you are dealing with. data-type makes more sense when come to display. we could easily handle in cf yesnoformat. table columns are not just for you, but its for your team.
If someone knows what *kind* of data they have in the column, there should be a natural inference of the data type. Usually, knowing the exact data type is irrelevent. For example, if a boolean value is stored as a TINYINT, a BIT, an INT, from an output stand point, there is no difference - 0 is false and NOT 0 is true. Also, for string, it doesn't matter if its TEXT, Varchar, NVarchar, Char, its all just text and can be treated as such on the output.
I could understand if you were capturing that general data type, such as Numeric Of String, but going into more detail than that, I think you gather data that doesn't need to be gathered.
3: Never conflict with Javascript. event, name, element, document etc
It would also never conflict with the way I name my cat... which is also just as relavent. There is no reason you should ever consider overlap with another language??? I am not sure what you are getting at.
4....
I am not sure what your point here was exactly. It sounds like you are comparing Java to SQL database design? I did not follow.
As for the CSV parsing, I agree, doing it in a database import would be much faster. I am offering a non-database option. There are probably lots of ways. I am not saying one is better than the other. But is possible that even though faster options might exist, some people would still opt for a slower, yet fairly fast UDF to just drop in.
But, I am not sure how this ties into naming conventions???
5: Cleary differentiate your db variables, html variables, CF variables
Again, just as with the Javascript point above, this really shouldn't ever be considered. Database calls will only be in the context of queries or with record sets - and, as long as you scope everything properly, I am not sure where any confusion would come from???
6....
That is how I name my non-scoped variables. However, I am phasing that out a bit and moving more towards scoping most of my variables (in which case I don't use hungarian notation).
But currently, I actually do err this way in CF code, but not in SQL code. It is an incosistency of mine that I am looking to improve.
7: Its for more than personal choice, which deliver more readable and easily maintainable codes
I don't think its any one naming standard that makes the code more readable. I think it's the use of Consistent naming standards in within your code that make it readable.
That being said, readability is HUGELY a personal preference. Most people dig on me for putting so much white space in my code, but that is the only way Ican read it. If I were to look at someone else's code that has very little white, it can hardly concentrate on it at all.
That being said, it is my personal preference that this:
el_joint.description
is MUCH more readable than this:
etblJoint.txtDescription
I find that with your option there are so many more characters that I simply have to filter out to get at the meat of the code. Of course, part of that is just what you are used to.
All in all, I still don't see any benefit that is unique to the way either of us do things (although mine does require significantly less characters). I really think that at the end of the day, the key is CONSISTENCY no matter what you do. That, and agreeing internally upon a standard.
And also, no offense taken and I hope no offense given. Just a good conversation going here. I have never been formally trained in DB design, so maybe that is a bad thing, but maybe that is a good thing - a fresh pair of eyes and such.
The notation that Sana is referring to reminds me very strongly of the old C-style Hungarian notation, in which all variables are prefixed with their data type. That methodology was somewhat useful for strongly typed languaged. Yes, SQL is strongly typed, but ColdFusion is not (at least not at it's top layer). In the end ColdFusion ends up auto-casting, not lending any real credibility to the name given it. Just cause it starts with an 'n' doesn't mean that you get a number from that variable.
In the end it does boil down to preference eh? :)
@Sana,
I just found a great link on the TeraTech blog to Pete Frietag's blog for a tool that has reserved words:
http://www.petefreitag.com/tools/sql_reserved_words_checker/?word=name
Looks like "name" is not a reserved word anywhere??
@Ben,
Glad to hear this, we have submitted a request to oracle why [name] is problematic.
Anyway I still really determined that our naming convention are far more better than yours. Thats why we don't fall in these frustrated situations:
:) :)
http://www.teratech.com/blog/index.cfm/2007/10/25/SQL-Reserved-word-or-not-that-is-the-question
http://www.petefreitag.com/tools/sql_reserved_words_checker/?word=element
As I mentioned earlier you could have word like, element etc. You will never be sure; is that reserved word or not. why not follow some standards.
ok I think lets run a poll on your blog, which metadata declaration is more better in understanding and flawless.
:) :)
@Sana,
Ha ha, I think if we ran a poll and 99 people entered, we get 99 different responses :)
Nice article,I'll collection this usefull article.Thanks much.
I think it makes perfect sense to start with the database first - it's like trying to create copy for a website before you even know what kind of design you want. You need to know something of the limititations before you can just go ahead and do the writing.
Similarly, you need to start with the database - I mean, that's where all the info's going. So, it makes sense to me that you are doing things the way you are doing them.