Can someone please reply and let us know if these statements are true/false? References to articles, help sections etc would be most welcome!
1. Table variable is not efficient for large record sets
2. Nested Selects will (always) cause hash joins
3. SQLServer 6.5 joins should be removed.
4. Select * is not best practice.
I have read nothing that says table variables are less efficient than temporary tables for any given size - I know that it gets stored into tempdb if memory becomes a problem.
I believe bested selects are optimized as with any other part of the query, and therefore may be hash/merge/loop joined as necessary.
"SQL Server 6.5 joins" - what are these? I don't know... I've been given the list.
Are these valid exceptions for using select *
- when the result produced should automatically include all columns from a source table
- when testing for exists (I read it's more efficient than "exists (select pk_col from table)")
Thanks!
Richard
Select * is plainly a bad practice, because it means you have no control over what you asked for, especially if the schema changes.
Once you get to about 5000 records, temp tables are more efficient than table variables.
What do you mean by SQL Server 6.5 joins ? Nested loop joins ? Yes, they are very inefficient ( they scan table1RowCount * table2RowCount times )
I knew the first one, got the next two answers from google pretty quickly, but couldn't find an answer to if nested selects are always hash joins, sorry.|||Thanks cgraus, that's good. You didn't answer the additional comments re "select *" though.
Ppl, I still can't find references that say 5000 records is the pivotal point when tempdb tables are better than variables, in fact - I did, but it was unqualified opinion.
The official sites and KBs don't state any such limits.
Help with nested selects and hash joins, anyone?
|||1. Table variable is not efficient for large record sets
Yes, temporary tables or permanent tables are more efficient than table variables for large record sets. Table variables does not cause recompilations so there is no auto statistics created on the columns. In addition to this queries containing table variables cannot use a parallel execution plan and this will severely affect performance. One instance where you would use table variable is to avoid recompilation of SPs but this needs to be done with care since it can affect the performance of the queries. Below are some links that will help. The KB article link contains recompilation thresholds for temporary tables which provides a guideline as to when to switch to those from table variables to get better plans.
http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
http://support.microsoft.com/default.aspx?scid=kb;en-us;195565
2. Nested Selects will (always) cause hash joins
No. This depends on the indexes, data among other things.
3. SQLServer 6.5 joins should be removed.
Not sure what this refers to. If this refers to the old *= and =* style of outer joins which has been in SQL Server since the beginning then it is true. This syntax has been deprecated now in SQL Server 2005. Other than this, the statement is plain wrong since there is nothing like SQL Server 6.5 joins.
4. Select * is not best practice.
It depends. It is not a good practice to use SELECT * in view definitions or resultsets that are resulted directly to the client. The main reason being that the metadata of the resultset can change due to schema changes and clients may not be able to handle it correctly. On the other hand, you can use SELECT * for ad-hoc queries or in sub-queries with EXISTS/NOT EXISTS or derived tables probably.|||But is refering to the specific fileds in the select statment more efficiant then a blanket select?
|||I am not sure what is the context of your question. Generally, you should only request data that you want to use. So specifying all columns when unneeded or using SELECT * for results sent directly to client is a bad thing. It can hurt performance for example - consider the case of a text column in a table.|||I am asking about a situatuion where I require all fileds returned - I have seen mentioned in certain places that specifying the feilds is more efficeient. Indeed I ran a performace optimizer from microsoft a while ago and it marked my select *'s as obsolete ("may be dropped in future versions etc..."|||As I said, it is always wise to specify the fields that you want in the SELECT list. Client applications may not be able to handle new columns in the result set if you use SELECT * for example. It helps little bit in terms of performance but this is more of a maintenance/management issue. I am not sure what performance optimizer you are talking about but SELECT * is not obsolete syntax. It is part of the ANSI SQL standards and very much valid.|||We had a company standard (ten years ago) which almost mandated the use of Select *. It was for Sybase, not Ms-Sql.
We had pairs of caller-callee stored procs. The caller would create a temp table, and then call the callee, which would work on the data in the temp table. Nice modular isolation, but it only worked if both procedures used exactly the same definition for the temp table. There was also the need to compile the callee, passing it the same temp table as it would be when called in real life.
So, the concept of "template table" was born. This was sitting in the schema, without any rows, and was used exclusively for
select * into #temp from template_table
We used this statement in the caller, to create the table. We also used it in the compilation script for the callee. Over time, we added new columns to the temp table, but the mechanism remained valid.
I wonder whether such an old-time technique is applicable to MS-Sql. On one hand, I suspect the modularity requirement mis still present. On the other had, there could be better ways to pass work tables around - not sure.
Andrew|||From SQL70 onwards, you can reference temporary table created within a SP in any of the SPs called from within it or trigger code without having to pass it around. This is due to the deferred name resolution feature. So you don't need the template_table approach.|||
I noticed the Deferred Name Resolution as I'm now writing stored procs for Ms-Sql. It's an interesting feature, but could be a mixed blessing:
create procedure test_proc
as
select my_db = db_name(), my_name = suser_sname()
into #tmp
select no_such_column from #tmp
select no_such_col from no_such_table
go
The above compiles without any error, which is not quite an optimal situation. I believe compile-time checking should be as tight as possible.
There was a traditional argument for putting business rules into stored procs, as opposed to having them in client-code. The reasoning was that stored procs offered better compile-time checking. Now, instead of improving the checks at the client-side, we have a situation where the stored procs are becoming almost as "loose" as client code.
|||Yes, deferred name resolution has it's own pros and cons. If you want to see more stricter check during creation of modules, you should send a request to sqlwish@.microsoft.com for such a feature. I have seen such requests in the past and more the merrier.
No comments:
Post a Comment