(This is part of the Study Guide series, 70-457)
Microsoft’s Measured Skill description: This objective may include but is not limited to: understand statistics; read query plans; plan guides; DMVs; hints; statistics IO; dynamic vs. parameterized queries; describe the different join types (HASH, MERGE, LOOP) and describe the scenarios in which they would be used
What I see:
· understand statistics
· query hints
· statistics IO
· join types (HASH, MERGE, LOOP)
Statistics are the way that SQL Server records and uses the data distribution for tables and indexes. They allow the query optimizer to choose an appropriate plan based off of row count, histograms, or page density. Fresh statistics are necessary for the process to make the best possible decision, but stale statistics can fool SQL Server into thinking it has found the best plan, when in fact it is a sub-optimal plan. For a great read and more information on statistics, see this Idera post by Donabel Santos on Understanding SQL Server Statistics.
Query hints are a way to tell the optimizer what to do, regardless of what the optimizer might have done originally. A few popular query hints are KEEP PLAN, MAXDOP, OPTIMIZE FOR, and RECOMPILE. For instance, MAXDOP will override the configured instance max degree of parallelism. RECOMPILE will cause SQL Server to discard the query execution plan after the query has completed as opposed to persistently storing it for later use. Please see BOL for a full list of Query Hints and corresponding explanations. All of these query hints are probably fair game on the exam, so a cursory knowledge of what they do will benefit you.
The set statement, SET STATISTICS IO, is used to output statistics regarding disk activity for the executed T-SQL queries. To see a working example of this, execution the below T-SQL and view the Messages window to see the disk/cache activity:
set statistics io on;
set statistics io off;
This gives us information such as scan count, logical reads (from the data cache/memory), physical reads (from disk), read-ahead reads (read from disk to cache for future page reads), and the LOB equivalents to the aforementioned statistics. This is a great way to see if a query or a subset of queries is hitting the disk too often.
There are three particular join types the optimizer can choose to utilize:
Hash Join – this join takes the smaller of the two sets to join and makes a hash table and fits that in the memory grant. Then it takes the other set and probes by computing a hash value for each row and comparing it to the hash table. To see this join in action, utilize the following query (notice a relatively small table that can easily fit into memory as a hashed table):
from HumanResources.Department d
inner join HumanResources.EmployeeDepartmentHistory edh
on d.DepartmentID = edh.DepartmentID;
The execution plan should look like the following:
Merge Join – this join goes through the inputted rows only once, and this can show performance gains through sorted data:
from Person.Person p
inner join Person.BusinessEntity be
on p.BusinessEntityID = be.BusinessEntityID;
The execution plan will resemble the following:
Loop Join – this join does just as its name states: one of the data sets will have every row of data iterated for each row of the other data set:
from Person.Person p
inner join Person.BusinessEntityAddress bea
on p.BusinessEntityID = bea.BusinessEntityID
where bea.AddressTypeID = 5;
The execution plan for the above query:
If there are any comments, questions, issues, or suggestions please feel free to leave a comment below or email me at firstname.lastname@example.org.