时间:2021-07-01 10:21:17 帮助过:15人阅读
If you are a developer writing applications that use SQL Server and you are wondering what exactly happens when you ‘run’ a query from your application, I hope this article will help you write better database code and will help you get started when you have to investigate performance problems.
SQL Server is a client-server platform. The only way to interact with the back-end database is by sending requests that contain commands for the database. The protocol used to communicate between your application and the database is called TDS (Tabular Data Sream) and is described on MSDN in the Technical Document [MS-TDS]: Tabular Data Stream Protocol. The application can use one of the several client-side implementations of the protocol: the CLR managed SqlClient, OleDB, ODBC, JDBC, PHP Driver for SQL Server or the open source FreeTDS implementation. The gist of it is that when your application whats the database to do anything it will send a request over the TDS protocol. The request itself can take several forms:
After a complete TDS request reaches the database engine SQL Server will create a task to handle the request. The list of requests in the server can be queried from sys.dm_exec_requests.
The above mentioned task created to handle the request will represent the request from beginning till completion. For example if the request is a SQL Batch type request the task will represent the entire batch, not individual statements. Individual statements inside the SQL Batch will not create new tasks. Certain individual statements inside the batch may execute with parallelism (often referred to as DOP, Degree Of Parallelism) and in their case the task will spawn new sub-tasks for executing in parallel. If the request returns a result the batch is complete when the result is completely consumed by the client (eg. when you dispose the SqlDataReader). You can see the list of tasks in the server by querying sys.dm_os_tasks.
When a new request reaches the server and the task is created to handle that request, in PENDING state. At this stage the server has no idea yet what the request actually is. The task has to start executing first, and for this the engine must assign a worker to it.
Workers are the thread pool of SQL Server. A number of workers is created initially at server start up and more can be created on-demand up to the configured max worker threads. Only workers execute code. Workers are waiting for PENDING tasks to become available (from requests coming into the server) and then each worker takes exactly one task and executes it. The worker is busy (occupied) until the task finishes completely. Tasks that are PENDING when there are no more available workers will have to wait until one of the executing (running) task completes and the worker that executed that task becomes available to execute another pending task. For a SQL batch request the worker that picks up that task will execute the entire SQL batch (every statement). This should settle the often asked question whether statements in a SQL batch (=> request => task => worker) can execute in parallel: no, as they are executed on a single thread (=> worker) then each statement must complete before the next one starts. For statements that internally use parallelism (DOP > 1) and create sub-tasks, each sub-task goes through exactly the same cycle: it is created as PENDING and a worker must pick it up and execute it (a different worker from the SQL batch worker, that is by definition occupied!). The lists and state of workers inside SQL Server can be seen by querying sys.dm_os_workers.
Once a task started executing a request the first thing it needs to do is to understand the content of the request. At this stage SQL Server will behave much like an interpreted language VM: the T-SQL text inside the request will be parsed and an abstract syntax tree will be created to represent the request. The entire request (batch) is parsed and compiled. If an error occurs at this stage, the requests terminates with a compilation error (the request is then complete, the task is done and the worker is free to pick up another pending task). SQL, and T-SQL, is a high end declarative language with extremely complex statements (think SELECT with several JOINs). Compilation of T-SQL batches does not result in executable code similar to native CPU instructions and not even similar to CLI instructions or JVM bytecode, but instead results primarily in data access plans (or query plans). These plans describe the way to open the tables and indexes, search and locate the rows of interest, and do any data manipulation as requested in the SQL batch. For instance a query plan will describe an access path like ‘open index idx1 on table t, locate the row with the key ‘k‘ and return the columns a and b‘. As a side note: a common mistake done by developers is trying to come up with a single T-SQL query that cover many alternatives, usually by using clever expressions in the WHERE clause, often having many OR alternatives (eg.(COLUMN = @parameter OR @parameter IS NULL). For developers trying to keep things DRY and avoiding repetition are good practices, for SQL queries they are plain bad. The compilation has to come up with an access path that work for any value of the input parameters and the result is most often sub-optimal. I cannot close this side without urging you to read Dynamic Search Conditions in T-SQL if you want to learn more about this subject.
Speaking of choosing an optimal data access path, this is the next stage in the lifetime of the request: optimization. In SQL, and in T-SQL, optimization means choosing the best the data access path from all the possible alternatives. Consider that if you have a simple query with join between two tables and each table has an additional index there are already 4 possible ways to access the data and the number of possibilities grows exponentially as the query complexity increases and more alternative access paths are available (basically, more indexes). Add to this that the JOIN can be done using various strategies (nested loop, hash, merge) and you’ll see why optimization is such an important concept in SQL. SQL Server uses a cost based optimizer, meaning that it will consider all (or at least many) of the possible alternatives, try to make an educated guess about the cost of each alternative, and then choose the one with the lowest cost. Cost is calculated primarily by considering the size of data that would have to be read by each alternative. In order to come up with these costs SQL Server needs to know the size of each table and the distribution of column values, which is are available from the statistics associated with the data. Other factors considered are the CPU consumption and the memory required for each plan alternative. Using formulas tuned over many years of experience all these factors are synthesized into a single cost value for each alternative and then the alternative with the lowest cost is chosen as the query plan to be used.
Exploring all these alternatives can be time consuming and this is why once a query plan is created is also cached for future reuse. Future similar requests can skip the optimization phase if they can find an already compiled and optimized query plan in the SQL Server internal cache. For a lengthier discussion see Execution Plan Caching and Reuse.
Once a query plan is chosen by the Optimizer the request can start executing. The query plan gets translated into an actual execution tree. Each node in this tree is an operator. All operators implement an abstract interface with 3 methods: open(), next(), close(). The execution loop consists in calling open() on the operator that is at the root of the tree, then calling next() repeatedly until it returns false, and finally calling close(). The operator at the root of the tree will in turn call the same operation on each of its children operators, and these in turn call the same operations on their child operators and so on. At the leaf the trees there are usually physical access operators that actually retrieve data from tables and indexes. At intermediate levels there are operators that implement various data operations like filtering data, performing JOINs or sorting the rows. Queries that use parallelism use a special operator called an Exchange operator. The Exchange operator launches multiple threads (tasks => workers) into execution and asks each thread to execute a sub-tree of the query plan. It then aggregates the output from these operators, using a typical multiple-producers-one-consumer pattern. An excellent description of this execution model can be found in the Volcano-An Extensible and Parallel Query Evaluation System.
This execution model applies not only to queries, but also to data modification (insert, delete, update). There are operators that handle inserting a row, operators that handle deleting a row and operators that handle updating a row. Some requests create trivial plans (eg. a INSERT INTO ... VALUES ...) while other creates extremely complex plans, but the execution is identical for all of the and occurs just as I described: the execution tree is iterated callingnext() until its done.
Some operators are very simple, consider for example the TOP(N) operator: when next() is called on it, all it just has to call next() on its children and keep a count. After N times being called, it simply return false w/o calling the children anymore, thus terminating the iteration of that particular sub-tree.
Other operators have more complex behavior, consider what a nested loop operator has to do: it needs to keep track of the loop iteration position on both the outer child and inner child, call next() on the outer child, rewind the inner child and call next() on the inner child until the join predicate is satisfied (see Nested Loop Joins for a more thorough discussion).
Certain operators have a stop-and-go behavior, meaning that they cannot produce any output until they consumed all the input from their own children operators. Examples of such operators is SORT: the very first call tonext() does not return until all the rows created by the children operators are retrieved and sorted.
An operator like HASH JOIN will be both complex and stop-and-go behavior: to build the hash table it has to callnext() on the build side child until that operator returns false. It then calls next() on the probe side child operator until a match is found in the hash table, then return. Subsequent calls continue to call next() on the probe side child operator and return on hash table match, until the probe side child operator next() returns false (see Hash Join for a more thorough discussion).
Results are returned back to the client program as the execution proceeds. As rows ‘bubble’ up the execution tree, the top operator is usually tasked with writing these rows into network buffers and sending them to back to the client. The result is not created first into some intermediate storage (memory or disk) and then sent back to the client, instead it is sent back as is being created (as the query executes). Sending the result back to the client is, of course, subject to the network flow control protocol. If the client is not actively consuming the result (eg. by callingSqlDataReader.Read()) then eventually the flow control will have to block the sending side (the query that is being executed) and this in turn will suspend the execution of the query. The query resumes and produces more results (continue iterating the execution plan) as soon as the network flow control relieves the required network resources.
An interesting case is OUTPUT parameters associated with the request. To return the output value back to the client the value has to be inserted into the network stream of data that flows from the query execution back to the client. The value can only be written back to the client at the end of execution, as the request finishes. This is why output parameter values can only be checked after all results were consumed.
At this moment I feel is necessary to introduce the way data is organized in SQL Server, because understanding of the Data Access topic depends on understanding the data organization. Data in SQL Server can be organized one of three ways:
With SQL Server 2012 there is another mode of organizing a data, namely Nonclustered Columnstores and in the next version of SQL Server there will also be a Clustered Columnstore mode. If you’re interested in them read the articles linked.
At the leaf extremities of the execution tree there are operators that implement the access to the data. The operators will return an actual row of data from a table (or from an index) when the next() method is called on them. There are three possible data access operators:
Strictly speaking all the operators used to insert, delete or update rows are also data access operators. TheInserted Scan and the Deleted Scanoperators do access data to iterate over the trigger inserted anddeleted pseudo-tables. Log Row Scan is a more esoteric data access operator (it reads rows from the log, not from the data tables). But going into this level of detail to explain how these work would derail the purpose of this very article.
Frequently you will see the concept of Range Scan being mentioned. This refers to a Seek operator that uses a seek to quickly locate a row by key and then it continues by iterating the rows from that position forward, sometimes up to a specific second key value. The Seek operator is performing a scan within the range defined by the start and end keys, hence the term Range Scan being used.
If we look back at how query execution occurs we can now understand how the data access operators drive the entire query plan iteration. When the next() method is called on the operator at the root of the query tree the call percolates down from parent operator to child operator until it reaches a data access operators. These operators implement next() by actually reading the data from the underlying source (heap or B-Tree) and returning the row read. They remember the position of the row returned and subsequent call to next() reads the next row and return this next row. Data access operators do not have more child operators, they sit at the leaves of the query plan tree. Operators higher on the tree implement functionality like filtering row, joining tables, sorting results, computing aggregates and so on atop the data returned by the data access operators.
The data access operators always read data from a cache, never from disk. This cache is called the Buffer Pool. If data is not present in the cache the data access operator must request it from the disk (issue a disk IO read) and wait until is fetched into the cache. Data in the cache (in the Buffer Pool) is shared between all queries so once fetched subsequent data access operators that have to read the same data will benefit from finding the data in the cache. SQL Server will read as much data into this cache as is possible, growing the process allocated private memory until all the memory on the system is allocated to SQL Server (subject to a max limit configured by changing the max server memory). The Buffer Pool cache, as well as the IO read/write requests do not handle individual rows but instead they always operate on a 8Kb page.
Lets consider how a data access operator like a Scan would read data from an unordered heap:
For comparison, here is how a data access operator would operate on a sorted B-Tree:
From the description of how scan operators operate you can see that every time they finish reading all the rows on a page they have to ‘fix’ the page into the buffer pool and this potentially can stall the operator as it has to wait for the page to be fetched from disk into memory. If the operator has to stall at every page, performance plummets. The solution is to read ahead pages that are not referenced by the operator now, but will be referenced soon. SQL Server does this and issues asynchronous read ahead requests for pages that will be required by the scan operator before the operator actually reaches that page to read the rows. Luckily by the time the operator reaches that page the page is already in the buffer pool and ‘fixing’ the page is nearly instantaneous. For more details see Reading Pages, Sequential Read Ahead. There exists also a special kind of read ahead for reading random pages for a nested loop, see Random Prefetching.
For a detailed discussion about latches read the Diagnosing and Resolving Latch Contention on SQL Serverwhitepaper.
Before going on to how data writes are execute, is necessary to give a brief description to the mechanisms that exists in place to ensure that always correct data is read from a page. Under multi-threading is always necessary to ensure that readers do not read an incomplete write. All programmers are familiar with the primitive structures used to enforce such protection: mutexes, semaphores, critical sections. In database nomenclature though the established term is a latch. Latches are data structures that protect resources for concurrent access (eg. each page in the buffer pool has a latch to protect it). Latches support multiple acquire modes: shared, exclusive, update, keep, destroy. The following table shows the latch mode compatibility:
KP | SH | UP | EX | DT | |
---|---|---|---|---|---|
KP | Y | Y | Y | Y | N |
SH | Y | Y | Y | N | N |
UP | Y | Y | N | N | N |
EX | Y | N | N | N | N |
DT | N | N | N | N | N |
Whenever a query operator needs to access a page (eg. to compare a key, or to read a row) it must acquire the page latch in SH mode. Multiple operators can read the same page concurrently as multiple threads can acquire the page latch in SH mode. Any operator that needs to modify the page must acquire the page latch in EX mode. Only one operator can modify the page at a time, an no other operators can read the same page until the modification is complete. These two types of requests show up in the wait stats was PAGELATCH_SH andPAGELATCH_EX wait types. The Diagnosing and Resolving Latch Contention on SQL Server whitepaper has more details, including on how the page latch also protects IO requests (so that the page is only read once from disk even if multiple concurrent threads need to fetch it from disk) and also how high end multi-CPU systems usesuperlatches to avoid processor cache invalidation.
It is important not to confuse the physical page protection offered by latches with the logical protection of locking. Locks can be controlled by the client (eg. by choosing a transaction isolation level) but latches are always required.
Operators that modify (write) data are very similar to the read operators presented above. When next() is called on a write operator it has to locate the row (or, for an insert, locate the position to insert a new row) and then do the actual modification.Then, when next() is called again, the do the modification on the next row. Delete and update operators are usually driven by other read operators that locate the row to be deleted or updated and these read operators pass to the write operators a bookmark that locates exactly the row to be modified. The actual inserting, deleting and modifying data though is a bit more convoluted than reading. SQL Server uses Write Ahead Logging which mean that each and every modification done to the data has to be first described in the log. Roughly speaking all writes occur in the following sequence:
A detailed description of this process can be found at SQL Server 2000 I/O Basics.
There is a special type of write that occurs using a different sequence: a minimally logged write. Only operations that insert new data can do minimally logged operations such as INSERT and append of a blob field using the.WRITE(@value, NULL,…) syntax of UPDATE. Certain conditions must be met before a minimally logged operation can occur. See Operations That Can Be Minimally Logged and also read The Data Loading Performance Guide. The sequence of operations done in a minimally logged operation is, roughly, the following:
Minimally logged operations are still fully transactional, consistent and durable. Up to the last moment the operation can be aborted (rolled back) or the server can crash and the database is left in a consistent state (after recovery). Just as with the fully logged operations, a detailed description of this process can be found at SQL Server 2000 I/O Basics.
Not all T-SQL statements are executed as an iteration of operators in a execution plan tree. The typical example are DDL statements, like CREATE TABLE. To understand how DDL statements work, is important to understand that SQL Server stores all the metadata about any object in the database in internal system tables.An insert into the system tables that describes the existing tables will result in a new table being recognized by SQL Server. An insert into the system table that describe columns will add a column to a table. Deleting a row in these tables it means the table object or a column of a table gets dropped. Everything inside the database is described by these system tables, there are about 80 system tables that cover objects, procedures, functions, schemas, users, logins, certificates, views, partitions, permissions, databases, files, literally every SQL Server concept. Therefore what DDL statements have to do is just to maintain these system tables. A statement like CREATE TABLE has to insert a row in the system table describing objects (tables are just one of the possible object types) and some rows in the system tables describing columns and voila, your new table was ‘created’ (Keep in mind I’m simplifying a great deal). While DDL statements do not use directly the operators that I mentioned above, they do use the same code to effectively access the system tables (read rows, write rows). When a DDL statement executes it does not callnext() on an Seek operator but instead is using directly the code that the said Seek operator would had used to locate the row it desires in the system table. The DDL statement accomplishes its work by inserting, deleting or updating rows in these system tables. A few DDL statements have additionally do some operations outside these system tables, eg. they have to create or delete files on disk for the database, or they have to connect to Windows Clustering API to configure Availability Groups. And some DDL statements have to manipulate the data tables internally, eg. to populate the default values of a new column or validate that the existing data conforms to a newly added check constraint.
After we sift out the query statements (including DML) and the DDL we’re left with a few special statements. BACKUP and RESTORE operates by bypassing almost everything we discussed so far. From a 10000ft view what BACKUP and RESTORE ultimately do is just a glorified copy from one file to another. BACKUP reads from the data and/or log files and writes into the backup file. RESTORE reads from the backup file and writes into the data and/or log files. They do need to do some housekeeping of system tables, but the bulk of work they simply read from a file and write into another. In doing so they bypass the data cache (the buffer pull). As for DBCC statements, pretty much each one does something different. For an example of how DBCC CHECKDB works I recommend reading the series of articles CHECKDB from every angle.
Developers that interact with databases have two main pain points: troubleshooting performance and troubleshooting data loss issues. This article will not help you much with the later, if you do not have a backup, nothing will save your, sorry! But I hope that understanding how things work will shed some light into the performance troubleshooting problems. Once you understand that your client is sending requests to the server and the server is creating a task for for each request, the performance conundrum can be simplified a great deal: at any moment your task is either executing (consuming CPU cycles) or is waiting. And every time it waits, and I mean everytime, the wait information (what was waited on and how long) will be collected by SQL Server internal wait info statistics. There is an excellent methodology on how to leverage these wait info collected statistics to troubleshoot performance bottlenecks: the Waits and Queues whitepaper. If you follow one link from the many I referenced in this article, it better be this last link.
This article, along with any associated source code and files, is licensed under The Apache License, Version 2.0
bugcollect.com | |
United States |
Understanding how SQL Server executes a query
标签:rail bubble server apache latch track yourself profile users