当前位置：Gxlcms > 数据库问题 > SQL基础笔记

SQL基础笔记

时间：2021-07-01 10:21:17 帮助过：9人阅读

Codecademy的课程以SQLite编写，笔记中改成了MySQL语句。 I. Learn SQL 1. Manipulation - Create, edit, delete data 1.4 Create 创建数据库或数据库中的表

CREATE TABLE celebs 
    (
    id INTEGER, 
    name TEXT, 
    age INTEGER 
    ); # 第一列id，数据类型整数；第二列name，数据类型文本；第三列age，数据类型整数

1.5 Insert 向表中插入行

INSERT INTO celebs ( id, name, age)
    VALUES ( 1, ‘Alan Mathison Turing‘, 42); # 在celebs表最下方插入数据：id列为1，name列为Alan Mathion Turing，age列为42

1.6 Select 选取数据

SELECT 
    *
FROM
    celebs; # 显示celebs表所有数据

1.7 Update 修改数据

UPDATE celebs 
SET 
    age = 22
WHERE
    id = 1; # 将celebs表中id=1的行的age改为22

1.8 Alert 更改表结构或数据类型

ALERT TABLE celebs
ADD COLUMN twitter_handle TEXT; # 在celebs表增加twitter_handle列

ALERT TABLE ‘test‘.‘data‘
CHANGE COLUMN ‘Mobile‘ ‘Mobile‘ BLOB NULL DEFAULT NULL; # 将表test.data的Mobile列的数据类型改为BLOB，该列数据默认为NULL

1.9 DELETE 删除行

DELETE FROM celebs 
WHERE
    twitter_handle IS NULL; # 删除表celebs中twitter_handle为NULL的行

2. Queries - Retrieve data 2.3 Select Distinct 返回唯一不同的值

SELECT DISTINCT
    genre
FROM
    movies; # 查询movies表中genre列的所有不重复值

2.4 Where 规定选择的标准

SELECT 
    *
FROM
    movies
WHERE
    imdb_rating > 8; # 查询movies表中imdb_rating大于8的行

= equals != not equals > greater than < less than >= greater than or equal to <= less than or equal to 2.5 Like I 在 WHERE 子句中搜索列中的指定模式

SELECT 
    *
FROM
    movies
WHERE
    name LIKE ‘ Se_en‘;

2.6 Like II

SELECT 
    *
FROM
    movies
WHERE
    name LIKE ‘a%‘;

SELECT 
    *
FROM
    movies
WHERE
    name LIKE ‘%man%‘;

NB 通配符 ‘_‘ substitutes any individual character ‘%‘ matches zero or more missing characters ‘[charlist]%‘ any individual character in string: WHERE city LIKE ‘[ALN]%‘ 以“A"或”L“或”N“开头的城市 ‘[!charlist]%‘ any individual character not in string: WHERE city LIKE ‘[!ALN]%‘ 不以“A"或”L“或”N“开头的城市 2.7 Between 在 WHERE 子句中使用，选取介于两个值之间的数据范围 The BETWEEN operator is used to filter the result set within a certain range. The values can be numbers, text or dates.

SELECT 
    *
FROM
    movies
WHERE
    name BETWEEN ‘A‘ AND ‘J‘; # 查询movies中name以A至J开头的所有行

NB: names that begin with letter "A" up to but not including "J". 不同的数据库对 BETWEEN...AND 操作符的处理方式是有差异的，有开区间、闭区间，也有半开半闭区间。

SELECT 
    *
FROM
    movies
WHERE
    year BETWEEN 1990 AND 2000; # 查询movies中year在1990至2000年间的行

NB: years between 1990 up to and including 2000 2.8 And 且运算符 AND is an operator that combines two conditions. Both conditions must be true for the row to be included in the result set.

SELECT 
    *
FROM
    movies
WHERE
    year BETWEEN 1990 AND 2000
        AND genre = ‘comedy‘; # 查询movies中year在1990至2000间，且genre为comedy的行

2.9 Or 或运算符 OR is used to combine more than one condition in WHERE clause. It evaluates each condition separately and if any of the conditions are true than the row is added to the result set. OR is an operator that filters the result set to only include rows where either condition is true.

SELECT 
    *
FROM
    movies
WHERE
    genre = ‘comedy‘ OR year < 1980; # 查询movies中genre为comedy，或year小于1980的行

2.10 Order By 对结果集进行排序

SELECT 
    *
FROM
    movies
ORDER BY imdb_rating DESC; # 查询movies中的行，结果以imdb_rating降序排列

DESC sorts the result by a particular column in descending order (high to low or Z - A). ASC ascending order (low to high or A - Z). 2.11 Limit 规定返回的记录的数目 LIMIT is a clause that lets you specify the maximum number of rows the result set will have.

SELECT 
    *
FROM
    movies
ORDER BY imdb_rating ASC
LIMIT 3;  # 查询movies中的行，结果以imdb_rating升序排列，仅返回前3行

MS SQL Server中使用SELECT TOP 3，Oracle中使用WHERE ROWNUM <= 5（？） 3. Aggregate Function 3.2 Count 返回匹配指定条件的行数 COUNT( ) is a function that takes the name of a column as an argument and counts the number of rows where the column is not NULL.

SELECT 
    COUNT(*)
FROM
    fake_apps
WHERE 
    price = 0; # 返回fake_apps中price=0的行数

3.3 Group By 合计函数

SELECT 
    price, COUNT(*)
FROM
    fake_apps
WHERE
    downloads > 2000
GROUP BY price; # 查询fake_apps表中downloads大于2000的行，将结果集根据price分组，返回price和行数

Here, our aggregate function is COUNT( ) and we are passing price as an argument（参数） to GROUP BY. SQL will count the total number of apps for each price in the table. It is usually helpful to SELECT the column you pass as an argument to GROUP BY. Here we SELECT price and COUNT(*). 3.4 Sum 返回数值列的总数（总额） SUM is a function that takes the name of a column as an argument and returns the sum of all the values in that column.

SELECT 
    category, SUM(downloads)
FROM
    fake_apps
GROUP BY category;

3.5 Max 返回一列中的最大值（NULL 值不包括在计算中） MAX( ) is a function that takes the name of a column as an argument and returns the largest value in that column.

SELECT 
    name, category, MAX(downloads)
FROM
    fake_apps
GROUP BY category;

3.6 Min 返回一列中的最小值（NULL 值不包括在计算中） MIN( ) is a function that takes the name of a column as an argument and returns the smallest value in that column.

SELECT 
    name, category, MIN(downloads)
FROM
    fake_apps
GROUP BY category;

3.7 Average 返回数值列的平均值（NULL 值不包括在计算中）

SELECT 
    price, AVG(downloads)
FROM
    fake_apps
GROUP BY price;

3.8 Round 把数值字段舍入为指定的小数位数 ROUND( ) is a function that takes a column name and an integer as an argument. It rounds the values in the column to the number of decimal places specified by the integer.

SELECT 
    price, ROUND(AVG(downloads), 2)
FROM
    fake_apps
GROUP BY price;

4. Multiple Tables 4.2 Primary Key 主键 A primary key serves as a unique identifier for each row or record in a given table. The primary key is literally an "id" value for a record. We could use this value to connect the table to other tables.

CREATE TABLE  artists
    (
    id INTEGER PRIMARY KET,
    name TEXT
    );

NB By specifying that the "id" column is the "PRIMARY KEY", SQL make sure that: 1. None of the values in this column are "NULL"; 2. Each value in this column is unique. A table can not have more than one "PRIMARY KEY" column. 4.3 Foreign Key 外键

SELECT 
    *
FROM
    albums
WHERE
    artist_id = 3;

A foreign key is a column that contains the primary key of another table in the database. We use foreign keys and primary keys to connect rows in two different tables. One table‘s foreign key holds the value of another table‘s primary key. Unlike primary keys, foreign keys do not need to be unique and can be NULL. Here, artist_id is a foreign key in the "albums" table. The relationship between the "artists" table and the "albums" table is the "id" value of the artists. 4.4 Cross Join 用于生成两张表的笛卡尔集

SELECT 
    albums.name, albums.year, artists.name
FROM
    albums,
    artists;

One way to query multiple tables is to write a SELECT statement with multiple table names seperated by a comma. This is also known as a "cross join". When querying more than one table, column names need to be specified by table_name.column_name. Unfortunately, the result of this cross join is not very useful. It combines every row of the "artists" table with every row of the "albums" table. It would be more useful to only combine the rows where the album was created by the artist. 4.5 Inner Join 内连接：在表中存在至少一个匹配时，INNER JOIN 关键字返回行

SELECT 
    *
FROM
    albums
        JOIN
    artists ON albums.artist_id = artists.id; # INNER JOIN等价于JOIN，写JOIN默认为INNER JOIN

In SQL, joins are used to combine rows from two or more tables. The most common type of join in SQL is an inner join. An inner join will combine rows from different tables if the join condition is true. 1. SELECT *: specifies the columns our result set will have. Here * refers to every column in both tables; 2. FROM albums: specifies first table we are querying; 3. JOIN artists ON: specifies the type of join as well as the second table; 4. albums.artist_id = artists.id: is the join condition that describes how the two tables are related to each other. Here, SQL uses the foreign key column "artist_id" in the "albums" table to match it with exactly one row in the "artists" table with the same value in the "id" column. It will only match one row in the "artists" table because "id" is the PRIMARY KEY of "artists". 4.6 Left Outer Join 左外连接：即使右表中没有匹配，也从左表返回所有的行

SELECT 
    *
FROM
    albums
        LEFT JOIN
    artists ON albums.artist_id = artists.id;

Outer joins also combine rows from two or more tables, but unlike inner joins, they do not require the join condition to be met. Instead, every row in the left table is returned in the result set, and if the join condition is not met, the NULL values are used to fill in the columns from the right table. RIGHT JOIN 右外链接：即使左表中没有匹配，也从右表返回所有的行 FULL JOIN 全链接：只要其中一个表中存在匹配，就返回行 4.7 Aliases 为列名称和表名称指定别名 AS is a keyword in SQL that allows you to rename a column or table using an alias. The new name can be anything you want as long as you put it inside of single quotes.

SELECT 
    albums.name AS ‘Album‘,
    albums.year,
    artists.name AS ‘Artist‘
FROM
    albums
        JOIN
    artists ON albums.artist_id = artists.id
WHERE
    albums.year > 1980;

NB The columns have not been renamed in either table. The aliases only appear in the result set. II. SQL: Table Transformation 1. Subqueries 子查询 1.2 Non-Correlated Subqueries I 不相关子查询

SELECT 
    *
FROM
    flights
WHERE
    origin IN (SELECT 
            code
        FROM
            airports
        WHERE
            elevation > 2000);

1.4 Non-Correlated Subqueries III

SELECT 
    a.dep_month,
    a.dep_day_of_week,
    AVG(a.flight_count) AS average_flights
FROM
    (SELECT 
        dep_month,
            dep_day_of_week,
            dep_date,
            COUNT(*) AS flight_count
    FROM
        flights
    GROUP BY 1 , 2 , 3) a
WHERE
    a.dep_day_of_week = ‘Friday‘
GROUP BY 1 , 2
ORDER BY 1 , 2; # 返回每个月中，每个星期五的平均航班数量

结构 [outer query] FROM [inner query] a WHERE GROUP BY ORDER BY NB "a": With the inner query, we create a virtual table. In the outer query, we can refer to the inner query as "a". "1,2,3" in inner query: refer to the first, second and third columns selected for display DBMS SELECT dep_month, (1) dep_day_of_week, (2) dep_date, (3) COUNT(*) AS flight_count (4) FROM flights

SELECT 
    a.dep_month,
    a.dep_day_of_week,
    AVG(a.flight_distance) AS average_distance
FROM
    (SELECT 
        dep_month,
    dep_day_of_week,
    dep_date,
    SUM(distance) AS flight_distance
    FROM
        flights
    GROUP BY 1 , 2 , 3) a
GROUP BY 1 , 2
ORDER BY 1 , 2; # 返回每个月中，每个周一、周二……至周日的平均飞行距离

1.5 Correlated Subqueries I 相关子查询 NB In a correlated subquery, the subquery can not be run independently of the outer query. The order of operations is important in a correlated subquery: 1. A row is processed in the outer query; 2. Then, for that particular row in the outer query, the subquery is executed. This means that for each row processed by the outer query, the subquery will also be processed for that row.

SELECT 
    id
FROM
    flights AS f
WHERE
    distance > (SELECT 
            AVG(distance)
        FROM
            flights
        WHERE
            carrier = f.carrier); # the list of all flights whose distance is above average for their carrier

1.6 Correlated Subqueries II In the above query, the inner query has to be reexecuted for each flight. Correlated subqueries may appear elsewhere besides the WHERE clause, they can also appear in the SELECT.

SELECT 
    carrier,
    id,
    (SELECT 
            COUNT(*)
        FROM
            flights f
        WHERE
            f.id < flights.id
                AND f.carrier = flights.carrier) + 1 AS flight_sequence_number
FROM
    flights; # 结果集为航空公司，航班id以及序号。相同航空公司的航班，id越大则序号越大

相关子查询中，对于外查询执行的每一行，子查询都会为这一行执行一次。在这段代码中，每当外查询提取一行数据中的carrier和id，子查询就会COUNT表中有多少行的carrier与外查询中的行的carrier相同，且id小于外查询中的行，并在COUNT结果上+1，这一结果列别名为flight_sequence_number。于是，id越大的航班，序号就越大。如果将"<"改为">"，则id越大的航班，序号越小。 2. Set Operation 2.2 Union 并集 (only distinct values) Sometimes, we need to merge two tables together and then query the merged result. There are two ways of doing this: 1) Merge the rows, called a join. 2) Merge the columns, called a union.

SELECT 
    item_name
FROM
    legacy_products 
UNION SELECT 
    item_name
FROM
    new_products;

Each SELECT statement within the UNION must have the same number of columns with similar data types. The columns in each SELECT statement must be in the same order. By default, the UNION operator selects only distinct values. 2.3 Union All 并集 (allows duplicate values)

SELECT 
    AVG(sale_price) 
FROM
    (SELECT 
        id, sale_price
    FROM
        order_items UNION ALL SELECT 
        id, sale_price
    FROM
        order_items_historic) AS a;

2.4 Intersect 交集 Microsoft SQL Server‘s INTERSECT returns any distinct values that are returned by both the query on the left and right sides of the INTERSECT operand.

SELECT category FROM new_products
INTERSECT
SELECT category FROM legacy_products;

NB MySQL不滋瓷INTERSECT，但可以用INNER JOIN+DISTINCT或WHERE...IN+DISTINCT或WHERE EXISTS实现：

SELECT DISTINCT
    category
FROM
    new_products
        INNER JOIN
    legacy_products USING (category);

或

SELECT DISTINCT
    category
FROM
    new_products
WHERE
    category IN (SELECT 
            category
        FROM
            legacy_products);

http://stackoverflow.com/questions/2621382/alternative-to-intersect-in-mysql 网上很多通过UNION ALL 实现的办法（如下）是错误的，可能会返回仅在一个表中出现且COUNT(*) > 1的值：

SELECT 
    category, COUNT(*)
FROM
    (SELECT 
        category
    FROM
        new_products UNION ALL SELECT 
        category
    FROM
        legacy_products) a
GROUP BY category
HAVING COUNT(*) > 1;

2.5 Except (MS SQL Server) / Minus (Oracle) 差集

SELECT category FROM legacy_products
EXCEPT # 在Oracle中为MINUS
SELECT category FROM new_products;

NB MySQL不滋瓷差集，但可以用WHERE...IS NULL+DISTINCT或WHERE...NOT IN+DISTINCT或WHERE EXISTS实现：

SELECT DISTINCT
    category
FROM
    legacy_products
        LEFT JOIN
    new_products USING (category)
WHERE
    new_products.category IS NULL;

或

SELECT DISTINCT
    category
FROM
    legacy_products
WHERE
    category NOT IN (SELECT 
            category
        FROM
            new_products);

3. Conditional Aggregates 3.2 NULL use IS NULL or IS NOT NULL in the WHERE clause to test whether a value is or is not null.

SELECT 
    COUNT(*)
FROM
    flights
WHERE
    arr_time IS NOT NULL
        AND destination = ‘ATL‘;

3.3 CASE WHEN "if, then, else"

SELECT
    CASE
        WHEN elevation < 250 THEN ‘Low‘
        WHEN elevation BETWEEN 250 AND 1749 THEN ‘Medium‘
        WHEN elevation >= 1750 THEN ‘High‘
        ELSE ‘Unknown‘
    END AS elevation_tier
    , COUNT(*)
FROM airports
GROUP BY 1;

END is required to terminate the statement, but ELSE is optional. If ELSE is not included, the result will be NULL. 3.4 COUNT(CASE WHEN) count the number of low elevation airports by state where low elevation is defined as less than 1000 ft.

SELECT 
    state,
    COUNT(CASE
        WHEN elevation < 1000 THEN 1
        ELSE NULL
    END) AS count_low_elevaton_airports
FROM
    airports
GROUP BY state;

3.5 SUM(CASE WHEN) sum the total flight distance and compare that to the sum of flight distance from a particular airline (in this case, Delta) by origin airport.

SELECT 
    origin,
    SUM(distance) AS total_flight_distance,
    SUM(CASE
        WHEN carrier = ‘DL‘ THEN distance
        ELSE 0
    END) AS total_delta_flight_distance
FROM
    flights
GROUP BY origin;

3.6 Combining aggregates find out the percentage of flight distance that is from Delta by origin airport.

SELECT 
    origin,
    100.0 * (SUM(CASE
        WHEN carrier = ‘DL‘ THEN distance
        ELSE 0
    END) / SUM(distance)) AS percentage_flight_distance_from_delta
FROM
    flights
GROUP BY origin;

3.7 Combining aggregates II Find the percentage of high elevation airports (elevation >= 2000) by state from the airports table.

SELECT 
    state,
    100.0 * COUNT(CASE
        WHEN elevation >= 2000 THEN 1
        ELSE NULL
    END) / COUNT(elevation) AS percentage_high_elevation_airports
FROM
    airports
GROUP BY 1;

或

SELECT 
    state,
    100.0 * SUM(CASE
        WHEN elevation >= 2000 THEN 1
        ELSE 0
    END) / COUNT(elevation) AS percentage_high_elevation_airports
FROM
    airports
GROUP BY 1;

4. Date, Number and String Functions MySQL Date 函数： https://dev.mysql.com/doc/refman/5.7/en/date-and-time-functions.html NOW() 返回当前的日期和时间 CURDATE() 返回当前的日期 CURTIME() 返回当前的时间 DATE() 提取日期或日期/时间表达式的日期部分

SQL基础笔记

人气教程排行