Mosh Mysql学习笔记4

4 汇总统计

4.1 聚合函数

小结

聚合函数:输入一系列值并聚合为一个结果的函数

实例

USE sql_invoicing; ​ SELECT MAX(invoice_date) AS latest_date, -- SELECT选择的不仅可以是列,也可以是数字、列间表达式、列的聚合函数 MIN(invoice_total) lowest, -- AS 可省略 AVG(invoice_total) average, SUM(invoice_total * 1.1) total, COUNT(*) total_records, COUNT(invoice_total) number_of_invoices, -- 和上一个相等 COUNT(payment_date) number_of_payments, -- 【聚合函数会忽略空值】,得到的支付数少于发票数 COUNT(DISTINCT client_id) number_of_distinct_clients -- DISTINCT client_id 筛掉了该列的重复值,再COUNT计数,会得到不同顾客数 FROM invoices WHERE invoice_date > '2019-07-01' -- 想只统计下半年的结果

练习

目标:

date_rangetotal_salestotal_paymentswhat_we_expect (the difference)
1st_half_of_2019
2nd_half_of_2019
Total
USE sql_invoicing; ​ SELECT '1st_half_of_2019' AS date_range, SUM(invoice_total) AS total_sales, SUM(payment_total) AS total_payments, SUM(invoice_total - payment_total) AS what_we_expect FROM invoices WHERE invoice_date BETWEEN '2019-01-01' AND '2019-06-30' ​ UNION ​ SELECT '2st_half_of_2019' AS date_range, SUM(invoice_total) AS total_sales, SUM(payment_total) AS total_payments, SUM(invoice_total - payment_total) AS what_we_expect FROM invoices WHERE invoice_date BETWEEN '2019-07-01' AND '2019-12-31' ​ UNION ​ SELECT 'Total' AS date_range, SUM(invoice_total) AS total_sales, SUM(payment_total) AS total_payments, SUM(invoice_total - payment_total) AS what_we_expect FROM invoices WHERE invoice_date BETWEEN '2019-01-01' AND '2019-12-31'

4.2 GROUP BY子句

小结

按一列或多列分组,注意语句的位置

案例1:按一个字段分组

在发票记录表中按不同顾客分组统计下半年总销售额并降序排列

USE sql_invoicing; ​ SELECT client_id, SUM(invoice_total) AS total_sales FROM invoices WHERE invoice_date >= '2019-07-01' -- 筛选,过滤器 GROUP BY client_id -- 分组 ORDER BY total_sales DESC

注:SELECT子句里出现的非聚合字段,必须全部写到GROUP BY后面;只有聚合函数(SUM/COUNT/MAX/MIN/AVG)不用进 GROUP BY,即GROUP BY子句选择的字段是client_id是满足规则的。

-- 错误!invoice_total没做聚合,又没放进GROUP BY SELECT client_id, invoice_total, SUM(invoice_total) FROM invoices GROUP BY client_id

这里invoice_total既不是聚合,又不在 GROUP BY,直接报错(新版 MySQL 默认严格模式,不允许)。

另外注意ORDER BY后面不能用invoice_total排序,分组后每行是客户总销售额,没有单独的 invoice_total 明细。

案例2:按多个字段分组

算各州各城市的总销售额

USE sql_invoicing; ​ SELECT state, city, SUM(invoice_total) AS total_sales FROM invoices JOIN clients USING (client_id) GROUP BY state, city ORDER BY state

这里GROUP BY后按正常理解其实去掉state语义上是没问题,但由于MySQL 默认严格模式,还是把字段都加上比较好。

练习

在 payments 表中,按日期和支付方式分组统计总付款额

USE sql_invoicing; ​ SELECT date, pm.name AS payment_method, SUM(amount) AS total_payments FROM payments p JOIN payment_methods pm ON p.payment_method = pm.payment_method_id GROUP BY date, payment_method -- 用的是 SELECT 里的列别名 ORDER BY date

4.3 HAVING子句

小结

HAVING 和 WHERE 都是是条件筛选语句,条件的写法相通,数学、比较(包括特殊比较)、逻辑运算都可以用(如 AND、REGEXP 等等)

两者本质区别:

  • WHERE 是对 FROM JOIN 里原表中的列进行 事前筛选,所以WHERE可以对没选择的列进行筛选,但必须用原表列名而不能用SELECT中确定的列别名

  • 相反 HAVING …… 对 SELECT …… 查询后(通常是分组并聚合查询后)的结果列进行 事后筛选,若SELECT里起了别名的字段则必须用别名进行筛选,且不能对SELECT里未选择的字段进行筛选。唯一特殊情况是,当HAVING筛选的是聚合函数时,该聚合函数可以不在SELECT里显性出现

案例

筛选出总发票金额大于500且总发票数量大于5的顾客

USE sql_invoicing; ​ SELECT client_id, SUM(invoice_total) AS total_sales, COUNT(*/invoice_total/invoice_date) AS number_of_invoices FROM invoices GROUP BY client_id HAVING total_sales > 500 AND number_of_invoices > 5 -- 均为 SELECT 里的列别名

若写:WHERE total_sales > 500 AND number_of_invoices > 5,会报错:Error Code: 1054. Unknown column 'total_sales' in 'where clause'

练习

在 sql_store 数据库(有顾客表、订单表、订单项目表等)中,找出在 'VA' 州且消费总额超过100美元的顾客(这是一个面试级的问题,还很常见)

思路:

  1. 需要的信息在顾客表、订单表、订单项目表三张表中,先将三张表合并

  2. WHERE 事前筛选 'VA' 州的

  3. 按顾客分组,并选取所需的列并聚合得到每位顾客的付款总额

  4. HAVING 事后筛选超过 100美元 的

USE sql_store; ​ SELECT c.customer_id, c.first_name, c.last_name, SUM(oi.quantity * oi.unit_price) AS total_sales FROM customers c JOIN orders o USING (customer_id) -- 别忘了括号,特容易忘 JOIN order_items oi USING (order_id) WHERE state = 'VA' GROUP BY c.customer_id, c.first_name, c.last_name HAVING total_sales > 100

4.4 ROLLUP运算符

GROUP BY …… WITH ROLL UP自动汇总型分组,若是多字段分组的话汇总也会是多层次的,注意这是MySQL扩展语法,不是SQL标准语法

案例

分组查询各客户的发票总额以及所有人的总发票额

USE sql_invoicing; SELECT client_id, SUM(invoice_total) FROM invoices GROUP BY client_id WITH ROLLUP

多字段分组 例1:分组查询各州、市的总销售额(发票总额)以及州层次和全国层次的两个层次的汇总额

SELECT state, city, SUM(invoice_total) AS total_sales FROM invoices JOIN clients USING (client_id) GROUP BY state, city WITH ROLLUP

多字段分组 例2:分组查询特定日期特定付款方式的总支付额以及单日汇总和整体汇总

USE sql_invoicing; SELECT date, pm.name AS payment_method, SUM(amount) AS total_payments FROM payments p JOIN payment_methods pm ON p.payment_method = pm.payment_method_id GROUP BY date, pm.name WITH ROLLUP

练习

分组计算各个付款方式的总付款 并汇总

SELECT pm.name AS payment_method, SUM(amount) AS total FROM payments p JOIN payment_methods pm ON p.payment_method = pm.payment_method_id GROUP BY pm.name WITH ROLLUP

总结

各子句执行先后顺序(GROUP BY 和其他子句的关系)

SQL 书写顺序:SELECT → FROM → WHERE → GROUP BY → HAVING → ORDER BY → LIMIT

真实底层执行顺序(关键!决定各子句能做什么)

  1. FROM:找到要查询的表invoices

  2. WHERE

    :过滤原始行,只保留

    invoice_date >= '2019-07-01'

    的数据

    • 作用对象:原始明细行,不能用聚合结果过滤(不能写WHERE SUM(xxx)

  3. GROUP BY:用过滤完的数据,按client_id分组,生成各组聚合值(SUM 等)

  4. HAVING

    :对分组后的聚合结果过滤(比如

    HAVING SUM(invoice_total) > 1000

    • 作用对象:分组后的聚合结果,只能过滤聚合后的数据

  5. SELECT:最后挑选要展示的列(分组字段 + 聚合函数)

  6. ORDER BY:对最终结果集排序

  7. LIMIT:截取指定行数