SQL以及数据库相关
SQL以及数据库相关
1. 数据库
1.1. 常见的数据库系统
市面上有多种常见的数据库系统,每种数据库系统适用于不同的应用场景。根据数据存储结构、应用场景和技术架构,可以将数据库系统大致分为几类:
| 类别 | 常见数据库系统 | 主要应用场景 |
|---|---|---|
| 关系型数据库 | MySQL、PostgreSQL、Oracle、SQL Server | 事务型系统、数据一致性要求高 |
| 键值数据库 | Redis、DynamoDB | 缓存、会话管理、配置管理 |
| 文档数据库 | MongoDB、Couchbase | 内容管理、Web 应用 |
| 列式数据库 | Cassandra、HBase | 时间序列数据、分析数据 |
| 图数据库 | Neo4j、Amazon Neptune | 社交网络、推荐系统 |
| NewSQL 数据库 | CockroachDB、TiDB、Spanner | 高并发、分布式事务处理 |
| 时序数据库 | InfluxDB、TimescaleDB | 监控数据、金融数据 |
| 对象数据库 | Db4o、ObjectDB | CAD/CAM 系统、嵌入式系统 |
| 嵌入式数据库 | SQLite、Berkeley DB | 移动应用、物联网设备 |
| 分布式文件存储与数据库 | HDFS、Bigtable、Elasticsearch | 大数据分析、非结构化数据 |
| 多模型数据库 | ArangoDB、Azure Cosmos DB | 内容管理、物联网、社交网络 |
市面上有数百种数据库系统,每种数据库的设计思想和技术特点各异,应根据具体业务需求选择合适的数据库类型。
1.2. 常见的sql语言
除了标准 SQL,许多数据库系统还支持自定义的扩展 SQL 或独特的查询语言,以适应不同的数据库架构、存储结构和应用场景需求。以下是一些常见的扩展 SQL 和其他可以用于不同数据库系统的查询语言:
| 查询语言 | 常见数据库系统 | 特点及应用场景 |
|---|---|---|
| T-SQL | SQL Server | 适合事务处理和复杂业务逻辑 |
| PL/SQL | Oracle | 支持复杂业务逻辑,常用于企业应用 |
| HiveQL | Hive | 面向大数据批处理和数据仓库 |
| CQL | Cassandra | 列式数据库查询,适合时间序列数据 |
| Cypher | Neo4j | 图查询语言,适合社交网络和推荐系统 |
| SPARQL | RDF、知识图谱数据库 | 语义网和知识图谱数据查询 |
| MQL | MongoDB | 文档数据库查询,适合半结构化数据 |
| Druid SQL | Druid | 时间序列数据库分析和聚合 |
| KQL | Azure Data Explorer | 日志和时间序列数据分析 |
| Elasticsearch DSL | Elasticsearch | 全文搜索和实时数据分析 |
| InfluxQL/Flux | InfluxDB | 时间序列数据分析,适合物联网和监控 |
| PartiQL | DynamoDB | SQL 风格的 NoSQL 数据查询,适合低延迟数据操作 |
| ClickHouse SQL | ClickHouse | 列式存储查询,适合实时分析和大规模数据处理 |
| BigQuery SQL | Google BigQuery | 大数据分析和聚合查询,适合数据仓库 |
| Presto SQL | Presto | 跨数据源查询,大数据分析 |
| FQL | FaunaDB | 分布 |
1.3. Mysql分区
为什么要分区?
-
分区可以在一个表中存储比单个磁盘或文件系统分区上的数据更多的数据,因为我们可以将分区表存储在不同物理磁盘上
-
对已过期或者不需要保存的数据,可以通过删除与这些数据有关的分区来快速删除数据,他的效率远比delete高;
-
优化查询,在where子句中包含分区条件时,可以只扫描必要的一个或者多个分区来提高查询效率;例如下面语句:
SELECT * FROM t PARTITION(p0,p1) WHERE c <5sql语句表示仅选择与WHERE条件匹配的分区p0和p1中的那些行。在这种情况下,MySQL不检查表t的任何其他分区;
-
涉及聚合函数SUM()、COUNT()的查询时,可以容易的在每个分区上并行处理,例如在执行下面这条语句:
SELECT salesperson_id,COUNT(orders)as order_total FROM sales GROUP BY salesperson_id会在每个分区上都同时运行查询;
-
凭借在多个磁盘上传播数据,实现更高的查询吞吐量。
MySQL中的分区是什么?为什么要分区?有什么好处?怎么进行分区?
1.4. 特别注意
order by 子句应该在where之后。
2. SQL语言的四大组成部分
聊聊SQL语句中 DDL 、DML 、DQL 、DCL 分别是什么
SQL(结构化查询语言)是数据库的核心操作语言,根据功能可分为DDL、DML、DQL和DCL四类。下面我将详细解释这四者的概念,并通过具体示例说明它们的用法。
2.1. DDL (Data Definition Language) - 数据定义语言
概念:用于定义和管理数据库对象(如表、视图、索引等)的结构。
主要命令及示例:
-- 创建表
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
emp_name VARCHAR(50) NOT NULL,
hire_date DATE,
salary DECIMAL(10,2)
);
-- 修改表结构
ALTER TABLE employees ADD COLUMN department VARCHAR(30);
-- 删除表
DROP TABLE employees;
-- 创建索引
CREATE INDEX idx_emp_name ON employees(emp_name);
特点:
- 操作的是数据库结构而非数据
- 执行后通常自动提交
- 需要较高权限
2.2. DML (Data Manipulation Language) - 数据操纵语言
概念:用于对数据库表中的数据进行增删改操作。
主要命令及示例:
-- 插入数据
INSERT INTO employees (emp_id, emp_name, hire_date, salary)
VALUES (101, '张三', '2020-05-15', 8500.00);
-- 更新数据
UPDATE employees
SET salary = 9000.00
WHERE emp_id = 101;
-- 删除数据
DELETE FROM employees
WHERE emp_id = 101;
-- 合并数据(MERGE/UPSERT)
MERGE INTO employees e
USING new_employees n
ON (e.emp_id = n.emp_id)
WHEN MATCHED THEN
UPDATE SET e.salary = n.salary
WHEN NOT MATCHED THEN
INSERT (emp_id, emp_name, salary)
VALUES (n.emp_id, n.emp_name, n.salary);
特点:
- 操作的是表中的数据
- 需要显式提交或回滚
- 会影响数据完整性
2.3. DQL (Data Query Language) - 数据查询语言
概念:用于从数据库中查询数据。
主要命令及示例:
-- 基本查询
SELECT emp_name, salary FROM employees;
-- 带条件的查询
SELECT emp_name, hire_date
FROM employees
WHERE salary > 8000
AND hire_date > '2020-01-01';
-- 分组统计
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 7000;
-- 多表连接查询
SELECT e.emp_name, d.dept_name
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id;
-- 子查询
SELECT emp_name
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
特点:
- 只查询数据,不修改数据
- 可以包含复杂逻辑
- 是SQL中使用最频繁的命令
2.4. DCL (Data Control Language) - 数据控制语言
概念:用于控制数据库访问权限和安全级别。
主要命令及示例:
-- 授予权限
GRANT SELECT, INSERT ON employees TO user1;
-- 授予所有权限
GRANT ALL PRIVILEGES ON employees TO admin_user;
-- 撤销权限
REVOKE INSERT ON employees FROM user1;
-- 拒绝权限(SQL Server特有)
DENY DELETE ON employees TO user2;
特点:
- 管理数据库安全性和权限
- 通常由数据库管理员执行
- 可以控制到表、视图等对象的细粒度权限
2.5. 实际应用场景示例
假设我们正在开发一个人力资源管理系统:
- DDL:创建员工表、部门表等基础结构
- DML:添加新员工记录、更新员工薪资、删除离职员工
- DQL:查询各部门平均薪资、查找高绩效员工、生成报表
- DCL:设置HR专员只能查看员工数据但不能修改薪资、财务人员可以更新薪资但不能删除记录
这四类SQL命令共同构成了完整的数据库操作语言体系,在实际开发中需要根据需求灵活运用。
'''
3. dataworks
MaxCompute是数据仓库,负责存储数据或对数据进行一系列的开发和运算。DataWorks为MaxCompute提供了可视化开发和节点流程管理等功能,结合MaxCompute完成一站式数据开发、数据治理等。pyODPS是maxcompute的接口,通过pyodps可以使用python脚本使用maxcompute。
使用python从dataworks上拉取数据并转为dataframe,不使用to_pandas()
import datetime
from odps import ODPS
odps = ODPS(access_id='xxxxxxxxxxxxx', secret_access_key='xxxxxxxxxxxxxx',
project='xxxxxxxxxxxxx', endpoint='xxxxxxxxxxxxxxxxxxxxxxxx')
import numpy as np
from sklearn.ensemble import ExtraTreesRegressor
import pandas as pd
import time
day=7
def read_sql_query(odps, sql):
try:
readers_ = odps.execute_sql(sql).open_reader(tunnel=True, limit=False)
col_name = [i.name for i in readers_.schema]
data = pd.concat(
[pd.DataFrame({column: [value] for column, value in zip(col_name, reader.values)}) for reader in readers_])
except Exception as e:
data = pd.DataFrame({'info': [e]})
print(e)
finally:
return data
sql2="""SELECT en_zhy_date, end_airport, planwarehousename,
SUM(sum_cbm) AS sum_cbm1,
SUM(sum_cnt) AS sum_cnt1
FROM airport_cbm_base_1126_table
where end_airport = '"""+end_airport+"""'
GROUP BY en_zhy_date, end_airport, planwarehousename"""
reader = read_sql_query(odps=odps, sql=sql2)
# 拉取出来的reader是dataframe
4. sql刷题
选择哪一个join。inner join表示交集、outer join表示并集、left join表示以左边字段为主键,左连接。左边字段有的,右边字段没有的,则为null。right join为右链接。
在SQL查询中,left join是将右表的数据连到左表上,但并不会将右表的字段直接合并到左表。因此在查询中仍要通过表名或者别名来明确引用。
GROUP BY过滤数据时 混淆 WHERE和HAVING——我见过很多人犯了这个错误。
CROSS JOIN(也称为“交叉联接”或“笛卡尔积”)将第一个表中的每一行与第二个表中的每一行组合起来。
sql查询顺序:
from join
where
group by
having
order by
limit
sql查询使用where条件时,字段中的NULL要是有 is null。字段中的字符串要使用 = 'Y'.
4.1. select
4.1.1. Example:
Write a solution to find the ids of products that are both low fat and recyclable.
Return the result table in any order.
The result format is in the following example.
Input:
Products table:
+-------------+----------+------------+
| product_id | low_fats | recyclable |
+-------------+----------+------------+
| 0 | Y | N |
| 1 | Y | Y |
| 2 | N | Y |
| 3 | Y | Y |
| 4 | N | N |
+-------------+----------+------------+
Output:
+-------------+
| product_id |
+-------------+
| 1 |
| 3 |
+-------------+
select product_id from Products
where low_fats = 'Y' and recyclable = 'Y';
4.1.2. Example:
Find the names of the customer that are not referred by the customer with id = 2.
Return the result table in any order.
The result format is in the following example.
Input:
Customer table:
+----+------+------------+
| id | name | referee_id |
+----+------+------------+
| 1 | Will | null |
| 2 | Jane | null |
| 3 | Alex | 2 |
| 4 | Bill | null |
| 5 | Zack | 1 |
| 6 | Mark | 2 |
+----+------+------------+
Output:
+------+
| name |
+------+
| Will |
| Jane |
| Bill |
| Zack |
+------+
select name from Customer
where referee_id != 2 or referee_id IS null;
4.1.3. Example:
A country is big if:
- it has an area of at least three million (i.e.,
3000000 km2), or - it has a population of at least twenty-five million (i.e.,
25000000).
Write a solution to find the name, population, and area of the big countries.
Return the result table in any order.
The result format is in the following example.
Input:
World table:
+-------------+-----------+---------+------------+--------------+
| name | continent | area | population | gdp |
+-------------+-----------+---------+------------+--------------+
| Afghanistan | Asia | 652230 | 25500100 | 20343000000 |
| Albania | Europe | 28748 | 2831741 | 12960000000 |
| Algeria | Africa | 2381741 | 37100000 | 188681000000 |
| Andorra | Europe | 468 | 78115 | 3712000000 |
| Angola | Africa | 1246700 | 20609294 | 100990000000 |
+-------------+-----------+---------+------------+--------------+
Output:
+-------------+------------+---------+
| name | population | area |
+-------------+------------+---------+
| Afghanistan | 25500100 | 652230 |
| Algeria | 37100000 | 2381741 |
+-------------+------------+---------+
select name, population, area from World
where area >= 3000000 or population >= 25000000;
4.1.4. Example:
编写一个解决方案来查找所有查看过至少一篇自己文章的作者。返回按 ID 升序排序的结果表。
The result format is in the following example.
Input:
Views table:
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date |
+------------+-----------+-----------+------------+
| 1 | 3 | 5 | 2019-08-01 |
| 1 | 3 | 6 | 2019-08-02 |
| 2 | 7 | 7 | 2019-08-01 |
| 2 | 7 | 6 | 2019-08-02 |
| 4 | 7 | 1 | 2019-07-22 |
| 3 | 4 | 4 | 2019-07-21 |
| 3 | 4 | 4 | 2019-07-21 |
+------------+-----------+-----------+------------+
Output:
+------+
| id |
+------+
| 4 |
| 7 |
+------+
select DISTINCT author_id
as id
from Views
WHERE author_id = viewer_id
order by author_id ASC;
4.1.5. Example
编写解决方案来查找无效推文的 ID。如果推文内容中使用的字符数严格大于 15,则该推文无效。
Return the result table in any order.
The result format is in the following example.
Input:
Tweets table:
+----------+-----------------------------------+
| tweet_id | content |
+----------+-----------------------------------+
| 1 | Let us Code |
| 2 | More than fifteen chars are here! |
+----------+-----------------------------------+
Output:
+----------+
| tweet_id |
+----------+
| 2 |
+----------+
select tweet_id from Tweets
where LENGTH(content) > 15;
4.2. basic join
4.2.1. examples
编写一个解决方案来显示每个用户的唯一 ID,如果用户没有唯一 ID,则仅显示 null。以任何顺序返回结果表。
The result format is in the following example.
Input:
Employees table:
+----+----------+
| id | name |
+----+----------+
| 1 | Alice |
| 7 | Bob |
| 11 | Meir |
| 90 | Winston |
| 3 | Jonathan |
+----+----------+
EmployeeUNI table:
+----+-----------+
| id | unique_id |
+----+-----------+
| 3 | 1 |
| 11 | 2 |
| 90 | 3 |
+----+-----------+
Output:
+-----------+----------+
| unique_id | name |
+-----------+----------+
| null | Alice |
| null | Bob |
| 2 | Meir |
| 3 | Winston |
| 1 | Jonathan |
+-----------+----------+
SELECT e.unique_id, emp.name
FROM Employees emp
LEFT JOIN EmployeeUNI e
ON emp.id = e.id;
4.2.2. examples
编写一个解决方案来报告 Sales 表中每个 sale_id 的 product_name、year 和 price。以任意顺序返回结果表。
The result format is in the following example.
Example 1:
Input:
Sales table:
+---------+------------+------+----------+-------+
| sale_id | product_id | year | quantity | price |
+---------+------------+------+----------+-------+
| 1 | 100 | 2008 | 10 | 5000 |
| 2 | 100 | 2009 | 12 | 5000 |
| 7 | 200 | 2011 | 15 | 9000 |
+---------+------------+------+----------+-------+
Product table:
+------------+--------------+
| product_id | product_name |
+------------+--------------+
| 100 | Nokia |
| 200 | Apple |
| 300 | Samsung |
+------------+--------------+
Output:
+--------------+-------+-------+
| product_name | year | price |
+--------------+-------+-------+
| Nokia | 2008 | 5000 |
| Nokia | 2009 | 5000 |
| Apple | 2011 | 9000 |
+--------------+-------+-------+
select p.product_name, s.year, s.price
from Sales s
left join Product p
on s.product_id = p.product_id;
4.2.3. example
Write a solution to find the IDs of the users who visited without making any transactions and the number of times they made these types of visits.
Return the result table sorted in any order.
The result format is in the following example.
Input:
Visits
+----------+-------------+
| visit_id | customer_id |
+----------+-------------+
| 1 | 23 |
| 2 | 9 |
| 4 | 30 |
| 5 | 54 |
| 6 | 96 |
| 7 | 54 |
| 8 | 54 |
+----------+-------------+
Transactions
+----------------+----------+--------+
| transaction_id | visit_id | amount |
+----------------+----------+--------+
| 2 | 5 | 310 |
| 3 | 5 | 300 |
| 9 | 5 | 200 |
| 12 | 1 | 910 |
| 13 | 2 | 970 |
+----------------+----------+--------+
Output:
+-------------+----------------+
| customer_id | count_no_trans |
+-------------+----------------+
| 54 | 2 |
| 30 | 1 |
| 96 | 1 |
+-------------+----------------+
select v.customer_id, COUNT(v.visit_id) AS count_no_trans
from Visits v
left join Transactions t
on v.visit_id = t.visit_id
where transaction_id is null
group by v.customer_id;
4.2.4. examples
Write a solution to find all dates' id with higher temperatures compared to its previous dates (yesterday).
Return the result table in any order.
The result format is in the following example.
Input:
Weather table:
+----+------------+-------------+
| id | recordDate | temperature |
+----+------------+-------------+
| 1 | 2015-01-01 | 10 |
| 2 | 2015-01-02 | 25 |
| 3 | 2015-01-03 | 20 |
| 4 | 2015-01-04 | 30 |
+----+------------+-------------+
Output:
+----+
| id |
+----+
| 2 |
| 4 |
+----+
SELECT w1.id
FROM Weather w1
JOIN Weather w2
ON DATEDIFF(w1.recordDate, w2.recordDate) = 1
WHERE w1.temperature > w2.temperature;
4.2.5. examples
There is a factory website that has several machines each running the same number of processes. Write a solution to find the average time each machine takes to complete a process.
The time to complete a process is the 'end' timestamp minus the 'start' timestamp. The average time is calculated by the total time to complete every process on the machine divided by the number of processes that were run.
The resulting table should have the machine_id along with the average time as processing_time, which should be rounded to 3 decimal places.
Return the result table in any order.
The result format is in the following example.
Input:
Activity table:
+------------+------------+---------------+-----------+
| machine_id | process_id | activity_type | timestamp |
+------------+------------+---------------+-----------+
| 0 | 0 | start | 0.712 |
| 0 | 0 | end | 1.520 |
| 0 | 1 | start | 3.140 |
| 0 | 1 | end | 4.120 |
| 1 | 0 | start | 0.550 |
| 1 | 0 | end | 1.550 |
| 1 | 1 | start | 0.430 |
| 1 | 1 | end | 1.420 |
| 2 | 0 | start | 4.100 |
| 2 | 0 | end | 4.512 |
| 2 | 1 | start | 2.500 |
| 2 | 1 | end | 5.000 |
+------------+------------+---------------+-----------+
Output:
+------------+-----------------+
| machine_id | processing_time |
+------------+-----------------+
| 0 | 0.894 |
| 1 | 0.995 |
| 2 | 1.456 |
+------------+-----------------+
select A.machine_id,
round(AVG(end_time - start_time), 3) as processing_time
from(
SELECT
machine_id,
process_id,
MAX(CASE WHEN activity_type = 'start' THEN timestamp END) AS start_time,
MAX(CASE WHEN activity_type = 'end' THEN timestamp END) AS end_time
FROM Activity
GROUP BY machine_id, process_id
) as A
GROUP BY machine_id;
4.2.6. examples
Write a solution to report the name and bonus amount of each employee with a bonus less than 1000.
Return the result table in any order.
The result format is in the following example.
Input:
Employee table:
+-------+--------+------------+--------+
| empId | name | supervisor | salary |
+-------+--------+------------+--------+
| 3 | Brad | null | 4000 |
| 1 | John | 3 | 1000 |
| 2 | Dan | 3 | 2000 |
| 4 | Thomas | 3 | 4000 |
+-------+--------+------------+--------+
Bonus table:
+-------+-------+
| empId | bonus |
+-------+-------+
| 2 | 500 |
| 4 | 2000 |
+-------+-------+
Output:
+------+-------+
| name | bonus |
+------+-------+
| Brad | null |
| John | null |
| Dan | 500 |
+------+-------+
select name, bonus
from Employee e
left join bonus b
on e.empId = b.empId
where bonus < 1000 or bonus is null;
4.2.7. example
Write a solution to find the number of times each student attended each exam.
Return the result table ordered by student_id and subject_name.
The result format is in the following example.
Input:
Students table:
+------------+--------------+
| student_id | student_name |
+------------+--------------+
| 1 | Alice |
| 2 | Bob |
| 13 | John |
| 6 | Alex |
+------------+--------------+
Subjects table:
+--------------+
| subject_name |
+--------------+
| Math |
| Physics |
| Programming |
+--------------+
Examinations table:
+------------+--------------+
| student_id | subject_name |
+------------+--------------+
| 1 | Math |
| 1 | Physics |
| 1 | Programming |
| 2 | Programming |
| 1 | Physics |
| 1 | Math |
| 13 | Math |
| 13 | Programming |
| 13 | Physics |
| 2 | Math |
| 1 | Math |
+------------+--------------+
Output:
+------------+--------------+--------------+----------------+
| student_id | student_name | subject_name | attended_exams |
+------------+--------------+--------------+----------------+
| 1 | Alice | Math | 3 |
| 1 | Alice | Physics | 2 |
| 1 | Alice | Programming | 1 |
| 2 | Bob | Math | 1 |
| 2 | Bob | Physics | 0 |
| 2 | Bob | Programming | 1 |
| 6 | Alex | Math | 0 |
| 6 | Alex | Physics | 0 |
| 6 | Alex | Programming | 0 |
| 13 | John | Math | 1 |
| 13 | John | Physics | 1 |
| 13 | John | Programming | 1 |
+------------+--------------+--------------+----------------+
select A.student_id, A.student_name, A.subject_name, COALESCE(B.attended_exams, 0) AS attended_exams
from(
SELECT
s.student_id,
s.student_name,
sub.subject_name
FROM
Students s
CROSS JOIN
Subjects sub
) AS A
LEFT JOIN
(
select e.student_id, e.subject_name, count(*) as attended_exams
from Examinations e
group by student_id, subject_name
) as B
ON
A.student_id = B.student_id
AND A.subject_name = B.subject_name
ORDER BY
A.student_id, A.subject_name;
4.2.8. Example
Write a solution to find managers with at least five direct reports.
Return the result table in any order.
The result format is in the following example.
Input:
Employee table:
+-----+-------+------------+-----------+
| id | name | department | managerId |
+-----+-------+------------+-----------+
| 101 | John | A | null |
| 102 | Dan | A | 101 |
| 103 | James | A | 101 |
| 104 | Amy | A | 101 |
| 105 | Anne | A | 101 |
| 106 | Ron | B | 101 |
+-----+-------+------------+-----------+
Output:
+------+
| name |
+------+
| John |
+------+
SELECT E.name
FROM Employee E
JOIN (
SELECT managerId
FROM Employee
WHERE managerId IS NOT NULL
GROUP BY managerId
HAVING COUNT(*) >= 5
) AS B
ON E.id = B.managerId;
4.2.9. examples
The confirmation rate of a user is the number of 'confirmed' messages divided by the total number of requested confirmation messages. The confirmation rate of a user that did not request any confirmation messages is 0. Round the confirmation rate to two decimal places.
Write a solution to find the confirmation rate of each user.
Return the result table in any order.
The result format is in the following example.
Input:
Signups table:
+---------+---------------------+
| user_id | time_stamp |
+---------+---------------------+
| 3 | 2020-03-21 10:16:13 |
| 7 | 2020-01-04 13:57:59 |
| 2 | 2020-07-29 23:09:44 |
| 6 | 2020-12-09 10:39:37 |
+---------+---------------------+
Confirmations table:
+---------+---------------------+-----------+
| user_id | time_stamp | action |
+---------+---------------------+-----------+
| 3 | 2021-01-06 03:30:46 | timeout |
| 3 | 2021-07-14 14:00:00 | timeout |
| 7 | 2021-06-12 11:57:29 | confirmed |
| 7 | 2021-06-13 12:58:28 | confirmed |
| 7 | 2021-06-14 13:59:27 | confirmed |
| 2 | 2021-01-22 00:00:00 | confirmed |
| 2 | 2021-02-28 23:59:59 | timeout |
+---------+---------------------+-----------+
Output:
+---------+-------------------+
| user_id | confirmation_rate |
+---------+-------------------+
| 6 | 0.00 |
| 3 | 0.00 |
| 7 | 1.00 |
| 2 | 0.50 |
+---------+-------------------+
SELECT s.user_id,
ROUND(
CASE WHEN COUNT(c.user_id) = 0
THEN 0
ELSE 1.0 * SUM(CASE WHEN c.action = 'confirmed' THEN 1 ELSE 0 END) / COUNT(c.user_id)
END, 2) AS confirmation_rate
FROM Signups s
LEFT JOIN Confirmations c ON s.user_id = c.user_id
GROUP BY s.user_id;
4.3. basic aggregate function(基础聚合函数)
4.3.1.
Write a solution to report the movies with an odd-numbered ID and a description that is not "boring".
Return the result table ordered by rating in descending order.
The result format is in the following example.
Input:
Cinema table:
+----+------------+-------------+--------+
| id | movie | description | rating |
+----+------------+-------------+--------+
| 1 | War | great 3D | 8.9 |
| 2 | Science | fiction | 8.5 |
| 3 | irish | boring | 6.2 |
| 4 | Ice song | Fantacy | 8.6 |
| 5 | House card | Interesting | 9.1 |
+----+------------+-------------+--------+
Output:
+----+------------+-------------+--------+
| id | movie | description | rating |
+----+------------+-------------+--------+
| 5 | House card | Interesting | 9.1 |
| 1 | War | great 3D | 8.9 |
+----+------------+-------------+--------+
select id, movie, description, rating
from Cinema
where id / 2 % 1 and description != "boring"
order by rating desc
4.3.2.
Write a solution to find the average selling price for each product. average_price should be rounded to 2 decimal places. If a product does not have any sold units, its average selling price is assumed to be 0.
Return the result table in any order.
The result format is in the following example.
Input:
Prices table:
+------------+------------+------------+--------+
| product_id | start_date | end_date | price |
+------------+------------+------------+--------+
| 1 | 2019-02-17 | 2019-02-28 | 5 |
| 1 | 2019-03-01 | 2019-03-22 | 20 |
| 2 | 2019-02-01 | 2019-02-20 | 15 |
| 2 | 2019-02-21 | 2019-03-31 | 30 |
+------------+------------+------------+--------+
UnitsSold table:
+------------+---------------+-------+
| product_id | purchase_date | units |
+------------+---------------+-------+
| 1 | 2019-02-25 | 100 |
| 1 | 2019-03-01 | 15 |
| 2 | 2019-02-10 | 200 |
| 2 | 2019-03-22 | 30 |
+------------+---------------+-------+
Output:
+------------+---------------+
| product_id | average_price |
+------------+---------------+
| 1 | 6.96 |
| 2 | 16.96 |
+------------+---------------+
select p.product_id, coalesce(round(sum(p.price * u.units) / sum(u.units), 2), 0) as average_price
from Prices p
left join UnitsSold u
on p.product_id = u.product_id
and u.purchase_date between p.start_date and p.end_date
group by p.product_id
order by p.product_id
4.3.3.
编写 SQL 查询,报告每个项目的员工平均工作年限,四舍五入到 2 位数字。以任意顺序返回它和结果表。
The query result format is in the following example.
Input:
Project table:
+-------------+-------------+
| project_id | employee_id |
+-------------+-------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 4 |
+-------------+-------------+
Employee table:
+-------------+--------+------------------+
| employee_id | name | experience_years |
+-------------+--------+------------------+
| 1 | Khaled | 3 |
| 2 | Ali | 2 |
| 3 | John | 1 |
| 4 | Doe | 2 |
+-------------+--------+------------------+
Output:
+-------------+---------------+
| project_id | average_years |
+-------------+---------------+
| 1 | 2.00 |
| 2 | 2.50 |
+-------------+---------------+
select p.project_id, round(avg(e.experience_years), 2) as average_years
from Project p
join Employee e
on p.employee_id = e.employee_id
group by p.project_id;
4.3.4.
编写解决方案以找出每个竞赛中注册的用户百分比,四舍五入到小数点后两位。按百分比降序排列结果表。如果出现平局,则按 contest_id 升序排列。
The result format is in the following example.
Input:
Users table:
+---------+-----------+
| user_id | user_name |
+---------+-----------+
| 6 | Alice |
| 2 | Bob |
| 7 | Alex |
+---------+-----------+
Register table:
+------------+---------+
| contest_id | user_id |
+------------+---------+
| 215 | 6 |
| 209 | 2 |
| 208 | 2 |
| 210 | 6 |
| 208 | 6 |
| 209 | 7 |
| 209 | 6 |
| 215 | 7 |
| 208 | 7 |
| 210 | 2 |
| 207 | 2 |
| 210 | 7 |
+------------+---------+
Output:
+------------+------------+
| contest_id | percentage |
+------------+------------+
| 208 | 100.0 |
| 209 | 100.0 |
| 210 | 100.0 |
| 215 | 66.67 |
| 207 | 33.33 |
+------------+------------+
SELECT r.contest_id,
ROUND(COUNT(DISTINCT r.user_id) * 100.0 / (SELECT COUNT(DISTINCT user_id) FROM Users), 2) AS percentage
FROM Register r
GROUP BY r.contest_id
ORDER BY percentage DESC, r.contest_id ASC;
4.3.5.
我们将查询质量定义为:
查询评级与其位置的比率的平均值。
我们还将不良查询百分比定义为:
评级低于 3 的所有查询的百分比。
编写解决方案以查找每个查询名称、质量和不良查询百分比。
质量和不良查询百分比都应四舍五入到小数点后 2 位。
以任意顺序返回结果表。
The result format is in the following example.
Input:
Queries table:
+------------+-------------------+----------+--------+
| query_name | result | position | rating |
+------------+-------------------+----------+--------+
| Dog | Golden Retriever | 1 | 5 |
| Dog | German Shepherd | 2 | 5 |
| Dog | Mule | 200 | 1 |
| Cat | Shirazi | 5 | 2 |
| Cat | Siamese | 3 | 3 |
| Cat | Sphynx | 7 | 4 |
+------------+-------------------+----------+--------+
Output:
+------------+---------+-----------------------+
| query_name | quality | poor_query_percentage |
+------------+---------+-----------------------+
| Dog | 2.50 | 33.33 |
| Cat | 0.66 | 33.33 |
+------------+---------+-----------------------+
SELECT
query_name,
ROUND(SUM(rating / position) / COUNT(query_name), 2) AS quality,
ROUND(SUM(CASE WHEN rating < 3 THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) AS poor_query_percentage
FROM Queries q
GROUP BY query_name;
4.3.6.
Write an SQL query to find for each month and country, the number of transactions and their total amount, the number of approved transactions and their total amount.
Return the result table in any order.
The query result format is in the following example.
Example 1:
Input:
Transactions table:
+------+---------+----------+--------+------------+
| id | country | state | amount | trans_date |
+------+---------+----------+--------+------------+
| 121 | US | approved | 1000 | 2018-12-18 |
| 122 | US | declined | 2000 | 2018-12-19 |
| 123 | US | approved | 2000 | 2019-01-01 |
| 124 | DE | approved | 2000 | 2019-01-07 |
+------+---------+----------+--------+------------+
Output:
+----------+---------+-------------+----------------+--------------------+-----------------------+
| month | country | trans_count | approved_count | trans_total_amount | approved_total_amount |
+----------+---------+-------------+----------------+--------------------+-----------------------+
| 2018-12 | US | 2 | 1 | 3000 | 1000 |
| 2019-01 | US | 1 | 1 | 2000 | 2000 |
| 2019-01 | DE | 1 | 1 | 2000 | 2000 |
+----------+---------+-------------+----------------+--------------------+-----------------------+
select year_month as month, country, trans_count, approved_count
SELECT
id,
country,
state,
amount,
trans_date,
DATE_FORMAT(trans_date, '%Y-%m') AS year_month
FROM
your_table_name;
4.3.7.
If the customer's preferred delivery date is the same as the order date, then the order is called immediate; otherwise, it is called scheduled.
The first order of a customer is the order with the earliest order date that the customer made. It is guaranteed that a customer has precisely one first order.
Write a solution to find the percentage of immediate orders in the first orders of all customers, rounded to 2 decimal places.
The result format is in the following example.
Example 1:
Input:
Delivery table:
+-------------+-------------+------------+-----------------------------+
| delivery_id | customer_id | order_date | customer_pref_delivery_date |
+-------------+-------------+------------+-----------------------------+
| 1 | 1 | 2019-08-01 | 2019-08-02 |
| 2 | 2 | 2019-08-02 | 2019-08-02 |
| 3 | 1 | 2019-08-11 | 2019-08-12 |
| 4 | 3 | 2019-08-24 | 2019-08-24 |
| 5 | 3 | 2019-08-21 | 2019-08-22 |
| 6 | 2 | 2019-08-11 | 2019-08-13 |
| 7 | 4 | 2019-08-09 | 2019-08-09 |
+-------------+-------------+------------+-----------------------------+
Output:
+----------------------+
| immediate_percentage |
+----------------------+
| 50.00 |
+----------------------+
4.3.8.
编写一个解决方案来报告在首次登录后的第二天再次登录的玩家比例,四舍五入到小数点后 2 位。换句话说,您需要计算从首次登录日期开始连续至少两天登录的玩家数量,然后将该数字除以玩家总数。
The result format is in the following example.
Input:
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-03-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Output:
+-----------+
| fraction |
+-----------+
| 0.33 |
+-----------+
WITH FirstLogin AS (
SELECT player_id, MIN(event_date) AS first_login_date
FROM Activity
GROUP BY player_id
),
NextDayLogin AS (
SELECT f.player_id
FROM FirstLogin f
JOIN Activity a
ON f.player_id = a.player_id
WHERE a.event_date = DATE_ADD(f.first_login_date, INTERVAL 1 DAY)
)
SELECT ROUND(COUNT(DISTINCT n.player_id) / COUNT(DISTINCT f.player_id), 2) AS fraction
FROM FirstLogin f
LEFT JOIN NextDayLogin n
ON f.player_id = n.player_id;
4.4. Sorting and Grouping(排序和分组)
4.5. Advanced Select and Joins(高级选择和连接)
4.6. Subqueries(子查询)
4.7. Advanced String Functions / Regex / Clause
编写一个解决方案,显示每个用户的唯一 ID。如果用户没有唯一 ID,则显示 null。
返回结果表格可以按任何顺序。
结果格式如下所示:
示例 1:
输入:
Employees 表:
+----+----------+
| id | name |
+----+----------+
| 1 | Alice |
| 7 | Bob |
| 11 | Meir |
| 90 | Winston |
| 3 | Jonathan |
+----+----------+
EmployeeUNI 表:
+----+-----------+
| id | unique_id |
+----+-----------+
| 3 | 1 |
| 11 | 2 |
| 90 | 3 |
+----+-----------+
输出:
+-----------+----------+
| unique_id | name |
+-----------+----------+
| null | Alice |
| null | Bob |
| 2 | Meir |
| 3 | Winston |
| 1 | Jonathan |
+-----------+----------+
解释:
Alice 和 Bob 没有唯一 ID,我们将显示 null。
Meir 的唯一 ID 是 2。
Winston 的唯一 ID 是 3。
Jonathan 的唯一 ID 是 1。
SELECT e.unique_id, emp.name
FROM Employees emp
LEFT JOIN EmployeeUNI e
ON emp.id = e.id;

浙公网安备 33010602011771号