Loading

SQL以及数据库相关

SQL以及数据库相关

1. 数据库

1.1. 常见的数据库系统

​ 市面上有多种常见的数据库系统,每种数据库系统适用于不同的应用场景。根据数据存储结构、应用场景和技术架构,可以将数据库系统大致分为几类:

类别 常见数据库系统 主要应用场景
关系型数据库 MySQL、PostgreSQL、Oracle、SQL Server 事务型系统、数据一致性要求高
键值数据库 Redis、DynamoDB 缓存、会话管理、配置管理
文档数据库 MongoDB、Couchbase 内容管理、Web 应用
列式数据库 Cassandra、HBase 时间序列数据、分析数据
图数据库 Neo4j、Amazon Neptune 社交网络、推荐系统
NewSQL 数据库 CockroachDB、TiDB、Spanner 高并发、分布式事务处理
时序数据库 InfluxDB、TimescaleDB 监控数据、金融数据
对象数据库 Db4o、ObjectDB CAD/CAM 系统、嵌入式系统
嵌入式数据库 SQLite、Berkeley DB 移动应用、物联网设备
分布式文件存储与数据库 HDFS、Bigtable、Elasticsearch 大数据分析、非结构化数据
多模型数据库 ArangoDB、Azure Cosmos DB 内容管理、物联网、社交网络

​ 市面上有数百种数据库系统,每种数据库的设计思想和技术特点各异,应根据具体业务需求选择合适的数据库类型。

1.2. 常见的sql语言

​ 除了标准 SQL,许多数据库系统还支持自定义的扩展 SQL 或独特的查询语言,以适应不同的数据库架构、存储结构和应用场景需求。以下是一些常见的扩展 SQL 和其他可以用于不同数据库系统的查询语言:

查询语言 常见数据库系统 特点及应用场景
T-SQL SQL Server 适合事务处理和复杂业务逻辑
PL/SQL Oracle 支持复杂业务逻辑,常用于企业应用
HiveQL Hive 面向大数据批处理和数据仓库
CQL Cassandra 列式数据库查询,适合时间序列数据
Cypher Neo4j 图查询语言,适合社交网络和推荐系统
SPARQL RDF、知识图谱数据库 语义网和知识图谱数据查询
MQL MongoDB 文档数据库查询,适合半结构化数据
Druid SQL Druid 时间序列数据库分析和聚合
KQL Azure Data Explorer 日志和时间序列数据分析
Elasticsearch DSL Elasticsearch 全文搜索和实时数据分析
InfluxQL/Flux InfluxDB 时间序列数据分析,适合物联网和监控
PartiQL DynamoDB SQL 风格的 NoSQL 数据查询,适合低延迟数据操作
ClickHouse SQL ClickHouse 列式存储查询,适合实时分析和大规模数据处理
BigQuery SQL Google BigQuery 大数据分析和聚合查询,适合数据仓库
Presto SQL Presto 跨数据源查询,大数据分析
FQL FaunaDB 分布

1.3. Mysql分区

​ 为什么要分区?

  1. 分区可以在一个表中存储比单个磁盘或文件系统分区上的数据更多的数据,因为我们可以将分区表存储在不同物理磁盘上

  2. 对已过期或者不需要保存的数据,可以通过删除与这些数据有关的分区来快速删除数据,他的效率远比delete高;

  3. 优化查询,在where子句中包含分区条件时,可以只扫描必要的一个或者多个分区来提高查询效率;例如下面语句:

    SELECT * 
    FROM t 
    PARTITION(p0,p1)
    WHERE c <5
    

    sql语句表示仅选择与WHERE条件匹配的分区p0和p1中的那些行。在这种情况下,MySQL不检查表t的任何其他分区;

  4. 涉及聚合函数SUM()、COUNT()的查询时,可以容易的在每个分区上并行处理,例如在执行下面这条语句:

    SELECT salesperson_id,COUNT(orders)as order_total 
    FROM sales 
    GROUP BY salesperson_id
    

    会在每个分区上都同时运行查询;

  5. 凭借在多个磁盘上传播数据,实现更高的查询吞吐量。

MySQL中的分区是什么?为什么要分区?有什么好处?怎么进行分区?

mysql分区概念

1.4. 特别注意

order by 子句应该在where之后。

2. SQL语言的四大组成部分

聊聊SQL语句中 DDL 、DML 、DQL 、DCL 分别是什么

SQL(结构化查询语言)是数据库的核心操作语言,根据功能可分为DDL、DML、DQL和DCL四类。下面我将详细解释这四者的概念,并通过具体示例说明它们的用法。

2.1. DDL (Data Definition Language) - 数据定义语言

概念:用于定义和管理数据库对象(如表、视图、索引等)的结构。

主要命令及示例:

-- 创建表
CREATE TABLE employees (
    emp_id INT PRIMARY KEY,
    emp_name VARCHAR(50) NOT NULL,
    hire_date DATE,
    salary DECIMAL(10,2)
);

-- 修改表结构
ALTER TABLE employees ADD COLUMN department VARCHAR(30);

-- 删除表
DROP TABLE employees;

-- 创建索引
CREATE INDEX idx_emp_name ON employees(emp_name);

特点:

  • 操作的是数据库结构而非数据
  • 执行后通常自动提交
  • 需要较高权限

2.2. DML (Data Manipulation Language) - 数据操纵语言

概念:用于对数据库表中的数据进行增删改操作。

主要命令及示例:

-- 插入数据
INSERT INTO employees (emp_id, emp_name, hire_date, salary)
VALUES (101, '张三', '2020-05-15', 8500.00);

-- 更新数据
UPDATE employees 
SET salary = 9000.00 
WHERE emp_id = 101;

-- 删除数据
DELETE FROM employees 
WHERE emp_id = 101;

-- 合并数据(MERGE/UPSERT)
MERGE INTO employees e
USING new_employees n
ON (e.emp_id = n.emp_id)
WHEN MATCHED THEN
    UPDATE SET e.salary = n.salary
WHEN NOT MATCHED THEN
    INSERT (emp_id, emp_name, salary)
    VALUES (n.emp_id, n.emp_name, n.salary);

特点:

  • 操作的是表中的数据
  • 需要显式提交或回滚
  • 会影响数据完整性

2.3. DQL (Data Query Language) - 数据查询语言

概念:用于从数据库中查询数据。

主要命令及示例:

-- 基本查询
SELECT emp_name, salary FROM employees;

-- 带条件的查询
SELECT emp_name, hire_date 
FROM employees 
WHERE salary > 8000 
AND hire_date > '2020-01-01';

-- 分组统计
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 7000;

-- 多表连接查询
SELECT e.emp_name, d.dept_name
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id;

-- 子查询
SELECT emp_name 
FROM employees 
WHERE salary > (SELECT AVG(salary) FROM employees);

特点:

  • 只查询数据,不修改数据
  • 可以包含复杂逻辑
  • 是SQL中使用最频繁的命令

2.4. DCL (Data Control Language) - 数据控制语言

概念:用于控制数据库访问权限和安全级别。

主要命令及示例:

-- 授予权限
GRANT SELECT, INSERT ON employees TO user1;

-- 授予所有权限
GRANT ALL PRIVILEGES ON employees TO admin_user;

-- 撤销权限
REVOKE INSERT ON employees FROM user1;

-- 拒绝权限(SQL Server特有)
DENY DELETE ON employees TO user2;

特点:

  • 管理数据库安全性和权限
  • 通常由数据库管理员执行
  • 可以控制到表、视图等对象的细粒度权限

2.5. 实际应用场景示例

假设我们正在开发一个人力资源管理系统:

  1. DDL:创建员工表、部门表等基础结构
  2. DML:添加新员工记录、更新员工薪资、删除离职员工
  3. DQL:查询各部门平均薪资、查找高绩效员工、生成报表
  4. DCL:设置HR专员只能查看员工数据但不能修改薪资、财务人员可以更新薪资但不能删除记录

这四类SQL命令共同构成了完整的数据库操作语言体系,在实际开发中需要根据需求灵活运用。
'''

3. dataworks

​ MaxCompute是数据仓库,负责存储数据或对数据进行一系列的开发和运算。DataWorks为MaxCompute提供了可视化开发和节点流程管理等功能,结合MaxCompute完成一站式数据开发、数据治理等。pyODPS是maxcompute的接口,通过pyodps可以使用python脚本使用maxcompute。

pyodps教程

使用python从dataworks上拉取数据并转为dataframe,不使用to_pandas()

import datetime
from odps import ODPS
odps = ODPS(access_id='xxxxxxxxxxxxx', secret_access_key='xxxxxxxxxxxxxx',
                project='xxxxxxxxxxxxx', endpoint='xxxxxxxxxxxxxxxxxxxxxxxx')
import numpy as np
from sklearn.ensemble import ExtraTreesRegressor
import pandas as pd
import time
day=7
def read_sql_query(odps, sql):
    try:
        readers_ = odps.execute_sql(sql).open_reader(tunnel=True, limit=False)
        col_name = [i.name for i in readers_.schema]
        data = pd.concat(
            [pd.DataFrame({column: [value] for column, value in zip(col_name, reader.values)}) for reader in readers_])
    except Exception as e:
        data = pd.DataFrame({'info': [e]})
        print(e)
    finally:
        return data

sql2="""SELECT en_zhy_date, end_airport, planwarehousename, 
        SUM(sum_cbm) AS sum_cbm1, 
        SUM(sum_cnt) AS sum_cnt1
        FROM airport_cbm_base_1126_table
        where end_airport  = '"""+end_airport+"""'
        GROUP BY en_zhy_date, end_airport, planwarehousename"""

reader = read_sql_query(odps=odps, sql=sql2)
# 拉取出来的reader是dataframe

4. sql刷题

sql学习书籍

刷题网站

选择哪一个join。inner join表示交集、outer join表示并集、left join表示以左边字段为主键,左连接。左边字段有的,右边字段没有的,则为null。right join为右链接。

在SQL查询中,left join是将右表的数据连到左表上,但并不会将右表的字段直接合并到左表。因此在查询中仍要通过表名或者别名来明确引用。

GROUP BY过滤数据时 混淆 WHERE和HAVING——我见过很多人犯了这个错误。

CROSS JOIN(也称为“交叉联接”或“笛卡尔积”)将第一个表中的每一行与第二个表中的每一行组合起来。

sql查询顺序:

from join

where

group by

having

order by

limit

sql查询使用where条件时,字段中的NULL要是有 is null。字段中的字符串要使用 = 'Y'.

4.1. select

4.1.1. Example:

Write a solution to find the ids of products that are both low fat and recyclable.

Return the result table in any order.

The result format is in the following example.

Input: 
Products table:
+-------------+----------+------------+
| product_id  | low_fats | recyclable |
+-------------+----------+------------+
| 0           | Y        | N          |
| 1           | Y        | Y          |
| 2           | N        | Y          |
| 3           | Y        | Y          |
| 4           | N        | N          |
+-------------+----------+------------+
Output: 
+-------------+
| product_id  |
+-------------+
| 1           |
| 3           |
+-------------+

select product_id from Products
where low_fats = 'Y' and recyclable = 'Y';

4.1.2. Example:

Find the names of the customer that are not referred by the customer with id = 2.

Return the result table in any order.

The result format is in the following example.

Input: 
Customer table:
+----+------+------------+
| id | name | referee_id |
+----+------+------------+
| 1  | Will | null       |
| 2  | Jane | null       |
| 3  | Alex | 2          |
| 4  | Bill | null       |
| 5  | Zack | 1          |
| 6  | Mark | 2          |
+----+------+------------+
Output: 
+------+
| name |
+------+
| Will |
| Jane |
| Bill |
| Zack |
+------+
select name from Customer
where referee_id != 2 or referee_id IS null;

4.1.3. Example:

A country is big if:

  • it has an area of at least three million (i.e., 3000000 km2), or
  • it has a population of at least twenty-five million (i.e., 25000000).

Write a solution to find the name, population, and area of the big countries.

Return the result table in any order.

The result format is in the following example.

Input: 
World table:
+-------------+-----------+---------+------------+--------------+
| name        | continent | area    | population | gdp          |
+-------------+-----------+---------+------------+--------------+
| Afghanistan | Asia      | 652230  | 25500100   | 20343000000  |
| Albania     | Europe    | 28748   | 2831741    | 12960000000  |
| Algeria     | Africa    | 2381741 | 37100000   | 188681000000 |
| Andorra     | Europe    | 468     | 78115      | 3712000000   |
| Angola      | Africa    | 1246700 | 20609294   | 100990000000 |
+-------------+-----------+---------+------------+--------------+
Output: 
+-------------+------------+---------+
| name        | population | area    |
+-------------+------------+---------+
| Afghanistan | 25500100   | 652230  |
| Algeria     | 37100000   | 2381741 |
+-------------+------------+---------+
select name, population, area from World
where area >= 3000000  or population >= 25000000;

4.1.4. Example:

编写一个解决方案来查找所有查看过至少一篇自己文章的作者。返回按 ID 升序排序的结果表。

The result format is in the following example.

Input: 
Views table:
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date  |
+------------+-----------+-----------+------------+
| 1          | 3         | 5         | 2019-08-01 |
| 1          | 3         | 6         | 2019-08-02 |
| 2          | 7         | 7         | 2019-08-01 |
| 2          | 7         | 6         | 2019-08-02 |
| 4          | 7         | 1         | 2019-07-22 |
| 3          | 4         | 4         | 2019-07-21 |
| 3          | 4         | 4         | 2019-07-21 |
+------------+-----------+-----------+------------+
Output: 
+------+
| id   |
+------+
| 4    |
| 7    |
+------+

select DISTINCT author_id 
as id 
from Views
WHERE author_id = viewer_id
order by author_id ASC;

4.1.5. Example

编写解决方案来查找无效推文的 ID。如果推文内容中使用的字符数严格大于 15,则该推文无效。

Return the result table in any order.

The result format is in the following example.

Input: 
Tweets table:
+----------+-----------------------------------+
| tweet_id | content                           |
+----------+-----------------------------------+
| 1        | Let us Code                       |
| 2        | More than fifteen chars are here! |
+----------+-----------------------------------+
Output: 
+----------+
| tweet_id |
+----------+
| 2        |
+----------+



select tweet_id from Tweets 
where LENGTH(content) > 15;

4.2. basic join

4.2.1. examples

编写一个解决方案来显示每个用户的唯一 ID,如果用户没有唯一 ID,则仅显示 null。以任何顺序返回结果表。

The result format is in the following example.

Input: 
Employees table:
+----+----------+
| id | name     |
+----+----------+
| 1  | Alice    |
| 7  | Bob      |
| 11 | Meir     |
| 90 | Winston  |
| 3  | Jonathan |
+----+----------+
EmployeeUNI table:
+----+-----------+
| id | unique_id |
+----+-----------+
| 3  | 1         |
| 11 | 2         |
| 90 | 3         |
+----+-----------+
Output: 
+-----------+----------+
| unique_id | name     |
+-----------+----------+
| null      | Alice    |
| null      | Bob      |
| 2         | Meir     |
| 3         | Winston  |
| 1         | Jonathan |
+-----------+----------+

SELECT e.unique_id, emp.name
FROM Employees emp
LEFT JOIN EmployeeUNI e
ON emp.id = e.id;

4.2.2. examples

编写一个解决方案来报告 Sales 表中每个 sale_id 的 product_name、year 和 price。以任意顺序返回结果表。

The result format is in the following example.

Example 1:

Input: 
Sales table:
+---------+------------+------+----------+-------+
| sale_id | product_id | year | quantity | price |
+---------+------------+------+----------+-------+ 
| 1       | 100        | 2008 | 10       | 5000  |
| 2       | 100        | 2009 | 12       | 5000  |
| 7       | 200        | 2011 | 15       | 9000  |
+---------+------------+------+----------+-------+
Product table:
+------------+--------------+
| product_id | product_name |
+------------+--------------+
| 100        | Nokia        |
| 200        | Apple        |
| 300        | Samsung      |
+------------+--------------+
Output: 
+--------------+-------+-------+
| product_name | year  | price |
+--------------+-------+-------+
| Nokia        | 2008  | 5000  |
| Nokia        | 2009  | 5000  |
| Apple        | 2011  | 9000  |
+--------------+-------+-------+


select p.product_name, s.year, s.price 
from Sales s
left join Product p
on s.product_id = p.product_id;

4.2.3. example

​ Write a solution to find the IDs of the users who visited without making any transactions and the number of times they made these types of visits.

Return the result table sorted in any order.

The result format is in the following example.

Input: 
Visits
+----------+-------------+
| visit_id | customer_id |
+----------+-------------+
| 1        | 23          |
| 2        | 9           |
| 4        | 30          |
| 5        | 54          |
| 6        | 96          |
| 7        | 54          |
| 8        | 54          |
+----------+-------------+
Transactions
+----------------+----------+--------+
| transaction_id | visit_id | amount |
+----------------+----------+--------+
| 2              | 5        | 310    |
| 3              | 5        | 300    |
| 9              | 5        | 200    |
| 12             | 1        | 910    |
| 13             | 2        | 970    |
+----------------+----------+--------+
Output: 
+-------------+----------------+
| customer_id | count_no_trans |
+-------------+----------------+
| 54          | 2              |
| 30          | 1              |
| 96          | 1              |
+-------------+----------------+


select v.customer_id, COUNT(v.visit_id) AS count_no_trans 
from Visits v
left join Transactions t
on v.visit_id = t.visit_id 
where transaction_id is null
group by v.customer_id;

4.2.4. examples

Write a solution to find all dates' id with higher temperatures compared to its previous dates (yesterday).

Return the result table in any order.

The result format is in the following example.

Input: 
Weather table:
+----+------------+-------------+
| id | recordDate | temperature |
+----+------------+-------------+
| 1  | 2015-01-01 | 10          |
| 2  | 2015-01-02 | 25          |
| 3  | 2015-01-03 | 20          |
| 4  | 2015-01-04 | 30          |
+----+------------+-------------+
Output: 
+----+
| id |
+----+
| 2  |
| 4  |
+----+

SELECT w1.id
FROM Weather w1
JOIN Weather w2
    ON DATEDIFF(w1.recordDate, w2.recordDate) = 1
WHERE w1.temperature > w2.temperature;

4.2.5. examples

There is a factory website that has several machines each running the same number of processes. Write a solution to find the average time each machine takes to complete a process.

The time to complete a process is the 'end' timestamp minus the 'start' timestamp. The average time is calculated by the total time to complete every process on the machine divided by the number of processes that were run.

The resulting table should have the machine_id along with the average time as processing_time, which should be rounded to 3 decimal places.

Return the result table in any order.

The result format is in the following example.

Input: 
Activity table:
+------------+------------+---------------+-----------+
| machine_id | process_id | activity_type | timestamp |
+------------+------------+---------------+-----------+
| 0          | 0          | start         | 0.712     |
| 0          | 0          | end           | 1.520     |
| 0          | 1          | start         | 3.140     |
| 0          | 1          | end           | 4.120     |
| 1          | 0          | start         | 0.550     |
| 1          | 0          | end           | 1.550     |
| 1          | 1          | start         | 0.430     |
| 1          | 1          | end           | 1.420     |
| 2          | 0          | start         | 4.100     |
| 2          | 0          | end           | 4.512     |
| 2          | 1          | start         | 2.500     |
| 2          | 1          | end           | 5.000     |
+------------+------------+---------------+-----------+
Output: 
+------------+-----------------+
| machine_id | processing_time |
+------------+-----------------+
| 0          | 0.894           |
| 1          | 0.995           |
| 2          | 1.456           |
+------------+-----------------+

select A.machine_id, 
round(AVG(end_time - start_time), 3) as processing_time 
from(
SELECT 
        machine_id,
        process_id,
        MAX(CASE WHEN activity_type = 'start' THEN timestamp END) AS start_time,
        MAX(CASE WHEN activity_type = 'end' THEN timestamp END) AS end_time
    FROM Activity
    GROUP BY machine_id, process_id
) as A
GROUP BY machine_id;

4.2.6. examples

Write a solution to report the name and bonus amount of each employee with a bonus less than 1000.

Return the result table in any order.

The result format is in the following example.

Input: 
Employee table:
+-------+--------+------------+--------+
| empId | name   | supervisor | salary |
+-------+--------+------------+--------+
| 3     | Brad   | null       | 4000   |
| 1     | John   | 3          | 1000   |
| 2     | Dan    | 3          | 2000   |
| 4     | Thomas | 3          | 4000   |
+-------+--------+------------+--------+
Bonus table:
+-------+-------+
| empId | bonus |
+-------+-------+
| 2     | 500   |
| 4     | 2000  |
+-------+-------+
Output: 
+------+-------+
| name | bonus |
+------+-------+
| Brad | null  |
| John | null  |
| Dan  | 500   |
+------+-------+

select name, bonus
from Employee e
left join bonus b
on e.empId = b.empId
where bonus < 1000 or bonus is null;

4.2.7. example

Write a solution to find the number of times each student attended each exam.

Return the result table ordered by student_id and subject_name.

The result format is in the following example.

Input: 
Students table:
+------------+--------------+
| student_id | student_name |
+------------+--------------+
| 1          | Alice        |
| 2          | Bob          |
| 13         | John         |
| 6          | Alex         |
+------------+--------------+
Subjects table:
+--------------+
| subject_name |
+--------------+
| Math         |
| Physics      |
| Programming  |
+--------------+
Examinations table:
+------------+--------------+
| student_id | subject_name |
+------------+--------------+
| 1          | Math         |
| 1          | Physics      |
| 1          | Programming  |
| 2          | Programming  |
| 1          | Physics      |
| 1          | Math         |
| 13         | Math         |
| 13         | Programming  |
| 13         | Physics      |
| 2          | Math         |
| 1          | Math         |
+------------+--------------+
Output: 
+------------+--------------+--------------+----------------+
| student_id | student_name | subject_name | attended_exams |
+------------+--------------+--------------+----------------+
| 1          | Alice        | Math         | 3              |
| 1          | Alice        | Physics      | 2              |
| 1          | Alice        | Programming  | 1              |
| 2          | Bob          | Math         | 1              |
| 2          | Bob          | Physics      | 0              |
| 2          | Bob          | Programming  | 1              |
| 6          | Alex         | Math         | 0              |
| 6          | Alex         | Physics      | 0              |
| 6          | Alex         | Programming  | 0              |
| 13         | John         | Math         | 1              |
| 13         | John         | Physics      | 1              |
| 13         | John         | Programming  | 1              |
+------------+--------------+--------------+----------------+


select A.student_id, A.student_name, A.subject_name, COALESCE(B.attended_exams, 0) AS attended_exams
from(
SELECT 
    s.student_id, 
    s.student_name, 
    sub.subject_name
FROM 
    Students s
CROSS JOIN 
    Subjects sub
) AS A
LEFT JOIN 
(
select e.student_id, e.subject_name, count(*) as attended_exams 
from Examinations e
group by student_id, subject_name
) as B
ON 
    A.student_id = B.student_id 
    AND A.subject_name = B.subject_name
ORDER BY 
    A.student_id, A.subject_name;

4.2.8. Example

Write a solution to find managers with at least five direct reports.

Return the result table in any order.

The result format is in the following example.

Input: 
Employee table:
+-----+-------+------------+-----------+
| id  | name  | department | managerId |
+-----+-------+------------+-----------+
| 101 | John  | A          | null      |
| 102 | Dan   | A          | 101       |
| 103 | James | A          | 101       |
| 104 | Amy   | A          | 101       |
| 105 | Anne  | A          | 101       |
| 106 | Ron   | B          | 101       |
+-----+-------+------------+-----------+
Output: 
+------+
| name |
+------+
| John |
+------+

SELECT E.name
FROM Employee E
JOIN (
    SELECT managerId
    FROM Employee
    WHERE managerId IS NOT NULL
    GROUP BY managerId
    HAVING COUNT(*) >= 5
) AS B
ON E.id = B.managerId;

4.2.9. examples

The confirmation rate of a user is the number of 'confirmed' messages divided by the total number of requested confirmation messages. The confirmation rate of a user that did not request any confirmation messages is 0. Round the confirmation rate to two decimal places.

Write a solution to find the confirmation rate of each user.

Return the result table in any order.

The result format is in the following example.

Input: 
Signups table:
+---------+---------------------+
| user_id | time_stamp          |
+---------+---------------------+
| 3       | 2020-03-21 10:16:13 |
| 7       | 2020-01-04 13:57:59 |
| 2       | 2020-07-29 23:09:44 |
| 6       | 2020-12-09 10:39:37 |
+---------+---------------------+
Confirmations table:
+---------+---------------------+-----------+
| user_id | time_stamp          | action    |
+---------+---------------------+-----------+
| 3       | 2021-01-06 03:30:46 | timeout   |
| 3       | 2021-07-14 14:00:00 | timeout   |
| 7       | 2021-06-12 11:57:29 | confirmed |
| 7       | 2021-06-13 12:58:28 | confirmed |
| 7       | 2021-06-14 13:59:27 | confirmed |
| 2       | 2021-01-22 00:00:00 | confirmed |
| 2       | 2021-02-28 23:59:59 | timeout   |
+---------+---------------------+-----------+
Output: 
+---------+-------------------+
| user_id | confirmation_rate |
+---------+-------------------+
| 6       | 0.00              |
| 3       | 0.00              |
| 7       | 1.00              |
| 2       | 0.50              |
+---------+-------------------+

SELECT s.user_id,
       ROUND(
         CASE WHEN COUNT(c.user_id) = 0 
              THEN 0 
              ELSE 1.0 * SUM(CASE WHEN c.action = 'confirmed' THEN 1 ELSE 0 END) / COUNT(c.user_id)
         END, 2) AS confirmation_rate
FROM Signups s
LEFT JOIN Confirmations c ON s.user_id = c.user_id
GROUP BY s.user_id;

4.3. basic aggregate function(基础聚合函数)

4.3.1.

Write a solution to report the movies with an odd-numbered ID and a description that is not "boring".

Return the result table ordered by rating in descending order.

The result format is in the following example.


Input: 
Cinema table:
+----+------------+-------------+--------+
| id | movie      | description | rating |
+----+------------+-------------+--------+
| 1  | War        | great 3D    | 8.9    |
| 2  | Science    | fiction     | 8.5    |
| 3  | irish      | boring      | 6.2    |
| 4  | Ice song   | Fantacy     | 8.6    |
| 5  | House card | Interesting | 9.1    |
+----+------------+-------------+--------+
Output: 
+----+------------+-------------+--------+
| id | movie      | description | rating |
+----+------------+-------------+--------+
| 5  | House card | Interesting | 9.1    |
| 1  | War        | great 3D    | 8.9    |
+----+------------+-------------+--------+

select id, movie, description, rating 
from Cinema
where id / 2 % 1 and description != "boring"
order by rating desc

4.3.2.

Write a solution to find the average selling price for each product. average_price should be rounded to 2 decimal places. If a product does not have any sold units, its average selling price is assumed to be 0.

Return the result table in any order.

The result format is in the following example.

 
Input: 
Prices table:
+------------+------------+------------+--------+
| product_id | start_date | end_date   | price  |
+------------+------------+------------+--------+
| 1          | 2019-02-17 | 2019-02-28 | 5      |
| 1          | 2019-03-01 | 2019-03-22 | 20     |
| 2          | 2019-02-01 | 2019-02-20 | 15     |
| 2          | 2019-02-21 | 2019-03-31 | 30     |
+------------+------------+------------+--------+
UnitsSold table:
+------------+---------------+-------+
| product_id | purchase_date | units |
+------------+---------------+-------+
| 1          | 2019-02-25    | 100   |
| 1          | 2019-03-01    | 15    |
| 2          | 2019-02-10    | 200   |
| 2          | 2019-03-22    | 30    |
+------------+---------------+-------+
Output: 
+------------+---------------+
| product_id | average_price |
+------------+---------------+
| 1          | 6.96          |
| 2          | 16.96         |
+------------+---------------+


select p.product_id, coalesce(round(sum(p.price * u.units) / sum(u.units), 2), 0) as average_price 
from Prices p
left join UnitsSold u
on p.product_id = u.product_id 
and u.purchase_date between p.start_date  and p.end_date   
group by p.product_id
order by p.product_id

4.3.3.

编写 SQL 查询,报告每个项目的员工平均工作年限,四舍五入到 2 位数字。以任意顺序返回它和结果表。

The query result format is in the following example.

Input: 
Project table:
+-------------+-------------+
| project_id  | employee_id |
+-------------+-------------+
| 1           | 1           |
| 1           | 2           |
| 1           | 3           |
| 2           | 1           |
| 2           | 4           |
+-------------+-------------+
Employee table:
+-------------+--------+------------------+
| employee_id | name   | experience_years |
+-------------+--------+------------------+
| 1           | Khaled | 3                |
| 2           | Ali    | 2                |
| 3           | John   | 1                |
| 4           | Doe    | 2                |
+-------------+--------+------------------+
Output: 
+-------------+---------------+
| project_id  | average_years |
+-------------+---------------+
| 1           | 2.00          |
| 2           | 2.50          |
+-------------+---------------+


select p.project_id, round(avg(e.experience_years), 2) as average_years 
from Project p
join Employee e
on p.employee_id  = e.employee_id 
group by p.project_id;

4.3.4.

	编写解决方案以找出每个竞赛中注册的用户百分比,四舍五入到小数点后两位。按百分比降序排列结果表。如果出现平局,则按 contest_id 升序排列。

The result format is in the following example.

Input: 
Users table:
+---------+-----------+
| user_id | user_name |
+---------+-----------+
| 6       | Alice     |
| 2       | Bob       |
| 7       | Alex      |
+---------+-----------+
Register table:
+------------+---------+
| contest_id | user_id |
+------------+---------+
| 215        | 6       |
| 209        | 2       |
| 208        | 2       |
| 210        | 6       |
| 208        | 6       |
| 209        | 7       |
| 209        | 6       |
| 215        | 7       |
| 208        | 7       |
| 210        | 2       |
| 207        | 2       |
| 210        | 7       |
+------------+---------+
Output: 
+------------+------------+
| contest_id | percentage |
+------------+------------+
| 208        | 100.0      |
| 209        | 100.0      |
| 210        | 100.0      |
| 215        | 66.67      |
| 207        | 33.33      |
+------------+------------+

SELECT r.contest_id,
       ROUND(COUNT(DISTINCT r.user_id) * 100.0 / (SELECT COUNT(DISTINCT user_id) FROM Users), 2) AS percentage
FROM Register r
GROUP BY r.contest_id
ORDER BY percentage DESC, r.contest_id ASC;

4.3.5.

我们将查询质量定义为:

查询评级与其位置的比率的平均值。
我们还将不良查询百分比定义为:
评级低于 3 的所有查询的百分比。
编写解决方案以查找每个查询名称、质量和不良查询百分比。
质量和不良查询百分比都应四舍五入到小数点后 2 位。
以任意顺序返回结果表。

The result format is in the following example.

Input: 
Queries table:
+------------+-------------------+----------+--------+
| query_name | result            | position | rating |
+------------+-------------------+----------+--------+
| Dog        | Golden Retriever  | 1        | 5      |
| Dog        | German Shepherd   | 2        | 5      |
| Dog        | Mule              | 200      | 1      |
| Cat        | Shirazi           | 5        | 2      |
| Cat        | Siamese           | 3        | 3      |
| Cat        | Sphynx            | 7        | 4      |
+------------+-------------------+----------+--------+
Output: 
+------------+---------+-----------------------+
| query_name | quality | poor_query_percentage |
+------------+---------+-----------------------+
| Dog        | 2.50    | 33.33                 |
| Cat        | 0.66    | 33.33                 |
+------------+---------+-----------------------+

SELECT 
  query_name, 
  ROUND(SUM(rating  / position) / COUNT(query_name), 2) AS quality,
  ROUND(SUM(CASE WHEN rating < 3 THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) AS poor_query_percentage
FROM Queries q
GROUP BY query_name;

4.3.6.

Write an SQL query to find for each month and country, the number of transactions and their total amount, the number of approved transactions and their total amount.

Return the result table in any order.

The query result format is in the following example.


Example 1:

Input: 
Transactions table:
+------+---------+----------+--------+------------+
| id   | country | state    | amount | trans_date |
+------+---------+----------+--------+------------+
| 121  | US      | approved | 1000   | 2018-12-18 |
| 122  | US      | declined | 2000   | 2018-12-19 |
| 123  | US      | approved | 2000   | 2019-01-01 |
| 124  | DE      | approved | 2000   | 2019-01-07 |
+------+---------+----------+--------+------------+
Output: 
+----------+---------+-------------+----------------+--------------------+-----------------------+
| month    | country | trans_count | approved_count | trans_total_amount | approved_total_amount |
+----------+---------+-------------+----------------+--------------------+-----------------------+
| 2018-12  | US      | 2           | 1              | 3000               | 1000                  |
| 2019-01  | US      | 1           | 1              | 2000               | 2000                  |
| 2019-01  | DE      | 1           | 1              | 2000               | 2000                  |
+----------+---------+-------------+----------------+--------------------+-----------------------+


select year_month as month, country, trans_count, approved_count 
SELECT 
    id,
    country,
    state,
    amount,
    trans_date,
    DATE_FORMAT(trans_date, '%Y-%m') AS year_month
FROM 
    your_table_name;

4.3.7.

If the customer's preferred delivery date is the same as the order date, then the order is called immediate; otherwise, it is called scheduled.

The first order of a customer is the order with the earliest order date that the customer made. It is guaranteed that a customer has precisely one first order.

Write a solution to find the percentage of immediate orders in the first orders of all customers, rounded to 2 decimal places.

The result format is in the following example.

 

Example 1:

Input: 
Delivery table:
+-------------+-------------+------------+-----------------------------+
| delivery_id | customer_id | order_date | customer_pref_delivery_date |
+-------------+-------------+------------+-----------------------------+
| 1           | 1           | 2019-08-01 | 2019-08-02                  |
| 2           | 2           | 2019-08-02 | 2019-08-02                  |
| 3           | 1           | 2019-08-11 | 2019-08-12                  |
| 4           | 3           | 2019-08-24 | 2019-08-24                  |
| 5           | 3           | 2019-08-21 | 2019-08-22                  |
| 6           | 2           | 2019-08-11 | 2019-08-13                  |
| 7           | 4           | 2019-08-09 | 2019-08-09                  |
+-------------+-------------+------------+-----------------------------+
Output: 
+----------------------+
| immediate_percentage |
+----------------------+
| 50.00                |
+----------------------+


4.3.8.

	编写一个解决方案来报告在首次登录后的第二天再次登录的玩家比例,四舍五入到小数点后 2 位。换句话说,您需要计算从首次登录日期开始连续至少两天登录的玩家数量,然后将该数字除以玩家总数。

The result format is in the following example.

Input: 
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-03-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+
Output: 
+-----------+
| fraction  |
+-----------+
| 0.33      |
+-----------+

 WITH FirstLogin AS (
    SELECT player_id, MIN(event_date) AS first_login_date
    FROM Activity
    GROUP BY player_id
),
NextDayLogin AS (
    SELECT f.player_id
    FROM FirstLogin f
    JOIN Activity a
        ON f.player_id = a.player_id
    WHERE a.event_date = DATE_ADD(f.first_login_date, INTERVAL 1 DAY)
)
SELECT ROUND(COUNT(DISTINCT n.player_id) / COUNT(DISTINCT f.player_id), 2) AS fraction
FROM FirstLogin f
LEFT JOIN NextDayLogin n
    ON f.player_id = n.player_id;

4.4. Sorting and Grouping(排序和分组)

4.5. Advanced Select and Joins(高级选择和连接)

4.6. Subqueries(子查询)

4.7. Advanced String Functions / Regex / Clause

编写一个解决方案,显示每个用户的唯一 ID。如果用户没有唯一 ID,则显示 null

返回结果表格可以按任何顺序。

结果格式如下所示:

示例 1:

输入:
Employees 表:

+----+----------+
| id | name     |
+----+----------+
| 1  | Alice    |
| 7  | Bob      |
| 11 | Meir     |
| 90 | Winston  |
| 3  | Jonathan |
+----+----------+

EmployeeUNI 表: 

+----+-----------+
| id | unique_id |
+----+-----------+
| 3  | 1         |
| 11 | 2         |
| 90 | 3         |
+----+-----------+

输出:

+-----------+----------+
| unique_id | name     |
+-----------+----------+
| null      | Alice    |
| null      | Bob      |
| 2         | Meir     |
| 3         | Winston  |
| 1         | Jonathan |
+-----------+----------+

解释:
Alice 和 Bob 没有唯一 ID,我们将显示 null
Meir 的唯一 ID 是 2。
Winston 的唯一 ID 是 3。
Jonathan 的唯一 ID 是 1。

SELECT e.unique_id, emp.name
FROM Employees emp
LEFT JOIN EmployeeUNI e
ON emp.id = e.id;
posted @ 2025-06-26 17:54  system_T  阅读(35)  评论(0)    收藏  举报