Hbase Individual Filter Syntax
之前上传的 CSDN,水印就懒得再做处理了
Hbase 作业 2
Launch Hadoop and Hbase
开始操作过滤器的时候先启动必要的环境,Hadoop、Hbase 等环境,HBase 需依赖 HDFS 存储数据,且需 ZooKeeper 协调服务,是否启动成功可以通过 jps 命令来查看
start-all.sh
start-hbase.sh
hbase shell

Question: 老是用 jps 来查看环境是否启动成功,jps 到底是啥?
Answer:jps(Java Virtual Machine Process Status Tool)是 JDK 提供的一个命令行工具,用于列出当前系统中所有正在运行的 Java 进程及其详细信息。输入 jps 命令后会列出 Java 进程(显示所有 Java 进程的进程 ID(PID)和主类名称(或 JAR 文件名))。
Launch Hbase Shell
启动 Hbase Shell,之后的过滤器操作都在 Shell 中演示
hbase shell
Data Preparation
准备一些数据来做过滤器的测试
创建 user_info 的表,指定四个 Column Family:basic、extend、time_sign 和 time_data
basic:用户基础信息(name, age, gender)extend:联系方式(wechat, phone)time_sign:签到时间(in, out)time_data:操作时间(edit, block)
create 'user_info', 'basic', 'extend', 'time_sign', 'time_data'

插入测试数据
put 'user_info', 'u781078001', 'basic:name','Alice'
put 'user_info', 'u781078001', 'basic:age','18'
put 'user_info', 'u781078001', 'basic:gender','female'
put 'user_info', 'u781078001', 'basic:id_card','80'
put 'user_info', 'u781078001', 'basic:email', 'alice@alc.com'
put 'user_info', 'u781078001', 'extend:wechat','9090sa0745a64ghwsrga6'
put 'user_info', 'u781078001', 'extend:phone', '123-456-7890'
put 'user_info', 'u781078001', 'time_sign:in', '2025-01-01 15:00'
put 'user_info', 'u781078001', 'time_sign:out', '2025-01-01 17:50'
put 'user_info', 'u781078001', 'time_data:edit', '2024-01-01 18:50'
put 'user_info', 'u781078001', 'time_data:block', '2024-12-01 11:30'
put 'user_info', 'u781078002', 'basic:name', 'Bob'
put 'user_info', 'u781078002', 'basic:gender', 'male'
put 'user_info', 'u781078002', 'basic:id_card', '90'
put 'user_info', 'u781078002', 'basic:email', 'bob@example.com'
put 'user_info', 'u781078002', 'extend:wechat', '1234567890'
put 'user_info', 'u781078002', 'extend:phone', '987-654-3210'
put 'user_info', 'u781078003', 'basic:name', 'Charlie'
put 'user_info', 'u781078003', 'basic:age', '30'
put 'user_info', 'u781078003', 'basic:id_card', '85'
put 'user_info', 'u781078003', 'basic:email', 'charlie@example.com'
put 'user_info', 'u781078003', 'extend:wechat', '0987654321'
put 'user_info', 'u781078003', 'extend:phone', '111-222-3333'
put 'user_info', 'u781078003', 'time_sign:in', '2023-01-01 15:00'
put 'user_info', 'u781078003', 'time_sign:out', '2023-01-01 17:50'
put 'user_info', 'u781078003', 'time_data:edit', '2023-01-01 18:50'
put 'user_info', 'u781078003', 'time_data:block', '2023-12-01 11:30'
put 'user_info', 'u781078004', 'basic:name', 'Diana'
put 'user_info', 'u781078004', 'basic:age', '22'
put 'user_info', 'u781078004', 'basic:gender', 'female'
put 'user_info', 'u781078004', 'basic:email', 'diana@outlook.com'
put 'user_info', 'u781078004', 'extend:phone', '444-555-6666'
put 'user_info', 'u781078005', 'basic:name', 'Edward'
put 'user_info', 'u781078005', 'basic:age', '28'
put 'user_info', 'u781078005', 'basic:gender', 'male'
put 'user_info', 'u781078005', 'basic:email', 'edward@icloud.com'
put 'user_info', 'u781078005', 'extend:wechat', 'edward_8765'
put 'user_info', 'u781078005', 'extend:phone', '777-888-9999'
put 'user_info', 'u781078005', 'time_sign:in', '2022-01-01 15:00'
put 'user_info', 'u781078005', 'time_data:block', '2022-12-01 11:30'
put 'user_info', 'u781078006', 'basic:age', '24'
put 'user_info', 'u781078006', 'basic:gender', 'female'
put 'user_info', 'u781078006', 'basic:id_card', '55'
put 'user_info', 'u781078006', 'basic:email', 'fiona@yahoo.com'
put 'user_info', 'u781078006', 'extend:wechat', 'fiona_abcde'
put 'user_info', 'u781078006', 'extend:phone', '333-444-5555'
put 'user_info', 'u781078007', 'basic:name', 'George'
put 'user_info', 'u781078007', 'basic:gender', 'male'
put 'user_info', 'u781078007', 'basic:id_card', '40'
put 'user_info', 'u781078007', 'basic:email', 'george@hotmail.com'
put 'user_info', 'u781078007', 'extend:wechat', 'george_xyz12'
put 'user_info', 'u781078007', 'extend:phone', '222-333-4444'
put 'user_info', 'u781078008', 'basic:name', 'Hannah'
put 'user_info', 'u781078008', 'basic:age', '27'
put 'user_info', 'u781078008', 'basic:gender', 'female'
put 'user_info', 'u781078008', 'basic:id_card', '30'
put 'user_info', 'u781078008', 'extend:wechat', 'hannah_54321'
put 'user_info', 'u781078008', 'extend:phone', '111-333-5555'
put 'user_info', 'u781078008', 'time_data:edit', '2012-01-01 18:50'
put 'user_info', 'u781078008', 'time_data:block', '2021-12-01 11:30'
put 'user_info', 'u781078009', 'basic:name', 'Ian'
put 'user_info', 'u781078009', 'basic:age', '31'
put 'user_info', 'u781078009', 'basic:gender', 'male'
put 'user_info', 'u781078009', 'basic:id_card', '25'
put 'user_info', 'u781078009', 'basic:email', 'ian@live.com'
put 'user_info', 'u781078009', 'extend:wechat', 'ian_wechat007'
put 'user_info', 'u781078009', 'extend:phone', '666-777-8888'
put 'user_info', 'u781078010', 'basic:name', 'Jane'
put 'user_info', 'u781078010', 'basic:age', '29'
put 'user_info', 'u781078010', 'basic:gender', 'female'
put 'user_info', 'u781078010', 'basic:id_card', '15'
put 'user_info', 'u781078010', 'basic:email', 'jane@outlook.com'
put 'user_info', 'u781078010', 'extend:wechat', 'jane_smith123'
put 'user_info', 'u781078011', 'basic:name', 'Jack'
put 'user_info', 'u781078011', 'basic:age', '33'
put 'user_info', 'u781078011', 'basic:gender', 'male'
put 'user_info', 'u781078011', 'basic:id_card', '35'
put 'user_info', 'u781078011', 'basic:email', 'jack@gmail.com'
put 'user_info', 'u781078011', 'extend:wechat', 'jack_wechat'
put 'user_info', 'u781078011', 'extend:phone', '555-666-7777'
put 'user_info', 'u781078012', 'basic:name', 'Kelly'
put 'user_info', 'u781078012', 'basic:age', '26'
put 'user_info', 'u781078012', 'basic:gender', 'female'
put 'user_info', 'u781078012', 'basic:id_card', '45'
put 'user_info', 'u781078012', 'basic:email', 'kelly@yahoo.com'
put 'user_info', 'u781078012', 'extend:wechat', 'kelly_123'
put 'user_info', 'u781078012', 'extend:phone', '888-999-0000'
put 'user_info', 'u781078013', 'basic:name', 'Liam'
put 'user_info', 'u781078013', 'extend:phone', '777-888-9999'
put 'user_info', 'u781078014', 'basic:name', 'Mia'
put 'user_info', 'u781078014', 'basic:age', '24'
put 'user_info', 'u781078014', 'basic:gender', 'female'
put 'user_info', 'u781078014', 'basic:id_card', '40'
put 'user_info', 'u781078014', 'basic:email', 'mia@gmail.com'
put 'user_info', 'u781078014', 'extend:wechat', 'mia_123'
put 'user_info', 'u781078014', 'extend:phone', '666-777-8888'
put 'user_info', 'u781078015', 'basic:name', 'Noah'
put 'user_info', 'u781078015', 'basic:age', '30'
put 'user_info', 'u781078015', 'basic:id_card', '55'
put 'user_info', 'u781078015', 'extend:phone', '999-888-7777'

数据量不大,scan 一下 user_info 的全部数据情况
scan 'user_info'

Individual Filter Syntax Demo
Individual Filter Syntax 原文可以通过链接 https://hbase.apache.org/book.html#thrift.filter_language 查看。

KeyOnlyFilter
KeyOnlyFilter 仅返回行键,不返回列值,通过这个命令可以快速查询行键情况
scan 'user_info', FILTER=>"KeyOnlyFilter()"

FirstKeyOnlyFilter
FirstKeyOnlyFilter 仅第一个行键
scan 'user_info', FILTER=>"FirstKeyOnlyFilter()"

咋理解这些数据?为了方便理解,绘制一个简单的概念表(实际的存储并不是按这种关系来存储的,毕竟就不是关系型数据库),再对比一下,就可以看清楚是怎么个回事了,具体可以自己感受。

PrefixFilter
PrefixFilter 按行键前缀过滤
scan 'user_info', FILTER=>"PrefixFilter('u781078001')"

这看起来可以这么来理解,匹配了前缀 u781078001,只有一条数据

scan 'user_info', FILTER=>"PrefixFilter('u78107801')"

匹配前缀 u78107801,过滤出 10-15 的数据

ColumnPrefixFilter
ColumnPrefixFilter 按列名(Qualifier)前缀过滤
scan 'user_info', FILTER=>"ColumnPrefixFilter('name')"


scan 'user_info', FILTER=>"ColumnPrefixFilter('edit')"


MultipleColumnPrefixFilter
同时匹配多个列名前缀
scan 'user_info', FILTER=>"MultipleColumnPrefixFilter('name', 'edit')"

ColumnCountGetFilter
限制每行返回的列数(仅用于 Get 操作)
get 'user_info', 'u781078001', FILTER=>"ColumnCountGetFilter(3)"


PageFilter
限制返回的行数
scan 'user_info', FILTER=>"PageFilter(2)"


ColumnPaginationFilter
对每行的列数分页,参数为(每页列数,跳过的列数)
scan 'user_info', FILTER=>"ColumnPaginationFilter(2, 1)"


InclusiveStopFilter
包含指定的停止行(默认不包含),需配合 STARTROW 和 STOPROW 使用
scan 'user_info', STARTROW=>'u781078002', STOPROW=>'u781078004', FILTER=>"InclusiveStopFilter('u781078003')"

TimestampsFilter
按时间戳范围过滤
随便搜索一个工具,转换一下时间戳格式


scan 'user_info', FILTER=>"TimestampsFilter(1742050891456, 1742050892034)"

RowFilter
基于比较运算符和比较器对行键过滤,支持多种比较器
scan 'user_info', FILTER=>"RowFilter(=,'substring:001')"

FamilyFilter
按列族名过滤,支持比较运算符
scan 'user_info', FILTER=>"FamilyFilter(=, 'substring:time')"

QualifierFilter
基于比较运算符和比较器对列名过滤
scan 'user_info',FILTER=>"QualifierFilter(=,'binary:block')"

ValueFilter
对 Value 值过滤,支持多种比较器
scan 'user_info', FILTER=>"ValueFilter(=, 'substring:Alice')"

DependentColumnFilter
以参考列的时间戳或值为条件过滤其他列
scan 'user_info', FILTER=>"DependentColumnFilter('basic', 'name')"

scan 'user_info', FILTER=>"DependentColumnFilter('basic', 'email')"

scan 'user_info', FILTER=>"DependentColumnFilter('basic', 'name', false, =, 'substring:Alice')"

scan 'user_info', FILTER=>"DependentColumnFilter('basic', 'age', false, >, 'binary:20')"

SingleColumnValueFilter
对指定列的值过滤,可控制是否返回该列
scan 'user_info', FILTER=>"SingleColumnValueFilter('basic', 'age', =, 'binary:30', true, true)"

scan 'user_info', FILTER=>"SingleColumnValueFilter('basic', 'age', =, 'binary:30', false, true)"

SingleColumnValueExcludeFilter
与 SingleColumnValueFilter 类似,但排除参考列
scan 'user_info', FILTER=>"SingleColumnValueExcludeFilter('basic', 'name', =, 'binary:Alice', true, true)"

scan 'user_info', FILTER=>"SingleColumnValueExcludeFilter('basic', 'name', =, 'binary:Alice', false, true)"

ColumnRangeFilter
按列名范围过滤,参数控制是否包含边界
scan 'user_info', FILTER=>"ColumnRangeFilter('age', true, 'email', true)"

scan 'user_info', FILTER=>"ColumnRangeFilter('age', true, 'email', false)"

Clear Data
disable 'user_info'
is_enabled 'user_info'
drop 'user_info'

浙公网安备 33010602011771号