Hbase Individual Filter Syntax

之前上传的 CSDN,水印就懒得再做处理了
Hbase 作业 2

Launch Hadoop and Hbase

开始操作过滤器的时候先启动必要的环境,Hadoop、Hbase 等环境,HBase 需依赖 HDFS 存储数据,且需 ZooKeeper 协调服务,是否启动成功可以通过 jps 命令来查看

start-all.sh
start-hbase.sh
hbase shell

Question: 老是用 jps 来查看环境是否启动成功,jps 到底是啥?
Answer: jps(Java Virtual Machine Process Status Tool)是 JDK 提供的一个命令行工具,用于列出当前系统中所有正在运行的 Java 进程及其详细信息。输入 jps 命令后会列出 Java 进程(显示所有 Java 进程的进程 ID(PID)和主类名称(或 JAR 文件名))。

Launch Hbase Shell

启动 Hbase Shell,之后的过滤器操作都在 Shell 中演示

hbase shell

Data Preparation

准备一些数据来做过滤器的测试

创建 user_info 的表,指定四个 Column Family:basicextendtime_signtime_data

  • basic:用户基础信息(name, age, gender)
  • extend:联系方式(wechat, phone)
  • time_sign:签到时间(in, out)
  • time_data:操作时间(edit, block)
create 'user_info', 'basic', 'extend', 'time_sign', 'time_data'

插入测试数据

put 'user_info', 'u781078001', 'basic:name','Alice'
put 'user_info', 'u781078001', 'basic:age','18'
put 'user_info', 'u781078001', 'basic:gender','female'
put 'user_info', 'u781078001', 'basic:id_card','80'
put 'user_info', 'u781078001', 'basic:email', 'alice@alc.com'
put 'user_info', 'u781078001', 'extend:wechat','9090sa0745a64ghwsrga6'
put 'user_info', 'u781078001', 'extend:phone', '123-456-7890'
put 'user_info', 'u781078001', 'time_sign:in', '2025-01-01 15:00'
put 'user_info', 'u781078001', 'time_sign:out', '2025-01-01 17:50'
put 'user_info', 'u781078001', 'time_data:edit', '2024-01-01 18:50'
put 'user_info', 'u781078001', 'time_data:block', '2024-12-01 11:30'

put 'user_info', 'u781078002', 'basic:name', 'Bob'
put 'user_info', 'u781078002', 'basic:gender', 'male'
put 'user_info', 'u781078002', 'basic:id_card', '90'
put 'user_info', 'u781078002', 'basic:email', 'bob@example.com'
put 'user_info', 'u781078002', 'extend:wechat', '1234567890'
put 'user_info', 'u781078002', 'extend:phone', '987-654-3210'

put 'user_info', 'u781078003', 'basic:name', 'Charlie'
put 'user_info', 'u781078003', 'basic:age', '30'
put 'user_info', 'u781078003', 'basic:id_card', '85'
put 'user_info', 'u781078003', 'basic:email', 'charlie@example.com'
put 'user_info', 'u781078003', 'extend:wechat', '0987654321'
put 'user_info', 'u781078003', 'extend:phone', '111-222-3333'
put 'user_info', 'u781078003', 'time_sign:in', '2023-01-01 15:00'
put 'user_info', 'u781078003', 'time_sign:out', '2023-01-01 17:50'
put 'user_info', 'u781078003', 'time_data:edit', '2023-01-01 18:50'
put 'user_info', 'u781078003', 'time_data:block', '2023-12-01 11:30'

put 'user_info', 'u781078004', 'basic:name', 'Diana'
put 'user_info', 'u781078004', 'basic:age', '22'
put 'user_info', 'u781078004', 'basic:gender', 'female'
put 'user_info', 'u781078004', 'basic:email', 'diana@outlook.com'
put 'user_info', 'u781078004', 'extend:phone', '444-555-6666'

put 'user_info', 'u781078005', 'basic:name', 'Edward'
put 'user_info', 'u781078005', 'basic:age', '28'
put 'user_info', 'u781078005', 'basic:gender', 'male'
put 'user_info', 'u781078005', 'basic:email', 'edward@icloud.com'
put 'user_info', 'u781078005', 'extend:wechat', 'edward_8765'
put 'user_info', 'u781078005', 'extend:phone', '777-888-9999'
put 'user_info', 'u781078005', 'time_sign:in', '2022-01-01 15:00'
put 'user_info', 'u781078005', 'time_data:block', '2022-12-01 11:30'

put 'user_info', 'u781078006', 'basic:age', '24'
put 'user_info', 'u781078006', 'basic:gender', 'female'
put 'user_info', 'u781078006', 'basic:id_card', '55'
put 'user_info', 'u781078006', 'basic:email', 'fiona@yahoo.com'
put 'user_info', 'u781078006', 'extend:wechat', 'fiona_abcde'
put 'user_info', 'u781078006', 'extend:phone', '333-444-5555'

put 'user_info', 'u781078007', 'basic:name', 'George'
put 'user_info', 'u781078007', 'basic:gender', 'male'
put 'user_info', 'u781078007', 'basic:id_card', '40'
put 'user_info', 'u781078007', 'basic:email', 'george@hotmail.com'
put 'user_info', 'u781078007', 'extend:wechat', 'george_xyz12'
put 'user_info', 'u781078007', 'extend:phone', '222-333-4444'

put 'user_info', 'u781078008', 'basic:name', 'Hannah'
put 'user_info', 'u781078008', 'basic:age', '27'
put 'user_info', 'u781078008', 'basic:gender', 'female'
put 'user_info', 'u781078008', 'basic:id_card', '30'
put 'user_info', 'u781078008', 'extend:wechat', 'hannah_54321'
put 'user_info', 'u781078008', 'extend:phone', '111-333-5555'
put 'user_info', 'u781078008', 'time_data:edit', '2012-01-01 18:50'
put 'user_info', 'u781078008', 'time_data:block', '2021-12-01 11:30'

put 'user_info', 'u781078009', 'basic:name', 'Ian'
put 'user_info', 'u781078009', 'basic:age', '31'
put 'user_info', 'u781078009', 'basic:gender', 'male'
put 'user_info', 'u781078009', 'basic:id_card', '25'
put 'user_info', 'u781078009', 'basic:email', 'ian@live.com'
put 'user_info', 'u781078009', 'extend:wechat', 'ian_wechat007'
put 'user_info', 'u781078009', 'extend:phone', '666-777-8888'

put 'user_info', 'u781078010', 'basic:name', 'Jane'
put 'user_info', 'u781078010', 'basic:age', '29'
put 'user_info', 'u781078010', 'basic:gender', 'female'
put 'user_info', 'u781078010', 'basic:id_card', '15'
put 'user_info', 'u781078010', 'basic:email', 'jane@outlook.com'
put 'user_info', 'u781078010', 'extend:wechat', 'jane_smith123'

put 'user_info', 'u781078011', 'basic:name', 'Jack'
put 'user_info', 'u781078011', 'basic:age', '33'
put 'user_info', 'u781078011', 'basic:gender', 'male'
put 'user_info', 'u781078011', 'basic:id_card', '35'
put 'user_info', 'u781078011', 'basic:email', 'jack@gmail.com'
put 'user_info', 'u781078011', 'extend:wechat', 'jack_wechat'
put 'user_info', 'u781078011', 'extend:phone', '555-666-7777'

put 'user_info', 'u781078012', 'basic:name', 'Kelly'
put 'user_info', 'u781078012', 'basic:age', '26'
put 'user_info', 'u781078012', 'basic:gender', 'female'
put 'user_info', 'u781078012', 'basic:id_card', '45'
put 'user_info', 'u781078012', 'basic:email', 'kelly@yahoo.com'
put 'user_info', 'u781078012', 'extend:wechat', 'kelly_123'
put 'user_info', 'u781078012', 'extend:phone', '888-999-0000'

put 'user_info', 'u781078013', 'basic:name', 'Liam'
put 'user_info', 'u781078013', 'extend:phone', '777-888-9999'

put 'user_info', 'u781078014', 'basic:name', 'Mia'
put 'user_info', 'u781078014', 'basic:age', '24'
put 'user_info', 'u781078014', 'basic:gender', 'female'
put 'user_info', 'u781078014', 'basic:id_card', '40'
put 'user_info', 'u781078014', 'basic:email', 'mia@gmail.com'
put 'user_info', 'u781078014', 'extend:wechat', 'mia_123'
put 'user_info', 'u781078014', 'extend:phone', '666-777-8888'

put 'user_info', 'u781078015', 'basic:name', 'Noah'
put 'user_info', 'u781078015', 'basic:age', '30'
put 'user_info', 'u781078015', 'basic:id_card', '55'
put 'user_info', 'u781078015', 'extend:phone', '999-888-7777'

数据量不大,scan 一下 user_info 的全部数据情况

scan 'user_info'

Individual Filter Syntax Demo

Individual Filter Syntax 原文可以通过链接 https://hbase.apache.org/book.html#thrift.filter_language 查看。

KeyOnlyFilter

KeyOnlyFilter 仅返回行键,不返回列值,通过这个命令可以快速查询行键情况

scan 'user_info', FILTER=>"KeyOnlyFilter()"

FirstKeyOnlyFilter

FirstKeyOnlyFilter 仅第一个行键

scan 'user_info', FILTER=>"FirstKeyOnlyFilter()"

咋理解这些数据?为了方便理解,绘制一个简单的概念表(实际的存储并不是按这种关系来存储的,毕竟就不是关系型数据库),再对比一下,就可以看清楚是怎么个回事了,具体可以自己感受。

PrefixFilter

PrefixFilter 按行键前缀过滤

scan 'user_info', FILTER=>"PrefixFilter('u781078001')"

这看起来可以这么来理解,匹配了前缀 u781078001,只有一条数据

scan 'user_info', FILTER=>"PrefixFilter('u78107801')"

匹配前缀 u78107801,过滤出 10-15 的数据

ColumnPrefixFilter

ColumnPrefixFilter 按列名(Qualifier)前缀过滤

scan 'user_info', FILTER=>"ColumnPrefixFilter('name')"

scan 'user_info', FILTER=>"ColumnPrefixFilter('edit')"

MultipleColumnPrefixFilter

同时匹配多个列名前缀

scan 'user_info', FILTER=>"MultipleColumnPrefixFilter('name', 'edit')"

ColumnCountGetFilter

限制每行返回的列数(仅用于 Get 操作)

get 'user_info', 'u781078001', FILTER=>"ColumnCountGetFilter(3)"

PageFilter

限制返回的行数

scan 'user_info', FILTER=>"PageFilter(2)"

ColumnPaginationFilter

对每行的列数分页,参数为(每页列数,跳过的列数)

scan 'user_info', FILTER=>"ColumnPaginationFilter(2, 1)"

InclusiveStopFilter

包含指定的停止行(默认不包含),需配合 STARTROWSTOPROW 使用

scan 'user_info', STARTROW=>'u781078002', STOPROW=>'u781078004', FILTER=>"InclusiveStopFilter('u781078003')"

TimestampsFilter

按时间戳范围过滤

随便搜索一个工具,转换一下时间戳格式

scan 'user_info', FILTER=>"TimestampsFilter(1742050891456, 1742050892034)"

RowFilter

基于比较运算符和比较器对行键过滤,支持多种比较器

scan 'user_info', FILTER=>"RowFilter(=,'substring:001')"

FamilyFilter

按列族名过滤,支持比较运算符

scan 'user_info', FILTER=>"FamilyFilter(=, 'substring:time')"

QualifierFilter

基于比较运算符和比较器对列名过滤

scan 'user_info',FILTER=>"QualifierFilter(=,'binary:block')"

ValueFilter

对 Value 值过滤,支持多种比较器

scan 'user_info', FILTER=>"ValueFilter(=, 'substring:Alice')"

DependentColumnFilter

以参考列的时间戳或值为条件过滤其他列

scan 'user_info', FILTER=>"DependentColumnFilter('basic', 'name')"

scan 'user_info', FILTER=>"DependentColumnFilter('basic', 'email')"

scan 'user_info', FILTER=>"DependentColumnFilter('basic', 'name', false, =, 'substring:Alice')"

scan 'user_info', FILTER=>"DependentColumnFilter('basic', 'age', false, >, 'binary:20')"

SingleColumnValueFilter

对指定列的值过滤,可控制是否返回该列

scan 'user_info', FILTER=>"SingleColumnValueFilter('basic', 'age', =, 'binary:30', true, true)"

scan 'user_info', FILTER=>"SingleColumnValueFilter('basic', 'age', =, 'binary:30', false, true)"

SingleColumnValueExcludeFilter

SingleColumnValueFilter 类似,但排除参考列

scan 'user_info', FILTER=>"SingleColumnValueExcludeFilter('basic', 'name', =, 'binary:Alice', true, true)"

scan 'user_info', FILTER=>"SingleColumnValueExcludeFilter('basic', 'name', =, 'binary:Alice', false, true)"

ColumnRangeFilter

按列名范围过滤,参数控制是否包含边界

scan 'user_info', FILTER=>"ColumnRangeFilter('age', true, 'email', true)"

scan 'user_info', FILTER=>"ColumnRangeFilter('age', true, 'email', false)"

Clear Data

disable 'user_info'
is_enabled 'user_info'
drop 'user_info'
posted @ 2025-04-08 18:47  Charlie_Byte  阅读(62)  评论(0)    收藏  举报