ZhangZhihui's Blog  

MATCH_RECOGNIZE is a powerful ANSI SQL standard feature (introduced in SQL:2016) used for Complex Event Processing (CEP). It allows you to search for patterns across multiple rows of data, similar to how Regular Expressions work for strings, but applied to rows in a table.

It is most commonly used in streaming engines like Apache Flink, Oracle Database, and Snowflake.


How it Works: The Structure

The clause follows a specific sequence to define how rows should be grouped, ordered, and matched.

SELECT *
FROM Ticker
MATCH_RECOGNIZE (
    PARTITION BY symbol          -- Group data (e.g., by stock ticker)
    ORDER BY tstamp              -- Sort data (usually by time)
    MEASURES                     -- Define the output columns
        FIRST(START_ROW.tstamp) AS start_tstamp,
        LAST(PRICE_DOWN.tstamp) AS bottom_tstamp,
        LAST(PRICE_UP.tstamp) AS end_tstamp
    ONE ROW PER MATCH            -- Output summary or all rows
    AFTER MATCH SKIP TO LAST PRICE_UP -- Where to start the next search
    PATTERN (START_ROW PRICE_DOWN+ PRICE_UP+) -- The Regex pattern
    DEFINE                       -- Define what the variables mean
        PRICE_DOWN AS price < PREV(price),
        PRICE_UP AS price > PREV(price)
)

Core Components Explained

  • PARTITION BY & ORDER BY: You cannot find a pattern if the data is jumbled. This organizes the stream into logical partitions and ensures they are chronologically sorted.

  • PATTERN: This is where you define the sequence using Regex-style quantifiers:

    • A B: A followed by B.

    • A+: One or more A.

    • A*: Zero or more A.

    • A?: Zero or one A.

  • DEFINE: This is the logic for your variables. In the example above, PRICE_UP is only met if the current row's price is higher than the PREV (previous) row's price.

  • MEASURES: Since a match spans multiple rows, MEASURES allows you to calculate aggregate values for the entire match (like the start price vs. end price).


Common Use Cases

Use Case Pattern Logic
V-Shape Recovery A decrease in price followed by an immediate increase.
Double Bottom Price drops, rises slightly, drops again to the same level, then rises.
Fraud Detection A "Login" event followed by "Change Password" and "Large Withdrawal" within a short time.
Sensor Jitter Detecting when a temperature sensor fluctuates above a threshold more than 3 times.

Why use this over Standard SQL?

Without MATCH_RECOGNIZE, you would have to use complex Self-Joins or Window Functions (like LAG and LEAD) which become unreadable and perform poorly as the pattern length increases. MATCH_RECOGNIZE is processed in a single pass over the data, making it highly efficient for real-time streaming.

Would you like me to create a specific SQL example for a use case like fraud detection or website clickstream analysis?

 

Yes, I would. Please create a specific SQL example for each of the use case you listed in Common Use Cases.

 

Here are specific ANSI SQL MATCH_RECOGNIZE examples for each of the use cases mentioned. These examples use syntax compatible with Apache Flink SQL and Oracle.


1. V-Shape Recovery (Market Analysis)

This pattern detects a "dip" in price followed by an immediate recovery. We define the "Bottom" of the V as the point where the price stops falling and starts rising.

SELECT *
FROM StockTicks
MATCH_RECOGNIZE (
    PARTITION BY symbol
    ORDER BY rowtime
    MEASURES
        START_ROW.price AS start_price,
        BOTTOM.price AS bottom_price,
        LAST(UP.price) AS recovery_price
    ONE ROW PER MATCH
    PATTERN (START_ROW DOWN+ BOTTOM UP+)
    DEFINE
        DOWN AS price < PREV(price),
        BOTTOM AS price < PREV(price),
        UP AS price > PREV(price)
);

2. Double Bottom (Technical Trading)

A "Double Bottom" looks like a "W". The price drops, hits a floor, rises, drops back to that same floor, and then breaks out upward.

 
SELECT *
FROM StockTicks
MATCH_RECOGNIZE (
    PARTITION BY symbol
    ORDER BY rowtime
    MEASURES
        FIRST(B1.price) AS first_low,
        FIRST(B2.price) AS second_low
    ONE ROW PER MATCH
    PATTERN (START_ROW DOWN+ B1 UP+ B2 UP+)
    DEFINE
        DOWN AS price < PREV(price),
        B1 AS price < PREV(price),
        UP AS price > PREV(price),
        -- The second bottom must be within 1% of the first bottom
        B2 AS price < PREV(price) 
              AND ABS(price - FIRST(B1.price)) < (FIRST(B1.price) * 0.01)
);

3. Fraud Detection (Security)

Detecting a high-risk sequence: A user logs in, immediately changes their password, and then initiates a withdrawal larger than $10,000, all within a short timeframe.

SELECT *
FROM AccountEvents
MATCH_RECOGNIZE (
    PARTITION BY account_id
    ORDER BY event_time
    MEASURES
        A.event_time AS login_time,
        C.amount AS withdrawal_amount
    ONE ROW PER MATCH
    PATTERN (A B C) 
    DEFINE
        A AS event_type = 'LOGIN',
        B AS event_type = 'PASSWORD_CHANGE',
        C AS event_type = 'WITHDRAWAL' AND amount > 10000
) 
-- Note: In Flink, you'd typically wrap this to ensure it happens within 1 hour

4. Sensor Jitter (IoT/Manufacturing)

This pattern identifies a malfunctioning sensor that is "jittering"—fluctuating rapidly above a certain threshold (e.g., 100°C) more than 3 times in a row.

SELECT *
FROM SensorReadings
MATCH_RECOGNIZE (
    PARTITION BY sensor_id
    ORDER BY ts
    MEASURES
        FIRST(JITTER.ts) AS jitter_start,
        LAST(JITTER.ts) AS jitter_end,
        COUNT(JITTER.temp) AS spike_count
    ONE ROW PER MATCH
    PATTERN (JITTER{3,}) -- Matches 3 or more consecutive JITTER events
    DEFINE
        -- Jitter is defined as a reading > 100 that is also 10 degrees 
        -- different from the previous reading
        JITTER AS temp > 100 AND ABS(temp - PREV(temp)) > 10
);

Key Takeaways for Writing Patterns

  • Quantifiers: Notice {3,} in the Sensor Jitter example? That means "at least 3." You can also use + (1 or more) or * (0 or more).

  • Strict Contiguity: By default, MATCH_RECOGNIZE looks for rows that are strictly next to each other in the sorted order.

  • Logical Offsets: PREV() is your best friend here. It allows you to compare the current row to the one right before it to detect trends (rising/falling).

Would you like to see how to handle "Late Data" in these SQL queries using Flink's Watermarks?

 

Yes, I would.

 

In Apache Flink, combining MATCH_RECOGNIZE with watermarks is essential because the SQL engine needs to know when it can "give up" waiting for a pattern to complete and move on.

When you use event-time (the timestamp inside your data), Flink uses the Watermark as a signal that no more old data will arrive. This allows the system to clean up memory and finalize matches.

How Watermarks Impact the Query

Without a watermark, a pattern like "Login followed by Logout" might stay in Flink's memory forever if the "Logout" event never arrives. To prevent this, we use a WITHIN clause or windowing logic.

Example: Fraud Detection with Watermarks

Here is how you would write a robust Flink SQL query that handles late data and out-of-order events using a 1-hour time constraint.

-- 1. Create the table with a Watermark strategy
CREATE TABLE AccountEvents (
    account_id INT,
    event_time TIMESTAMP(3),
    event_type STRING,
    amount DOUBLE,
    -- Extract the timestamp and allow 5 seconds of 'lateness'
    WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND
) WITH (
    'connector' = 'kafka',
    ...
);

-- 2. Run the MATCH_RECOGNIZE query
SELECT *
FROM AccountEvents
MATCH_RECOGNIZE (
    PARTITION BY account_id
    ORDER BY event_time
    MEASURES
        A.event_time AS login_time,
        C.event_time AS withdrawal_time
    ONE ROW PER MATCH
    -- We add a time constraint to the pattern itself
    PATTERN (A B C) WITHIN INTERVAL '1' HOUR 
    DEFINE
        A AS event_type = 'LOGIN',
        B AS event_type = 'PASSWORD_CHANGE',
        C AS event_type = 'WITHDRAWAL' AND amount > 10000
);

What happens behind the scenes?

  1. Buffering: As rows arrive, Flink buffers them in state, sorted by event_time.

  2. Watermark Arrival: When a Watermark for 12:00:00 arrives, Flink knows that all data for 12:00:00 is now present.

  3. Pattern Matching: Flink attempts to match the sequence A -> B -> C.

  4. Pruning: Because of the WITHIN INTERVAL '1' HOUR clause, if an A (Login) happened at 10:00:00 and the Watermark passes 11:00:00 without a C occurring, Flink automatically drops that partial match from memory to save space.

Handling "Late" Data

If a record arrives after the watermark has already passed its timestamp (e.g., a record from 12:00:00 arrives when the watermark is at 12:00:10), Flink considers it "Late Data".

  • In SQL, late data is typically dropped by default to maintain correctness.

  • If you need to keep it, you would need to increase the "out-of-orderness" interval in your WATERMARK definition (changing '5' SECOND to something larger).

Would you like to see how to capture these "timed-out" partial matches (e.g., catching everyone who logged in but didn't logout) using the OMIT EMPTY MATCHES or similar logic?

 

Yes, I would.

 

In Flink SQL, capturing "timed-out" or "incomplete" matches is a classic requirement—for example, identifying users who started a checkout process but never completed it.

Standard MATCH_RECOGNIZE only emits a row when the entire pattern is satisfied. However, you can use proactive pattern timeouts or specific logic to find "non-events."

1. Finding "Abandoned" Patterns (The Timeout Logic)

To find someone who performed Action A but not Action B within a timeframe, we usually define a pattern that looks for A, then a period of "nothingness," or we use the WITHIN clause combined with an outer join logic.

However, since MATCH_RECOGNIZE is designed for positive matches, the most common way to handle "incomplete" paths is to use Final State detection.

SELECT *
FROM WebTraffic
MATCH_RECOGNIZE (
    PARTITION BY session_id
    ORDER BY rowtime
    MEASURES
        A.ts AS start_time,
        LAST(B.ts) AS last_action_time
    ONE ROW PER MATCH
    -- We look for Start, then any number of clicks, 
    -- but we want to know if it DIDN'T end in 'PURCHASE'
    PATTERN (A B* ) WITHIN INTERVAL '30' MINUTE
    DEFINE
        A AS status = 'START_CHECKOUT',
        B AS status = 'CLICK'
) AS T
-- We then filter for those that never hit the 'PURCHASE' status 
-- in a separate logic or by refining the pattern variables.

2. Using AFTER MATCH SKIP to Handle Overlaps

One of the most complex parts of late data and pattern matching is deciding where to start looking again after a match (or a partial match) is found.

  • AFTER MATCH SKIP TO NEXT ROW: The most thorough. It checks every possible starting point.

  • AFTER MATCH SKIP PAST LAST ROW: Most efficient. It prevents the same event from being part of two different matches.

3. Captured Timed-out Matches (The "Timeout" Workaround)

Currently, ANSI SQL MATCH_RECOGNIZE doesn't have a native "ON TIMEOUT" clause in all implementations. To capture people who timed out, Flink developers often use a Pattern FlatMap in the DataStream API or a trick in SQL involving Interval Joins.

If you stay in SQL, here is the pattern for "A followed by NOT B":

-- Using an Anti-Join to find missing events (The 'Timeout' equivalent)
SELECT 
    starts.user_id, 
    starts.rowtime as start_time
FROM CheckoutStarts starts
LEFT JOIN CheckoutCompletes completes
    ON starts.user_id = completes.user_id
    -- This is the 'Time Window' where we expect the match
    AND completes.rowtime BETWEEN starts.rowtime 
        AND starts.rowtime + INTERVAL '1' HOUR
WHERE completes.user_id IS NULL; -- Only return those that NEVER matched

4. How Watermarks Trigger the "End"

Whether you use MATCH_RECOGNIZE or an Interval Join, the Watermark is the physical trigger.

  1. Event A arrives at 10:00 AM. Flink stores it in state.

  2. The WITHIN INTERVAL 1 HOUR or BETWEEN clause tells Flink to wait until 11:00 AM.

  3. Flink's clock (the Watermark) reaches 11:00:01 AM.

  4. Flink looks at its state, sees Event A has no matching Event B, and—because the time has expired—it either drops it or (in the case of a Left Anti-Join) emits it as an abandonment.


Summary Table: Watermarks vs. Patterns

Feature Role of Watermark
PATTERN ... WITHIN Defines when to clear "partial matches" from memory.
ORDER BY event_time Uses watermarks to ensure rows are sorted before matching.
State Size Watermarks allow Flink to prune state, preventing "Out of Memory" errors.
Lateness Determines if a late event should be ignored or if the pattern should be "re-evaluated" (rare in SQL).

Would you like to see the DataStream API (CEP) version of this? It is much more flexible than SQL for capturing "Timed Out" matches specifically because it has an OutputTag for timed-out data.

 

Yes, I would.

 

In Flink, Python support for the DataStream API is provided via PyFlink. While the SQL/Table API is more common in Python, you can use the PatternProcessFunction logic in PyFlink to capture timed-out matches similarly to the Java implementation.

1. The PyFlink CEP Implementation

To capture timeouts, you define a pattern and use the process method with a PatternProcessFunction. In PyFlink, you handle timed-out data by overriding the on_timeout method.

from pyflink.common import WatermarkStrategy, Time, Duration
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.functions import PatternProcessFunction
from pyflink.datastream.window import OutputTag
from pyflink.datastream.cep import CEP, Pattern

# 1. Setup Environment
env = StreamExecutionEnvironment.get_execution_environment()

# 2. Define the Side Output Tag for Timed-out data
timeout_tag = OutputTag("abandoned-cart")

# 3. Create a Pattern: Start Checkout -> Purchase within 30 mins
pattern = Pattern.begin("start").where(lambda e: e['type'] == 'START_CHECKOUT') \
    .followed_by("complete").where(lambda e: e['type'] == 'PURCHASE') \
    .within(Duration.of_minutes(30))

# 4. Define the Processing Logic
class MyPatternProcessFunction(PatternProcessFunction):
    def process_match(self, match, ctx):
        # This executes for successful completions
        start_event = match['start'][0]
        return f"User {start_event['user_id']} completed purchase."

    def on_timeout(self, match, ctx):
        # This executes when the Watermark exceeds the 30-minute window
        start_event = match['start'][0]
        # Emit to side output
        ctx.output(timeout_tag, f"ALERT: User {start_event['user_id']} abandoned cart!")

# 5. Apply Pattern to Stream
# (Assume 'ds' is a DataStream with a Watermark Strategy already applied)
pattern_stream = CEP.pattern(ds, pattern)
result_stream = pattern_stream.process(MyPatternProcessFunction())

# 6. Extract the Timeout Stream
abandoned_stream = result_stream.get_side_output(timeout_tag)

result_stream.print()
abandoned_stream.print()

env.execute("CEP Timeout Example")

2. How the Watermark triggers the Python Code

The relationship between your source events and this code is strictly governed by the Watermark Strategy you apply at the source.

  1. Event Ingestion: A START_CHECKOUT event for User 101 arrives with a timestamp of 10:00:00. Flink puts this into its internal "Priority Queue" state.

  2. The Window: Because of .within(Duration.of_minutes(30)), Flink sets an internal timer for 10:30:00.

  3. Watermark Progress: Your source keeps emitting events. As it does, it periodically emits watermarks (e.g., WM: 10:15:00).

  4. The Timeout Trigger: Only when a Watermark arrives with a value $\ge$ 10:30:00 will Flink's CEP operator conclude that User 101's "PURCHASE" event is never coming.

  5. Execution: Flink then executes the on_timeout Python function and clears User 101 from memory.


3. Key Differences in Python

  • Performance: PyFlink CEP runs on the JVM, but your on_timeout logic executes in a Python process. For extremely high-throughput patterns, the "Context Switching" between JVM and Python can be a bottleneck.

  • Type Information: You must ensure your DataStream has clear type information (like Types.ROW or Types.MAP) so the Python environment knows how to deserialize the match maps.

Summary Table: Timeout Handling

Step Action Triggered By
Start Match Store START_CHECKOUT in state. Arrival of Source Event
End Match Call process_match. Arrival of PURCHASE before 30m
Trigger Timeout Call on_timeout. Arrival of Watermark > 10:30:00
State Cleanup Delete event from memory. Completion or Timeout

Would you like me to show you how to configure the Watermark Strategy in Python to specifically allow for "Late Data" before these timeouts trigger?

 

Yes, I would.

 

In Python, configuring the Watermark strategy is the "knob" that controls how patient Flink is before it triggers those timeouts. If your network is jittery, you want to allow for some Out-of-Orderness so that a "Purchase" event arriving a few seconds late doesn't get ignored.

Here is how you configure the Watermark Strategy and the Source in PyFlink to balance latency and correctness.

1. Configuring the Watermark Strategy in Python

You apply the strategy to your source stream. The for_bounded_out_of_orderness method is the most common way to handle "reasonably late" data.

from pyflink.common import WatermarkStrategy, Duration
from pyflink.common.typeinfo import Types
from pyflink.datastream import StreamExecutionEnvironment

env = StreamExecutionEnvironment.get_execution_environment()

# 1. Define the Strategy: Allow data to be up to 10 seconds late
# This means Flink's "Internal Clock" will always be (Max Timestamp - 10s)
watermark_strategy = WatermarkStrategy \
    .for_bounded_out_of_orderness(Duration.of_seconds(10)) \
    .with_timestamp_assigner(lambda e, record_timestamp: e['ts'])

# 2. Apply it to your stream
ds = env.from_collection(
    collection=[
        {'user_id': 1, 'type': 'START_CHECKOUT', 'ts': 1672531200000}, # 10:00:00
        {'user_id': 1, 'type': 'PURCHASE', 'ts': 1672531205000}        # 10:00:05
    ],
    type_info=Types.MAP(Types.STRING(), Types.LONG())
).assign_timestamps_and_watermarks(watermark_strategy)

2. The Impact on "Timed Out" Matches

The amount of "out-of-orderness" you allow directly delays your timeout notifications.

  • If Delay = 0 seconds: As soon as an event from 10:30:01 arrives, the on_timeout for a 10:00:00 event triggers immediately.

  • If Delay = 10 seconds: Flink waits until an event from 10:30:11 arrives. This gives "late" Purchase events an extra 10 seconds to arrive and still count as a successful match instead of a timeout.


3. Visualizing the "Late Data" Flow

When you combine a 30-minute CEP window with a 10-second Watermark delay, the logic looks like this:

Event Timestamp System Max Timestamp Watermark (Max - 10s) Action
10:00:00 10:00:00 09:59:50 "START_CHECKOUT" stored in state.
10:30:05 10:30:05 10:29:55 No timeout yet (Watermark hasn't hit 10:30:00).
10:30:08 (late Purchase) 10:30:08 10:29:58 SUCCESS! Match found because timeout hadn't triggered.
10:30:11 10:30:11 10:30:01 TIMEOUT TRIGGERED for any unmatched events from 10:00:00.

4. Handling "Extremely Late" Data (Side Outputs)

If an event arrives after the Watermark has already passed its timestamp, it is considered "Droppable Late Data." In the DataStream API, you can capture these as well so they aren't lost forever:

# In standard windowing (non-CEP), you can use 'side_output_late_data'
# For CEP, you generally increase the BoundedOutOfOrderness to cover your expected lag.
watermark_strategy = WatermarkStrategy \
    .for_bounded_out_of_orderness(Duration.of_minutes(1)) # Be very patient (1 min)

Summary of the Lifecycle

  1. Event Time: The timestamp etched into your data.

  2. Watermark: The "Truth" of what time it currently is in the stream.

  3. OutOfOrderness: The "Grace Period" you give late events.

  4. CEP within(): The "Business Rule" for how long a process should take.

Would you like to see how to handle "Idle Sources"—where the watermark stops moving because one Kafka partition has no new data?

 

Yes, I would.

 

In a distributed system like Flink, your data is often split across multiple partitions (e.g., Kafka partitions). Flink tracks the watermark for each partition individually and uses the minimum watermark of all partitions to advance the global clock.

If one partition stops receiving data (becomes idle), its watermark stops advancing. Because Flink waits for the slowest partition, the entire job's watermark will get "stuck," and your on_timeout functions will never trigger.

1. The "Stuck Watermark" Problem

Imagine you have two Kafka partitions. Partition A is busy, but Partition B has no events for 10 minutes.

  • Partition A: Watermark = 10:45 AM

  • Partition B: Watermark = 10:00 AM (Last seen event)

  • Global Watermark: 10:00 AM (The minimum)

Even though it is 10:45 AM, any 30-minute pattern that started at 10:10 AM will not timeout because the global clock is stuck at 10:00 AM.


2. The Python Solution: with_idleness

To fix this, you tell Flink: "If you don't hear from a partition for $X$ amount of time, ignore it when calculating the global watermark."

from pyflink.common import WatermarkStrategy, Duration

# Configure the strategy with an Idleness timeout
watermark_strategy = WatermarkStrategy \
    .for_bounded_out_of_orderness(Duration.of_seconds(10)) \
    .with_timestamp_assigner(lambda e, record_timestamp: e['ts']) \
    .with_idleness(Duration.of_minutes(1)) # <--- The Fix

# Apply to your source
ds = env.from_source(source, watermark_strategy, "Kafka Source")

3. How Idleness Restores the Timeout Logic

Once you add .with_idleness(Duration.of_minutes(1)):

  1. Partition B stops sending data at 10:00 AM.

  2. After 1 minute of silence (10:01 AM wall-clock time), Flink marks Partition B as Idle.

  3. Flink now calculates the Global Watermark using only active partitions (Partition A).

  4. The Global Watermark jumps to 10:45 AM.

  5. All pending on_timeout calls in your CEP patterns between 10:00 AM and 10:15 AM trigger immediately.


Summary of Watermark "Knobs"

Configuration Purpose Impact on on_timeout
for_bounded_out_of_orderness Handles late data. Delays the timeout trigger by the offset amount.
within(Duration) Business logic duration. Defines when a pattern is considered "failed."
with_idleness Handles silent partitions. Ensures timeouts trigger even if some data sources are empty.

One Final Tip for Python Developers

When working with PyFlink and Watermarks, always ensure your timestamps are in Milliseconds (Unix epoch). If your Python logic sends seconds (e.g., 1672531200), Flink will interpret that as a date in the year 1970, and your watermarks will behave very strangely!

 

 

posted on 2025-12-27 21:36  ZhangZhihuiAAA  阅读(1)  评论(0)    收藏  举报