Flink -- Keyed State

    /* <pre>{@code
     * DataStream<MyType> stream = ...;
     * KeyedStream<MyType> keyedStream = stream.keyBy("id");
     *
     * keyedStream.map(new RichMapFunction<MyType, Tuple2<MyType, Long>>() {
     *
     *     private ValueState<Long> count;
     *
     *     public void open(Configuration cfg) {
     *         state = getRuntimeContext().getState(
     *                 new ValueStateDescriptor<Long>("count", LongSerializer.INSTANCE, 0L));
     *     }
     *
     *     public Tuple2<MyType, Long> map(MyType value) {
     *         long count = state.value() + 1;
     *         state.update(value);
     *         return new Tuple2<>(value, count);
     *     }
     * });
     * }</pre>
     */

 

在使用keyed state时,首先需要初始化,这里以ValueState为例子,

state = getRuntimeContext().getState(new ValueStateDescriptor<Long>("count", LongSerializer.INSTANCE, 0L));

 

1. 每个state需要一个标识,ValueStateDescriptor,包含唯一名字,Class,和default值

public ValueStateDescriptor(String name, Class<T> typeClass, T defaultValue)

 

2. getState,向stateBackend注册keyed state,

StreamingRuntimeContext
    public <T> ValueState<T> getState(ValueStateDescriptor<T> stateProperties) {
        KeyedStateStore keyedStateStore = checkPreconditionsAndGetKeyedStateStore(stateProperties);
        stateProperties.initializeSerializerUnlessSet(getExecutionConfig());
        return keyedStateStore.getState(stateProperties);
    }

 

调用keyedStateStore.getState(stateProperties)

KeyedStateStore其实就是KeyedStateBackend的封装

public class DefaultKeyedStateStore implements KeyedStateStore {

    private final KeyedStateBackend<?> keyedStateBackend;
    private final ExecutionConfig executionConfig;

    @Override
    public <T> ValueState<T> getState(ValueStateDescriptor<T> stateProperties) {
        try {
            stateProperties.initializeSerializerUnlessSet(executionConfig);
            return getPartitionedState(stateProperties);
        } catch (Exception e) {
            throw new RuntimeException("Error while getting state", e);
        }
    }

最终是调用到,keyedStateBackend

   private <S extends State> S getPartitionedState(StateDescriptor<S, ?> stateDescriptor) throws Exception {
        return keyedStateBackend.getPartitionedState(
                VoidNamespace.INSTANCE,
                VoidNamespaceSerializer.INSTANCE,
                stateDescriptor);
    }

 

AbstractKeyedStateBackend
   public <N, S extends State> S getPartitionedState(
            final N namespace,
            final TypeSerializer<N> namespaceSerializer,
            final StateDescriptor<S, ?> stateDescriptor) throws Exception {

        final S state = getOrCreateKeyedState(namespaceSerializer, stateDescriptor);
        final InternalKvState<N> kvState = (InternalKvState<N>) state;

        return state;
    }

 

getOrCreateKeyedState

    public <N, S extends State, V> S getOrCreateKeyedState(
            final TypeSerializer<N> namespaceSerializer,
            StateDescriptor<S, V> stateDescriptor) throws Exception {

        InternalKvState<?> existing = keyValueStatesByName.get(stateDescriptor.getName());
        if (existing != null) {
            @SuppressWarnings("unchecked")
            S typedState = (S) existing;
             return typedState;  //如果keyValueStatesByName有直接返回
        }

        // create a new blank key/value state
        S state = stateDescriptor.bind(new StateBinder() {
            @Override
            public <T> ValueState<T> createValueState(ValueStateDescriptor<T> stateDesc) throws Exception {
                return AbstractKeyedStateBackend.this.createValueState(namespaceSerializer, stateDesc);
            }
        });

        InternalKvState<N> kvState = (InternalKvState<N>) state;
        keyValueStatesByName.put(stateDescriptor.getName(), kvState); //把新产生的state注册到keyValueStatesByName

 

3. ValueState读写,value,update

 

看下ValueState的定义,

HeapValueState
public class HeapValueState<K, N, V>
        extends AbstractHeapState<K, N, V, ValueState<V>, ValueStateDescriptor<V>>
        implements InternalValueState<N, V> {

    /**
     * Creates a new key/value state for the given hash map of key/value pairs.
     *
     * @param stateDesc The state identifier for the state. This contains name
     *                           and can create a default state value.
     * @param stateTable The state tab;e to use in this kev/value state. May contain initial state.
     */
    public HeapValueState(
            ValueStateDescriptor<V> stateDesc,
            StateTable<K, N, V> stateTable,
            TypeSerializer<K> keySerializer,
            TypeSerializer<N> namespaceSerializer) {
        super(stateDesc, stateTable, keySerializer, namespaceSerializer);
    }

    @Override
    public V value() {
        final V result = stateTable.get(currentNamespace);

        if (result == null) {
            return stateDesc.getDefaultValue();
        }

        return result;
    }

    @Override
    public void update(V value) {

        if (value == null) {
            clear();
            return;
        }

        stateTable.put(currentNamespace, value);
    }
}

 

都是通过StateTable,

CopyOnWriteStateTable
    @Override
    public S get(N namespace) {
        return get(keyContext.getCurrentKey(), namespace);
    }

    @Override
    public boolean containsKey(N namespace) {
        return containsKey(keyContext.getCurrentKey(), namespace);
    }

    @Override
    public void put(N namespace, S state) {
        put(keyContext.getCurrentKey(), namespace, state);
    }

可以看到value不光是记录一个value,而是记录key,namespace,value的关系

其中key是通过,keyContext.getCurrentKey()去到的

 

keyContext就是KeyedStateBackend

在StreamInputProcessor.processInput的时候,会通过

streamOperator.setKeyContextElement1(record);

把当前的key设置到KeyedStateBackend

 

这就是为何,对state的操作都是按key隔离开的

posted on 2017-09-28 16:52  fxjwind  阅读(1157)  评论(0编辑  收藏  举报