apache ignite系列(八):问题汇总

目录

1,java.lang.ClassNotFoundException Unknown pair

1.Please try to turn on isStoreKeepBinary in cache settings - like this; please note the last line:

down vote
accepted
Please try to turn on isStoreKeepBinary in cache settings - like this; please note the last line:

if (persistence){
// Configuring Cassandra's persistence
DataSource dataSource = new DataSource();
// ...here go the rest of your settings as they appear now...
configuration.setWriteBehindEnabled(true);

    configuration.setStoreKeepBinary(true);
}

This setting forces Ignite to avoid binary deserialization when working with underlying cache store.

2.I can reproduce it when, in loadCaches(), I put something that isn't exactly the expected Item in the cache:
private void loadCache(IgniteCache<Integer, Item> cache, /* Ignite.binary() */ IgniteBinary binary) {
// Note the absence of package name here:
BinaryObjectBuilder builder = binary.builder("Item");
builder.setField("name", "a");
builder.setField("brand", "B");
builder.setField("type", "c");
builder.setField("manufacturer", "D");
builder.setField("description", "e");
builder.setField("itemId", 1);
参考链接:

http://apache-ignite-users.70518.x6.nabble.com/ClassNotFoundException-with-affinity-run-td5359.html

https://stackoverflow.com/questions/44781672/apache-ignite-java-lang-classnotfoundexception-unknown-pair#

https://stackoverflow.com/questions/47502111/apache-ignite-ignitecheckedexception-unknown-pair#

2,java.lang.IndexOutOfBoundsException + Failed to wait for completion of partition map exchange

异常描述:

2018-06-06 14:24:02.932 ERROR 17364 --- [ange-worker-#42] .c.d.d.p.GridDhtPartitionsExchangeFuture : Failed to reinitialize local partitions (preloading will be stopped): 
	...
java.lang.IndexOutOfBoundsException: index 678
	...	org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279) [ignite-core-2.3.0.jar:2.3.0]
	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) [ignite-core-2.3.0.jar:2.3.0]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_73]

2018-06-06 14:24:02.932  INFO 17364 --- [ange-worker-#42] .c.d.d.p.GridDhtPartitionsExchangeFuture : Finish exchange future [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], resVer=null, err=java.lang.IndexOutOfBoundsException: index 678]
2018-06-06 14:24:02.941 ERROR 17364 --- [ange-worker-#42] .i.p.c.GridCachePartitionExchangeManager : Failed to wait for completion of partition map exchange (preloading will not start): GridDhtPartitionsExchangeFuture 
...
org.apache.ignite.IgniteCheckedException: index 678
	at org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7252) ~[ignite-core-2.3.0.jar:2.3.0]
	....
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279) ~[ignite-core-2.3.0.jar:2.3.0]
	... 2 common frames omitted

出现这个情况的原因如下:

如果定义的缓存类型是REPLICATED模式,并且开启了持久化,后面将其改为PARTITIONED模式,并导入数据,后续重启的时候就会报这个错误。

比如下面这种情况:

default-config.xml

        <property name="cacheConfiguration">
            <list>
                <bean class="org.apache.ignite.configuration.CacheConfiguration">
                  ...
                    <property name="name" value="Test"/>
                    <property name="atomicityMode" value="ATOMIC"/>
                    <property name="cacheMode" value="REPLICATED"/>
                  ...
                </bean>
            </list>
        </property>
        ignite.destroyCache("Test");
        IgniteCache<Long, CommRate> cache = ignite.getOrCreateCache("Test");

当重新启动的时候,default-config.xml中的配置先生效,所以会出现这个问题。

解决办法就是在持久化模式下不要更改缓存模式,或者不要在配置文件中预定义缓存类型。

I can't reproduce your case. But the issue could occur if you had a REPLICATED cache and after some time changed it to PARTITIONED and for example call to getOrCreateCache keeping old cache name.

参考链接:

http://apache-ignite-users.70518.x6.nabble.com/Weird-index-out-bound-Exception-td14905.html

3,Failed to find SQL table for type xxxx

导入数据有误,将该cache destroy掉重新导入.

4, ignite消息机制出现重复消息并且按执行次数递增

ignite消息机制出现重复消息并且按执行次数递增的原因是添加了多次监听器。

针对相同主题的remoteListen和localListen都只应该执行一次,不然每重复执行一次就会多增加一个监听器,
然后表现出的现象就像是消息按执行次数重复发。

    private AtomicBoolean rmtMsgInit = new AtomicBoolean(false);
    private AtomicBoolean localMsgInit = new AtomicBoolean(false);
    @RequestMapping("/msgTest")
    public @ResponseBody
    String orderedMsg(HttpServletRequest request, HttpServletResponse response) {
        /***************************remote message****************************/
        IgniteMessaging rmtMsg = ignite.message(ignite.cluster().forRemotes());

        /**相同的消息监听只能设置一次,不然会出现接收到重复消息,并且按次数递增*/
        if(!rmtMsgInit.get()) {
            rmtMsg.remoteListen("MyOrderdTopic", (nodeId, msg) -> {
                System.out.println("Received ordered message [msg=" + msg +", from=" + nodeId + "]");
                return true;
            });
            rmtMsgInit.set(true);
        }

        rmtMsg.send("MyOrderdTopic", UUID.randomUUID().toString());
//        for (int i=0; i < 10; i++) {
//            rmtMsg.sendOrdered("MyOrderdTopic", Integer.toString(i), 0);
//            rmtMsg.send("MyOrderdTopic", Integer.toString(i));
//        }


        /***************************local message****************************/
        IgniteMessaging localMsg = ignite.message(ignite.cluster().forLocal());

        /**相同的消息监听只能设置一次,不然会出现接收到重复消息,并且按次数递增*/
        if(!localMsgInit.get()){
            localMsg.localListen("localTopic", (nodeId, msg) -> {
                System.out.println(String.format("Received local message [msg=%s, from=%s]", msg, nodeId));
                return true;
            });
            localMsgInit.set(true);
        }

        localMsg.send("localTopic", UUID.randomUUID().toString());

        return "executed!";
    }

5,ignite远程执行(remote)之类的操作控制台无打印

一般在ignite.cluster().forRemotes()远程执行相关的操作的时候,程序可能会在其他节点执行,
所以打印的日志和输出也会在节点上输出,而程序终端不一定会有输出。
例如:

    IgniteMessaging rmtMsg = ignite.message(ignite.cluster().forRemotes());

    rmtMsg.remoteListen("MyOrderdTopic", (nodeId, msg) -> {
        System.out.println("Received ordered message [msg=" + msg +", from=" + nodeId + "]");
        return true;
    });

如果想在程序端看到效果,可以使用本地模式:
IgniteMessaging.localListen
ignite.events().localListen

6,ignite持久化占用磁盘空间过大

wal日志机制

增加如下配置,修改wal日志同步频率

        <!-- Redefining maximum memory size for the cluster node usage. -->
        <property name="dataStorageConfiguration">
            <bean class="org.apache.ignite.configuration.DataStorageConfiguration">

		...

                <!--Checkpointing frequency which is a minimal interval when the dirty pages will be written to the Persistent Store.-->
                <property name="checkpointFrequency" value="180000"/>

                <!-- Number of threads for checkpointing.-->
                <property name="checkpointThreads" value="4"/>

                <!-- Number of checkpoints to be kept in WAL after checkpoint is finished.-->
                <property name="walHistorySize" value="20"/>

		...
            </bean>
        </property>

7,java.lang.ClassCastException org.cord.xxx cannot be cast to org.cord.xxx

java.lang.ClassCastException org.cord.ignite.data.domain.Student cannot be cast to org.cord.ignite.data.domain.Student

在从ignite中查询缓存的时候出现该异常,明明是相同的类,但是却无法接收获取的缓存对象:

        IgniteCache<Long, Student> cache = ignite.cache(CacheKeyConstant.STUDENT);
        Student student = cache.get(1L);

于是使用instanceof进行分析:

cache.get(1L) instanceof Student返回false

说明从ignite中返回的对象不是Student的实例,但是debug看类的属性都是相同的,那么只有一种可能,ignite中查询出来的对象用的Student和当前接收结果的Student使用的类加载器不同。

于是查看两者的类加载器:

cache.get(1L).getClass().getClassLoader()
=> AppClassLoader

Student.class.getClassLoader()
=> RestartClassLoader

果然,两个类的类加载器不同,经过度娘,RestartClassLoader是spring-boot-devtools热部署插件使用的类加载器。问题找到了,这样就好办了,去掉spring-boot-devtools的依赖后即可。

8,Ignite持久化情况下使用SqlFieldQuery查询数据中文乱码

普通模式正常,而开启持久化之后,如果是使用SqlQuery查询的结果是对象,数据不乱码(有反序列化),但是如果是使用SqlFieldQuery则出现乱码。持久化是将内存的数据持久化到磁盘,这说明可能跟文件的编码有关,于是打印一下每个节点的文件编码:System.getProperty("file.encoding"),结果发现持久化的节点的编码为 gb18030,设置file.encoding=UTF-8之后,重新导入数据再查询,不再出现乱码情况了。

通过设置环境变量 JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8 即可

9,[Ignite2.7] java.lang.IllegalAccessError: tried to access field org.h2.util.LocalDateTimeUtils.LOCAL_DATE from class org.apache.ignite.internal.processors.query.h2.H2DatabaseType

这是h2兼容问题导致的,使用最新的h2版本即可

      <dependency>
          <groupId>org.apache.ignite</groupId>
          <artifactId>ignite-indexing</artifactId>
          <version>${ignite.version}</version>
          <exclusions>
              <exclusion>
                  <groupId>com.h2database</groupId>
                  <artifactId>h2</artifactId>
              </exclusion>
          </exclusions>
      </dependency>
      <dependency>
          <groupId>com.h2database</groupId>
          <artifactId>h2</artifactId>
          <version>1.4.197</version>
      </dependency>

10, Failed to serialize object...Failed to write field...Failed to marshal object with optimized marshaller 分布式计算无法传播到其它节点

具体报错信息如下:

o.a.i.i.m.d.GridDeploymentLocalStore     : Class locally deployed: class org.cord.ignite.controller.ComputeTestController
2018-12-20 21:13:05.398 ERROR 16668 --- [nio-8080-exec-1] o.a.i.internal.binary.BinaryContext      : Failed to serialize object [typeName=o.a.i.i.worker.WorkersRegistry]
org.apache.ignite.binary.BinaryObjectException: Failed to write field [name=registeredWorkers]  at org.apache.ignite.internal.binary.BinaryFieldAccessor.write(BinaryFieldAccessor.java:164) [ignite-core-2.7.0.jar:2.7.0]
    ...
Caused by: org.apache.ignite.binary.BinaryObjectException: Failed to marshal object with optimized marshaller: {...}
Caused by: org.apache.ignite.IgniteCheckedException: Failed to serialize object: {...}
Caused by: java.io.IOException: Failed to serialize object [typeName=java.util.concurrent.ConcurrentHashMap]
Caused by: java.io.IOException: java.io.IOException: Failed to serialize object 
...
Caused by: java.io.IOException: Failed to serialize object [typeName=java.util.ArrayDeque]
Caused by: java.io.IOException: java.lang.NullPointerException
...

在分布式计算类中如果含有特殊注入的bean的话会导致分布式计算传播异常,例如下面这样:

...
@Autowired
private IgniteConfiguration igniteCfg;

String broadcastTest() {
        IgniteCompute compute = ignite.compute();
        compute.broadcast(() -> System.out.println("Hello Node: " + ignite.cluster().localNode().id()));
        return "all executed.";
    }   
    

这些bean是无法被传播的,所以在分布式计算类中 除了ignite实例注入,最好不要随便注入其它的bean,如果是更复杂的场景可以考虑服务网格;

11,WARNING: Exception during batch send on streamed connection close; java.sql.BatchUpdateException: class org.apache.ignite.IgniteCheckedException: Data streamer has been closed

ignite jdbc在进行批量插入操作的时候,如果重复打开流或者流不是顺序模式容易出现这个错误。解决办法:在创建jdbc connection的时候设置打开流;开启流的时候设置为顺序模式: SET STREAMING ON ORDERED

String url = "jdbc:ignite:thin://127.0.0.1/";
String[] sqls = new String[]{};
Properties properties = new Properties();
properties.setProperty(IgniteJdbcDriver.PROP_STREAMING, "true");
properties.setProperty(IgniteJdbcDriver.PROP_STREAMING_ALLOW_OVERWRITE, "true");
try (Connection conn = DriverManager.getConnection(url, properties)){
    Statement statement = conn.createStatement();
    for (String sql : sqls) {
        statement.addBatch(sql);
    }
    statement.executeBatch();
}

参考链接:https://issues.apache.org/jira/browse/IGNITE-10991

http://apache-ignite-users.70518.x6.nabble.com/Data-streamer-has-been-closed-td26521.html


12,java.lang.IllegalArgumentException: Ouch! Argument is invalid: timeout cannot be negative: -2

如果超时参数设置的太大导致溢出,则启动会抛出这个异常。例如像下面这样设置:

            igniteCfg.setFailureDetectionTimeout(Integer.MAX_VALUE);
            igniteCfg.setNetworkTimeout(Long.MAX_VALUE);

13,ddl创建的表怎么进行集群分组

with语句中有个TEMPLATE参数,它既可以简单的指定复制(REPLICATED)和分区(PARTITIONED,也可以指定CacheConfiguration的实例,所以可以将ddl与xml中的cache进行关联即可进行集群分组。但是CacheConfiguration如果添加配置默认会创建一个cache,这时候可以通过在cache name后面加一个*号,这样就不会创建对应的cache,这时候ddl就可以与该配置进行关联,示例:

	    <property name="cacheConfiguration">
            <list>
                <bean class="org.apache.ignite.configuration.CacheConfiguration">
                    <property name="name" value="student*"/>
                    <property name="cacheMode" value="REPLICATED"/>
                    <property name="nodeFilter"> <!--配置节点过滤器-->
                        <bean class="org.cord.ignite.initial.DataNodeFilter"/>
                    </property>
                </bean>
            </list>
        </property>  
CREATE TABLE IF NOT EXISTS PUBLIC.STUDENT (
 STUDID INTEGER,
 NAME VARCHAR,
 EMAIL VARCHAR,
 dob Date,
 PRIMARY KEY (STUDID, NAME))
WITH "template=student,atomicity=ATOMIC,cache_name=student";

14, Failed to communicate with Ignite cluster

瘦客户端(IgniteJdbcThinDriver)并不是线程安全的,如果要使用瘦客户端并发执行sql查询,则需要为每个线程各自创建Connection

参考链接:https://stackoverflow.com/questions/49792329/failed-to-communicate-with-ignite-cluster-while-trying-to-execute-multiple-queri


15,dbeaver关联查询有部分数据关联不到

dbeaver是瘦客户端,如果关联的缓存的模式有是分区模式的,则关联查询需要开启分布式关联,开启方式为在连接url中添加distributedJoins=true的配置,示例:

jdbc:ignite:thin://127.0.0.1:10800;distributedJoins=true

16,WARN [H2TreeIndex] Indexed columns of a row cannot be fully inlined into index what may lead to slowdown due to additional data page reads, increase index inline size if needed

主键的inlineSize怎么指定?

H2TreeIndex.computeInlineSize(List<InlineIndexHelper> inlineIdxs, int cfgInlineSize)

《|》

int confSize = cctx.config().getSqlIndexMaxInlineSize()

private int sqlIdxMaxInlineSize = DFLT_SQL_INDEX_MAX_INLINE_SIZE = -1;

IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT=10

也就是说如果建索引的时候不指定inlinesize的话默认就是10;

recommendedInlineSize计算规则:
H2Tree.inlineSizeRecomendation(SearchRow row)
InlineIndexHelper.inlineSizeOf(Value val)
InlineIndexHelper.InlineIndexHelper(String colName, int type, int colIdx, int sortType, CompareMode compareMode)

通过python计算inlineSize:

import os
import cx_Oracle as oracle
os.environ["NLS_LANG"] = ".UTF8"
db = oracle.connect('cord/123456@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=127.0.0.1)(PORT=1520)))(CONNECT_DATA=(SID=orcl)))')
cursor = db.cursor()

query_index_name="select index_name from ALL_INDEXES where table_name='%s' and index_type='NORMAL' and uniqueness='NONUNIQUE'"
query_index_column="select column_name from all_ind_columns where table_name='%s' and index_name='%s'"
query_index_column_type="select data_type,data_length from all_tab_columns where table_name='%s' and column_name='%s'"

def inlineSizeOf(data_type, data_length):
    if data_type == 'VARCHAR2':
        return data_length + 3
    if data_type == 'DATE':
        return 16+1
    if data_type == 'NUMBER':
        return 8+1
    return -1

def computeInlineSize(tableName):
    table=tableName.upper()
    retmap = {}
    ###查询索引名
    ret = cursor.execute(query_index_name % table).fetchall()
    if len(ret) == 0:
        print("table[%s] not find any normal index" % table)
        return
    ###根据索引名获取索引字段名
    for indexNames in ret:
        # print(indexNames[0])
        indexName = indexNames[0]
        result = cursor.execute(query_index_column % (table, indexName)).fetchall()
        if len(result) == 0:
            print("table[%s] index[%s] not find any column" % (table, indexName))
            continue
        inlineSize=0
        ###根据字段获取字段类型并计算inlineSze
        for columns in result:
            column=columns[0]
            type_ret = cursor.execute(query_index_column_type % (table, column)).fetchall()
            if len(result) == 0:
                print("table[%s] index[%s] column[%s] not find any info" % (table, indexName, column))
                continue
            data_type = type_ret[0][0]
            data_length = type_ret[0][1]
            temp = inlineSizeOf(data_type, data_length)
            if temp == -1:
                print("table[%s] index[%s] column[%s] type[%s] unknown" % (table, indexName, column, data_type))
            inlineSize += inlineSizeOf(data_type, data_length)
        retmap[indexName] = inlineSize
    print(retmap)

if __name__ == '__main__':
    computeInlineSize('PERSON')

17,class org.apache.ignite.spi.IgniteSpiException: Node with the same ID was found in node IDs history or existing node in topology has the same ID (fix configuration and restart local node)

如果是使用社区版(8.7.x,纯内存模式)出现该异常,那么在排除是集群发现配置错误外,极有可能是启动过程耗时导致心跳超时出现该错误,因此通过调大心跳超时时间()可解决该问题:

        <!-- Failure detection timeout used by discovery and communication subsystems -->
        <property name="failureDetectionTimeout" value="60000"/>
posted @ 2018-12-03 22:44  堕落门徒  阅读(4750)  评论(1编辑  收藏  举报