H2 database研究

最近自己一直在优化组里的一个java written的daemon。比较有意思的一件事情是，之前开发的哥们用了一个叫H2的open source in-memory database，负责存储程序运行时的数据并且提供快速查询的可行性，可惜当数据量相当大的时候（10个左右的字段，10左右的索引，大约1million数据行），H2的插入性能就变得很糟糕，考虑到之前都没有接触过这种小巧的in-memory database，多点深入的理解总是好的。

H2的总特点：open source, written in Java(有空可以去看看source code), in-memory, two modes "embedded mode and server mode".

关于embedded mode, 貌似是说只要把h2.jar放到class path中，然后使用H2的JDBC driver设置好h2 db的connection string，程序启动时就会创建一个本地的数据库文件，相当于这个数据库是和程序嵌入在一起的，使用同一个JVM。对于H2的server mode，其用途是允许remote connect(JDBC或者ODBC via TCP/IP)。在server mode的情形中，服务器端需要启动H2的server程序，然后等待不同程序的连接。而H2的Driver则可以通过链接字符串中的前缀来分辨不同的mode http://h2database.com/html/features.html#database_url，在使用server mode的时候需要在服务器端启动数据库server程序，否则会抛出connection refuse的异常。考虑到server mode相对较为繁琐，H2引入了一种新的mix mode，只要在连接字符串中显示说明AUTO_SERVER = true，这样在有多进程连接的情况时第一个进程自动启动server mode，非常方便。

关于多线程和多连接，cache：

Multiple Connections to the Same Database: Client/Server

If you want to access the same database at the same time from different processes or computers, you need to use the client / server mode. In this case, one process acts as the server, and the other processes (that could reside on other computers as well) connect to the server via TCP/IP (or SSL/TLS over TCP/IP for improved security).

这段话的意思如果不同进程同时连接H2，那么需要使用server mode，其中一个进程会成为server端（估计是run在同一个JVM的进程），其他进程需要通过远程协议登陆。

Multithreading Support

This database is multithreading-safe. That means, if an application is multi-threaded, it does not need to worry about synchronizing access to the database. Internally, most requests to the same database are synchronized. That means an application can use multiple threads that access the same database at the same time, however if one thread executes a long running query, the other threads need to wait.

An application should normally use one connection per thread. This database synchronizes access to the same connection, but other databases may not do this.

数据库是线程安全的，因为大部分的内部请求是串行操作的，如果一个线程的堵塞会影响其他线程的请求。当然数据库内部也可以启用多线程模式，允许内部数据库操作并行运行，不过需要自行调试其性能。

Cache Settings

The database keeps most frequently used data in the main memory. The amount of memory used for caching can be changed using the setting CACHE_SIZE. This setting can be set in the database connection URL (jdbc:h2:~/test;CACHE_SIZE=131072), or it can be changed at runtime using SET CACHE_SIZE size. The size of the cache, as represented by CACHE_SIZE is measured in KB, with each KB being 1024 bytes. This setting has no effect for in-memory databases. For persistent databases, the setting is stored in the database and re-used when the database is opened the next time. However, when opening an existing database, the cache size is set to at most half the amount of memory available for the virtual machine (Runtime.getRuntime().maxMemory()), even if the cache size setting stored in the database is larger; however the setting stored in the database is kept. Setting the cache size in the database URL or explicitly using SET CACHE_SIZE overrides this value (even if larger than the physical memory). To get the current used maximum cache size, use the query SELECT * FROM INFORMATION_SCHEMA.SETTINGS WHERE NAME = 'info.CACHE_MAX_SIZE'

An experimental scan-resistant cache algorithm "Two Queue" (2Q) is available. To enable it, append;CACHE_TYPE=TQ to the database URL. The cache might not actually improve performance. If you plan to use it, please run your own test cases first.

Also included is an experimental second level soft reference cache. Rows in this cache are only garbage collected on low memory. By default the second level cache is disabled. To enable it, use the prefix SOFT_. Example:jdbc:h2:~/test;CACHE_TYPE=SOFT_LRU. The cache might not actually improve performance. If you plan to use it, please run your own test cases first.

To get information about page reads and writes, and the current caching algorithm in use, call SELECT * FROM INFORMATION_SCHEMA.SETTINGS. The number of pages read / written is listed.

对于in-memory mode，cache size没有任何作用（因为所有的row已经全部在数据库中）。对于persistence mode，cache size决定了停留在main memory中的数据量。131072->128MB。

posted on 2012-11-09 16:34 梁霄阅读(1329) 评论(0) 收藏举报

刷新页面返回顶部

工作的那些事儿，小梁的blog

H2 database研究

导航

公告