运用LibreOffice Online与JODConverter实现文档转换

1 前言

在当今数字化办公环境中,文档格式转换与在线协作已成为提升效率的关键环节。LibreOffice Online 作为一款基于 Web 的开源办公套件,能够帮助我们实现文档的在线预览与格式转换,尤其适合需要集成文档处理能力的企业级应用和云服务平台。它支持常见的办公文档格式,如 DOCX、PPTX 和 PDF 等,并提供了基于标准协议的远程调用接口,便于开发者通过程序进行自动化处理。

不过需要注意的是,该技术并不适用于高并发或对实时协作有极高要求的场景,其功能与性能相较于某些商业方案(如 Microsoft Office Online 或 Google Docs)仍有一定差距。但其核心优势在于开源免费、可自主部署和较强的定制化能力,尤其适合对数据隐私有较高要求或希望深度集成的用户群体。

本文将介绍如何基于 Docker 快速部署 LibreOffice Online 服务,并结合 Java 通过 JODConverter 实现远程文档转换,涵盖从环境搭建、配置优化到实际调用的完整流程,为开发者提供一套稳定可行的技术方案。

2 测试环境

推荐使用以下环境进行部署和测试,以保证兼容性和稳定性:

  • java 8+
  • maven 3.3+
  • docker 24

3 实现步骤

3.1 Docker Compose 配置

新建 docker-compose.yml 文件,输入以下内容:

version: '2'
services:
libreoffice:
image: libreoffice:7.1.0
volumes:
- ./volumes/config/loolwsd.xml:/etc/loolwsd/loolwsd.xml
ports:
- "9080:9980"
container_name: libreoffice
environment:
username: root
password: your_passord
DONT_GEN_SSL_CERT: 1
cap_add:
- MKNOD
restart:
unless-stopped

配置说明:

该 Docker Compose 文件用于快速部署 LibreOffice Online 服务。其中指定了使用的镜像版本、配置文件挂载路径、端口映射及环境变量等。DONT_GEN_SSL_CERT 设置为 1 以避免生成 SSL 证书,适用于内部测试环境。cap_add: MKNOD 为容器添加创建特殊设备节点的能力,是 LibreOffice Online 正常工作的必要权限。

3.2 LibreOffice 配置

新建 loolwsd.xml 文件,输入以下内容:

<config>
  <!-- Note: 'default' attributes are used to document a setting's default value as well as to use as fallback. -->
    <!-- Note: When adding a new entry, a default must be set in WSD in case the entry is missing upon deployment. -->
    <allowed_languages desc="List of supported languages of Writing Aids (spell checker, grammar checker, thesaurus, hyphenation) on this instance. Allowing too many has negative effect on startup performance." default="de_DE en_GB en_US es_ES fr_FR it nl pt_BR pt_PT ru">de_DE en_GB en_US es_ES fr_FR it nl pt_BR pt_PT ru zh_CN</allowed_languages>
    <sys_template_path desc="Path to a template tree with shared libraries etc to be used as source for chroot jails for child processes." type="path" relative="true" default="systemplate"></sys_template_path>
    <child_root_path desc="Path to the directory under which the chroot jails for the child processes will be created. Should be on the same file system as systemplate and lotemplate. Must be an empty directory." type="path" relative="true" default="jails"></child_root_path>
    <mount_jail_tree desc="Controls whether the systemplate and lotemplate contents are mounted or not, which is much faster than the default of linking/copying each file." type="bool" default="true"></mount_jail_tree>
    <server_name desc="External hostname:port of the server running loolwsd. If empty, it's derived from the request (please set it if this doesn't work). Must be specified when behind a reverse-proxy or when the hostname is not reachable directly." type="string" default=""></server_name>
    <file_server_root_path desc="Path to the directory that should be considered root for the file server. This should be the directory containing loleaflet." type="path" relative="true" default="loleaflet/../"></file_server_root_path>
    <memproportion desc="The maximum percentage of system memory consumed by all of the LibreOffice Online Personal, after which we start cleaning up idle documents" type="double" default="80.0"></memproportion>
    <num_prespawn_children desc="Number of child processes to keep started in advance and waiting for new clients." type="uint" default="1">1</num_prespawn_children>
        <per_document desc="Document-specific settings, including LO Core settings.">
      <max_concurrency desc="The maximum number of threads to use while processing a document." type="uint" default="4">4</max_concurrency>
      <batch_priority desc="A (lower) priority for use by batch eg. convert-to processes to avoid starving interactive ones" type="uint" default="5">5</batch_priority>
      <document_signing_url desc="The endpoint URL of signing server, if empty the document signing is disabled" type="string" default=""></document_signing_url>
      <redlining_as_comments desc="If true show red-lines as comments" type="bool" default="false">false</redlining_as_comments>
      <idle_timeout_secs desc="The maximum number of seconds before unloading an idle document. Defaults to 1 hour." type="uint" default="3600">3600</idle_timeout_secs>
        <!-- Idle save and auto save are checked every 30 seconds -->
          <!-- They are disabled when the value is zero or negative. -->
          <idlesave_duration_secs desc="The number of idle seconds after which document, if modified, should be saved. Defaults to 30 seconds." type="int" default="30">30</idlesave_duration_secs>
          <autosave_duration_secs desc="The number of seconds after which document, if modified, should be saved. Defaults to 5 minutes." type="int" default="300">300</autosave_duration_secs>
          <always_save_on_exit desc="On exiting the last editor, always perform the save, even if the document is not modified." type="bool" default="false">false</always_save_on_exit>
          <limit_virt_mem_mb desc="The maximum virtual memory allowed to each document process. 0 for unlimited." type="uint">0</limit_virt_mem_mb>
          <limit_stack_mem_kb desc="The maximum stack size allowed to each document process. 0 for unlimited." type="uint">8000</limit_stack_mem_kb>
          <limit_file_size_mb desc="The maximum file size allowed to each document process to write. 0 for unlimited." type="uint">0</limit_file_size_mb>
          <limit_num_open_files desc="The maximum number of files allowed to each document process to open. 0 for unlimited." type="uint">0</limit_num_open_files>
          <limit_load_secs desc="Maximum number of seconds to wait for a document load to succeed. 0 for unlimited." type="uint" default="100">600</limit_load_secs>
          <limit_convert_secs desc="Maximum number of seconds to wait for a document conversion to succeed. 0 for unlimited." type="uint" default="100">600</limit_convert_secs>
              <cleanup desc="Checks for resource consuming (bad) documents and kills associated kit process. A document is considered resource consuming (bad) if is in idle state for idle_time_secs period and memory usage passed limit_dirty_mem_mb or CPU usage passed limit_cpu_per" enable="false">
            <cleanup_interval_ms desc="Interval between two checks" type="uint" default="10000">10000</cleanup_interval_ms>
            <bad_behavior_period_secs desc="Minimum time period for a document to be in bad state before associated kit process is killed. If in this period the condition for bad document is not met once then this period is reset" type="uint" default="60">60</bad_behavior_period_secs>
            <idle_time_secs desc="Minimum idle time for a document to be candidate for bad state" type="uint" default="300">300</idle_time_secs>
            <limit_dirty_mem_mb desc="Minimum memory usage for a document to be candidate for bad state" type="uint" default="3072">3072</limit_dirty_mem_mb>
            <limit_cpu_per desc="Minimum CPU usage for a document to be candidate for bad state" type="uint" default="85">85</limit_cpu_per>
            </cleanup>
          </per_document>
            <per_view desc="View-specific settings.">
          <out_of_focus_timeout_secs desc="The maximum number of seconds before dimming and stopping updates when the browser tab is no longer in focus. Defaults to 120 seconds." type="uint" default="120">120</out_of_focus_timeout_secs>
          <idle_timeout_secs desc="The maximum number of seconds before dimming and stopping updates when the user is no longer active (even if the browser is in focus). Defaults to 15 minutes." type="uint" default="900">900</idle_timeout_secs>
          </per_view>
        <loleaflet_html desc="Allows UI customization by replacing the single endpoint of loleaflet.html" type="string" default="loleaflet.html">loleaflet.html</loleaflet_html>
          <logging>
          <color type="bool">true</color>
          <level type="string" desc="Can be 0-8, or none (turns off logging), fatal, critical, error, warning, notice, information, debug, trace" default="warning">warning</level>
          <protocol type="bool" desc="Enable minimal client-site JS protocol logging from the start">false</protocol>
            <!-- lokit_sal_log example: Log WebDAV-related messages, that is interesting for debugging Insert - Image operation: "+TIMESTAMP+INFO.ucb.ucp.webdav+WARN.ucb.ucp.webdav"
            See also: https://docs.libreoffice.org/sal/html/sal_log.html -->
          <lokit_sal_log type="string" desc="Fine tune log messages from LOKit. Default is to suppress log messages from LOKit." default="-INFO-WARN">-INFO-WARN</lokit_sal_log>
              <file enable="false">
            <property name="path" desc="Log file path.">/var/log/loolwsd.log</property>
            <property name="rotation" desc="Log file rotation strategy. See Poco FileChannel.">never</property>
            <property name="archive" desc="Append either timestamp or number to the archived log filename.">timestamp</property>
            <property name="compress" desc="Enable/disable log file compression.">true</property>
            <property name="purgeAge" desc="The maximum age of log files to preserve. See Poco FileChannel.">10 days</property>
            <property name="purgeCount" desc="The maximum number of log archives to preserve. Use 'none' to disable purging. See Poco FileChannel.">10</property>
            <property name="rotateOnOpen" desc="Enable/disable log file rotation on opening.">true</property>
            <property name="flush" desc="Enable/disable flushing after logging each line. May harm performance. Note that without flushing after each line, the log lines from the different processes will not appear in chronological order.">false</property>
            </file>
            <anonymize>
            <anonymize_user_data type="bool" desc="Enable to anonymize/obfuscate of user-data in logs. If default is true, it was forced at compile-time and cannot be disabled." default="false">false</anonymize_user_data>
            <anonymization_salt type="uint" desc="The salt used to anonymize/obfuscate user-data in logs. Use a secret 64-bit random number." default="82589933">82589933</anonymization_salt>
            </anonymize>
          </logging>
        <loleaflet_logging desc="Logging in the browser console" default="false">false</loleaflet_logging>
            <trace desc="Dump commands and notifications for replay. When 'snapshot' is true, the source file is copied to the path first." enable="false">
          <path desc="Output path to hold trace file and docs. Use '%' for timestamp to avoid overwriting. For example: /some/path/to/looltrace-%.gz" compress="true" snapshot="false"></path>
            <filter>
            <message desc="Regex pattern of messages to exclude"></message>
            </filter>
            <outgoing>
            <record desc="Whether or not to record outgoing messages" default="false">false</record>
            </outgoing>
          </trace>
            <net desc="Network settings">
            <!-- On systems where localhost resolves to IPv6 [::1] address first, when net.proto is all and net.listen is loopback, loolwsd unexpectedly listens on [::1] only.
            You need to change net.proto to IPv4, if you want to use 127.0.0.1. -->
          <proto type="string" default="all" desc="Protocol to use IPv4, IPv6 or all for both">all</proto>
          <listen type="string" default="any" desc="Listen address that loolwsd binds to. Can be 'any' or 'loopback'.">any</listen>
          <service_root type="path" default="" desc="Prefix all the pages, websockets, etc. with this path."></service_root>
          <proxy_prefix type="bool" default="false" desc="Enable a ProxyPrefix to be passed int through which to redirect requests"></proxy_prefix>
              <post_allow desc="Allow/deny client IP address for POST(REST)." allow="true">
            <host desc="The IPv4 private 192.168 block as plain IPv4 dotted decimal addresses.">192\.168\.[0-9]{1,3}\.[0-9]{1,3}</host>
            <host desc="Ditto, but as IPv4-mapped IPv6 addresses">::ffff:192\.168\.[0-9]{1,3}\.[0-9]{1,3}</host>
            <host desc="The IPv4 loopback (localhost) address.">127\.0\.0\.1</host>
            <host desc="Ditto, but as IPv4-mapped IPv6 address">::ffff:127\.0\.0\.1</host>
            <host desc="The IPv6 loopback (localhost) address.">::1</host>
            <host desc="The IPv4 private 172.17.0.0/16 subnet (Docker).">172\.17\.[0-9]{1,3}\.[0-9]{1,3}</host>
            <host desc="Ditto, but as IPv4-mapped IPv6 addresses">::ffff:172\.17\.[0-9]{1,3}\.[0-9]{1,3}</host>
            <host desc="Regex pattern of hostname to allow or deny." allow="true">.*</host>
            </post_allow>
          <frame_ancestors desc="Specify who is allowed to embed the LO Online iframe (loolwsd and WOPI host are always allowed). Separate multiple hosts by space."></frame_ancestors>
          <connection_timeout_secs desc="Specifies the connection, send, recv timeout in seconds for connections initiated by loolwsd (such as WOPI connections)." type="int" default="30"></connection_timeout_secs>
          </net>
            <ssl desc="SSL settings">
          <enable type="bool" desc="Controls whether SSL encryption between browser and loolwsd is enabled (do not disable for production deployment). If default is false, must first be compiled with SSL support to enable." default="true">false</enable>
          <termination desc="Connection via proxy where loolwsd acts as working via https, but actually uses http." type="bool" default="true">false</termination>
          <cert_file_path desc="Path to the cert file" relative="false">/etc/loolwsd/cert.pem</cert_file_path>
          <key_file_path desc="Path to the key file" relative="false">/etc/loolwsd/key.pem</key_file_path>
          <ca_file_path desc="Path to the ca file" relative="false">/etc/loolwsd/ca-chain.cert.pem</ca_file_path>
          <cipher_list desc="List of OpenSSL ciphers to accept" default="ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH"></cipher_list>
              <hpkp desc="Enable HTTP Public key pinning" enable="false" report_only="false">
            <max_age desc="HPKP's max-age directive - time in seconds browser should remember the pins" enable="true">1000</max_age>
            <report_uri desc="HPKP's report-uri directive - pin validation failure are reported at this URL" enable="false"></report_uri>
                <pins desc="Base64 encoded SPKI fingerprints of keys to be pinned">
              <pin></pin>
              </pins>
            </hpkp>
          </ssl>
            <security desc="Altering these defaults potentially opens you to significant risk">
          <seccomp desc="Should we use the seccomp system call filtering." type="bool" default="true">true</seccomp>
          <capabilities desc="Should we require capabilities to isolate processes into chroot jails" type="bool" default="true">true</capabilities>
          </security>
          <watermark>
          <opacity desc="Opacity of on-screen watermark from 0.0 to 1.0" type="double" default="0.2"></opacity>
          <text desc="Watermark text to be displayed on the document if entered" type="string"></text>
          </watermark>
          <welcome>
          <enable type="bool" desc="Controls whether the welcome screen should be shown to the users on new install and updates." default="false">false</enable>
          <enable_button type="bool" desc="Controls whether the welcome screen should have an explanatory button instead of an X button to close the dialog." default="false">false</enable_button>
          <path desc="Path to 'welcome-$lang.html' files served on first start or when the version changes. When empty, defaults to the Release notes." type="path" relative="true" default="loleaflet/welcome"></path>
          </welcome>
          <user_interface>
          <mode type="string" desc="Controls the user interface style (classic|notebookbar)" default="classic">classic</mode>
          </user_interface>
            <storage desc="Backend storage">
            <filesystem allow="false" />
              <wopi desc="Allow/deny wopi storage. Mutually exclusive with webdav." allow="true">
            <host desc="Regex pattern of hostname to allow or deny." allow="true"></host>
            <host desc="Regex pattern of hostname to allow or deny." allow="true">10\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}</host>
            <host desc="Regex pattern of hostname to allow or deny." allow="true">172\.1[6789]\.[0-9]{1,3}\.[0-9]{1,3}</host>
            <host desc="Regex pattern of hostname to allow or deny." allow="true">172\.2[0-9]\.[0-9]{1,3}\.[0-9]{1,3}</host>
            <host desc="Regex pattern of hostname to allow or deny." allow="true">172\.3[01]\.[0-9]{1,3}\.[0-9]{1,3}</host>
            <host desc="Regex pattern of hostname to allow or deny." allow="true">192\.168\.[0-9]{1,3}\.[0-9]{1,3}</host>
            <host desc="Regex pattern of hostname to allow or deny." allow="false">192\.168\.1\.1</host>
            <max_file_size desc="Maximum document size in bytes to load. 0 for unlimited." type="uint">0</max_file_size>
            <reuse_cookies desc="When enabled, cookies from the browser will be captured and set on WOPI requests." type="bool" default="false">false</reuse_cookies>
                <locking desc="Locking settings">
              <refresh desc="How frequently we should re-acquire a lock with the storage server, in seconds (default 15 mins) or 0 for no refresh" type="int" default="900">900</refresh>
              </locking>
            </wopi>
              <webdav desc="Allow/deny webdav storage. Mutually exclusive with wopi." allow="false">
            <host desc="Hostname to allow" allow="false"></host>
            </webdav>
              <ssl desc="SSL settings">
            <as_scheme type="bool" default="true" desc="When set we exclusively use the WOPI URI's scheme to enable SSL for storage">true</as_scheme>
            <enable type="bool" desc="If as_scheme is false or not set, this can be set to force SSL encryption between storage and loolwsd. When empty this defaults to following the ssl.enable setting"></enable>
            <cert_file_path desc="Path to the cert file" relative="false"></cert_file_path>
            <key_file_path desc="Path to the key file" relative="false"></key_file_path>
            <ca_file_path desc="Path to the ca file. If this is not empty, then SSL verification will be strict, otherwise cert of storage (WOPI-like host) will not be verified." relative="false"></ca_file_path>
            <cipher_list desc="List of OpenSSL ciphers to accept. If empty the defaults are used. These can be overriden only if absolutely needed."></cipher_list>
            </ssl>
          </storage>
        <tile_cache_persistent desc="Should the tiles persist between two editing sessions of the given document?" type="bool" default="true">true</tile_cache_persistent>
            <admin_console desc="Web admin console settings.">
          <enable desc="Enable the admin console functionality" type="bool" default="true">true</enable>
          <enable_pam desc="Enable admin user authentication with PAM" type="bool" default="false">false</enable_pam>
          <username desc="The username of the admin console. Ignored if PAM is enabled.">root</username>
          <password desc="The password of the admin console. Deprecated on most platforms. Instead, use PAM or loolconfig to set up a secure password.">xekieSklwWpoexs884s</password>
          </admin_console>
            <monitors desc="Addresses of servers we connect to on start for monitoring">
          </monitors>
        </config>

配置说明:

该 XML 配置文件用于定制 LibreOffice Online 的各项行为,包括日志设置、网络配置、内存限制、文档处理超时等。关键配置项包括设置支持的语言类型、定义系统模板路径、调整文档处理的并发数与超时限制,以及配置网络访问白名单等,这些设置直接影响到服务的稳定性与安全性。

3.3 启动容器

使用 docker compose up -d 启动容器。

3.4 管理界面

访问 http://your_ip:9080/loleaflet/dist/admin/admin.html,输入配置中设定的账号和密码,即可进入管理界面,实时监控服务状态与管理文档会话。

110-01

4 测试

4.1 添加依赖

在项目的 pom.xml 中添加以下依赖:

<dependency>
<groupId>org.jodconverter</groupId>
<artifactId>jodconverter-remote</artifactId>
<version>4.4.6</version>
</dependency>
<dependency>
<groupId>org.jodconverter</groupId>
<artifactId>jodconverter-core</artifactId>
<version>4.4.6</version>
</dependency>

依赖说明:

  • jodconverter-remote 封装了与 LibreOffice Online 服务进行远程通信的能力,提供简洁的 API 实现文档转换。
  • jodconverter-core 是核心功能模块,包含文档转换的基本逻辑与接口定义。

4.2 调用示例

新建 RemoteDemo.java 文件,输入以下内容:

import org.jodconverter.remote.RemoteConverter;
import org.jodconverter.remote.office.RemoteOfficeManager;
import java.io.File;
import java.io.FileInputStream;
public class RemoteDemo
{
public static void main(String[] args) throws Exception {
// 配置 LibreOffice 远程服务地址
final RemoteOfficeManager officeManager = RemoteOfficeManager.builder()
.urlConnection("http://your_ip:9080/")
.taskExecutionTimeout(6000 * 1000L)
.socketTimeout(6000 * 1000L)
.connectTimeout(3 * 1000L)
.build();
// 启动连接
officeManager.start();
// 创建转换器
RemoteConverter converter = RemoteConverter.make(officeManager);
// 中文
File inputFile = new File("./workspace/中文.docx");
File outputFile = new File("./workspace/中文.pdf");
//这是另一种转换方法,这种转换方式为了解决在线转换源文件有中文路径的问题
converter.convert(new FileInputStream(inputFile)).to(outputFile)
.execute();
System.out.println("转换成功!");
officeManager.stop();
}
}

代码说明 :

该示例演示了如何使用 JODConverter 远程调用 LibreOffice Online 完成文档格式转换。通过 RemoteOfficeManager 建立与服务的连接,并设置连接超时、任务执行超时等参数,以适应大文档或网络不佳的情况。使用 FileInputStream 作为输入可有效避免中文路径导致的乱码问题,是一种推荐的最佳实践方式。

5 总结

通过上述步骤,我们可以快速搭建一套基于 LibreOffice Online 的文档转换服务,并利用 JODConverter 在 Java 应用中轻松集成该功能。该方案具备良好的可控性与扩展性,特别适合对数据安全有要求的内网环境或定制化项目中使用。

参考资料

posted @ 2025-09-11 16:57  yjbjingcha  阅读(40)  评论(0)    收藏  举报