SciTech-BigDataAIML-NLP-TensorFlow-KerasNLP-预处理: tensorflow.keras.preprocess.{text, sequence}

KerasNLP API reference

The easiest way to get started processing text in TensorFlow is to use KerasNLP, a natural language processing library that provides modular components with state-of-the-art preset weights and architectures. You can use KerasNLP components out-of-the-box or customize them as needed. KerasNLP emphasizes in-graph computation for all workflows, so you can expect easy productionization using the TensorFlow ecosystem.

To install KerasNLP, see Installation.

TensorFlow 的“新版”API如下：
https://tensorflow.google.cn/text/api_overview?hl=en
https://tensorflow.google.cn/text/api_docs/python/text/Splitter?hl=en

如下的“旧版”API已经为“新版”取代；
tensorflow.keras.preprocess.{text, sequence}
from tensorflow.keras.preprocess import text.Tokenizer as txtTok
from tensorflow.keras.preprocess import sequence.pad_sequences as seqPad

如上示例：
TensorFlow 的 keras.preprocess 集合包，
有 text 与 sequence 等多种类型的分包，适用于预处理对应类型的数据。

但是NLP的原理是通用的；大体上：
text -Tokenizer -> token sequence
sequence -padding/truncating-> RaggedTensor 或 RegularizedMatrixs

posted @ 2024-04-11 09:01 abaelhe 阅读(18) 评论(0) 收藏举报

刷新页面返回顶部

abaelhe

SciTech-BigDataAIML-NLP-TensorFlow-KerasNLP-预处理: tensorflow.keras.preprocess.{text, sequence}

公告