SciTech-BigDataAIML-NLP-TensorFlow-KerasNLP-预处理: tensorflow.keras.preprocess.{text, sequence}
The easiest way to get started processing text in TensorFlow is to use KerasNLP, a natural language processing library that provides modular components with state-of-the-art preset weights and architectures. You can use KerasNLP components out-of-the-box or customize them as needed. KerasNLP emphasizes in-graph computation for all workflows, so you can expect easy productionization using the TensorFlow ecosystem.
To install KerasNLP, see Installation.
TensorFlow 的“新版”API如下:
https://tensorflow.google.cn/text/api_overview?hl=en
https://tensorflow.google.cn/text/api_docs/python/text/Splitter?hl=en
如下的“旧版”API已经为“新版”取代;
tensorflow.keras.preprocess.{text, sequence}
from tensorflow.keras.preprocess import text.Tokenizer as txtTok
from tensorflow.keras.preprocess import sequence.pad_sequences as seqPad
如上示例:
TensorFlow 的 keras.preprocess 集合包,
有 text 与 sequence 等多种类型的分包,适用于预处理对应类型的数据。
但是NLP的原理是通用的;大体上:
text -Tokenizer -> token sequence
sequence -padding/truncating-> RaggedTensor 或 RegularizedMatrixs

浙公网安备 33010602011771号