TensorFlow Serving在Kubernetes中怎么配置

170次阅读

共计 7729 个字符，预计需要花费 20 分钟才能阅读完成。

本篇内容介绍了“TensorFlow Serving 在 Kubernetes 中怎么配置”的有关知识，在实际案例的操作过程中，不少人都会遇到这样的困境，接下来就让丸趣 TV 小编带领大家学习一下如何处理这些情况吧！希望大家仔细阅读，能够学有所成！

关于 TensorFlow Serving

下面是 TensorFlow Serving 的架构图：

关于 TensorFlow Serving 的更多基础概念等知识，请看官方文档，翻译的再好也不如原文写的好。

这里，我总结了下面一些知识点，我认为是比较重要的：

TensorFlow Serving 通过 Model Version Policy 来配置多个模型的多个版本同时 serving；

默认只加载 model 的 latest version；

支持基于文件系统的模型自动发现和加载；

请求处理延迟低；

无状态，支持横向扩展；

可以使用 A / B 测试不同 Version Model；

支持从本地文件系统扫描和加载 TensorFlow 模型；

支持从 HDFS 扫描和加载 TensorFlow 模型；

提供了用于 client 调用的 gRPC 接口；

TensorFlow Serving 配置

当我翻遍整个 TensorFlow Serving 的官方文档，我还是没找到一个完整的 model config 是怎么配置的，很沮丧。没办法，发展太快了，文档跟不上太正常，只能撸代码了。

在 model_servers 的 main 方法中，我们看到 tensorflow_model_server 的完整配置项及说明如下：

tensorflow_serving/model_servers/main.cc#L314
int main(int argc, char** argv) {
 std::vector tensorflow::Flag  flag_list = { tensorflow::Flag( port ,  port,  port to listen on),
 tensorflow::Flag(enable_batching ,  enable_batching,  enable batching),
 tensorflow::Flag( batching_parameters_file ,  batching_parameters_file,
  If non-empty, read an ascii BatchingParameters  
  protobuf from the supplied file name and use the  
  contained values instead of the defaults. ),
 tensorflow::Flag( model_config_file ,  model_config_file,
  If non-empty, read an ascii ModelServerConfig  
  protobuf from the supplied file name, and serve the  
  models in that file. This config file can be used to  
  specify multiple models to serve and other advanced  
  parameters including non-default version policy. (If  
  used, --model_name, --model_base_path are ignored.) ),
 tensorflow::Flag( model_name ,  model_name,
  name of model (ignored  
  if --model_config_file flag is set ),
 tensorflow::Flag( model_base_path ,  model_base_path,
  path to export (ignored if --model_config_file flag  
  is set, otherwise required) ),
 tensorflow::Flag( file_system_poll_wait_seconds ,
  file_system_poll_wait_seconds,
  interval in seconds between each poll of the file  
  system for new model version ),
 tensorflow::Flag( tensorflow_session_parallelism ,
  tensorflow_session_parallelism,
  Number of threads to use for running a  
  Tensorflow session. Auto-configured by default. 
  Note that this option is ignored if  
  --platform_config_file is non-empty. ),
 tensorflow::Flag( platform_config_file ,  platform_config_file,
  If non-empty, read an ascii PlatformConfigMap protobuf  
  from the supplied file name, and use that platform  
  config instead of the Tensorflow platform. (If used,  
  --enable_batching is ignored.) )};
}

因此，我们看到关于 model version config 的配置，全部在 –model_config_file 中进行配置，下面是 model config 的完整结构：

tensorflow_serving/config/model_server_config.proto#L55
// Common configuration for loading a model being served.
message ModelConfig {
 // Name of the model.
 string name = 1;
 // Base path to the model, excluding the version directory.
 // E.g  for a model at /foo/bar/my_model/123, where 123 is the version, the
 // base path is /foo/bar/my_model.
 //
 // (This can be changed once a model is in serving, *if* the underlying data
 // remains the same. Otherwise there are no guarantees about whether the old
 // or new data will be used for model versions currently loaded.)
 string base_path = 2;
 // Type of model.
 // TODO(b/31336131): DEPRECATED. Please use  model_platform  instead.
 ModelType model_type = 3 [deprecated = true];
 // Type of model (e.g.  tensorflow).
 //
 // (This cannot be changed once a model is in serving.)
 string model_platform = 4;
 reserved 5;
 // Version policy for the model indicating how many versions of the model to
 // be served at the same time.
 // The default option is to serve only the latest version of the model.
 //
 // (This can be changed once a model is in serving.)
 FileSystemStoragePathSourceConfig.ServableVersionPolicy model_version_policy =
 7;
 // Configures logging requests and responses, to the model.
 //
 // (This can be changed once a model is in serving.)
 LoggingConfig logging_config = 6;
}

我们看到了 model_version_policy，那便是我们要找的配置, 它的定义如下：

tensorflow_serving/sources/storage_path/file_system_storage_path_source.proto
message ServableVersionPolicy {
 // Serve the latest versions (i.e. the ones with the highest version
 // numbers), among those found on disk.
 //
 // This is the default policy, with the default number of versions as 1.
 message Latest { // Number of latest versions to serve. (The default is 1.)
 uint32 num_versions = 1;
 }
 // Serve all versions found on disk.
 message All { }
 // Serve a specific version (or set of versions).
 //
 // This policy is useful for rolling back to a specific version, or for
 // canarying a specific version while still serving a separate stable
 // version.
 message Specific {
 // The version numbers to serve.
 repeated int64 versions = 1;
 }
}

因此 model_version_policy 目前支持三种选项：

all: {} 表示加载所有发现的 model；

latest: {num_versions: n} 表示只加载最新的那 n 个 model，也是默认选项；

specific: {versions: m} 表示只加载指定 versions 的 model，通常用来测试；

因此，通过 tensorflow_model_server —port=9000 —model_config_file= file 启动时，一个完整的 model_config_file 格式可参考如下：

model_config_list: {
 config: {
 name:  mnist ,
 base_path:  /tmp/monitored/_model ,mnist
 model_platform:  tensorflow ,
 model_version_policy: { all: {}
 config: {
 name:  inception ,
 base_path:  /tmp/monitored/inception_model ,
 model_platform:  tensorflow ,
 model_version_policy: {
  latest: {
   num_versions: 2
  }
 config: {
 name:  mxnet ,
 base_path:  /tmp/monitored/mxnet_model ,
 model_platform:  tensorflow ,
 model_version_policy: {
  specific: {
   versions: 1
  }
}

TensorFlow Serving 编译

其实 TensorFlow Serving 的编译安装，在 github setup 文档中已经写的比较清楚了，在这里我只想强调一点，而且是非常重要的一点, 就是文档中提到的：

Optimized build
It s possible to compile using some platform specific instruction sets (e.g. AVX) that can significantly improve performance. Wherever you see  bazel build  in the documentation, you can add the flags -c opt --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-O3 (or some subset of these flags). For example:
bazel build -c opt --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-O3 tensorflow_serving/...
Note: These instruction sets are not available on all machines, especially with older processors, so it may not work with all flags. You can try some subset of them, or revert to just the basic  -c opt  which is guaranteed to work on all machines.

这很重要，开始的时候我们并没有加上对应的 copt 选项进行编译，测试发现这样编译出来的 tensorflow_model_server 的性能是很差的（至少不能满足我们的要求），client 并发请求 tensorflow serving 的延迟很高 (基本上所有请求延迟都大于 100ms)。加上这些 copt 选项时，对同样的 model 进行同样并发测试，结果 99.987% 的延迟都在 50ms 以内，对比悬殊。

关于使用 –copt=O2 还是 O3 及其含义，请看 gcc optimizers 的说明，这里不作讨论。（因为我也不懂 …）

那么，是不是都是按照官方给出的一模一样的 copt 选项进行编译呢？答案是否定的！这取决于你运行 TensorFlow Serving 的服务器的 cpu 配置，通过查看 /proc/cpuinfo 可知道你该用的编译 copt 配置项：

使用注意事项

由于 TensorFlow 支持同时 serve 多个 model 的多个版本，因此建议 client 在 gRPC 调用时尽量指明想调用的 model 和 version，因为不同的 version 对应的 model 不同，得到的预测值也可能大不相同。

将训练好的模型复制导入到 model base path 时，尽量先压缩成 tar 包，复制到 base path 后再解压。因为模型很大，复制过程需要耗费一些时间，这可能会导致导出的模型文件已复制，但相应的 meta 文件还没复制，此时如果 TensorFlow Serving 开始加载这个模型，并且无法检测到 meta 文件，那么服务器将无法成功加载该模型，并且会停止尝试再次加载该版本。

如果你使用的 protobuf version = 3.2.0, 那么请注意 TensorFlow Serving 只能加载不超过 64MB 大小的 model。可以通过命令 pip list | grep proto 查看到 probtobuf version。我的环境是使用 3.5.0 post1，不存在这个问题，请你留意。更多请查看 issue 582。

官方宣称支持通过 gRPC 接口动态更改 model_config_list, 但实际上你需要开发 custom resource 才行，意味着不是开箱即用的。可持续关注 issue 380。

TensorFlow Serving on Kubernetes

将 TensorFlow Serving 以 Deployment 方式部署到 Kubernetes 中，下面是对应的 Deployment yaml：

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
 name: tensorflow-serving
spec:
 replicas: 1
 template:
 metadata:
 labels:
 app:  tensorflow-serving 
 spec:
 restartPolicy: Always
 imagePullSecrets:
 - name: harborsecret
 containers:
 - name: tensorflow-serving
 image: registry.vivo.xyz:4443/bigdata_release/tensorflow_serving1.3.0:v0.5
 command: [/bin/sh ,  -c , export CLASSPATH=.:/usr/lib/jvm/java-1.8.0/lib/tools.jar:$(/usr/lib/hadoop-2.6.1/bin/hadoop classpath --glob); /root/tensorflow_model_server --port=8900 --model_name=test_model --model_base_path=hdfs://xx.xx.xx.xx:zz/data/serving_model ]
 ports:
 - containerPort: 8900

“TensorFlow Serving 在 Kubernetes 中怎么配置”的内容就介绍到这里了，感谢大家的阅读。如果想了解更多行业相关的知识可以关注丸趣 TV 网站，丸趣 TV 小编将为大家输出更多高质量的实用文章！

正文完