{"id":582,"date":"2020-04-01T13:58:26","date_gmt":"2020-04-01T05:58:26","guid":{"rendered":"https:\/\/www.myway5.com\/?p=582"},"modified":"2023-07-14T13:14:21","modified_gmt":"2023-07-14T05:14:21","slug":"gpu-manager-startup","status":"publish","type":"post","link":"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/","title":{"rendered":"[Gaia Scheduler] gpu-manager \u542f\u52a8\u6d41\u7a0b\u5206\u6790"},"content":{"rendered":"<h2>\u6982\u8ff0<\/h2>\n<p>Gaia scheduler \u662f\u817e\u8baf\u5f00\u6e90\u7684\u5728 Kubernetes \u96c6\u7fa4\u4e2d\u505a GPU \u865a\u62df\u5316\u7684\u65b9\u6848\uff0c\u5b9e\u73b0\u4e86\u4e3a\u5bb9\u5668\u5206\u914d\u865a\u62df\u5316 GPU \u8d44\u6e90\u5e76\u52a0\u4ee5\u9650\u5236\uff0c\u5b83\u7684\u6700\u5927\u7684\u4f18\u52bf\u5c31\u662f\u4e0d\u9700\u8981\u7279\u6b8a\u7684\u786c\u4ef6\u652f\u6301\uff0c\u5e76\u4e14\u6027\u80fd\u635f\u8017\u5f88\u5c0f\u3002\u5173\u4e8e\u5b83\u7684\u8bba\u6587\uff0c\u5730\u5740\u5728\u8fd9\u91cc\uff1a<a class=\"wp-editor-md-post-content-link\" href=\"https:\/\/ieeexplore.ieee.org\/document\/8672301\">Gaia Scheduler: A Kubernetes-Based Scheduler Framework<\/a>\u3002\u5982\u679c\u60f3\u8981\u7406\u89e3\u8fd9\u4e2a\u9879\u76ee\uff0c\u5f3a\u70c8\u5efa\u8bae\u5148\u8bfb\u8fd9\u7bc7\u8bba\u6587\u3002<\/p>\n<p>Gaia Scheduler \u53ef\u4ee5\u5206\u4e3a 4 \u4e2a\u7ec4\u4ef6\uff1a<\/p>\n<ul>\n<li>GPU Manager: \u4f5c\u4e3a device plugin \u5411 kubelet \u6ce8\u518c\u3002\u5171\u6ce8\u518c\u4e86\u4e24\u4e2a\u8bbe\u5907\uff0c\u5305\u62ec vcore \u548c vmemory\uff0c\u652f\u6301\u4e24\u79cd\u8ba1\u7b97\u8d44\u6e90\uff1a<code>tencent.com\/vcuda-core<\/code> \u548c <code>tencent.com\/vcuda-memory<\/code>\uff0c\u5206\u522b\u7528\u6765\u505a GPU \u8ba1\u7b97\u8d44\u6e90\u548c GPU \u5185\u5b58\u8d44\u6e90\u7684\u8bf7\u6c42\u548c\u9650\u5236\u3002<\/p>\n<\/li>\n<li>\n<p>GPU Scheduler: \u8fd9\u91cc\u7684 scheduler \u5e76\u4e0d\u662f kubernetes \u7684\u8c03\u5ea6\u5668\uff0c\u662f GPU Manager \u5728\u6536\u5230 kubelet \u7684 Allocate \u8c03\u7528\u540e\uff0c\u5b83\u9700\u6c42\u5c06\u8bbe\u5907\u6302\u8f7d\u7ed9\u5bb9\u5668\u3002\u4e3a\u4e86\u5b9e\u73b0\u6700\u4f73\u7684 GPU \u6302\u8f7d\uff0c\u5c31\u6709\u8fd9\u6837\u4e00\u4e2a\u4e13\u95e8\u7684 Scheduler \u6765\u6839\u636e\u8282\u70b9\u4e0a\u5f53\u524d\u7684 GPU \u62d3\u6251\u548c\u8d44\u6e90\u5360\u7528\u60c5\u51b5\u8fdb\u884c\u8c03\u5ea6\u3002<\/p>\n<\/li>\n<li>\n<p>vGPU Manager: vGPU Manager \u662f\u5177\u4f53\u8d1f\u8d23\u7ba1\u7406\u5bb9\u5668\u7684\u7ec4\u4ef6\uff0c\u5305\u62ec\u76d1\u63a7\u5bb9\u5668\u72b6\u6001\uff0c\u4f20\u9012\u914d\u7f6e\uff0c\u548c\u5bb9\u5668\u5185\u7684vGPU Library\u901a\u4fe1\uff0c\u4ee5\u53ca\u5728\u5bb9\u5668\u6b7b\u4ea1\u540e\u8fdb\u884c\u56de\u6536\u64cd\u4f5c\u3002<\/p>\n<\/li>\n<li>\n<p>vGPU Library: vGPU Library \u867d\u7136\u76f8\u5173\u7684\u4ee3\u7801\u91cf\u4e0d\u591a\uff0c\u4f46\u5b83\u662f Gaia Scheduler \u6700\u91cd\u8981\u7684\u90e8\u5206\u3002\u56e0\u4e3a\u5b83\u662f\u5b9e\u73b0 GPU \u865a\u62df\u5316\u7684\u6838\u5fc3\u3002\u901a\u8fc7\u8986\u76d6\u5bb9\u5668\u4e2d\u7684 LD_LIBRARY_PATH \u4ee5\u53ca\u81ea\u5b9a\u4e49\u4e86 <code>libcuda-control.so<\/code> \u5b9e\u73b0\u5bf9 CUDA API \u7684\u62e6\u622a\u3002<\/p>\n<\/li>\n<\/ul>\n<p>Gaia Scheduler \u4e3b\u8981\u7531\u4e09\u4e2a\u9879\u76ee\u7ec4\u6210: <a class=\"wp-editor-md-post-content-link\" href=\"https:\/\/github.com\/tkestack\/gpu-manager\">gpu-manager<\/a> \u548c <a class=\"wp-editor-md-post-content-link\" href=\"https:\/\/github.com\/tkestack\/vcuda-controller\">vcuda-controller<\/a>\uff0c<a class=\"wp-editor-md-post-content-link\" href=\"https:\/\/github.com\/tkestack\/gpu-admission\">gpu-admission<\/a>\u3002\u4f46\u662f\u8fd9\u91cc\u7684 gpu-manager \u662f Gaia Scheduler \u7684\u4e3b\u8981\u5b9e\u73b0\uff0c\u5305\u542b\u4e86\u4e0a\u8ff0\u7684 4 \u4e2a\u7ec4\u4ef6\uff0cvcuda-controller \u5c31\u662f vGPU Library\uff0c\u5df2\u7ecf\u88ab\u6253\u5305\u5230\u4e86 gpu-manager \u8fd9\u4e2a\u9879\u76ee\u4e2d\u3002gpu-manager \u9700\u8981\u914d\u5408 gpu-admission \u9879\u76ee\u6765\u5b8c\u6210 GPU Scheduler \u7684\u5de5\u4f5c\u3002\u4e0d\u8981\u56e0\u6b64\u4ea7\u751f\u8bef\u89e3\u3002\u4e0b\u6587\u4e2d\u6211\u4eec\u4e3b\u8981\u5c31 gpu-manager \u8fd9\u4e2a\u9879\u76ee\u8fdb\u884c\u5206\u6790\u3002<\/p>\n<h2>\u542f\u52a8\u6d41\u7a0b\u5206\u6790<\/h2>\n<p>gpu-manager \u672c\u8eab\u4e3b\u8981\u4f5c\u4e3a kubernetes \u7684 device plugin \u6765\u5b9e\u73b0\u7684\uff0c\u5b9a\u4e49\u4e86\u4e24\u79cd\u8bbe\u5907: <code>vcuda-core<\/code> \u548c <code>vcuda-memory<\/code>\uff0c\u6211\u4eec\u7684\u5e94\u7528\u901a\u8fc7 pod \u7684\u8d44\u6e90\u5b57\u6bb5\u8fdb\u884c\u7533\u8bf7\uff0c\u7136\u540e kube-scheduler \u4f1a\u6839\u636e\u8282\u70b9\u4e0a\u7684\u8d44\u6e90\u72b6\u6001\u8fdb\u884c\u8c03\u5ea6\u3002\u56e0\u6b64\uff0c\u4f60\u6700\u597d\u8fd8\u9700\u8981\u4e86\u89e3 kubernetes \u7684 device plugin \u7684\u5f00\u53d1\u77e5\u8bc6\u3002\u5173\u4e8e device plugin \u7684\u5f00\u53d1\uff0c\u53ef\u4ee5\u770b\u4e4b\u524d\u7684\u4e00\u7bc7\u6587\u7ae0\uff1a<a class=\"wp-editor-md-post-content-link\" href=\"https:\/\/www.myway5.com\/index.php\/2020\/03\/24\/kubernetes-device-plugin\/\">Kubernetes\u5f00\u53d1\u77e5\u8bc6&#8211;device-plugin\u7684\u5b9e\u73b0<\/a>\u3002<\/p>\n<h3>\u542f\u52a8\u53c2\u6570<\/h3>\n<p>\u5206\u6790\u4e00\u4e2a\u9879\u76ee\u4ece\u542f\u52a8\u53c2\u6570\u5f00\u59cb\uff0c\u53ef\u4ee5\u5e2e\u52a9\u6211\u4eec\u5feb\u901f\u4e86\u89e3\uff1a<\/p>\n<ul>\n<li>driver: \u8fd9\u4e2a\u662f GPU \u7684\u9a71\u52a8\uff0c\u5f53\u524d\u7684\u9ed8\u8ba4\u503c\u662f nvidia\uff0c\u5f88\u663e\u7136\u8be5\u9879\u76ee\u53ef\u4ee5\u6269\u5c55\u652f\u6301\u5176\u4ed6\u7c7b\u578b\u7684 GPU\u3002<\/li>\n<li>extra-config: \u989d\u5916\u7684\u914d\u7f6e\uff0c\u8fd9\u4e2a\u53c2\u6570\u6682\u65f6\u770b\u4e0d\u51fa\u6765\u6709\u4ec0\u4e48\u7279\u522b<\/li>\n<li>volume-config: \u8fd9\u91cc\u7684 volume \u6307\u7684\u662f\u4e00\u4e9b\u52a8\u6001\u94fe\u63a5\u5e93\u548c\u53ef\u6267\u884c\u6587\u4ef6\u7684\u4f4d\u7f6e\u3002\u4e5f\u5c31\u662f gpu-manager \u9700\u8981\u62e6\u622a\u8c03\u7528\u7684\u4e00\u4e9b\u5e93<\/li>\n<li>docker-endpoint: \u7528\u6765\u6302\u8f7d\u5230\u5bb9\u5668\u4e2d\u548c docker \u505a\u901a\u4fe1\u7684\uff0c\u9ed8\u8ba4\u4f4d\u7f6e\u662f <code>unix:\/\/\/\/var\/run\/docker.sock<\/code><\/li>\n<li>query-port: \u7edf\u8ba1\u4fe1\u606f\u670d\u52a1\u7684\u67e5\u8be2\u63a5\u53e3<\/li>\n<li>query-port: \u7edf\u8ba1\u4fe1\u606f\u670d\u52a1\u7684\u76d1\u542c\u5730\u5740<\/li>\n<li>kubeconfig: \u7528\u6765\u6388\u6743\u7684\u914d\u7f6e\u6587\u4ef6<\/li>\n<li>standalone: \u6682\u65f6\u8fd8\u4e0d\u6e05\u695a\u7684\u53c2\u6570<\/li>\n<li>sample-period: gpu-manager \u4f1a\u67e5\u8be2 gpu \u8bbe\u5907\u7684\u4f7f\u7528\u60c5\u51b5\uff0c\u8fd9\u4e2a\u53c2\u6570\u7528\u6765\u8bbe\u5b9a\u91c7\u6837\u5468\u671f<\/li>\n<li>node-labels: \u7ed9\u8282\u70b9\u81ea\u52a8\u6253\u6807\u7b7e<\/li>\n<li>hostname-override: gpu-manager \u5728\u8fd0\u884c\u65f6\uff0c\u53ea\u5173\u6ce8\u81ea\u5df1\u8282\u70b9\u4e0a\u7684 pod\uff0c\u8fd9\u4e3b\u8981\u662f\u901a\u8fc7 hostname \u6765\u8fa8\u8ba4\u7684<\/li>\n<li>virtual-manager-path: gpu-manager \u4f1a\u4e3a\u6240\u6709\u9700\u8981\u865a\u62df gpu \u8d44\u6e90\u7684 pod \u521b\u5efa\u552f\u4e00\u7684\u6587\u4ef6\u5939\uff0c\u6587\u4ef6\u5939\u7684\u8def\u5f84\u5c31\u5728\u8fd9\u4e2a\u5730\u5740\u4e0b\u3002<\/li>\n<li>device-plugin-path: kubernetes \u9ed8\u8ba4\u7684 device plugin \u7684\u76ee\u5f55\u5730\u5740<\/li>\n<li>checkpoint-path: gpu-manager \u4f1a\u4ea7\u751f checkpoint \u6765\u5f53\u7f13\u5b58\u7528<\/li>\n<li>share-mode: gpu-manager \u6700\u5927\u7684\u7279\u70b9\u5c31\u662f\u5c06\u4e00\u4e2a\u7269\u7406 gpu \u5206\u6210\u591a\u4e2a\u865a\u62df gpu\uff0c\u4e5f\u5c31\u662f\u5171\u4eab\u6a21\u5f0f<\/li>\n<li>allocation-check-period: \u68c0\u67e5\u5206\u914d\u4e86\u865a\u62df gpu \u8d44\u6e90\u7684 pod \u7684\u72b6\u6001\uff0c\u53ca\u65f6\u56de\u6536\u8d44\u6e90<\/li>\n<li>incluster-mode: \u662f\u5426\u5728\u96c6\u7fa4\u5185\u8fd0\u884c<\/li>\n<\/ul>\n<h3>\u670d\u52a1\u542f\u52a8<\/h3>\n<p>gpu-manager \u63a8\u8350\u7684\u90e8\u7f72\u65b9\u6848\u662f\u901a\u8fc7 kubernetes \u7684 daemonset\uff0c\u7136\u540e\u914d\u7f6e node selector \u8c03\u5ea6\u5230\u6307\u5b9a\u7684\u8282\u70b9\u4e0a\u3002\u7136\u540e gpu-manager \u5c31\u5f00\u59cb\u5728\u6307\u5b9a\u8282\u70b9\u4e0a\u542f\u52a8\u4e86\u3002<\/p>\n<pre><code class=\"language-go line-numbers\">srv := server.NewManager(cfg)\ngo srv.Run()\n<\/code><\/pre>\n<p>\u8fd9\u91cc\uff0c\u6211\u4eec\u9700\u8981\u770b\u4e00\u4e0b\u8fd9\u4e2a srv \u7684\u5177\u4f53\u5b9e\u73b0\uff0c\u9996\u5148\u662f\u5b83\u7684\u7ed3\u6784\u4f53\uff1a<\/p>\n<pre><code class=\"language-go line-numbers\">type managerImpl struct {\n    config *config.Config\n\n    allocator      allocFactory.GPUTopoService     \/\/ gpu \u5bb9\u5668\u8c03\u5ea6\u5206\u914d\n    displayer      *display.Display                \/\/ gpu \u4f7f\u7528\u60c5\u51b5\u53ef\u89c6\u5316\u670d\u52a1\n    virtualManager *vitrual_manager.VirtualManager \/\/ \u8d1f\u8d23\u7ba1\u7406 vgpu\n\n    bundleServer map[string]ResourceServer\n    srv          *grpc.Server\n}\n<\/code><\/pre>\n<p>config \u5305\u542b\u4e86\u6211\u4eec\u4e0a\u9762\u7684\u6240\u6709\u53c2\u6570\uff0c\u5c31\u4e0d\u8fdb\u53bb\u7ec6\u770b\u4e86\u3002<\/p>\n<p>allocator \u8d1f\u8d23\u5728\u5bb9\u5668\u8c03\u5ea6\u5230\u8282\u70b9\u4e0a\u540e\uff0c\u4e3a\u5176\u5206\u914d\u5177\u4f53\u7684\u8bbe\u5907\u8d44\u6e90\u3002allocator \u5b9e\u73b0\u4e86\u63a2\u6d4b\u8282\u70b9\u4e0a\u7684 gpu \u62d3\u6251\u67b6\u6784\uff0c\u7136\u540e\u4ee5\u6700\u4f73\u6027\u80fd\uff0c\u6700\u5c11\u788e\u7247\u4e3a\u76ee\u7684\u4f7f\u7528\u6700\u4f18\u7684\u65b9\u6848\u8fdb\u884c\u8d44\u6e90\u5206\u914d\u3002<\/p>\n<p>displayer \u662f\u5c06 gpu \u7684\u4f7f\u7528\u60c5\u51b5\u8f93\u51fa\uff0c\u65b9\u4fbf\u6211\u4eec\u67e5\u770b\u3002<\/p>\n<p>virtualManager \u8d1f\u8d23 vgpu \u5206\u914d\u540e\u7684\u7ba1\u7406\u5de5\u4f5c\u3002<\/p>\n<p>bundleServer \u5305\u542b vcore\uff0cvmemory\uff0c\u6211\u4eec\u4e0a\u9762\u63d0\u5230\u8fd9\u4e24\u79cd\u8d44\u6e90\u4ee5 device plugin \u7684\u65b9\u5f0f\u8fdb\u884c\u6ce8\u518c\uff0c\u56e0\u6b64\u4ed6\u4eec\u9700\u8981\u542f\u52a8 grpc server\u3002<\/p>\n<p>srv: \u5c06 gpu display server \u6ce8\u518c\u5230\u8fd9\u4e2a grpc server \u4e2d\u3002<\/p>\n<p>\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c31\u53ef\u4ee5\u5206\u6790 <code>srv.Run()<\/code> \u65b9\u6cd5\u5177\u4f53\u6267\u884c\u4e86\u54ea\u4e9b\u5185\u5bb9\u3002\u4e3a\u4e86\u5148\u5bf9\u6574\u4e2a\u6d41\u7a0b\u6709\u4e2a\u5927\u6982\u7684\u5370\u8c61\uff0c\u6211\u5c06\u5185\u5bb9\u6574\u7406\u6210\u4ee5\u4e0b\u6761\u76ee\uff1a<\/p>\n<ul>\n<li>\u542f\u52a8 volumeManager\uff0c\u5c06\u8282\u70b9\u4e0a\u548c nvidia gpu (\u5305\u62eccuda) \u7684\u6240\u6709\u53ef\u6267\u884c\u6587\u4ef6\u548c\u5e93\u79fb\u52a8\u5230 \/etc\/gpu-manager\/vdriver \u4e2d\u3002\u5e76\u4e14\u5c06\u5173\u952e\u7684\u5e93\u66ff\u6362\u6210 vcuda-control\uff0c\u5b9e\u73b0 cuda \u8c03\u7528\u7684\u62e6\u622a\u3002<\/li>\n<li>watchdog \u521b\u5efa pod \u7f13\u5b58\u5e76\u76d1\u63a7 pod\uff0c\u4e4b\u540e\u6240\u6709\u5173\u4e8e pod \u7684\u64cd\u4f5c\u90fd\u6765\u6e90\u4e8e\u8fd9\u91cc\u3002<\/li>\n<li>watchdog \u7ed9\u8282\u70b9\u6253\u4e0a\u6807\u7b7e<\/li>\n<li>\u542f\u52a8 virtualManager<\/li>\n<li>gpu \u62d3\u6251\u7ed3\u6784\u611f\u77e5\u3002<\/li>\n<li>\u521d\u59cb\u5316\u8d44\u6e90\u5206\u914d\u5668<\/li>\n<li>\u8bbe\u7f6e vcuda, vmemory, display \u7684 grpc \u670d\u52a1<\/li>\n<li>\u542f\u52a8 metrics \u7684 http \u670d\u52a1\uff0c\u4e3b\u8981\u662f\u63d0\u4f9b\u7ed9 prometheus<\/li>\n<li>\u542f\u52a8 vcuda\uff0cvmemory \u7684 grpc \u670d\u52a1<\/li>\n<li>\u542f\u52a8 display \u7684 grpc \u670d\u52a1<\/li>\n<\/ul>\n<p>\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5177\u4f53\u6765\u5206\u6790\u6bcf\u4e00\u6b65\u662f\u5982\u4f55\u505a\u7684\u3002\u5f53\u7136\uff0c\u8fd9\u91cc\u53ea\u4f1a\u6311\u4e00\u4e9b\u91cd\u70b9\u7684\u90e8\u5206\u3002<\/p>\n<h4>volumeManager \u7684\u542f\u52a8<\/h4>\n<pre><code class=\"language-go line-numbers\">func (vm *VolumeManager) Run() (err error) {\n    \/\/ ldcache \u662f\u52a8\u6001\u94fe\u63a5\u5e93\u7684\u7f13\u5b58\u4fe1\u606f\n    cache, err := ldcache.Open()\n    defer func() {\n        if e := cache.Close(); err == nil {\n            err = e\n        }\n    }()\n    vols := make(VolumeMap)\n    for _, cfg := range vm.Config {\n        vol := &amp;Volume{\n            Path: path.Join(cfg.BasePath, cfg.Name),\n        }\n\n        if cfg.Name == \"nvidia\" {\n            \/\/ nvidia \u5e93\u7684\u4f4d\u7f6e\n            types.DriverLibraryPath = filepath.Join(cfg.BasePath, cfg.Name)\n        } else {\n            \/\/ origin \u5e93\u7684\u4f4d\u7f6e\n            types.DriverOriginLibraryPath = filepath.Join(cfg.BasePath, cfg.Name)\n        }\n\n        for t, c := range cfg.Components {\n            switch t {\n            case \"binaries\":\n                \/\/ \u8c03\u7528 which \u6765\u67e5\u627e\u53ef\u6267\u884c\u6587\u4ef6\u7684\u4f4d\u7f6e\n                bins, err := which(c...)\n                \/\/ \u5c06\u5b9e\u9645\u4f4d\u7f6e\u5b58\u8d77\u6765\n                vol.dirs = append(vol.dirs, volumeDir{binDir, bins})\n            case \"libraries\":\n                \/\/ \u662f\u5e93\u7684\u8bdd\uff0c\u5c31\u4ece ldcache \u91cc\u9762\u53bb\u627e\n                libs32, libs64 := cache.Lookup(c...)\n                \/\/ \u5c06 library \u4f4d\u7f6e\u5b58\u8d77\u6765\n                vol.dirs = append(vol.dirs, volumeDir{lib32Dir, libs32}, volumeDir{lib64Dir, libs64})\n            }\n            vols[cfg.Name] = vol\n        }\n    }\n    \/\/ \u627e\u5230\u4e86\u9700\u8981\u7684\u5e93\u4f4d\u7f6e\u4e4b\u540e\uff0c\u505a mirror \u5904\u7406\n    if err := vm.mirror(vols); err != nil {\n        return err\n    }\n    return nil\n}\n<\/code><\/pre>\n<p>\u8fd9\u6bb5\u4ee3\u7801\u7684\u524d\u534a\u90e8\u5206\u90fd\u662f\u5728\u67e5\u627e\u6307\u5b9a\u7684\u52a8\u6001\u94fe\u63a5\u5e93\u548c\u53ef\u6267\u884c\u6587\u4ef6\uff0c\u8fd9\u4e9b\u6587\u4ef6\u662f\u5728 volume.conf \u8fd9\u4e2a\u914d\u7f6e\u6587\u4ef6\u4e2d\u6307\u5b9a\u7684\uff0c\u901a\u8fc7\u53c2\u6570\u4f20\u8fdb\u6765\u3002\u67e5\u627e\u52a8\u6001\u94fe\u63a5\u5e93\u65f6\uff0c\u4f7f\u7528\u7684\u662f ldcache\uff0c\u67e5\u627e\u53ef\u6267\u884c\u6587\u4ef6\u65f6\uff0c\u4f7f\u7528\u4e86\u7cfb\u7edf\u7684 <code>which<\/code> \u6307\u4ee4\u3002\u627e\u5230\u4e4b\u540e\u4f1a\u5c06\u5176\u6240\u5728\u4f4d\u7f6e\u8bb0\u5f55\u4e0b\u6765\u3002\u63a5\u7740\u5c31\u662f\u5bf9\u627e\u5230\u7684\u5e93\u505a <code>mirror<\/code> \u5904\u7406\u3002<\/p>\n<pre><code class=\"language-go line-numbers\">func (vm *VolumeManager) mirror(vols VolumeMap) error {\n    \/\/ nvidia \u548c origin\n    for driver, vol := range vols {\n        if exist, _ := vol.exist(); !exist {\n            \/\/ \u8fd9\u91cc\u7684path\u662f\/etc\/gpu-manager\/vdriver\u4e0b\u9762\n            if err := os.MkdirAll(vol.Path, 0755); err != nil {\n                return err\n            }\n        }\n        for _, d := range vol.dirs {\n            vpath := path.Join(vol.Path, d.name)\n            \/\/ \u521b\u5efa bin lib lib64\n            if err := os.MkdirAll(vpath, 0755); err != nil {\n                return err\n            }\n\n            \/\/ For each file matching the volume components (blacklist excluded), create a hardlink\/copy\n            \/\/ of it inside the volume directory. We also need to create soname symlinks similar to what\n            \/\/ ldconfig does since our volume will only show up at runtime.\n            for _, f := range d.files {\n                glog.V(2).Infof(\"Mirror %s to %s\", f, vpath)\n                if err := vm.mirrorFiles(driver, vpath, f); err != nil {\n                    return err\n                }\n\n                if strings.HasPrefix(path.Base(f), \"libcuda.so\") {\n                    driverStr := strings.SplitN(strings.TrimPrefix(path.Base(f), \"libcuda.so.\"), \".\", 2)\n                    types.DriverVersionMajor, _ = strconv.Atoi(driverStr[0]) \/\/ \u9a71\u52a8\u7248\u672c\u53f7\n                    types.DriverVersionMinor, _ = strconv.Atoi(driverStr[1])\n                    glog.V(2).Infof(\"Driver version: %d.%d\", types.DriverVersionMajor, types.DriverVersionMinor)\n                }\n\n                if strings.HasPrefix(path.Base(f), \"libcuda-control.so\") {\n                    vm.cudaControlFile = f\n                }\n            }\n        }\n    }\n\n    vCudaFileFn := func(soFile string) error {\n        if err := os.Remove(soFile); err != nil {\n            if !os.IsNotExist(err) {\n                return err\n            }\n        }\n        if err := clone(vm.cudaControlFile, soFile); err != nil {\n            return err\n        }\n\n        glog.V(2).Infof(\"Vcuda %s to %s\", vm.cudaControlFile, soFile)\n\n        l := strings.TrimRight(soFile, \".0123456789\")\n        if err := os.Remove(l); err != nil {\n            if !os.IsNotExist(err) {\n                return err\n            }\n        }\n        if err := clone(vm.cudaControlFile, l); err != nil {\n            return err\n        }\n        glog.V(2).Infof(\"Vcuda %s to %s\", vm.cudaControlFile, l)\n        return nil\n    }\n\n    if vm.share &amp;&amp; len(vm.cudaControlFile) &gt; 0 {\n        if len(vm.cudaSoname) &gt; 0 {\n            for _, f := range vm.cudaSoname {\n                if err := vCudaFileFn(f); err != nil {\n                    return err\n                }\n            }\n        }\n\n        if len(vm.mlSoName) &gt; 0 {\n            for _, f := range vm.mlSoName {\n                if err := vCudaFileFn(f); err != nil {\n                    return err\n                }\n            }\n        }\n    }\n\n    return nil\n}\n<\/code><\/pre>\n<p>\u8fd9\u6bb5\u4ee3\u7801\u5148\u4f1a\u5bf9\u6240\u6709\u4e0a\u9762\u67e5\u627e\u5230\u7684\u5e93\u6216\u53ef\u6267\u884c\u6587\u4ef6\u8c03\u7528 <code>mirrorFiles<\/code>\uff0c\u4f46\u662f\u8bb0\u5f55\u4e0b\u6765\u4e86 <code>libcuda.so<\/code> \u7684\u7248\u672c\u53f7\u548c <code>libcuda-control.so<\/code> \u7684\u4f4d\u7f6e\u3002\u6ce8\u610f\uff0c\u8fd9\u4e2a <code>libcuda-control<\/code> \u5c31\u662f <code>vcuda-control<\/code> \u9879\u76ee\u751f\u6210\u7684\u7528\u6765\u62e6\u622a <code>cuda<\/code> \u8c03\u7528\u7684\u5e93\u3002<\/p>\n<p>\u7136\u540e\u5c06 <code>cudaControlFile<\/code> clone\u5230\u6240\u6709 <code>cudaSoname<\/code> \u548c <code>mlSoName<\/code> \u4e2d\u5e93\u7684\u4f4d\u7f6e\u3002\u8fd9\u4e2a clone \u65b9\u6cd5\u4f1a\u5148\u5c1d\u8bd5\u786c\u94fe\u63a5\u8fc7\u53bb\uff0c\u5982\u679c\u5931\u8d25\u5c31\u76f4\u63a5\u590d\u5236\u8fc7\u53bb\u3002\u8fd9\u91cc\u7684 <code>cudaControlFile<\/code> \u5c31\u662f\u6211\u4eec\u4e0a\u9762\u6240\u8bf4\u7684 <code>libcuda-control.so<\/code> \u5566\u3002<code>cudaSoname<\/code> \u548c <code>mlSoName<\/code>  \u5305\u542b\u4e86\u6240\u6709\u9700\u8981\u88ab\u62e6\u622a\u8c03\u7528\u7684\u5e93\u3002\u8fd9\u6837\u5b50\u5c31\u5b9e\u73b0\u4e86\u62e6\u622a\u6240\u6709\u7684 <code>cuda<\/code> \u8c03\u7528\u3002\u4e0b\u9762\u6211\u4eec\u5728\u770b\u4e00\u4e0b <code>mirrorFiles<\/code> \u8fd9\u4e2a\u65b9\u6cd5\u5c31\u53ef\u4ee5\u4e86\u3002<\/p>\n<pre><code class=\"language-go line-numbers\">\/\/ driver \u662f\u914d\u7f6e\u6587\u4ef6\u4e2d\u7684 \"nvidia\" \u6216 \"origin\"\n\/\/ vpath \u662f\u8981 mirror \u5230\u7684\u4f4d\u7f6e\uff0c\u5728 \/etc\/gpu-manager\/vdriver \u4e0b\u9762\nfunc (vm *VolumeManager) mirrorFiles(driver, vpath string, file string) error {\n    \/\/ In computing, the Executable and Linkable Format (ELF, formerly named Extensible Linking Format), is a common standard file format for executable files, object code, shared libraries, and core dumps\n    obj, err := elf.Open(file)\n    defer obj.Close()\n\n    \/\/ \u9ed1\u540d\u5355\u673a\u5236\uff0c\u5177\u4f53\u7528\u5904\u8fd8\u4e0d\u6e05\u695a\uff0c\u8ddf nvidia \u7684\u9a71\u52a8\u76f8\u5173\n    ok, err := blacklisted(file, obj)\n    if ok {\n        return nil\n    }\n    l := path.Join(vpath, path.Base(file))\n    \/\/ \u4e0d\u7ba1\u6709\u6ca1\u6709\uff0c\u5148\u5c1d\u8bd5\u628a gpu-manager \u91cc\u9762\u7684\u79fb\u9664\n    if err := removeFile(l); err != nil {\n        return err\n    }\n    \/\/ clone \u4f18\u5148\u786c\u8fde\u63a5\uff0c\u5176\u6b21\u662f\u590d\u5236\u6587\u4ef6\u5230\u6307\u5b9a\u4f4d\u7f6e\n    if err := clone(file, l); err != nil {\n        return err\n    }\n    \/\/ \u4ece elf \u4e2d\u83b7\u53d6\u5f53\u524d\u5e93\u7684 soname\n    soname, err := obj.DynString(elf.DT_SONAME)\n    if len(soname) &gt; 0 {\n        \/\/ \u5c06\u83b7\u53d6\u5230 soname \u7ec4\u6210\u8def\u5f84\n        l = path.Join(vpath, soname[0])\n        \/\/ \u5982\u679c\u6587\u4ef6\u548c\u5b83\u7684soname\u4e0d\u4e00\u81f4\uff08\u662f\u5426\u53ef\u4ee5\u8ba4\u4e3a\u8fd9\u4e2a\u6587\u4ef6\u662f\u8f6f\u94fe\u63a5\u8fc7\u53bb\u7684\uff09\n        if err := linkIfNotSameName(path.Base(file), l); err != nil &amp;&amp; !os.IsExist(err) {\n            return err\n        }\n\n        \/\/ XXX Many applications (wrongly) assume that libcuda.so exists (e.g. with dlopen)\n        \/\/ Hardcode the libcuda symlink for the time being.\n        if strings.Contains(driver, \"nvidia\") {\n            \/\/ \u8fd9\u91cc\u4e3a\u4ec0\u4e48\u8981\u79fb\u9664 libcuda.so \u548c libnvidia-ml.so \u7684\u8f6f\u94fe\u63a5\n            \/\/ \u56e0\u4e3agpu\u8c03\u7528\u4f1a\u6d89\u53ca\u5230\u8fd9\u4e24\u4e2a\u5e93\uff0c\u8fd9\u4e24\u4e2a\u5e93\u4f1a\u8f6f\u94fe\u63a5\u5230\u771f\u5b9e\u7684\u5e93\u4e0a\u3002\u79fb\u9664\u540e\u66ff\u6362\u6210\u62e6\u622a\u7684\u5e93\n            \/\/ Remove libcuda symbol link\n            if vm.share &amp;&amp; driver == \"nvidia\" &amp;&amp; strings.HasPrefix(soname[0], \"libcuda.so\") {\n                os.Remove(l)\n                vm.cudaSoname[l] = l\n            }\n\n            \/\/ Remove libnvidia-ml symbol link\n            if vm.share &amp;&amp; driver == \"nvidia\" &amp;&amp; strings.HasPrefix(soname[0], \"libnvidia-ml.so\") {\n                os.Remove(l)\n                vm.mlSoName[l] = l\n            }\n\n            \/\/ XXX GLVND requires this symlink for indirect GLX support\n            \/\/ It won't be needed once we have an indirect GLX vendor neutral library.\n            if strings.HasPrefix(soname[0], \"libGLX_nvidia\") {\n                l = strings.Replace(l, \"GLX_nvidia\", \"GLX_indirect\", 1)\n                if err := linkIfNotSameName(path.Base(file), l); err != nil &amp;&amp; !os.IsExist(err) {\n                    return err\n                }\n            }\n        }\n    }\n\n    return nil\n}\n<\/code><\/pre>\n<p>\u8fd9\u6bb5\u4ee3\u7801\u4e2d\uff0c\u5148\u4f7f\u7528 <code>blacklisted<\/code> \u6392\u9664\u4e00\u4e9b\u4e0d\u9700\u8981\u5904\u7406\u7684\u5e93\uff0c\u7136\u540e\u5c1d\u8bd5\u5c06\u5e93\u6216\u53ef\u6267\u884c\u6587\u4ef6 clone \u5230\u6211\u4eec\u7684 <code>\/etc\/gpu-manager\/vdriver<\/code> \u4e0b\u9762\u3002<code>\/etc\/gpu-manager\/vdriver<\/code> \u4e0b\u9762\u6709\u4e24\u4e2a\u6587\u4ef6\u5939\uff0c\u4e00\u4e2a\u662f <code>nvidia<\/code>\uff0c\u4fdd\u5b58\u4e86\u5df2\u7ecf\u88ab\u6211\u4eec\u62e6\u622a\u7684\u5e93\uff0c\u4e00\u4e2a\u662f <code>origin<\/code>\uff0c\u8fd9\u91cc\u9762\u662f\u539f\u59cb\u7684\u672a\u5904\u7406\u7684\u5e93\u3002\u540c\u65f6\uff0c\u8fd8\u5c06 libcuda.so \u548c libnvidia-ml.so \u79fb\u9664\u4e86\uff0c\u8fd9\u6837\u5c31\u8c03\u7528\u4e0d\u5230\u771f\u5b9e\u7684\u5e93\u4e86\uff0c\u8f6c\u800c\u5728\u4e4b\u540e\u7528\u6211\u4eec\u62e6\u622a\u7684\u5e93\u6765\u66ff\u6362\u8fd9\u51e0\u4e2a\u6587\u4ef6\u3002<\/p>\n<p>\u81f3\u6b64\uff0cvolumeManager \u5206\u6790\u7ed3\u675f\u3002<\/p>\n<h4>gpu \u62d3\u6251\u7ed3\u6784\u611f\u77e5<\/h4>\n<p>\u5173\u4e8e gpu \u62d3\u6251\u7ed3\u6784\u8fd9\u4e00\u5757\uff0c\u4e3b\u8981\u662f\u4e3a\u4e86\u5728\u4e4b\u540e\u505a\u8d44\u6e90\u5206\u914d\u65f6\u9009\u62e9\u6700\u4f18\u65b9\u6848\u7528\u7684\u3002\u817e\u8baf\u4e5f\u6709\u5206\u4eab\u8fc7\u8fd9\u4e00\u5757\u7684\u8d44\u6599(<a class=\"wp-editor-md-post-content-link\" href=\"http:\/\/dl.zhangluya.com\/Qcon\/qconbj2019\/%E8%85%BE%E8%AE%AF%E5%9F%BA%E4%BA%8E%20Kubernetes%20%E7%9A%84%E4%BC%81%E4%B8%9A%E7%BA%A7%E5%AE%B9%E5%99%A8%E4%BA%91%E5%AE%9E%E8%B7%B5-%E7%BD%97%E9%9F%A9%E6%A2%85.pdf\">\u817e\u8baf\u57fa\u4e8e Kubernetes \u7684\u4f01\u4e1a\u7ea7\u5bb9\u5668\u4e91\u5b9e\u8df5<\/a>):<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.myway5.com\/wp-content\/uploads\/2020\/04\/Screenshot-from-2020-04-01-13-18-35.png\" alt=\"gpu \u62d3\u6251\u7ed3\u6784\" \/><\/p>\n<p>\u8fd9\u91cc\u4e0d\u5f71\u54cd\u6211\u4eec\u7406\u89e3\u6574\u4e2a\u5de5\u4f5c\u673a\u5236\uff0c\u6240\u4ee5\u5148\u4e0d\u5206\u6790\u3002<\/p>\n<h4>\u521d\u59cb\u5316\u8d44\u6e90\u5206\u914d\u5668<\/h4>\n<pre><code class=\"language-go line-numbers\">\/\/ \u5206\u914d\u5668\uff0c\u6839\u636edriver\u8c03\u7528\u76f8\u5e94\u7684\u5206\u914d\u5668\ninitAllocator := allocFactory.NewFuncForName(m.config.Driver)\nif initAllocator == nil {\n    return fmt.Errorf(\"can not find allocator for %s\", m.config.Driver)\n}\n\nm.allocator = initAllocator(m.config, tree, client)\n<\/code><\/pre>\n<p>\u8fd9\u91cc\u7684 initAllocator \u5bf9\u5e94\u7684\u65b9\u6cd5\u662f:<\/p>\n<pre><code class=\"language-go line-numbers\">\/\/NewNvidiaTopoAllocator returns a new NvidiaTopoAllocator\nfunc NewNvidiaTopoAllocator(config *config.Config, tree device.GPUTree, k8sClient kubernetes.Interface) allocator.GPUTopoService {\n    runtimeRequestTimeout := metav1.Duration{Duration: 2 * time.Minute}\n    imagePullProgressDeadline := metav1.Duration{Duration: 1 * time.Minute}\n    dockerClientConfig := &amp;dockershim.ClientConfig{\n        DockerEndpoint:            config.DockerEndpoint,\n        RuntimeRequestTimeout:     runtimeRequestTimeout.Duration,\n        ImagePullProgressDeadline: imagePullProgressDeadline.Duration,\n    }\n\n    _tree, _ := tree.(*nvtree.NvidiaTree)\n    cm, err := checkpoint.NewManager(config.CheckpointPath, checkpointFileName)\n    if err != nil {\n        glog.Fatalf(\"Failed to create checkpoint manager due to %s\", err.Error())\n    }\n    alloc := &amp;NvidiaTopoAllocator{\n        tree:              _tree,\n        config:            config,\n        evaluators:        make(map[string]Evaluator),\n        dockerClient:      dockershim.NewDockerClientFromConfig(dockerClientConfig),\n        allocatedPod:      cache.NewAllocateCache(),\n        k8sClient:         k8sClient,\n        queue:             workqueue.NewRateLimitingQueue(workqueue.DefaultControllerRateLimiter()),\n        stopChan:          make(chan struct{}),\n        checkpointManager: cm,\n    }\n\n    \/\/ Load kernel module if it's not loaded\n    alloc.loadModule()\n\n    \/\/ Initialize evaluator\n    alloc.initEvaluator(_tree)\n\n    \/\/ Read extra config if it's given\n    alloc.loadExtraConfig(config.ExtraConfigPath)\n\n    \/\/ Process allocation results in another goroutine\n    go wait.Until(alloc.runProcessResult, time.Second, alloc.stopChan)\n\n    \/\/ Recover\n    alloc.recoverInUsed()\n\n    \/\/ Check allocation in another goroutine periodically\n    go alloc.checkAllocationPeriodically(alloc.stopChan)\n\n    return alloc\n}\n<\/code><\/pre>\n<p>allocator \u8c03\u7528 <code>loadModule()<\/code> \u6765\u542f\u7528 nvidia \u7684\u5185\u6838\u6a21\u5757\u3002<\/p>\n<p>\u8c03\u7528 <code>initEvaluator(_tree)<\/code> \u6765\u521d\u59cb\u5316\u8bc4\u4f30\u5668\uff0c\u8fd9\u91cc\u7684 <code>_tree<\/code> \u5c31\u662f\u611f\u77e5\u5230\u7684 gpu \u62d3\u6251\u7ed3\u6784\u3002<\/p>\n<p>\u8c03\u7528 <code>loadExtraConfig(config.ExtraConfigPath)<\/code> \u6765\u52a0\u8f7d\u542f\u52a8\u65f6\u4f20\u5165\u7684\u989d\u5916\u53c2\u6570\u914d\u7f6e\u6587\u4ef6\u3002<\/p>\n<p><code>go wait.Until(alloc.runProcessResult, time.Second, alloc.stopChan)<\/code> \u521b\u5efa\u4e86\u65b0\u7684\u534f\u7a0b\u6765\u5904\u7406\u5206\u914d\u7ed3\u679c\u3002<\/p>\n<p><code>recoverInUsed()<\/code> \u662f\u6062\u590d gpu \u5206\u914d\u7ed3\u679c\u3002\u6bd4\u5982\u5728 gpu-manager \u91cd\u542f\u4e4b\u540e\uff0c\u4e4b\u524d\u7684 gpu \u5206\u914d\u7ed3\u679c\u90fd\u4e22\u5931\u4e86\uff0c\u4f46\u662f\u8282\u70b9\u4e0a\u8fd8\u6709\u5927\u91cf\u7684\u5bb9\u5668\u6b63\u5728\u5360\u7528 gpu\uff0c\u8fd9\u4e2a\u65b9\u6cd5\u4f1a\u901a\u8fc7\u67e5\u627e\u8282\u70b9\u4e0a\u5b58\u6d3b\u7684\u5bb9\u5668\uff0c\u901a\u8fc7 docker endpoint\uff0c \u8c03\u7528 <code>InspectContainer<\/code> \u83b7\u53d6\u5bb9\u5668\u4e2d\u5360\u7528\u7684 device id\uff0c\u7136\u540e\u6807\u8bb0\u8be5\u8bbe\u5907\u548c\u5bb9\u5668\u4e4b\u95f4\u7684\u5360\u7528\u5173\u7cfb\u3002<\/p>\n<p><code>go alloc.checkAllocationPeriodically(alloc.stopChan)<\/code> \u521b\u5efa\u65b0\u7684\u534f\u7a0b\u6765\u5468\u671f\u6027\u7684\u68c0\u67e5\u8d44\u6e90\u5206\u914d\u60c5\u51b5\u3002\u5982\u679c\u662f Failed \u548c Pending \u72b6\u6001\u7684\u5bb9\u5668\uff0c\u5c31\u6839\u636e\u9519\u8bef\u4fe1\u606f\u68c0\u67e5\u662f\u5426\u5e94\u8be5\u5220\u9664\u5b83\u4eec\uff0c\u7136\u540e\u5982\u679c\u8fd9\u4e9b pod \u7684\u63a7\u5236\u5668\u662f deployment \u7c7b\u4f3c\u7684\uff0c\u5c31\u5c1d\u8bd5\u5220\u9664\u5b83\u4eec\uff0c\u8fd9\u6837\u63a7\u5236\u5668\u4f1a\u91cd\u65b0\u521b\u5efa\u8fd9\u4e9b pod \u8fdb\u884c\u8c03\u5ea6\uff0c\u8ba9\u8fd9\u4e9b pod \u6062\u590d\u5230\u6b63\u5e38\u8fd0\u884c\u72b6\u6001\u3002<\/p>\n<h4>\u542f\u52a8\u5404\u79cd\u670d\u52a1<\/h4>\n<p>vcuda\uff0cvmemory \u7684 grpc \u670d\u52a1\u662f device plugin \u7684\u673a\u5236\u3002metrics service \u662f\u63d0\u4f9b\u7ed9 prometheus \u8c03\u7528\u7684\uff0c\u4ee5\u76d1\u63a7\u8be5\u8282\u70b9\u7684\u76f8\u5173\u4fe1\u606f\u3002display \u670d\u52a1\u4f1a\u6253\u5370 gpu \u62d3\u6251\u7ed3\u6784\u7684\u76f8\u5173\u4fe1\u606f\u3002<\/p>\n<h3>Device plugin \u7684\u6ce8\u518c<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/www.myway5.com\/wp-content\/uploads\/2020\/03\/device-plugins.svg\" alt=\"Device plugin\" \/><\/p>\n<p>\u8fd9\u5f20\u56fe\u662f device plugin \u6ce8\u518c\u7684\u65f6\u5e8f\u56fe\u3002gpu-manager \u7684\u6ce8\u518c\u65b9\u6cd5\u662f\uff1a<\/p>\n<pre><code class=\"language-go line-numbers\">func (m *managerImpl) RegisterToKubelet() error {\n    socketFile := filepath.Join(m.config.DevicePluginPath, types.KubeletSocket)\n    dialOptions := []grpc.DialOption{grpc.WithInsecure(), grpc.WithDialer(utils.UnixDial), grpc.WithBlock(), grpc.WithTimeout(time.Second * 5)}\n\n    conn, err := grpc.Dial(socketFile, dialOptions...)\n    if err != nil {\n        return err\n    }\n    defer conn.Close()\n\n    client := pluginapi.NewRegistrationClient(conn)\n\n    for _, srv := range m.bundleServer {\n        req := &amp;pluginapi.RegisterRequest{\n            Version:      pluginapi.Version,\n            Endpoint:     path.Base(srv.SocketName()),\n            ResourceName: srv.ResourceName(),\n            Options:      &amp;pluginapi.DevicePluginOptions{PreStartRequired: true},\n        }\n\n        glog.V(2).Infof(\"Register to kubelet with endpoint %s\", req.Endpoint)\n        _, err = client.Register(context.Background(), req)\n        if err != nil {\n            return err\n        }\n    }\n\n    return nil\n}\n<\/code><\/pre>\n<p>\u8fd9\u91cc\u5206\u522b\u6ce8\u518c\u4e86 vcuda \u548c vmemory\u3002vcuda \u548c vmemory \u7684 Allocate \u65b9\u6cd5\u90fd\u6307\u5411\u4e86\u540c\u4e00\u4e2a\u65b9\u6cd5\uff0c\u5199\u5728\u4e86 <code>service\/allocator\/nvidia\/allocator.go<\/code> \u4e2d\u3002<\/p>\n<p>\u81f3\u6b64\uff0cgpu-manager \u7684\u542f\u52a8\u6d41\u7a0b\u7ed3\u675f\u3002\u63a5\u4e0b\u6765\u7684 gpu-manager \u7684\u804c\u8d23\u5c31\u662f\u7b49\u5f85 kubelet \u901a\u8fc7 grpc \u7684\u8c03\u7528\uff0c\u5728\u5bb9\u5668\u8c03\u5ea6\u5230\u8282\u70b9\u7684\u65f6\u5019\u8fdb\u884c\u8d44\u6e90\u8bbe\u5907\u7684\u5206\u914d\uff0c\u5fc5\u8981\u76ee\u5f55\u7684\u6302\u8f7d\u7b49\u5de5\u4f5c\u4e86\u3002\u5177\u4f53\u7684\u53ef\u4ee5\u89c1\u4e0b\u4e00\u7bc7\u6587\u7ae0<\/p>\n<p>\u6700\u540e\uff0c\u63d0\u4f9b\u4e00\u4e2a\u7b80\u5355\u7684\u8111\u56fe\u5e2e\u52a9\u7406\u89e3\uff1a<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.myway5.com\/wp-content\/uploads\/2020\/04\/Screenshot-from-2020-04-01-13-57-48.png\" alt=\"gpu-manager-arch\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u6982\u8ff0 Gaia scheduler \u662f\u817e\u8baf\u5f00\u6e90\u7684\u5728 Kubernetes \u96c6\u7fa4\u4e2d\u505a GPU \u865a\u62df\u5316\u7684\u65b9\u6848\uff0c\u5b9e\u73b0 &hellip; <a href=\"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/\" class=\"more-link\">\u7ee7\u7eed\u9605\u8bfb<span class=\"screen-reader-text\">[Gaia Scheduler] gpu-manager \u542f\u52a8\u6d41\u7a0b\u5206\u6790<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[89],"tags":[115,116,114,104],"class_list":["post-582","post","type-post","status-publish","format-standard","hentry","category-k8s","tag-gaia-scheduler","tag-gpu-virtual","tag-gpu-manager","tag-k8s"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>[Gaia Scheduler] gpu-manager \u542f\u52a8\u6d41\u7a0b\u5206\u6790 - \u4e00\u53ea\u5b89\u9759\u7684\u732b<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/\" \/>\n<meta property=\"og:locale\" content=\"zh_CN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"[Gaia Scheduler] gpu-manager \u542f\u52a8\u6d41\u7a0b\u5206\u6790 - \u4e00\u53ea\u5b89\u9759\u7684\u732b\" \/>\n<meta property=\"og:description\" content=\"\u6982\u8ff0 Gaia scheduler \u662f\u817e\u8baf\u5f00\u6e90\u7684\u5728 Kubernetes \u96c6\u7fa4\u4e2d\u505a GPU \u865a\u62df\u5316\u7684\u65b9\u6848\uff0c\u5b9e\u73b0 &hellip; \u7ee7\u7eed\u9605\u8bfb[Gaia Scheduler] gpu-manager \u542f\u52a8\u6d41\u7a0b\u5206\u6790\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/\" \/>\n<meta property=\"og:site_name\" content=\"\u4e00\u53ea\u5b89\u9759\u7684\u732b\" \/>\n<meta property=\"article:published_time\" content=\"2020-04-01T05:58:26+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-07-14T05:14:21+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.myway5.com\/wp-content\/uploads\/2020\/04\/Screenshot-from-2020-04-01-13-18-35.png\" \/>\n<meta name=\"author\" content=\"jiangpengfei\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u4f5c\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"jiangpengfei\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 \u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/\"},\"author\":{\"name\":\"jiangpengfei\",\"@id\":\"https:\/\/www.myway5.com\/#\/schema\/person\/b19267e8b106343431e163ec96950685\"},\"headline\":\"[Gaia Scheduler] gpu-manager \u542f\u52a8\u6d41\u7a0b\u5206\u6790\",\"datePublished\":\"2020-04-01T05:58:26+00:00\",\"dateModified\":\"2023-07-14T05:14:21+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/\"},\"wordCount\":238,\"commentCount\":4,\"publisher\":{\"@id\":\"https:\/\/www.myway5.com\/#\/schema\/person\/b19267e8b106343431e163ec96950685\"},\"image\":{\"@id\":\"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.myway5.com\/wp-content\/uploads\/2020\/04\/Screenshot-from-2020-04-01-13-18-35.png\",\"keywords\":[\"gaia scheduler\",\"gpu \u865a\u62df\u5316\",\"gpu-manager\",\"k8s\"],\"articleSection\":[\"k8s\"],\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/\",\"url\":\"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/\",\"name\":\"[Gaia Scheduler] gpu-manager \u542f\u52a8\u6d41\u7a0b\u5206\u6790 - \u4e00\u53ea\u5b89\u9759\u7684\u732b\",\"isPartOf\":{\"@id\":\"https:\/\/www.myway5.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.myway5.com\/wp-content\/uploads\/2020\/04\/Screenshot-from-2020-04-01-13-18-35.png\",\"datePublished\":\"2020-04-01T05:58:26+00:00\",\"dateModified\":\"2023-07-14T05:14:21+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/#primaryimage\",\"url\":\"https:\/\/www.myway5.com\/wp-content\/uploads\/2020\/04\/Screenshot-from-2020-04-01-13-18-35.png\",\"contentUrl\":\"https:\/\/www.myway5.com\/wp-content\/uploads\/2020\/04\/Screenshot-from-2020-04-01-13-18-35.png\",\"width\":1888,\"height\":926},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u9996\u9875\",\"item\":\"https:\/\/www.myway5.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"[Gaia Scheduler] gpu-manager \u542f\u52a8\u6d41\u7a0b\u5206\u6790\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.myway5.com\/#website\",\"url\":\"https:\/\/www.myway5.com\/\",\"name\":\"\u4e00\u53ea\u5b89\u9759\u7684\u732b\",\"description\":\"\u60f3\u5565\u5462\",\"publisher\":{\"@id\":\"https:\/\/www.myway5.com\/#\/schema\/person\/b19267e8b106343431e163ec96950685\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.myway5.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"zh-Hans\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/www.myway5.com\/#\/schema\/person\/b19267e8b106343431e163ec96950685\",\"name\":\"jiangpengfei\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/www.myway5.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f8c7de757f6e0247412bcfd31b7c2271?s=96&d=monsterid&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f8c7de757f6e0247412bcfd31b7c2271?s=96&d=monsterid&r=g\",\"caption\":\"jiangpengfei\"},\"logo\":{\"@id\":\"https:\/\/www.myway5.com\/#\/schema\/person\/image\/\"},\"url\":\"https:\/\/www.myway5.com\/index.php\/author\/joyme\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"[Gaia Scheduler] gpu-manager \u542f\u52a8\u6d41\u7a0b\u5206\u6790 - \u4e00\u53ea\u5b89\u9759\u7684\u732b","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/","og_locale":"zh_CN","og_type":"article","og_title":"[Gaia Scheduler] gpu-manager \u542f\u52a8\u6d41\u7a0b\u5206\u6790 - \u4e00\u53ea\u5b89\u9759\u7684\u732b","og_description":"\u6982\u8ff0 Gaia scheduler \u662f\u817e\u8baf\u5f00\u6e90\u7684\u5728 Kubernetes \u96c6\u7fa4\u4e2d\u505a GPU \u865a\u62df\u5316\u7684\u65b9\u6848\uff0c\u5b9e\u73b0 &hellip; \u7ee7\u7eed\u9605\u8bfb[Gaia Scheduler] gpu-manager \u542f\u52a8\u6d41\u7a0b\u5206\u6790","og_url":"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/","og_site_name":"\u4e00\u53ea\u5b89\u9759\u7684\u732b","article_published_time":"2020-04-01T05:58:26+00:00","article_modified_time":"2023-07-14T05:14:21+00:00","og_image":[{"url":"https:\/\/www.myway5.com\/wp-content\/uploads\/2020\/04\/Screenshot-from-2020-04-01-13-18-35.png","type":"","width":"","height":""}],"author":"jiangpengfei","twitter_card":"summary_large_image","twitter_misc":{"\u4f5c\u8005":"jiangpengfei","\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4":"8 \u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/#article","isPartOf":{"@id":"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/"},"author":{"name":"jiangpengfei","@id":"https:\/\/www.myway5.com\/#\/schema\/person\/b19267e8b106343431e163ec96950685"},"headline":"[Gaia Scheduler] gpu-manager \u542f\u52a8\u6d41\u7a0b\u5206\u6790","datePublished":"2020-04-01T05:58:26+00:00","dateModified":"2023-07-14T05:14:21+00:00","mainEntityOfPage":{"@id":"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/"},"wordCount":238,"commentCount":4,"publisher":{"@id":"https:\/\/www.myway5.com\/#\/schema\/person\/b19267e8b106343431e163ec96950685"},"image":{"@id":"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/#primaryimage"},"thumbnailUrl":"https:\/\/www.myway5.com\/wp-content\/uploads\/2020\/04\/Screenshot-from-2020-04-01-13-18-35.png","keywords":["gaia scheduler","gpu \u865a\u62df\u5316","gpu-manager","k8s"],"articleSection":["k8s"],"inLanguage":"zh-Hans","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/","url":"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/","name":"[Gaia Scheduler] gpu-manager \u542f\u52a8\u6d41\u7a0b\u5206\u6790 - \u4e00\u53ea\u5b89\u9759\u7684\u732b","isPartOf":{"@id":"https:\/\/www.myway5.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/#primaryimage"},"image":{"@id":"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/#primaryimage"},"thumbnailUrl":"https:\/\/www.myway5.com\/wp-content\/uploads\/2020\/04\/Screenshot-from-2020-04-01-13-18-35.png","datePublished":"2020-04-01T05:58:26+00:00","dateModified":"2023-07-14T05:14:21+00:00","breadcrumb":{"@id":"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/"]}]},{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/#primaryimage","url":"https:\/\/www.myway5.com\/wp-content\/uploads\/2020\/04\/Screenshot-from-2020-04-01-13-18-35.png","contentUrl":"https:\/\/www.myway5.com\/wp-content\/uploads\/2020\/04\/Screenshot-from-2020-04-01-13-18-35.png","width":1888,"height":926},{"@type":"BreadcrumbList","@id":"https:\/\/www.myway5.com\/index.php\/2020\/04\/01\/gpu-manager-startup\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u9996\u9875","item":"https:\/\/www.myway5.com\/"},{"@type":"ListItem","position":2,"name":"[Gaia Scheduler] gpu-manager \u542f\u52a8\u6d41\u7a0b\u5206\u6790"}]},{"@type":"WebSite","@id":"https:\/\/www.myway5.com\/#website","url":"https:\/\/www.myway5.com\/","name":"\u4e00\u53ea\u5b89\u9759\u7684\u732b","description":"\u60f3\u5565\u5462","publisher":{"@id":"https:\/\/www.myway5.com\/#\/schema\/person\/b19267e8b106343431e163ec96950685"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.myway5.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"zh-Hans"},{"@type":["Person","Organization"],"@id":"https:\/\/www.myway5.com\/#\/schema\/person\/b19267e8b106343431e163ec96950685","name":"jiangpengfei","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/www.myway5.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/f8c7de757f6e0247412bcfd31b7c2271?s=96&d=monsterid&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f8c7de757f6e0247412bcfd31b7c2271?s=96&d=monsterid&r=g","caption":"jiangpengfei"},"logo":{"@id":"https:\/\/www.myway5.com\/#\/schema\/person\/image\/"},"url":"https:\/\/www.myway5.com\/index.php\/author\/joyme\/"}]}},"views":11663,"_links":{"self":[{"href":"https:\/\/www.myway5.com\/index.php\/wp-json\/wp\/v2\/posts\/582","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.myway5.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.myway5.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.myway5.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.myway5.com\/index.php\/wp-json\/wp\/v2\/comments?post=582"}],"version-history":[{"count":4,"href":"https:\/\/www.myway5.com\/index.php\/wp-json\/wp\/v2\/posts\/582\/revisions"}],"predecessor-version":[{"id":1619,"href":"https:\/\/www.myway5.com\/index.php\/wp-json\/wp\/v2\/posts\/582\/revisions\/1619"}],"wp:attachment":[{"href":"https:\/\/www.myway5.com\/index.php\/wp-json\/wp\/v2\/media?parent=582"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.myway5.com\/index.php\/wp-json\/wp\/v2\/categories?post=582"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.myway5.com\/index.php\/wp-json\/wp\/v2\/tags?post=582"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}