k8s中哪个组件操刀创建了容器的namespace

intro

接触过docker的人都都会听说过,docker的是建立在namespace、cgroup、unionfs三个基础功能之上的。

namespace是linux内核中支持容器化的一个重要组成部分,也就是说:创建容器的时候必定会调用系统调用(syscall)来创建新的namespace。

因为docker、k8s等引入的各种标准和组件,这个最基本的功能可能被深埋在了海量的代码中。尽管如此,从最原始的功能来看,肯定需要有一个具体的模块来真正执行这个看起来琐碎但是由非常重要的功能。

那么,具体是由哪个组件直接创建了容器需要的namespace呢?

OCI Config

在open container initiative的运行时(runtime-spec)中,有linuxnamespace相关的专门说明:明确说明了不同配置对应的意义;反过来说,如果需要实现某个功能,就需要按照特定格式配置。

总结起来:

  1. 没有type

此时继承调用者namespace(If a namespace type is not specified in the namespaces array, the container MUST inherit the runtime namespace of that type. )。

  1. path为空

创建新的namespace(If path is not specified, the runtime MUST create a new container namespace of type type.)。

  1. path非空

新进程加入该path指定namespace(The runtime MUST place the container process in the namespace associated with that path. )。

path (string, OPTIONAL) - namespace file. This value MUST be an absolute path in the runtime mount namespace. The runtime MUST place the container process in the namespace associated with that path. The runtime MUST generate an error if path is not associated with a namespace of type type.

If path is not specified, the runtime MUST create a new container namespace of type type.

If a namespace type is not specified in the namespaces array, the container MUST inherit the runtime namespace of that type. If a namespaces field contains duplicated namespaces with same type, the runtime MUST generate an error.

Example
"namespaces": [
    {
        "type": "pid",
        "path": "/proc/1234/ns/pid"
    },
    {
        "type": "network",
        "path": "/var/run/netns/neta"
    },
    {
        "type": "mount"
    },
    {
        "type": "ipc"
    },
    {
        "type": "uts"
    },
    {
        "type": "user"
    },
    {
        "type": "cgroup"
    },
    {
        "type": "time"
    }
]

cri-api的proto描述中,区分了podsandbox(pod-level sandbox)和container(new container in specified PodSandbox)级别的

// Runtime service defines the public APIs for remote container runtimes
service RuntimeService {
///...
    // RunPodSandbox creates and starts a pod-level sandbox. Runtimes must ensure
    // the sandbox is in the ready state on success.
    rpc RunPodSandbox(RunPodSandboxRequest) returns (RunPodSandboxResponse) {}
///...
    // CreateContainer creates a new container in specified PodSandbox
    rpc CreateContainer(CreateContainerRequest) returns (CreateContainerResponse) {}
    

创建pod

创建POD时,生成sandboxContainerSpec的主体流程在sandboxContainerSpec函数中。其中有一个细节:如果指定的namespace是NODE级别(nsOptions.GetNetwork() == runtime.NamespaceMode_NODE),会清空namespace中对应的键值(customopts.WithoutNamespace(runtimespec.NetworkNamespace)),对应的行为是"If a namespace type is not specified in the namespaces array, the container MUST inherit the runtime namespace of that type"。假设调用者所在的namespace是NODE级别,那么新创建的容器也是NODE级别,从而对应OCI Config的spec说明。

///@file: containerd\internal\cri\server\podsandbox\sandbox_run_linux.go
func (c *Controller) sandboxContainerSpec(id string, config *runtime.PodSandboxConfig,
	imageConfig *imagespec.ImageConfig, nsPath string, runtimePodAnnotations []string) (_ *runtimespec.Spec, retErr error) {
	// Creates a spec Generator with the default spec.
	// TODO(random-liu): [P1] Compare the default settings with docker and containerd default.
///...
	// Set namespace options.
	var (
		securityContext = config.GetLinux().GetSecurityContext()
		nsOptions       = securityContext.GetNamespaceOptions()
	)
	if nsOptions.GetNetwork() == runtime.NamespaceMode_NODE {
		specOpts = append(specOpts, customopts.WithoutNamespace(runtimespec.NetworkNamespace))
		specOpts = append(specOpts, customopts.WithoutNamespace(runtimespec.UTSNamespace))
	} else {
		specOpts = append(specOpts, oci.WithLinuxNamespace(
			runtimespec.LinuxNamespace{
				Type: runtimespec.NetworkNamespace,
				Path: nsPath,
			}))
	}
	if nsOptions.GetPid() == runtime.NamespaceMode_NODE {
		specOpts = append(specOpts, customopts.WithoutNamespace(runtimespec.PIDNamespace))
	}
	if nsOptions.GetIpc() == runtime.NamespaceMode_NODE {
		specOpts = append(specOpts, customopts.WithoutNamespace(runtimespec.IPCNamespace))
	}
///...
	return c.runtimeSpec(id, "", specOpts...)
}

sandboxContainerSpec==>>runtimeSpec

///@file: containerd\internal\cri\server\podsandbox\helpers.go
// runtimeSpec returns a default runtime spec used in cri-containerd.
func (c *Controller) runtimeSpec(id string, baseSpecFile string, opts ...oci.SpecOpts) (*runtimespec.Spec, error) {
	// GenerateSpec needs namespace.
	ctx := ctrdutil.NamespacedContext()
	container := &containers.Container{ID: id}
///...
	spec, err := oci.GenerateSpec(ctx, nil, container, opts...)
	if err != nil {
		return nil, fmt.Errorf("failed to generate spec: %w", err)
	}

	return spec, nil
}

runtimeSpec==>>GenerateSpec

// GenerateSpec will generate a default spec from the provided image
// for use as a containerd container
func GenerateSpec(ctx context.Context, client Client, c *containers.Container, opts ...SpecOpts) (*Spec, error) {
	return GenerateSpecWithPlatform(ctx, client, platforms.DefaultString(), c, opts...)
}
// GenerateSpecWithPlatform will generate a default spec from the provided image
// for use as a containerd container in the platform requested.
func GenerateSpecWithPlatform(ctx context.Context, client Client, platform string, c *containers.Container, opts ...SpecOpts) (*Spec, error) {
	var s Spec
	if err := generateDefaultSpecWithPlatform(ctx, platform, c.ID, &s); err != nil {
		return nil, err
	}

	return &s, ApplyOpts(ctx, client, c, &s, opts...)
}

创建的是只有Type名称但是没有path的配置,也就是对应”If path is not specified, the runtime MUST create a new container namespace of type type.“的情况。
runtimeSpec>>GenerateSpec>>generateDefaultSpecWithPlatform==>>defaultUnixNamespaces

///@file: containerd\pkg\oci\spec.go
func defaultUnixNamespaces() []specs.LinuxNamespace {
	return []specs.LinuxNamespace{
		{
			Type: specs.PIDNamespace,
		},
		{
			Type: specs.IPCNamespace,
		},
		{
			Type: specs.UTSNamespace,
		},
		{
			Type: specs.MountNamespace,
		},
		{
			Type: specs.NetworkNamespace,
		},
	}
}

创建container

可以看到,创建container的时候从指定的PODID中获得进程pid所在namespace(c.sandboxService.SandboxStatus)填充到新创建的Config文件中。

///@file: containerd\internal\cri\server\container_create.go
func (c *criService) buildLinuxSpec(
	id string,
	sandboxID string,
	sandboxPid uint32,
	containerName string,
	imageName string,
	config *runtime.ContainerConfig,
	sandboxConfig *runtime.PodSandboxConfig,
	imageConfig *imagespec.ImageConfig,
	extraMounts []*runtime.Mount,
	ociRuntime criconfig.Runtime,
	runtimeHandler *runtime.RuntimeHandler,
) (_ []oci.SpecOpts, retErr error) {
///...
	specOpts = append(specOpts,
		customopts.WithOOMScoreAdj(config, c.config.RestrictOOMScoreAdj),
		customopts.WithPodNamespaces(securityContext, sandboxPid, targetPid, uids, gids),
		customopts.WithSupplementalGroups(supplementalGroups),
	)
///...
///@file: containerd\internal\cri\server\container_create.go
// CreateContainer creates a new container in the given PodSandbox.
func (c *criService) CreateContainer(ctx context.Context, r *runtime.CreateContainerRequest) (_ *runtime.CreateContainerResponse, retErr error) {
	config := r.GetConfig()
	log.G(ctx).Debugf("Container config %+v", config)
	sandboxConfig := r.GetSandboxConfig()
	sandbox, err := c.sandboxStore.Get(r.GetPodSandboxId())
	if err != nil {
		return nil, fmt.Errorf("failed to find sandbox id %q: %w", r.GetPodSandboxId(), err)
	}

	cstatus, err := c.sandboxService.SandboxStatus(ctx, sandbox.Sandboxer, sandbox.ID, false)
	if err != nil {
		return nil, fmt.Errorf("failed to get controller status: %w", err)
	}

	var (
		sandboxID  = cstatus.SandboxID
		sandboxPid = cstatus.Pid
	)
///...

	spec, err := c.buildContainerSpec(
		platform,
		id,
		sandboxID,
		sandboxPid,
		sandbox.NetNSPath,
		containerName,
		containerdImage.Name(),
		config,
		sandboxConfig,
		&image.ImageSpec.Config,
		volumeMounts,
		ociRuntime,
		runtimeHandler,
	)
}

在config.json中指定使用和POD相同的网络(GetIPCNamespace(sandboxPid))、IPC(Path: GetIPCNamespace(sandboxPid))、UTS(GetUTSNamespace(sandboxPid))。

///@file:containerd\internal\cri\opts\spec_opts.go
// WithPodNamespaces sets the pod namespaces for the container
func WithPodNamespaces(config *runtime.LinuxContainerSecurityContext, sandboxPid uint32, targetPid uint32, uids, gids []runtimespec.LinuxIDMapping) oci.SpecOpts {
	namespaces := config.GetNamespaceOptions()

	opts := []oci.SpecOpts{
		oci.WithLinuxNamespace(runtimespec.LinuxNamespace{Type: runtimespec.NetworkNamespace, Path: GetNetworkNamespace(sandboxPid)}),
		oci.WithLinuxNamespace(runtimespec.LinuxNamespace{Type: runtimespec.IPCNamespace, Path: GetIPCNamespace(sandboxPid)}),
		oci.WithLinuxNamespace(runtimespec.LinuxNamespace{Type: runtimespec.UTSNamespace, Path: GetUTSNamespace(sandboxPid)}),
	}
	if namespaces.GetPid() != runtime.NamespaceMode_CONTAINER {
		opts = append(opts, oci.WithLinuxNamespace(runtimespec.LinuxNamespace{Type: runtimespec.PIDNamespace, Path: GetPIDNamespace(targetPid)}))
	}

	if namespaces.GetUsernsOptions() != nil {
		switch namespaces.GetUsernsOptions().GetMode() {
		case runtime.NamespaceMode_NODE:
			// Nothing to do. Not adding userns field uses the node userns.
		case runtime.NamespaceMode_POD:
			opts = append(opts, oci.WithLinuxNamespace(runtimespec.LinuxNamespace{Type: runtimespec.UserNamespace, Path: GetUserNamespace(sandboxPid)}))
			opts = append(opts, oci.WithUserNamespace(uids, gids))
		}
	}

	return oci.Compose(opts...)
}

host/resolv

在准备文件系统的时候,还可以额外增加一些用于解析(/etc/resolv.conf)和主机名的配置文件(/etc/hostname)。其中的/etc/resolv.conf是k8s中实现Service域名解析的基础。

///@file: containerd\internal\cri\server\podsandbox\sandbox_run_linux.go
// setupSandboxFiles sets up necessary sandbox files including /dev/shm, /etc/hosts,
// /etc/resolv.conf and /etc/hostname.
func (c *Controller) setupSandboxFiles(id string, config *runtime.PodSandboxConfig) error {
	sandboxEtcHostname := c.getSandboxHostname(id)
	hostname := config.GetHostname()
	if hostname == "" {
		var err error
		hostname, err = c.os.Hostname()
		if err != nil {
			return fmt.Errorf("failed to get hostname: %w", err)
		}
	}
	if err := c.os.WriteFile(sandboxEtcHostname, []byte(hostname+"\n"), 0644); err != nil {
		return fmt.Errorf("failed to write hostname to %q: %w", sandboxEtcHostname, err)
	}

	// TODO(random-liu): Consider whether we should maintain /etc/hosts and /etc/resolv.conf in kubelet.
	sandboxEtcHosts := c.getSandboxHosts(id)
	if err := c.os.CopyFile(etcHosts, sandboxEtcHosts, 0644); err != nil {
		return fmt.Errorf("failed to generate sandbox hosts file %q: %w", sandboxEtcHosts, err)
	}

	// Set DNS options. Maintain a resolv.conf for the sandbox.
	resolvPath := c.getResolvPath(id)

	if dnsConfig := config.GetDnsConfig(); dnsConfig != nil {
		resolvContent, err := parseDNSOptions(dnsConfig.Servers, dnsConfig.Searches, dnsConfig.Options)
		if err != nil {
			return fmt.Errorf("failed to parse sandbox DNSConfig %+v: %w", dnsConfig, err)
		}
		if err := c.os.WriteFile(resolvPath, []byte(resolvContent), 0644); err != nil {
			return fmt.Errorf("failed to write resolv content to %q: %w", resolvPath, err)
		}
	} else {
		// The DnsConfig was nil - we interpret that to mean "use the global
		// default", which is dubious but backwards-compatible.
		if err := c.os.CopyFile(resolvConfPath, resolvPath, 0644); err != nil {
			return fmt.Errorf("failed to copy host's resolv.conf to %q: %w", resolvPath, err)
		}
	}

	// Setup sandbox /dev/shm.
	if config.GetLinux().GetSecurityContext().GetNamespaceOptions().GetIpc() == runtime.NamespaceMode_NODE {
		if _, err := c.os.Stat(devShm); err != nil {
			return fmt.Errorf("host %q is not available for host ipc: %w", devShm, err)
		}
	} else {
		sandboxDevShm := c.getSandboxDevShm(id)
		if err := c.os.MkdirAll(sandboxDevShm, 0700); err != nil {
			return fmt.Errorf("failed to create sandbox shm: %w", err)
		}
		shmproperty := fmt.Sprintf("mode=1777,size=%d", defaultShmSize)
		if err := c.os.Mount("shm", sandboxDevShm, "tmpfs", uintptr(unix.MS_NOEXEC|unix.MS_NOSUID|unix.MS_NODEV), shmproperty); err != nil {
			return fmt.Errorf("failed to mount sandbox shm: %w", err)
		}
	}

	return nil
}

runc

创建容器进程:

///@file: runc\libcontainer\container_linux.go
func (c *Container) newParentProcess(p *Process) (parentProcess, error) {
	comm, err := newProcessComm()
	if err != nil {
		return nil, err
	}
	///...
	if p.Init {
		// We only set up fifoFd if we're not doing a `runc exec`. The historic
		// reason for this is that previously we would pass a dirfd that allowed
		// for container rootfs escape (and not doing it in `runc exec` avoided
		// that problem), but we no longer do that. However, there's no need to do
		// this for `runc exec` so we just keep it this way to be safe.
		if err := c.includeExecFifo(cmd); err != nil {
			return nil, fmt.Errorf("unable to setup exec fifo: %w", err)
		}
		return c.newInitProcess(p, cmd, comm)
	}
	return c.newSetnsProcess(p, cmd, comm)
}

func (c *Container) newInitProcess(p *Process, cmd *exec.Cmd, comm *processComm) (*initProcess, error) {
	cmd.Env = append(cmd.Env, "_LIBCONTAINER_INITTYPE="+string(initStandard))
	nsMaps := make(map[configs.NamespaceType]string)
	for _, ns := range c.config.Namespaces {
		if ns.Path != "" {
			nsMaps[ns.Type] = ns.Path
		}
	}
	data, err := c.bootstrapData(c.config.Namespaces.CloneFlags(), nsMaps)
	if err != nil {
		return nil, err
	}

	init := &initProcess{
		cmd:             cmd,
		comm:            comm,
		manager:         c.cgroupManager,
		intelRdtManager: c.intelRdtManager,
		config:          c.newInitConfig(p),
		container:       c,
		process:         p,
		bootstrapData:   data,
	}
	c.initProcess = init
	return init, nil
}


func (c *Container) newInitProcess(p *Process, cmd *exec.Cmd, comm *processComm) (*initProcess, error) {
	cmd.Env = append(cmd.Env, "_LIBCONTAINER_INITTYPE="+string(initStandard))
	nsMaps := make(map[configs.NamespaceType]string)
	for _, ns := range c.config.Namespaces {
		if ns.Path != "" {
			nsMaps[ns.Type] = ns.Path
		}
	}
	data, err := c.bootstrapData(c.config.Namespaces.CloneFlags(), nsMaps)
	if err != nil {
		return nil, err
	}

	init := &initProcess{
		cmd:             cmd,
		comm:            comm,
		manager:         c.cgroupManager,
		intelRdtManager: c.intelRdtManager,
		config:          c.newInitConfig(p),
		container:       c,
		process:         p,
		bootstrapData:   data,
	}
	c.initProcess = init
	return init, nil
}

runc创建进程时,将配置转换clone系统调用可以识别的标志位。

///@file: runc\libcontainer\configs\namespaces_syscall.go
var namespaceInfo = map[NamespaceType]int{
	NEWNET:    unix.CLONE_NEWNET,
	NEWNS:     unix.CLONE_NEWNS,
	NEWUSER:   unix.CLONE_NEWUSER,
	NEWIPC:    unix.CLONE_NEWIPC,
	NEWUTS:    unix.CLONE_NEWUTS,
	NEWPID:    unix.CLONE_NEWPID,
	NEWCGROUP: unix.CLONE_NEWCGROUP,
	NEWTIME:   unix.CLONE_NEWTIME,
}

// CloneFlags parses the container's Namespaces options to set the correct
// flags on clone, unshare. This function returns flags only for new namespaces.
func (n *Namespaces) CloneFlags() uintptr {
	var flag int
	for _, v := range *n {
		if v.Path != "" {
			continue
		}
		flag |= namespaceInfo[v.Type]
	}
	return uintptr(flag)
}

下面是join指定path的流程(newSetnsProcess)。

///@file: runc\libcontainer\nsenter\nsexec.c
void join_namespaces(char *nslist)
{
///...
	/*
	 * The ordering in which we join namespaces is important. We should
	 * always join the user namespace *first*. This is all guaranteed
	 * from the container_linux.go side of this, so we're just going to
	 * follow the order given to us.
	 */

	for (i = 0; i < num; i++) {
		struct namespace_t *ns = &namespaces[i];
		int flag = nsflag(ns->type);

		write_log(DEBUG, "setns(%#x) into %s namespace (with path %s)", flag, ns->type, ns->path);
		if (setns(ns->fd, flag) < 0)
			bail("failed to setns into %s namespace", ns->type);

		close(ns->fd);
	}

	free(namespaces);
}{

问题:containerd-shim是否一定和它派生的POD在同一个namespace

根据前面的信息,稍加推理就可以知道:因为POD需要支持NODE级别namespace,NODE级别namespace是runc通过继承父进程(shim)来实现的,所以shim必须/一定跟NODE在同一个namespace;而派生POD大概率不会和NODE同一个namespace,所以派生POD通常不会和container在同一个namespace。

可以看到,systemd、containerd、shim都使用相同的cgroup和namespace。

tsecer@harry: ps aux | grep "containerd\|shim"
root        1148  2.9  2.9 1938484 58984 ?       Ssl  11:56   0:16 /usr/local/bin/containerd
root        1432  1.3  4.4 2123084 87964 ?       Ssl  11:57   0:07 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.10
root        1493  0.0  0.4 1233248 9536 ?        Sl   11:57   0:00 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -id 15b8ac56216ddc99a0ac7dc517c3f8591b27cbaf5cb7599397a95a2b9befc225 -address /run/containerd/containerd.sock
root        1513  0.0  0.4 1233504 9716 ?        Sl   11:57   0:00 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -id f0ab11932060d1e0c17f2f390f3ecca5b7b6ebb5c70b3441dfafed3603435e49 -address /run/containerd/containerd.sock
laborant    4077  0.0  0.0   4092  1876 pts/0    S+   12:06   0:00 grep --color=auto containerd\|shim
tsecer@harry: sudo ls -l  /proc/1148/ns
total 0
lrwxrwxrwx 1 root root 0 May 28 12:02 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 root root 0 May 28 12:02 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 root root 0 May 28 12:02 mnt -> 'mnt:[4026531840]'
lrwxrwxrwx 1 root root 0 May 28 12:02 net -> 'net:[4026532056]'
lrwxrwxrwx 1 root root 0 May 28 11:56 pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 May 28 12:04 pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 May 28 12:02 time -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 May 28 12:04 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 May 28 12:02 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 May 28 12:02 uts -> 'uts:[4026531838]'
tsecer@harry: sudo ls -l  /proc/1493/ns
total 0
lrwxrwxrwx 1 root root 0 May 28 12:02 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 root root 0 May 28 12:02 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 root root 0 May 28 12:02 mnt -> 'mnt:[4026531840]'
lrwxrwxrwx 1 root root 0 May 28 12:02 net -> 'net:[4026532056]'
lrwxrwxrwx 1 root root 0 May 28 12:02 pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 May 28 12:04 pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 May 28 12:02 time -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 May 28 12:04 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 May 28 12:02 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 May 28 12:02 uts -> 'uts:[4026531838]'
tsecer@harry: sudo ls -l  /proc/1/ns
total 0
lrwxrwxrwx 1 root root 0 May 28 12:02 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 root root 0 May 28 12:02 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 root root 0 May 28 12:02 mnt -> 'mnt:[4026531840]'
lrwxrwxrwx 1 root root 0 May 28 12:02 net -> 'net:[4026532056]'
lrwxrwxrwx 1 root root 0 May 28 11:57 pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 May 28 12:07 pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 May 28 12:02 time -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 May 28 12:07 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 May 28 12:02 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 May 28 12:02 uts -> 'uts:[4026531838]'
tsecer@harry: 

而shim派生的container却和shim有不同的namespace(当然,同一个shim派生的container在同一个namespace)。也就是说,namespace是在shim和它派生container的时候“人猿相揖别”(而不是containerd派生shim的时候)。

  ├─containerd-shim,1513 -namespace k8s.io -id f0ab11932060d1e0c17f2f390f3ecca5b7b6ebb5c70b3441dfafed3603435e49 -address /run/containerd/containerd.sock
  │   ├─flanneld,1843 --ip-masq --kube-subnet-mgr
  │   │   ├─{flanneld},1856
  │   │   ├─{flanneld},1857
  │   │   ├─{flanneld},1858
  │   │   ├─{flanneld},1859
  │   │   ├─{flanneld},1860
  │   │   ├─{flanneld},1861
  │   │   ├─{flanneld},1862
  │   │   └─{flanneld},1904
  │   ├─pause,1554
  │   ├─{containerd-shim},1514
  │   ├─{containerd-shim},1515
  │   ├─{containerd-shim},1516
  │   ├─{containerd-shim},1517
  │   ├─{containerd-shim},1518
  │   ├─{containerd-shim},1519
  │   ├─{containerd-shim},1520
  │   ├─{containerd-shim},1521
  │   ├─{containerd-shim},1568
  │   ├─{containerd-shim},1569
  │   └─{containerd-shim},1770
tsecer@harry: sudo ls -l /proc/1513/ns/
total 0
lrwxrwxrwx 1 root root 0 May 28 12:02 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 root root 0 May 28 12:02 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 root root 0 May 28 12:02 mnt -> 'mnt:[4026531840]'
lrwxrwxrwx 1 root root 0 May 28 12:02 net -> 'net:[4026532056]'
lrwxrwxrwx 1 root root 0 May 28 12:02 pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 May 28 12:13 pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 May 28 12:02 time -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 May 28 12:13 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 May 28 12:02 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 May 28 12:02 uts -> 'uts:[4026531838]'
tsecer@harry: sudo ls -l /proc/1843/ns/
total 0
lrwxrwxrwx 1 root root 0 May 28 12:02 cgroup -> 'cgroup:[4026532209]'
lrwxrwxrwx 1 root root 0 May 28 12:02 ipc -> 'ipc:[4026532203]'
lrwxrwxrwx 1 root root 0 May 28 11:57 mnt -> 'mnt:[4026532207]'
lrwxrwxrwx 1 root root 0 May 28 12:02 net -> 'net:[4026532056]'
lrwxrwxrwx 1 root root 0 May 28 12:02 pid -> 'pid:[4026532208]'
lrwxrwxrwx 1 root root 0 May 28 12:13 pid_for_children -> 'pid:[4026532208]'
lrwxrwxrwx 1 root root 0 May 28 12:02 time -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 May 28 12:13 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 May 28 12:02 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 May 28 12:02 uts -> 'uts:[4026531838]'
tsecer@harry: sudo ls -l /proc/1554/ns/
total 0
lrwxrwxrwx 1 65535 65535 0 May 28 12:02 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 65535 65535 0 May 28 11:57 ipc -> 'ipc:[4026532203]'
lrwxrwxrwx 1 65535 65535 0 May 28 11:57 mnt -> 'mnt:[4026532202]'
lrwxrwxrwx 1 65535 65535 0 May 28 11:57 net -> 'net:[4026532056]'
lrwxrwxrwx 1 65535 65535 0 May 28 12:02 pid -> 'pid:[4026532204]'
lrwxrwxrwx 1 65535 65535 0 May 28 12:13 pid_for_children -> 'pid:[4026532204]'
lrwxrwxrwx 1 65535 65535 0 May 28 12:02 time -> 'time:[4026531834]'
lrwxrwxrwx 1 65535 65535 0 May 28 12:13 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 65535 65535 0 May 28 12:02 user -> 'user:[4026531837]'
lrwxrwxrwx 1 65535 65535 0 May 28 11:57 uts -> 'uts:[4026531838]'
tsecer@harry: 

outro

从实现上看,pod和container底层对namespace的使用差别不大:只是pod创建的时候使用的是新创建的namespace(If path is not specified, the runtime MUST create a new container namespace of type type.),而container在创建的时候需要在配置中声明加入pod所在的namespace(If a namespace type is not specified in the namespaces array, the container MUST inherit the runtime namespace of that type.)。

POD更多的是一个因为"调度"而引入的中间概念,只是为了更方便的将不同的container聚合/限制到特定node,从而实现部署上类似于事务的功能(要么都成功、要么都失败)。在namespace这个层面,其实操作相对比较简单。这也符合系统分层设计的模式:底层基础功能简单可扩展,可变化逻辑封装在业务层。

posted on 2025-05-28 21:30  tsecer  阅读(24)  评论(0)    收藏  举报

导航