查询远程执行功能失败:连接超时

Failed to query remote execution capabilities: connection timed out

提问人:M80 提问时间:9/30/2021 最后编辑:Ken WhiteM80 更新时间:4/2/2022 访问量:1182

问:

我正在尝试在我们的 k8s 集群上使用 bazel buildfarm 内存工作者运行远程构建执行。

我已经按照 buildfarm 的架构要求设置了服务器 pod、worker pod 和 redis 集群,以及 k8s 服务和入口,以允许我远程发送构建。

但是,当我尝试执行它时,我得到了以下结果:

eito@fuji:~/MyRepo$ bazel --client_debug run //tools:ipython3 --config=rbe
[INFO 11:03:07.374 src/main/cpp/option_processor.cc:407] Looking for the following rc files: /etc/bazel.bazelrc,/home/eito/MyRepo/.bazelrc,/home/eito/.bazelrc
[INFO 11:03:07.374 src/main/cpp/rc_file.cc:56] Parsing the RcFile /home/eito/MyRepo/.bazelrc
[INFO 11:03:07.374 src/main/cpp/rc_file.cc:56] Parsing the RcFile user.bazelrc
[INFO 11:03:07.374 src/main/cpp/rc_file.cc:129] Skipped optional import of user.bazelrc, the specified rc file either does not exist or is not readable.
[INFO 11:03:07.374 src/main/cpp/rc_file.cc:56] Parsing the RcFile /home/eito/.bazelrc
[INFO 11:03:07.374 src/main/cpp/blaze.cc:1626] Debug logging requested, sending all client log statements to stderr
[INFO 11:03:07.374 src/main/cpp/blaze.cc:1509] Acquired the client lock, waited 0 milliseconds
[INFO 11:03:07.377 src/main/cpp/blaze.cc:1697] Trying to connect to server (timeout: 30 secs)...
[INFO 11:03:07.385 src/main/cpp/blaze.cc:1264] Connected (server pid=113490).
[INFO 11:03:07.385 src/main/cpp/blaze.cc:1974] Releasing client lock, let the server manage concurrent requests.
INFO: Invocation ID: c97091ec-e335-4882-8107-c9084d4453ff
ERROR: Failed to query remote execution capabilities: connection timed out: buildfarm.dev.azr.internal.mydomain.com/172.33.33.99:8980
[INFO 11:03:37.613 src/main/cpp/blaze.cc:2093] failure_detail: message: "Failed to query remote execution capabilities: connection timed out: buildfarm.dev.azr.internal.mydomain.com/172.33.33.99:8980"
remote_execution {
  code: CAPABILITIES_QUERY_FAILURE
}

我的 worker 部署和服务看起来像(服务器非常相似,只是挂载了不同的图像和不同的 configmap):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: aks-buildfarm-worker
  namespace: infrastructure--buildfarm
spec:
  replicas: 1
  selector:
    matchLabels:
      app: aks-buildfarm
  template:
    metadata:
      labels:
        app: aks-buildfarm
        role: app
    spec:
      containers:
      - name: buildfarm-worker
        image: mydomain.azurecr.io/buildfarm-memory-worker:v8
        volumeMounts:
          - mountPath: "/config"
            name: buildfarm-worker-config
        ports:
        - containerPort: 8980
          protocol: TCP
        resources:
          limits:
            memory: 256Mi
            cpu: "300m"
      volumes:
      - name: buildfarm-worker-config
        configMap:
          name: buildfarm-worker-config
---
apiVersion: v1
kind: Service
metadata:
  name: aks-buildfarm
  namespace: infrastructure--buildfarm
spec:
  type: ClusterIP
  ports:
    - protocol: TCP
      name: grpc
      port: 8980
      targetPort: 8980
  selector:
    app: aks-buildfarm


我主要使用以下配置作为 k8s 上的 configmaps:https://github.com/bazelbuild/bazel-buildfarm/blob/main/examples/shard-server.config.example https://github.com/bazelbuild/bazel-buildfarm/blob/main/examples/worker.config.example

唯一的区别是我在 worker 配置中将 all 指定为 ,因为它们位于同一个 k8s 集群中,并且可以通过它进行通信。localhost:8980"aks-buildfarm-server.infrastructure--buildfarm.svc.cluster.local"

我的入口如下所示:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  namespace: infrastructure--buildfarm
  name: buildfarm-ingress
  annotations:
    kubernetes.io/ingress.class: nginx-internal
    nginx.ingress.kubernetes.io/rewrite-target: /$1
    nginx.ingress.kubernetes.io/use-regex: "true"
    cert-manager.io/cluster-issuer: selfsigned-cluster-issuer
spec:
  rules:
  - host: buildfarm.dev.azr.internal.mydomain.com
    http:
      paths:
      - backend:
          serviceName: aks-buildfarm
          servicePort: 8980
        path: /(.*)

我的文件如下所示:.bazelrc

build:rbe --remote_executor=grpcs://buildfarm.dev.azr.internal.mydomain.com:8980

kubernetes bazel 远程执行 buildfarm

评论


答:

1赞 Tom Zayouna 4/2/2022 #1

您需要从此处使用分片工作线程配置: https://github.com/bazelbuild/bazel-buildfarm/blob/main/examples/shard-worker.config.example 您还需要一个正在运行的 redis 实例或集群,因为服务器和工作线程之间的双向通信是通过 redis 触发的