提问人:M80 提问时间:9/30/2021 最后编辑:Ken WhiteM80 更新时间:4/2/2022 访问量:1182
查询远程执行功能失败:连接超时
Failed to query remote execution capabilities: connection timed out
问:
我正在尝试在我们的 k8s 集群上使用 bazel buildfarm 内存工作者运行远程构建执行。
我已经按照 buildfarm 的架构要求设置了服务器 pod、worker pod 和 redis 集群,以及 k8s 服务和入口,以允许我远程发送构建。
但是,当我尝试执行它时,我得到了以下结果:
eito@fuji:~/MyRepo$ bazel --client_debug run //tools:ipython3 --config=rbe
[INFO 11:03:07.374 src/main/cpp/option_processor.cc:407] Looking for the following rc files: /etc/bazel.bazelrc,/home/eito/MyRepo/.bazelrc,/home/eito/.bazelrc
[INFO 11:03:07.374 src/main/cpp/rc_file.cc:56] Parsing the RcFile /home/eito/MyRepo/.bazelrc
[INFO 11:03:07.374 src/main/cpp/rc_file.cc:56] Parsing the RcFile user.bazelrc
[INFO 11:03:07.374 src/main/cpp/rc_file.cc:129] Skipped optional import of user.bazelrc, the specified rc file either does not exist or is not readable.
[INFO 11:03:07.374 src/main/cpp/rc_file.cc:56] Parsing the RcFile /home/eito/.bazelrc
[INFO 11:03:07.374 src/main/cpp/blaze.cc:1626] Debug logging requested, sending all client log statements to stderr
[INFO 11:03:07.374 src/main/cpp/blaze.cc:1509] Acquired the client lock, waited 0 milliseconds
[INFO 11:03:07.377 src/main/cpp/blaze.cc:1697] Trying to connect to server (timeout: 30 secs)...
[INFO 11:03:07.385 src/main/cpp/blaze.cc:1264] Connected (server pid=113490).
[INFO 11:03:07.385 src/main/cpp/blaze.cc:1974] Releasing client lock, let the server manage concurrent requests.
INFO: Invocation ID: c97091ec-e335-4882-8107-c9084d4453ff
ERROR: Failed to query remote execution capabilities: connection timed out: buildfarm.dev.azr.internal.mydomain.com/172.33.33.99:8980
[INFO 11:03:37.613 src/main/cpp/blaze.cc:2093] failure_detail: message: "Failed to query remote execution capabilities: connection timed out: buildfarm.dev.azr.internal.mydomain.com/172.33.33.99:8980"
remote_execution {
code: CAPABILITIES_QUERY_FAILURE
}
我的 worker 部署和服务看起来像(服务器非常相似,只是挂载了不同的图像和不同的 configmap):
apiVersion: apps/v1
kind: Deployment
metadata:
name: aks-buildfarm-worker
namespace: infrastructure--buildfarm
spec:
replicas: 1
selector:
matchLabels:
app: aks-buildfarm
template:
metadata:
labels:
app: aks-buildfarm
role: app
spec:
containers:
- name: buildfarm-worker
image: mydomain.azurecr.io/buildfarm-memory-worker:v8
volumeMounts:
- mountPath: "/config"
name: buildfarm-worker-config
ports:
- containerPort: 8980
protocol: TCP
resources:
limits:
memory: 256Mi
cpu: "300m"
volumes:
- name: buildfarm-worker-config
configMap:
name: buildfarm-worker-config
---
apiVersion: v1
kind: Service
metadata:
name: aks-buildfarm
namespace: infrastructure--buildfarm
spec:
type: ClusterIP
ports:
- protocol: TCP
name: grpc
port: 8980
targetPort: 8980
selector:
app: aks-buildfarm
我主要使用以下配置作为 k8s 上的 configmaps:https://github.com/bazelbuild/bazel-buildfarm/blob/main/examples/shard-server.config.example https://github.com/bazelbuild/bazel-buildfarm/blob/main/examples/worker.config.example
唯一的区别是我在 worker 配置中将 all 指定为 ,因为它们位于同一个 k8s 集群中,并且可以通过它进行通信。localhost:8980
"aks-buildfarm-server.infrastructure--buildfarm.svc.cluster.local"
我的入口如下所示:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
namespace: infrastructure--buildfarm
name: buildfarm-ingress
annotations:
kubernetes.io/ingress.class: nginx-internal
nginx.ingress.kubernetes.io/rewrite-target: /$1
nginx.ingress.kubernetes.io/use-regex: "true"
cert-manager.io/cluster-issuer: selfsigned-cluster-issuer
spec:
rules:
- host: buildfarm.dev.azr.internal.mydomain.com
http:
paths:
- backend:
serviceName: aks-buildfarm
servicePort: 8980
path: /(.*)
我的文件如下所示:.bazelrc
build:rbe --remote_executor=grpcs://buildfarm.dev.azr.internal.mydomain.com:8980
答:
您需要从此处使用分片工作线程配置: https://github.com/bazelbuild/bazel-buildfarm/blob/main/examples/shard-worker.config.example 您还需要一个正在运行的 redis 实例或集群,因为服务器和工作线程之间的双向通信是通过 redis 触发的
评论