AmazonEC2ContainerServiceforEC2Role 不会通过 Terraform 附加到角色

AmazonEC2ContainerServiceforEC2Role does not get attached to role via Terraform

提问人:shybovycha 提问时间:11/16/2023 更新时间:11/17/2023 访问量:48

问:

我正在尝试启动一个可以访问私有 ECR 的 EC2 实例。

我有一个 AWS 用户(角色),该用户具有启动实例和执行设置基础设施的大部分操作所需的权限:

  • 所附政策AmazonEC2ContainerServiceforEC2Role
  • 允许的 和 操作ecs:*ecr:*
  • 必要的操作(如、等;见下文)iam:...GetPolicyAttachPolicy

但是每次创建此基础结构时,我的服务都无法启动。

通过 SSH 登录 EC2 计算机时,我可以看到代理没有启动:ecs

$ sudo systemctl status ecs

● ecs.service - Amazon Elastic Container Service - container agent
     Loaded: loaded (/usr/lib/systemd/system/ecs.service; enabled; preset: disabled)
     Active: active (running) since Thu 2023-11-16 05:59:28 UTC; 20min ago

...

[WARN] ECS Agent failed to start, retrying in 15.18926843s

显示更多信息:/var/logs/ecs/ecs-agent.log

level=error time=2023-11-16T05:59:29Z msg="Error getting ECS instance credentials from default chain: NoCredentialProviders: no valid providers in chain. Deprecated.\n\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors"

显然,EC2 实例上没有运行 Docker 容器。

在调整和运行Amazon的自我诊断自动化(开箱即用,因此必须修复脚本以在EC2实例本身上提供合理的异常消息和AWS凭证)后,我看到以下输出:

# download the `attachment.zip`; install `pip` and `unzip`; `pip install boto3`; put AWS credentials in `~/.aws/credentials`; edit `lambda_function.py` to provide EC2 instance_id and AWS region

$ python3 lambda_function.py

There is no container instance profile or correct trust relationship for the EC2Instance('i-02fcb0ded7fd5c8f2').
Before you launch container instances and register them to a cluster, you must create an IAM role for your container instances to use.
Amazon ECS provides the AmazonEC2ContainerServiceforEC2Role managed IAM policy which contains the permissions needed to use the full Amazon ECS feature set.
This managed policy can be attached to an IAM role and associated with your container instances.
See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/instance_IAM_role.html

与我一起运行的用户具有预配基础结构的所有权限。terraform apply

我尝试与之运行 EC2 实例的用户(和 ECS 服务)通过 AWS 控制台(由管理员)附加了策略。它还具有所有 ECS 权限(如在集群中注册等,通过 和AmazonEC2ContainerServiceforEC2Roleecs:*ec2:*)

我尝试将策略附加到角色并创建 IAM 配置文件并将其提供给 ,但无济于事。aws_instance

ECS 集群页面上的选项卡不显示任何 EC2 实例。Infrastructure

我做错了什么?为什么策略没有附加,或者更确切地说,为什么 EC2 实例甚至无法在集群中注册自身?

注意:我没有走上这条路的原因是我不想弄乱子网和 CIDR - 我想让这个基础设施尽可能简单。FARGATE

我的 Terraform 脚本如下所示:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.16"
    }
  }

  required_version = ">= 1.2.0"
}

provider "aws" {
  region = "us-east-1"
}

# ----

data "aws_iam_role" "my_ecs_role" {
  name = "ecsTaskExecutionRole"
}

data "aws_iam_policy" "my_ecs_service_policy" {
  arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"
}

resource "aws_iam_instance_profile" "my_ecs_profile" {
  name = "my_ecs_profile"
  role = data.aws_iam_role.my_ecs_role.name
}

resource "aws_iam_role_policy_attachment" "ec2_instance" {
  role       = data.aws_iam_role.my_ecs_role.name
  policy_arn = data.aws_iam_policy.my_ecs_service_policy.arn
}

# ----

resource "aws_security_group" "allow_all_sg" {
  name   = "allow-all-sg"
  vpc_id = data.aws_vpc.my_vpc.id

  ingress {
    cidr_blocks = ["0.0.0.0/0"]
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
  }

  ingress {
    cidr_blocks = ["0.0.0.0/0"]
    from_port   = 8080
    to_port     = 8080
    protocol    = "tcp"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

variable "public_ssh_key" {
  type = string
}

resource "aws_key_pair" "dev_ssh" {
  key_name   = "dev_ssh"
  public_key = var.public_ssh_key
}

# ----

data "aws_vpc" "my_vpc" {
  default = true
}

# ----

data "aws_ecr_repository" "my_repository" {
  name = "relational-my-internal"
}

# ----

resource "aws_ecs_cluster" "my_cluster" {
  name = "my-cluster"
}

resource "aws_ecs_task_definition" "deploy_my_task" {
  family = "deploy_my_task"
  container_definitions = jsonencode([
    {
      name      = "deploy_my_task",
      image     = "${data.aws_ecr_repository.my_repository.repository_url}:latest",
      essential = true,
      memory    = 1024,
      portMappings = [
        {
          containerPort = 8080,
          hostPort      = 8080
        }
      ]
    }
  ])

  requires_compatibilities = ["EC2"]
  network_mode             = "host"
  execution_role_arn       = data.aws_iam_role.my_ecs_role.arn
  memory                   = 1024
  cpu                      = 512
}

resource "aws_ecs_service" "my_service" {
  name             = "my_service"
  cluster          = aws_ecs_cluster.my_cluster.id
  task_definition  = aws_ecs_task_definition.deploy_my_task.arn
  desired_count    = 1
  launch_type      = "EC2"
}

# ----

resource "aws_instance" "my" {
  ami               = "ami-0906a0047c1ebc062"
  instance_type     = "m6in.large"
  availability_zone = "us-east-1b" # m6in.large is not available in all AZs

  key_name               = aws_key_pair.dev_ssh.key_name
  vpc_security_group_ids = [aws_security_group.allow_all_sg.id]

  associate_public_ip_address = true

  monitoring              = true
  disable_api_termination = false
  ebs_optimized           = true

  iam_instance_profile = aws_iam_instance_profile.my_ecs_profile.name

  user_data = <<USER_DATA
#!/bin/bash
sudo echo 'ECS_CLUSTER=${aws_ecs_cluster.my_cluster.name}' >> /etc/ecs/ecs.config
  USER_DATA
}
亚马逊网络服务 Amazon-EC2 Terraform

评论

0赞 Arpit Jain 11/16/2023
EC2 实例应具有附加了策略的 IAM 角色。AmazonEC2ContainerServiceforEC2Role
0赞 Marko E 11/16/2023
为什么要将数据源用于 IAM 角色?它是否已经存在,或者你想创建它?
0赞 shybovycha 11/16/2023
我的组织中@MarkoE用户角色由管理员管理,因此最好已经存在一个
1赞 Marko E 11/16/2023
你知道它存在与否吗?如果没有,则管理员应创建它。否则,数据源将不返回任何内容,这将解释您看到的错误。
1赞 Mark B 11/16/2023
@MarkoE,如果该角色不存在,则 Terraform 将在运行时引发错误。Terraform 数据源从不“不返回任何内容”,它们要么返回事物,要么抛出错误。terraform apply
1赞 Mark B 11/16/2023
我不明白这一点:“我没有走上 FARGATE 的道路的原因是我不想弄乱子网和 CIDR——我想让这个基础设施尽可能简单。您的 EC2 实例具有与 Fargate 任务相同的“子网和 CIDR”。EC2 和 Fargate 具有相同的 VPC 要求。如果有的话,尝试管理您自己的 EC2 实例并将其附加到 ECS 集群比使用 Fargate 要复杂得多。

答:

0赞 shybovycha 11/17/2023 #1

注意:不是真正回答最初的问题,而是无论如何都要解决问题。


我没有使用启动类型,而是切换到(正如 MarkB 在评论中指出的那样)。以前,子网的问题是由于我尝试定义新子网引起的,而我本可以依赖默认子网EC2FARGATE

resource "aws_default_subnet" "my_default_subnet" {
  availability_zone = "us-east-1a"
}

resource "aws_ecs_task_definition" "deploy_my_task" {
  family = "deploy-my-task"
  container_definitions = jsonencode([
    {
      name      = "my-container",
      image     = "${data.aws_ecr_repository.my_repository.repository_url}:${var.my_version}",
      essential = true,
      memory    = 1024,
      portMappings = [
        {
          containerPort = 8080,
          hostPort      = 8080
        }
      ]
    }
  ])

  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  execution_role_arn       = data.aws_iam_role.my_ecs_role.arn
  memory                   = 1024
  cpu                      = 512
}

resource "aws_ecs_service" "my_service" {
  name            = "my-service"
  cluster         = aws_ecs_cluster.my_cluster.id
  task_definition = aws_ecs_task_definition.deploy_my_task.arn
  desired_count   = 1
  depends_on      = [aws_ecs_service.mongodb]

  launch_type      = "FARGATE"
  platform_version = "1.4.0"

  network_configuration {
    subnets          = [aws_default_subnet.my_default_subnet.id]
    security_groups  = [aws_security_group.my_sg.id]
    assign_public_ip = true
  }
}

这样一来,就不需要单独的 EC2 实例预置,因此基础架构计划确实变得更加简单。

但这样做的代价是需要允许执行任务的命令并操作 AWS CLI 以操作实例(或者更确切地说是实例上的 Docker 容器)。

注意:从公共注册表 (DockerHub) 拉取 Docker 映像需要公共 IP。这可能还需要在安全组中打开端口。443