Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Infrastructure Repository Overview

Core AWS Infrastructure as Code

The infrastructure repository contains Path2Response’s AWS Cloud Development Kit (CDK) infrastructure code for managing core AWS resources across multiple environments. It provides centralized configuration for VPCs, EMR security, DNS architecture, and developer instance provisioning.

Purpose

The infrastructure repository serves as the foundation for P2R’s cloud infrastructure:

  • VPC & Network Configuration - Centralized VPC definitions and subnet mappings across all AWS accounts
  • EMR Security - Security configurations enforcing IMDSv2 for EMR clusters
  • DNS Management - Public and private DNS zone architecture using Route53
  • Developer Instances - Automated provisioning of EC2 instances for data scientists and developers
  • Global Constants - Shared account IDs, region mappings, EFS volume configurations, and SSH key pairs

Target Users: Infrastructure Engineers, DevOps, Data Scientists, Developers


Architecture

Repository Structure

/infrastructure/
├── package.json             # Root package configuration (v335.0.0-SNAPSHOT)
├── .gitignore               # Git ignore patterns
├── README.md                # Repository documentation
└── cdk/                     # AWS CDK infrastructure stacks
    ├── dns/                 # DNS zone management
    │   ├── public-zones/    # Public Route53 hosted zones
    │   ├── private-zones/   # Private Route53 hosted zones
    │   ├── resolver-rules/  # DNS resolver rules
    │   ├── shared-resources/# Common DNS components
    │   └── docs/            # DNS architecture documentation
    ├── emr/                 # EMR security configuration stack
    │   ├── bin/             # CDK app entry point
    │   ├── lib/             # Stack implementation
    │   └── test/            # Unit tests
    ├── global/              # Shared configuration namespace
    │   ├── account.ts       # AWS account definitions
    │   ├── vpc.ts           # VPC and subnet mappings
    │   ├── efs.ts           # EFS volume configurations
    │   └── index.ts         # Module exports
    └── shared-services/     # Cross-cutting infrastructure
        └── dev-instances/   # Developer instance provisioning
            ├── bin/         # CDK app entry point
            ├── lib/         # Stack and constructs
            │   ├── launch-templates/  # EC2 launch template generation
            │   ├── home-dirs/         # EFS home directory setup
            │   ├── network/           # Network constructs
            │   ├── instance-management/ # Instance lifecycle
            │   └── utils/             # Utility functions
            ├── docs/        # Architecture documentation
            └── scripts/     # Deployment scripts

Technology Stack

LayerTechnologies
IaC FrameworkAWS CDK 2.233 (TypeScript)
LanguageTypeScript 5.9
RuntimeNode.js 20, 22, or 24
TestingJest
Compressionlzma-native (for userdata scripts)
TemplatingHandlebars

AWS Account Structure

The infrastructure supports a multi-account AWS Organization with Control Tower:

EnvironmentAccount IDProfilePurpose
p2r (Root)448838825215p2r_rootPath2Response main account, DNS root
prd531556151531p2r_prdProduction workloads
rc881797796941p2r_rcRelease Candidate testing
stg135821922267p2r_stgStaging environment
dev190585684037p2r_devDevelopment environment

All accounts operate in us-east-1 region.


Core Components

1. Global Configuration (/cdk/global/)

Centralized TypeScript modules providing shared constants and lookup functions used across all CDK stacks.

Account Configuration (account.ts):

  • AWS account IDs for each environment
  • Region mappings
  • getCdkEnvironment() helper for stack deployment

VPC Configuration (vpc.ts):

  • Complete VPC and subnet definitions by environment
  • Three VPC purposes: ControlTower, General, Emr
  • Functions: findVpcIdByPurpose(), findVpcIdByName(), getKeyName()
  • Region abbreviation mappings (e.g., us-east-1 -> use1)

EFS Configuration (efs.ts):

  • Data volume IDs by environment (General, Ingest, IngestPrd)
  • Home volume IDs by architecture (X86, ARM)
  • Security group mappings per VPC
  • Functions: lookupEfsDataVolumeId(), lookupEfsHomeVolumeId()

Example Usage:

import { accounts, getCdkEnvironment } from '../global/account';
import { findVpcIdByPurpose, VpcPurpose } from '../global/vpc';
import { lookupEfsHomeVolumeId } from '../global/efs';

const vpcId = findVpcIdByPurpose('dev', 'us-east-1', VpcPurpose.General);
const efsId = lookupEfsHomeVolumeId('p2r', 'us-east-1', 'X86');

2. EMR Stack (/cdk/emr/)

Deploys EMR security configuration enforcing IMDSv2 (Instance Metadata Service v2) across all EMR clusters.

Stack: EmrStack

Key Configuration:

new emr.CfnSecurityConfiguration(this, 'EmrSecurityConfiguration', {
  name: 'p2r-emr-security-config',
  securityConfiguration: {
    "InstanceMetadataServiceConfiguration": {
      "MinimumInstanceMetadataServiceVersion": 2,
      "HttpPutResponseHopLimit": 1
    }
  }
});

Deployment:

cd cdk/emr
npm install
cdk deploy --profile p2r_dev  # Deploy to development
cdk deploy --profile p2r_prd  # Deploy to production

Purpose: Security hardening - prevents SSRF attacks by requiring IMDSv2 tokens for metadata access.


3. DNS Infrastructure (/cdk/dns/)

Manages both public and private DNS using AWS Route53, with a mirrored domain structure across environments.

DNS Architecture

Public DNS Hierarchy:

path2response.com (Root - P2R Account)
├── dev.path2response.com (Development Account)
├── stg.path2response.com (Staging Account)
├── rc.path2response.com (RC Account)
├── prd.path2response.com (Production Account)
└── common.path2response.com (P2R Account - shared services)

Private DNS Hierarchy:

path2response.internal (Each workload account)
├── dev.path2response.internal
├── stg.path2response.internal
├── rc.path2response.internal
└── prd.path2response.internal

Key Features

  • Decentralized Management: Each AWS account manages its own subdomain
  • Cross-Account Resolution: DNS resolver rules enable cross-account name resolution
  • User-Friendly Production URLs: foo.path2response.com CNAMEs to foo.prd.path2response.com
  • Mirrored Structure: Public and private DNS use consistent naming conventions

Directory Organization

DirectoryPurpose
public-zones/Public Route53 hosted zones per environment
private-zones/Private Route53 hosted zones per environment
resolver-rules/Inbound/outbound DNS resolver rules
shared-resources/Common constructs and utilities
docs/Architecture documentation

4. Developer Instances Stack (/cdk/shared-services/dev-instances/)

Automated provisioning system for EC2 developer instances with persistent home directories and pre-configured environments.

Features

  • Multiple Instance Types: 12 launch templates covering burstable, general-use, memory-optimized, and compute-optimized instances
  • Architecture Support: Both X86 (AMD/Intel) and ARM (Graviton) instances
  • Persistent Home Directories: EFS-mounted /home directories that survive instance termination
  • Data Lake Access: Environment-specific EFS volumes mounted at /mnt/data
  • Pre-configured Tools: Conda, Node.js, Python, and operations scripts
  • Idle Detection: Automatic notification after 4 hours of idle time
  • SSM Integration: AWS Systems Manager for patching and management
  • Slack Notifications: Integration with Slack for instance events

Launch Template Types

Template FamilyDescription
Burstable_X86_AMDT3a instances for intermittent CPU needs
Burstable_X86_IntelT3 instances for intermittent CPU needs
Burstable_ArmT4g ARM instances for cost-efficient bursting
GeneralUse_X86_AMDM7a balanced workloads
GeneralUse_X86_IntelM7i balanced workloads
GeneralUse_ArmM7g ARM balanced workloads
MemoryOptimized_X86_AMDR7a for large in-memory datasets
MemoryOptimized_X86_IntelR7i for large in-memory datasets
MemoryOptimized_ArmR7g ARM for memory-intensive apps
ComputeOptimized_X86_AMDC7a for compute-bound applications
ComputeOptimized_X86_IntelC7i for compute-bound applications
ComputeOptimized_ArmC6g ARM for batch processing

Template Variants

Each template family has three variants:

  1. DataScience (dev02-style): General EFS data mount + EFS home
  2. Ingest (dev01-style): Ingest EFS data mount + EFS home
  3. Ingest-NoEfsHome: Ingest EFS data mount, no persistent home

Stack Resources

ResourcePurpose
Security GroupsControl access from VPC CIDR blocks
IAM Role + Instance ProfileSSM, CloudWatch, S3, EFS permissions
S3 Config BucketStores startup scripts and configuration
SNS TopicSlack integration for notifications
SSM ParametersCompressed startup scripts for each template
Launch TemplatesPre-configured EC2 launch specifications

Userdata Script Pipeline

Startup scripts are processed through a Handlebars templating system and compressed with LZMA:

  1. Base userdata - Core instance setup (swap, EFS mounts, user creation)
  2. Transformers - Additional configuration layers:
    • withCondaInstaller - Miniforge/Conda setup
    • withNodeInstall - Node.js installation
    • withSophosGateway - Network routing through Sophos firewall
    • withOperationsRepo - Clone operations repository
    • withGeneralEfsDataMount / withIngestEfsDataMount - Data volume mounting

Deployment

cd cdk/shared-services/dev-instances
npm install
npm run build
npx cdk deploy --profile p2r_root

VPC Architecture

Development Environment

VPC NameCIDRPurposeSubnets
aws-controltower-VPC172.31.0.0/16Control Tower3 public, 3 private (us-east-1a/b/c)
Vpc4General10.129.0.0/16General workloads3 public, 3 private (us-east-1a/b/f)
Vpc4Emr172.19.0.0/16EMR clusters1 public, 1 private (us-east-1a)

P2R (Root) Account

VPC NameCIDRPurposeSubnets
P2R- SAF Created VPC192.168.0.0/16General1 public, 1 private (us-east-1a)
Vpc4Emr192.19.0.0/16EMR clusters5 public, 5 private (us-east-1a/b/c/d/f)

EFS Volumes

Data Volumes (by Environment)

EnvironmentPurposeVolume ID
devGeneralfs-03a174cc36df5c6fd
stgGeneralfs-0c4d5253c243ba1bf
rcGeneralfs-041218514437fec08
prdGeneralfs-00aa2dafd53a02602
p2rGeneralfs-0c4d5253c243ba1bf
p2rIngestfs-b027b950
p2rIngestPrdfs-ab36c148

Home Volumes (P2R Account)

ArchitectureVolume IDSecurity Group
X86fs-06c9b837b86da16d9sg-03c95a564763eda59
ARMfs-0ec542736419440c8sg-03c95a564763eda59

Development Workflows

Deploying EMR Security Config

cd cdk/emr
npm install
npm run build

# Deploy to specific environment
cdk deploy --profile p2r_dev
cdk deploy --profile p2r_prd

# Preview changes
cdk diff --profile p2r_dev

Deploying Developer Instances Stack

cd cdk/shared-services/dev-instances
npm install
npm run build

# Synthesize CloudFormation template
npm run synth

# Preview changes
npm run diff

# Deploy
npm run deploy

Adding a New VPC

  1. Add VPC definition to /cdk/global/vpc.ts in the vpcs object
  2. Include all subnets with CIDR, availability zone, and scope (public/private)
  3. Add corresponding key pair to keypairs object if needed
  4. Update any dependent stacks

Adding a New EFS Volume

  1. Add volume configuration to /cdk/global/efs.ts
  2. For data volumes: add to efsDataVolumes object
  3. For home volumes: add to efsHomeVolumes with security group mappings
  4. Update launch templates if volume should be auto-mounted

Key Integrations

SystemIntegration
AWS Control TowerMulti-account organization structure
AWS SSMInstance patching and parameter storage
AWS Quick SetupPatch policy management
SlackInstance notifications (via Chatbot + SNS)
Sophos FirewallNetwork gateway routing
Operations RepoScripts deployed to instances


Important Notes

  • Multi-Account Deployment - Use appropriate AWS CLI profile for each environment
  • IMDSv2 Required - All instances enforce IMDSv2 for security
  • EFS Persistence - Home directories survive instance termination
  • Network Routing - P2R account uses Sophos gateway for NAT in public subnet
  • Idle Detection - Instances notify after 4 hours idle (development environment only)
  • Ubuntu 22.04 - All launch templates use Ubuntu Jammy (2023-12-07 AMI)
  • Operations Repo - Cloned and zipped during CDK synthesis for deployment

Source: infrastructure repository (README.md, cdk/global/, cdk/emr/, cdk/dns/, cdk/shared-services/dev-instances/) Documentation created: 2026-01-24