Tech Stack
Problem
Managing cloud infrastructure assets across multiple environments is challenging:
- Visibility Gap: Hard to track what’s running where and who owns it
- Cost Leakage: Unused resources sit idle, burning money
- Compliance Risk: No central view of security configurations
- Manual Tracking: Spreadsheets get outdated within hours
Solution
Asset Manager is an automated infrastructure asset tracking system that:
- Discovers cloud resources across multiple accounts/regions
- Tracks ownership, cost, and configuration
- Identifies unused or misconfigured assets
- Provides APIs for programmatic access
- Integrates with Terraform for IaC workflows
Architecture
┌─────────────────┐
│ Asset Manager │
│ (FastAPI) │
└────────┬────────┘
│
┌────┴────┐
│ │
┌───▼───┐ ┌──▼────┐
│ AWS │ │ GCP │ ... (Cloud Providers)
│ API │ │ API │
└───────┘ └───────┘
│ │
└────┬────┘
│
┌────▼────┐
│ DB │ (Asset Inventory)
└─────────┘
Key Components:
- Discovery Engine: Scans cloud accounts for resources
- Asset Store: PostgreSQL database for inventory
- Cost Tracker: Aggregates spend by resource/team
- API Layer: REST endpoints for querying/updating assets
- Terraform Integration: Sync IaC state with actual resources
Technical Implementation
Resource Discovery
class AssetDiscovery:
def discover_aws_assets(self, account_id: str, regions: list):
assets = []
for region in regions:
client = boto3.client('resourcegroupstaggingapi', region_name=region)
# Get all tagged resources
paginator = client.get_paginator('get_resources')
for page in paginator.paginate():
for resource in page['ResourceTagMappingList']:
asset = {
'arn': resource['ResourceARN'],
'type': self.parse_resource_type(resource['ResourceARN']),
'region': region,
'account_id': account_id,
'tags': {tag['Key']: tag['Value'] for tag in resource['Tags']},
'discovered_at': datetime.utcnow(),
}
assets.append(asset)
return assets
def enrich_with_cost_data(self, assets: list):
"""Fetch cost data from AWS Cost Explorer"""
ce_client = boto3.client('ce')
for asset in assets:
cost_data = ce_client.get_cost_and_usage(
TimePeriod={'Start': '2024-10-01', 'End': '2024-11-01'},
Granularity='MONTHLY',
Filter={'Tags': {'Key': 'ResourceId', 'Values': [asset['id']]}},
Metrics=['UnblendedCost']
)
asset['monthly_cost'] = parse_cost(cost_data)
return assets
Asset API Endpoints
@app.get("/assets")
async def list_assets(
type: Optional[str] = None,
owner: Optional[str] = None,
region: Optional[str] = None,
unused: Optional[bool] = None
):
"""List assets with optional filters"""
query = db.query(Asset)
if type:
query = query.filter(Asset.type == type)
if owner:
query = query.filter(Asset.tags['Owner'] == owner)
if region:
query = query.filter(Asset.region == region)
if unused:
query = query.filter(Asset.last_accessed_at < datetime.now() - timedelta(days=30))
return query.all()
@app.post("/assets/{asset_id}/retire")
async def retire_asset(asset_id: str, reason: str):
"""Mark asset for retirement (generates Terraform destroy plan)"""
asset = db.query(Asset).get(asset_id)
# Create retirement plan
plan = terraform.plan_destroy(resources=[asset.terraform_address])
# Update asset status
asset.status = "pending_retirement"
asset.retirement_reason = reason
db.commit()
return {"plan": plan, "asset": asset}
Terraform Integration
class TerraformSync:
def sync_with_state(self, state_file: str):
"""Compare Terraform state with discovered assets"""
tf_state = json.loads(read_file(state_file))
tf_resources = {r['id']: r for r in tf_state['resources']}
discovered = db.query(Asset).all()
discovered_ids = {asset.cloud_id for asset in discovered}
# Find drift: resources in TF but not discovered
missing = set(tf_resources.keys()) - discovered_ids
# Find drift: discovered but not in TF (manual changes)
unmanaged = discovered_ids - set(tf_resources.keys())
return {
'managed': len(tf_resources),
'discovered': len(discovered),
'missing': list(missing),
'unmanaged': list(unmanaged),
}
Features
1. Multi-Cloud Discovery
- AWS: EC2, RDS, S3, Lambda, ECS, etc.
- GCP: Compute, Storage, Cloud Functions (planned)
- Azure: VMs, Databases (planned)
2. Cost Tracking
- Monthly cost per resource
- Aggregate by team/project/environment
- Identify cost anomalies
- Predict upcoming spend
3. Ownership Tracking
- Tag-based ownership mapping
- Team assignment and alerts
- Slack notifications for high-cost resources
4. Unused Resource Detection
- Last accessed time tracking
- Idle instance identification
- Automated recommendations for cleanup
5. Compliance Reporting
- Security group audits
- Untagged resource reports
- Public access detection
- Encryption status
Use Cases
Cost Optimization
Scenario: Find all EC2 instances unused in the last 30 days
curl "https://api.example.com/assets?type=ec2&unused=true"
Result: Identified 15 idle instances costing $2,400/month → scheduled for termination
Security Audit
Scenario: List all publicly accessible S3 buckets
assets = api.list_assets(
type="s3",
filter=lambda a: a.config.get("public_access") == True
)
Result: Found 3 public buckets, alerted owners, restricted access
Terraform Drift Detection
Scenario: Detect manual changes not tracked in Terraform
asset-manager sync --state-file terraform.tfstate
Result: Found 8 manually created resources → imported into Terraform or deleted
Results & Impact
Cost Savings:
- 💰 $15K/month saved by identifying and removing unused resources
- 📊 40% visibility improvement across cloud infrastructure
- ⏱️ 80% reduction in time spent tracking assets manually
Operational Excellence:
- ✅ Complete inventory of all cloud resources
- 🔍 Real-time drift detection between code and reality
- 🛡️ Automated compliance reporting
- 📈 Cost trending and forecasting
Team Productivity:
- Before: 4 hours/week manually tracking spreadsheets
- After: 15 minutes/week reviewing automated reports
- Developer Satisfaction: Significantly improved
Technical Stack
Backend:
- FastAPI for REST APIs
- SQLAlchemy ORM + PostgreSQL
- Celery for async discovery jobs
- Redis for task queue
Cloud SDKs:
- Boto3 (AWS SDK for Python)
- Google Cloud Python Client
- Azure SDK (planned)
Infrastructure:
- Docker for containerization
- Kubernetes for orchestration
- Terraform for IaC
- GitHub Actions for CI/CD
Monitoring:
- Prometheus metrics
- Grafana dashboards
- PagerDuty for alerts
Challenges & Solutions
Challenge 1: Scale
Problem: Scanning 1000+ resources across 20 regions takes hours
Solution:
- Implemented parallel scanning with rate limiting
- Cached results with incremental updates
- Added region-based batching
Challenge 2: Cost Attribution
Problem: AWS Cost Explorer data is delayed by 24 hours
Solution:
- Use resource tags for instant attribution
- Fallback to last known costs
- Daily sync for historical accuracy
Challenge 3: Terraform State
Problem: Multiple state files across teams
Solution:
- Terraform Cloud integration for remote state
- State aggregation service
- Per-team state file discovery
Key Learnings
- Tag Everything: Tagging is critical for ownership and cost tracking
- Automate Discovery: Manual tracking doesn’t scale
- Real-time Alerts: Catch issues before they become expensive
- API-First: Programmatic access > manual dashboards
- Cost Allocation: Show teams their spend → drives accountability
Future Roadmap
- Multi-cloud support (GCP, Azure)
- ML-based cost forecasting
- Automated remediation (auto-stop idle instances)
- Integration with CMDB systems
- Resource lifecycle policies
- Slack bot for queries (“Show me my team’s resources”)
Repository: github.com/vayux/asset-manager
Tech Stack: Python · FastAPI · PostgreSQL · AWS · Terraform · Docker · Kubernetes
VayuX Technologies: Infrastructure automation and observability tools