I like this paradigm where you break your whole infrastructure into microservices, I thought it would be fun to create an app using Python and go full serverless on AWS.
Check out the full code here --> https://github.com/bobocuillere/Serverless-AWS-Project
The “serverless” approach lets
]]>I like this paradigm where you break your whole infrastructure into microservices, I thought it would be fun to create an app using Python and go full serverless on AWS.
Check out the full code here --> https://github.com/bobocuillere/Serverless-AWS-Project
The “serverless” approach lets you concentrate on what truly matters: creating the application logic and delivering features. With AWS’s serverless services, you get automatic scaling, high availability, and a pay-as-you-go billing model, all while AWS handles the heavy lifting behind the scenes.
For my survey web application—built entirely from scratch—I wanted an environment where I could iterate quickly, scale effortlessly.
Achieving this meant two things:
By the end of this article, you’ll see how these AWS components fit together to form a coherent, secure, and scalable serverless application.
The entire goal is to have a browser-based survey frontend that never talks directly to a server you manage. Instead, it relies on AWS-managed services and logic running in AWS Lambda functions—both of which scale and operate without you ever provisioning a single VM.
We’ll look at the full request flow, authentication steps, data storage logic, and how secure isolation is maintained. We’ll also consider how each piece is wired together with Infrastructure as Code (IaC) and how the environment remains consistent across deployments.
Amazon Cognito (User Authentication & Identity Management)
Authorization
header. Backend Lambdas verify the token’s signature against Cognito’s JWKS (JSON Web Key Set) endpoints, ensuring requests are from authenticated users and haven’t been tampered with.Amazon S3 (Frontend Hosting & Private Assets)
Public S3 Bucket (Frontend):
Private S3 Bucket (Protected Assets):
s3:GetObject
, can retrieve these files. The browser never sees a direct link to these private files. For example, if a user wants to retrieve a special survey template, the frontend lambda sends an API request. If authorized, it fetches the needed object from the private bucket and returns the data to the user. The user never sees a public URL to that file, preserving confidentiality.Amazon API Gateway
POST /flask/login
for authenticating the user and retrieving a JWT.POST /survey/create
for creating a new survey.GET /survey/responses?survey_id=XYZ
to fetch existing responses.AWS Lambda Functions (Backend and Frontend Logic in Python)
Amazon DynamoDB (Primary Data Store)
Amazon SNS (Event Notifications)
Let’s walk through the entire flow of the survey application’s architecture from start to finish, showing exactly how each part interacts with the others. This step-by-step approach will help clarify the roles of the frontend Lambda, backend Lambda, and all the AWS services in between.
1- User Opens the Survey App:
2 - User Authenticates with Cognito:
3- User Creates or Manages a Survey:
4 - Backend Lambda Interacts with DynamoDB:
5 - User Fetches Survey Responses (Another Example Flow):
6 - Working with Private Assets (On the private bucket):
7 - Notifications via SNS:
Throughout development, I encountered a few issues. Here are some highlights and how I solved them:
CORS Errors on the Frontend:
Access-Control-Allow-Origin
and other CORS headers. This involved setting method.response.header.Access-Control-Allow-Origin
to '*'
and ensuring the OPTIONS method was configured. Re-applying Terraform resolved the issue.Invalid JWT Token Errors in Lambda:
kid
, retrieve the right public key, and verify signatures fixed the problem.DynamoDB Throttling Under Heavy Load:
Permission Denied for Private S3 Bucket Access:
s3:GetObject
on the private bucket’s ARN to the Lambda execution role. After a quick apply, the function could access files properly.Building a serverless survey application on AWS—and automating every aspect of its infrastructure with Terraform—has shown just how far cloud computing and DevOps practices have come. Instead of setting up servers, manually creating users in an identity service, or worrying about scaling databases, we focused on writing clear configuration files and straightforward application code.
In essence, this approach transforms the way you build and run applications. It takes you from a world where operations can be slow, error-prone, and costly, to one where agility, reliability, and cost-efficiency are the natural byproducts of well-chosen architectural patterns and tools. Whether you’re working on a small hobby project, a startup’s MVP, or a complex enterprise system, the principles and workflows described here will help you embrace modern cloud-native development with confidence.
]]>Welcome back to my Kubernetes Networking series! In the first article, we covered the fundamentals of Kubernetes networking, including the basic components and the overall networking model. Now, we'll take a look into how Pods and Services communicate within a Kubernetes cluster.
Series Outline:
Let's briefly recap the key principles of the Kubernetes networking model:
These principles simplify application development by abstracting away the underlying network complexities.
When a Pod is created, it is assigned an IP address that allows it to communicate with other network entities in the cluster.
Example:
If your cluster Pod CIDR is 10.244.0.0/16
, Node 1 might be assigned 10.244.1.0/24
, and Node 2 10.244.2.0/24
. Pods on Node 1 get IPs like 10.244.1.5
, 10.244.1.6
, and so on.
localhost
.Pods communicate with each other using their IP addresses over the cluster network.
Same Node Communication:
cbr0
).Cross-Node Communication:
Key Points:
Services provide stable endpoints to access a set of Pods.
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- protocol: TCP
port: 80
targetPort: 8080
10.96.0.1
).my-service
as the hostname.apiVersion: v1
kind: Service
metadata:
name: my-nodeport-service
spec:
type: NodePort
selector:
app: my-app
ports:
- protocol: TCP
port: 80
targetPort: 8080
nodePort: 31000
<NodeIP>:31000
from outside the cluster.apiVersion: v1
kind: Service
metadata:
name: my-loadbalancer-service
spec:
type: LoadBalancer
selector:
app: my-app
ports:
- protocol: TCP
port: 80
targetPort: 8080
A headless service is a type of Kubernetes Service that does not allocate a ClusterIP. Instead, it allows direct access to the individual Pods' IPs. This is useful for applications that require direct Pod access, such as databases or stateful applications where each Pod needs to be addressed individually.
clusterIP: None
in the Service definition.apiVersion: v1
kind: Service
metadata:
name: my-headless-service
spec:
clusterIP: None
selector:
app: my-db
ports:
- port: 5432
targetPort: 5432
service-name.namespace.svc.cluster.local
.kube-proxy
is a network proxy that runs on each Node and reflects the Services defined in Kubernetes.
It runs on each node of a Kubernetes cluster. It watches Service and Endpoints (and EndpointSlices ) objects and accordingly updates the routing rules on its host nodes to allow communicating over Services.
Userspace Mode (Legacy):
iptables Mode (Default):
iptables
rules to route traffic directly in the kernel space.IPVS Mode:
Session affinity ensures that requests from a client are directed to the same Pod.
sessionAffinity: ClientIP
in the Service spec.service.spec.sessionAffinityConfig.clientIP.timeoutSeconds
.apiVersion: v1
kind: Service
metadata:
name: my-affinity-service
spec:
selector:
app: my-app
ports:
- protocol: TCP
port: 80
targetPort: 8080
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10800
Let's create a Deployment and expose it with a Service.
Step 1: Create a Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.17
ports:
- containerPort: 80
Step 2: Expose the Deployment
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
type: ClusterIP
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
Accessing the Service:
curl https://nginx-service
curl https://nginx-service.default.svc.cluster.local
Headless Services can be used in combination with StatefulSets.
apiVersion: v1
kind: Service
metadata:
name: mysql
spec:
clusterIP: None
selector:
app: mysql
ports:
- port: 3306
targetPort: 3306
StatefulSet Pods:
mysql-0.mysql.default.svc.cluster.local
Understanding Pod and Service networking at a deeper level makes you better to design and troubleshoot applications effectively in Kubernetes.
Key Takeaways:
Pod Networking:
Service Types:
Traffic Routing:
kube-proxy
handles traffic routing using iptables
or IPVS.Service Discovery:
You're now ready to build good, scalable applications on Kubernetes :).
In the Next Article:
We'll explore Network Security with Policies and Ingress Controllers, where we'll look at securing your cluster's network communication and managing external access to your services.
]]>Welcome to this new series of article for Kubernetes networking, my goal is to give you all the information needed to not feel lost and to understand everything you should need when it comes to the Networking of Kubernetes.
We’ll explore the basics of Kubernetes networking, including the networking model, core components, and common networking solutions. This will give you a solid foundation for understanding the more advanced topics we'll cover in later articles.
I will do a 5 part series :
Networking in Kubernetes revolves around a unique and robust model that ensures efficient communication between various components. Understanding the key aspects of this model is crucial to deploying and operating Kubernetes effectively.
Kubernetes networking is essential for application communication within a cluster. The networking model is designed to solve several challenges that arise when deploying and managing containerized applications:
To maintain seamless communication, Kubernetes imposes several key requirements on the network infrastructure:
1.** Pod-to-Pod Communication:**
Kubernetes networking encompasses several layers of communication, including:
We've explored the Kubernetes networking model, including an overview of how it works, the core requirements, and how networking functions at different layers within the cluster. Understanding these concepts is essential for anyone looking to deploy and manage applications on Kubernetes.
In this section, we'll talk in more details about the fundamental networking components of Kubernetes. We’ll cover Pods, Services, Network Policies, and DNS in Kubernetes, all of which play a key role in Kubernetes networking.
Pods are the smallest deployable units in Kubernetes, typically representing one or more containers that share the same context. Let’s go into the details:
Pods are designed to encapsulate application containers, storage resources, a unique network identity (IP address), and other configurations:
How do Pods communicate with each other?
Services are an abstraction that defines a logical set of Pods and a policy by which to access them. They are a crucial part of Kubernetes networking because they provide stable endpoints for applications and manage load balancing:
Network Policies are a Kubernetes resource that controls the traffic allowed to and from Pods:
DNS is a fundamental part of Kubernetes networking, enabling name resolution for Pods and Services:
service-name.namespace.svc.cluster.local
We've covered in this section the basic networking components in Kubernetes, including Pods, Services, Network Policies, and DNS. Each of these components plays a crucial role in Kubernetes networking, enabling communication within the cluster and with external networks.
Kubernetes networking relies on various tools and plugins to handle the complexities of network communication within and across clusters. In this section, we'll cover some of the most common networking solutions available in Kubernetes and how to choose the right Container Network Interface (CNI) for your cluster.
Kubernetes networking is modular, allowing different solutions to be plugged in based on your needs. These solutions implement the CNI specification, providing networking functionality for Pods and Services. Here, we will explore some of the most common and widely used solutions:
Each of these solutions has its strengths and weaknesses, catering to different needs and use cases.
Choosing the right CNI for your Kubernetes cluster depends on several factors, including the size of your cluster, your networking requirements, and the features you need. Here’s how you can decide which CNI is right for you:
In this section, we've explored the common networking solutions available for Kubernetes and how to choose the right CNI for your cluster. Each solution caters to specific use cases, ranging from simple networking to advanced security and performance.
The choice of CNI depends on factors like cluster size, networking features, security, performance, complexity, and integration.
The Kubernetes Networking series goal is to equip you with a comprehensive understanding of how networking functions within Kubernetes. This first article provided a foundational understanding of the Kubernetes networking model, core components, and essential networking tools.
Key Takeways:
Monitoring is the continuous observation of system metrics, logs, and operations to ensure everything functions as expected. Effective monitoring can preemptively alert you to potential issues before they escalate into major problems . It's about gaining visibility into your IT environment's performance, availability, and overall health, enabling you to make informed decisions .
Together, Prometheus and Grafana form a powerful duo. Prometheus collects and stores the data, while Grafana brings that data to life through visualization.
In this article, you'll learn about their basics and advanced features, how they complement each other , and the best practices.
Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It excels in gathering numerical data over time, making it ideal for monitoring the performance of systems and applications. Its philosophy is centered on reliability and simplicity.
Prometheus collects and stores its metrics as time series data, metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.
Prometheus excels in environments where you need to track the performance and health of IT systems and applications. It's particularly well-suited for:
Real-life example: Consider a scenario where you have a Kubernetes-based microservices architecture. Prometheus can dynamically discover new service instances, collect metrics, and help you visualize the overall health of your system. If a service goes down, Prometheus can still function independently, allowing you to diagnose issues even if parts of your infrastructure are compromised.
However, Prometheus might not be the best fit when:
Real-life example: If you're running an e-commerce platform and need to bill customers for each API request, relying on Prometheus alone might lead to inaccuracies because it's designed to monitor trends and patterns, not to track individual transactions with 100% precision. In this case, you'd want a system that logs each transaction in detail for billing, while still using Prometheus for overall system monitoring and alerting.
Prometheus is good particularly when you need reliability and can tolerate slight imprecisions in favor of overall trends and diagnostics.
Prometheus includes several components that work together to provide a comprehensive monitoring solution:
Prometheus Server: The core component where data retrieval, storage, and processing occur. It consists of:
Pushgateway: For supporting short-lived jobs that cannot be scraped, the Pushgateway acts as an intermediary, allowing these ephemeral jobs to push metrics. The Prometheus server then scrapes the aggregated data from the Pushgateway.
Jobs/Exporters: These are external entities or agents that expose the metrics of your target systems (e.g., databases, servers, applications) in a format that Prometheus can retrieve. They are either part of the target system or stand-alone exporters that translate existing metrics into the appropriate format.
Service Discovery: Prometheus supports automatic discovery of targets in dynamic environments like Kubernetes, as well as static configuration, which simplifies the management of target endpoints that Prometheus needs to monitor.
Alertmanager: Handles the alerts sent by the Prometheus server. It manages the routing, deduplication, grouping, and silencing of alert notifications. It can notify end-users through various methods, such as email, PagerDuty, webhooks, etc.
Prometheus Web UI and Grafana: The Web UI is built into the Prometheus server and provides basic visualizations and a way to execute PromQL queries directly. Grafana is a more advanced visualization tool that connects to Prometheus as a data source and allows for the creation of rich dashboards.
API Clients: These are the tools or libraries that can interact with the Prometheus HTTP API for further processing, custom visualization, or integration with other systems.
Prometheus is designed with a set of core features that make it an efficient for monitoring and alerting. These features are centered around a multi-dimensional data model, a powerful query language, and a flexible data collection approach.
These core features, when leveraged together, provide a powerful platform for monitoring at scale, capable of handling the complex and dynamic nature of modern IT infrastructure.
Instrumenting your applications is about embedding monitoring code within them so that Prometheus can collect relevant metrics. It's like giving your applications a voice, allowing them to report on their health and behavior.
To instrument an application:
/metrics
, which is a web page that displays metrics in a format Prometheus understands.Assuming you have a Python web application and you want to expose metrics for Prometheus to scrape. You would use the Prometheus Python client to define and expose a simple metric, like the number of requests received.
Here's an example using Flask:
from flask import Flask, Response
from prometheus_client import Counter, generate_latest
# Create a Flask application
app = Flask(__name__)
# Define a Prometheus counter metric
REQUEST_COUNTER = Counter('app_requests_total', 'Total number of requests')
@app.route('/')
def index():
# Increment the counter
REQUEST_COUNTER.inc()
return 'Hello, World!'
@app.route('/metrics')
def metrics():
# Expose the metrics
return Response(generate_latest(), mimetype='text/plain')
if __name__ == '__main__':
app.run(host='0.0.0.0')
This snippet shows a simple web server with two endpoints: the root (/
) that increments a counter every time it's accessed, and the /metrics
endpoint that exposes the metrics.
For Prometheus to scrape metrics from the instrumented application, you need to add the application as a target in Prometheus's configuration file (prometheus.yml
). Here's a simple example:
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
scrape_configs:
- job_name: 'python_application'
static_configs:
- targets: ['localhost:5000']
This configuration tells Prometheus to scrape our Python application (which we're running locally on port 5000) every 15 seconds.
Service discovery in Prometheus automates the process of finding and monitoring targets. It ensures Prometheus always knows what to monitor.
Prometheus supports several service discovery mechanisms:
This means if a new instance of your application is up, Prometheus will automatically start monitoring it without manual intervention.
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
This code would go in your prometheus.yml
file and tells Prometheus to discover all pods in a Kubernetes cluster.
Alertmanager handles alerts sent by the Prometheus server and is responsible for deduplicating, grouping, and routing them to the correct receiver such as email, PagerDuty.
Here's an example alertmanager.yml
configuration file for Alertmanager:
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'instance']
group_wait: 10s
group_interval: 10m
repeat_interval: 1h
receiver: 'email-notifications'
receivers:
- name: 'email-notifications'
email_configs:
- to: 'your-email@example.com'
from: 'alertmanager@example.com'
smarthost: 'smtp.example.com:587'
auth_username: 'alertmanager@example.com'
auth_identity: 'alertmanager@example.com'
auth_password: 'password'
This Alertmanager configuration sets up email notifications as the method of alerting. It groups alerts by the alertname
and instance
labels, waiting for 10 seconds to group them. Notifications will be sent if the group waits or group interval has passed.
By instrumenting your applications, leveraging service discovery, and configuring Alertmanager, Prometheus becomes a vigilant guardian of your infrastructure, always on the lookout for anomalies and equipped to notify you the moment something needs attention.
Grafana is an open-source analytics and interactive visualization web application. It provides charts, graphs, and alerts when connected to supported data sources, including Prometheus. Essentially, Grafana allows you to turn your time-series database data into beautiful graphs and visualizations.
Grafana is known for its powerful and elegant dashboards. It is feature-rich and widely used for its:
While Grafana is recognized for its dashboarding capabilities, it also offers a suite of advanced features that enable more detailed data analysis and manipulation. Here are some of these features :
Explore is an ad-hoc query workspace in Grafana, designed for iterative and interactive data exploration. It is particularly useful for:
Transformations in Grafana allow you to manipulate the data returned from a query before it's visualized. This feature is crucial when you want to:
The State Timeline panel is one of Grafana's visualizations that displays discrete state changes over time. This is beneficial for:
Grafana's alerting feature allows you to define alert rules for the visualizations. You can:
With templating and variables :
Creating dashboards that are both informative and clear is not easy. Here are some best practices:
Consistent naming conventions are crucial. They ensure that metrics are easily identifiable, understandable, and maintainable. For instance, a metric name like http_requests_total
is clear and indicates that it’s a counter metric tallying the total number of HTTP requests.
For efficient PromQL queries:
count_over_time
can be resource-intensive. Use them judiciously.For security in Prometheus and Grafana:
By integrating Prometheus with Grafana following these best practices, you can create a monitoring environment to your systems in a secure, efficient, and user-friendly manner. This combination can be a powerful asset in any infrastructure, enabling teams to detect and address issues proactively.
Grafana makes sense of all the numbers Prometheus collects by turning them into easy-to-read dashboards and graphs. This makes it easier for you to see what's going on and make smart decisions.
However, Prometheus isn't great for everything. If you need to track every tiny detail for things like billing, it's not the best choice. It's better for looking at overall trends and issues.
]]>IAM is a crucial part of managing security in Amazon Web Services (AWS).
But what is it?
At its core, IAM is all about who can do what in your AWS account. Think of it as the gatekeeper. It lets you decide who is allowed to enter your AWS cloud space and what they can do once they're in. In AWS, “who” could be a person, a service, or even an application, and “what they can do” ranges from reading data from a database to launching a new virtual server.
In this guide, we'll explore the key aspects of IAM, from the basic concepts like users and permissions to more advanced features like role boundaries and trust policies. So, let's get started on this journey to master AWS IAM.
Let's break down some key IAM terms and concept:
Let's use an example.
Note that in a standard IAM policy, “Authentication” is not a field since it's a process rather than a policy attribute.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:user/Sophnel"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::example-bucket/*"
}
]
}
In this policy:
Principal: "AWS": "arn:aws:iam::123456789012:user/Sophnel"
Action: "s3:GetObject"
s3:GetObject
, which allows the principal to read objects from the specified S3 bucket.Resource: "arn:aws:s3:::example-bucket/*"
/*
) in the example-bucket
S3 bucket.Effect: "Allow"
Authentication:
Optional and Required Fields
Remember, the structure and fields of an IAM policy can vary depending on its use (e.g., attached to a user/role or used as a resource-based policy). In user/role policies, the principal is usually the user/role itself and not explicitly stated. In resource-based policies (like those used in S3 buckets), the principal must be specified.
Understanding these terms helps to grasp how IAM works. It's all about managing who (principal) can do what (actions) with which items (resources) securely (authentication and authorization).
In the following sections, we'll learn about each of these aspects and see how they come together to form the backbone of AWS security.
In this section, we will learn more about the core components of IAM.
Least Privilege Principle
This principle means giving someone only the permissions they need to do their job – nothing more. It’s like giving a key card that only opens the doors someone needs to access. It reduces the risk of someone accidentally (or intentionally) doing something harmful.
This foundational understanding of IAM will set the stage for exploring more advanced topics.
In this section, we go deeper into some advanced IAM concepts that play a critical role in managing access and security in AWS.
Service Roles are special types of IAM roles designed specifically for AWS services to interact with other services. They are like permissions given to AWS services to perform specific tasks on your behalf.
For example, you might have an AWS Lambda function that needs to access your S3 buckets . You create a service role for Lambda that gives it the necessary permissions to read and write to your S3 buckets.
These roles are crucial for automating tasks within AWS and for enabling different AWS services to work together seamlessly and securely.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": "arn:aws:s3:::MyExampleBucket/*"
}
]
}
The role this policy is attached to, for example MyLambdaFunction
will allow it to perform any action on MyExampleBucket
.
2. IAM Boundaries
IAM Boundaries is an advanced feature in AWS IAM that helps in further tightening security. These boundaries are essentially guidelines or limits you set to control the maximum permissions that IAM roles and users can have.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::MyExampleBucket/*"
}
]
}
In this case, DevRole can only read from and write to MyExampleBucket
, regardless of what other permissions she's given.
3. Trust Policies
In AWS, a trust policy is attached to a role and defines which entities (like an AWS service or another AWS account) are allowed to assume the role. It's like saying, “This ID badge can only be used by these specific people or roles.”
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Here, EC2 instances can assume the MyEC2Function
role .
4. IAM Policy with Conditions and Context Keys
IAM policy conditions are additional specifications you can include in a policy to control when it applies. Think of them like extra rules in a game that apply only under certain circumstances.
Context keys are the specific elements within a request that a condition checks against. They are the details that trigger the conditions.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::MyExampleBucket/*",
"Condition": {
"IpAddress": {"aws:SourceIp": "192.168.100.0/24"},
"DateGreaterThan": {"aws:CurrentTime": "2023-01-01T09:00:00Z"},
"DateLessThan": {"aws:CurrentTime": "2023-01-01T17:00:00Z"}
}
}
]
}
This policy will authorize the roles, users or groups it attached to access to MyExampleBucket
, but only from the specified IP range and during specified hours.
5. Cross-Account Access Management
In larger organizations, you often have multiple AWS accounts. Cross-account access management is about allowing users from one AWS account to access resources in another.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {"AWS": "arn:aws:iam::123456789012:root"},
"Action": "sts:AssumeRole"
}
]
}
In this example, users from Account B (ID 123456789012
) can assume the CrossAccountRole
in Account A.
6. IAM Role Chaining
Role chaining occurs when an IAM role assumes another role. It's like passing a baton in a relay race; one role hands over permissions to another.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::123456789012:role/SecondRole"
}
]
}
Here, InitialRole
can pass its permissions to SecondRole
for further actions.
Let's say you have an IAM user named DevUser
who needs to perform actions that require different permissions from time to time, perhaps for deploying a service. Instead of giving DevUser
a broad range of permissions, you give them permission to assume SecondRole
, which has the necessary permissions for that deployment task.
When DevUser
runs a deployment script, they first assume SecondRole
using the AssumeRole
action. AWS STS then provides DevUser
with temporary credentials. With these credentials, DevUser
temporarily has the permissions of SecondRole
and can carry out the deployment.
This follows the principle of least privilege by only granting permissions as needed for specific tasks, which is a security best practice. It also provides an audit trail because actions taken with the assumed role can be traced back to the original entity that assumed the role, ensuring accountability and easier troubleshooting in complex environments.
By incorporating these advanced IAM concepts into your AWS strategy, you'll be better equipped to handle complex security and access requirements. These features allow for greater flexibility, precision, and control in managing access to your AWS resources, ensuring that your cloud environment remains both robust and secure.
The schema shows the relationships between different IAM components. The goal is to provide an easy way of understanding how they work with each other.
IAM Users:
IAM Groups:
IAM Roles:
To conclude, AWS Identity and Access Management (IAM) is an essential component for managing security and access within the AWS ecosystem.
Key takeaways include:
Additionally, advanced concepts like service roles, cross-account access management, and role chaining offer sophisticated methods to handle complex access requirements and ensure seamless, secure operations across multiple services and accounts.
]]>As a DevOps engineer, I am usually involves with pipelines, automation, and cloud services. However, I've always been curious about the other side of the tech world which is application development. So, I thought, why not mix things up a bit? That's how I found myself building a Python financial app, complete with a REST API.
This blog post documents the journey of developing and deploying my mock financial application from scratch, from coding the initial app to its deployment on AWS using Docker, Kubernetes (EKS), Terraform, and Ansible. And guess what? I've automated the whole process - every single bit of it!
If you're itching to see how it all came together, check out my GitHub repository for all the details.
In this article, we'll learn:
In our project, we integrate cloud-based services with external tools and continuous integration/deployment (CI/CD) using GitHub Actions to create a robust and scalable application.
Step 1: Code Commit to Deployment
Step 2: Building and Storing the Docker Image
Step 3: Infrastructure Provisioning
Step 4: Securing Secrets and Database Connectivity
Step 5: User Interaction with the Application
Step 6: Interaction Between Components
I structured the application with the following main components:
Building the REST API was interesting from a learning point pesperctive as it was my first time coding one. I structured endpoints to manage user authentication, accounts, and transactions.
I coded the logic to manage different transaction types such as deposits and withdrawals, ensuring accurate balance updates and transaction validations.
2. Account Operations:
- Creation of financial accounts.
- Viewing account details including balance and created date.
3. Transaction Management:
- Performing deposit and withdrawal transactions.
- Viewing a list of transactions for each account.
Docker is a platform for developing, shipping, and running applications inside containers. Containers are lightweight, standalone, and executable software packages that include everything needed to run an application: code, runtime, system tools, system libraries, and settings.
The main goal was to ensure our application scales efficiently and runs consistently across different environments.
I'll share how I approached this task, the key questions that guided my decisions, and the rationale behind the choices I made.
A Dockerfile is a text document containing commands to assemble a Docker image. The Docker image is a lightweight, standalone, executable package that includes everything needed to run your application.
Selecting the Base Image:
FROM python:3.10.2-slim
: A slim version of Python was chosen as the base image for its balance between size and functionality. It provided just the necessary components required to run our Flask application without the overhead of a full-fledged OS.Installing PostgreSQL Client:
postgresql-client
) was made to support our wait-for-postgres.sh
script. This was a crucial part of ensuring that the Flask application only starts after the database is ready to accept connections.Optimizing for Docker Cache:
requirements.txt
file initially and installing dependencies, we leveraged Docker’s caching mechanism. This meant faster builds during development, as unchanged dependencies wouldn't need to be reinstalled each time.Setting Up the Application:
/app
directory and setting it as the working directory established a clear and organized structure within the container.Exposing Ports and Setting Environment Variables:
EXPOSE 5000
and ENV FLASK_APP=app.py
: These commands made our application accessible on port 5000 and specified the entry point for our Flask app.Implementing the Wait Script:
wait-for-postgres.sh
was a decision to handle dependencies between services, particularly ensuring the Flask app doesn’t start before the database is ready.The docker-compose.yml
file played a role in defining and linking multiple services (the Flask app and PostgreSQL database).
DATABASE_URI
were configured to dynamically construct the database connection string, ensuring flexibility and ease of configuration.Database Configuration:
POSTGRES_DB
, POSTGRES_USER
, POSTGRES_PASSWORD
) allowed for an isolated and controlled database environment. This separation is crucial in a microservices-oriented architecture.Reflecting on this dockerization process, several key lessons stand out:
wait-for-postgres.sh
script was a practical solution to a common problem in containerized environments – managing service dependencies.After successfully containerizing the application, the next phase was moving it to AWS and create a monitoring infrastructure with Prometheus and Grafana.
To automate the cloud infrastructure setup, I used Terraform.
You can find below the architecture.
.
├── backend.tf
├── fintech-monitoring.pem
├── main.tf
├── modules
│ ├── ec2
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ └── variables.tf
│ ├── eks
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ └── variables.tf
│ ├── rds
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ └── variables.tf
│ ├── security_groups
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ └── variables.tf
│ └── vpc
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
├── outputs.tf
├── provider.tf
├── terraform.tfvars
└── variables.tf
I structured my Terraform configuration into distinct modules, each focusing on different aspects of the AWS infrastructure. This modular design enhanced readability, reusability, and maintainability.
For further automation about the monitoring configuration, I utilized Ansible. It played a crucial role in automating repetitive tasks, ensuring that the environment was configured correctly.
Python and Bash scripts to automate various aspects of the infrastructure and monitoring setup. These scripts were designed to complement the Terraform and Ansible configurations, ensuring a seamless and automated workflow. Here's a detailed overview of each script and its purpose in the project:
update_inventory.py
import json
import boto3
import yaml
# Load vars.yml
with open('vars.yml') as file:
vars_data = yaml.safe_load(file)
aws_region = vars_data['aws_region']
def get_instance_ip(instance_name):
ec2 = boto3.client('ec2', region_name=aws_region)
response = ec2.describe_instances(
Filters=[
{'Name': 'tag:Name', 'Values': [instance_name]},
{'Name': 'instance-state-name', 'Values': ['running']}
]
)
for reservation in response['Reservations']:
for instance in reservation['Instances']:
ip_address = instance.get('PublicIpAddress')
print(f"IP for {instance_name}: {ip_address}")
return ip_address
print(f"No running instance found for {instance_name}")
return None
def update_inventory():
grafana_ip = get_instance_ip('Grafana-Server')
prometheus_ip = get_instance_ip('Prometheus-Server')
inventory_content = f'''
all:
children:
grafana:
hosts:
{grafana_ip}:
prometheus:
hosts:
{prometheus_ip}:
'''
with open('./inventory.yml', 'w') as file:
file.write(inventory_content.strip())
with open('./roles/grafana/templates/grafana.ini.j2', 'r') as file:
lines = file.readlines()
with open('./roles/grafana/templates/grafana.ini.j2', 'w') as file:
for line in lines:
if line.strip().startswith('domain'):
file.write(f'domain = {grafana_ip}\n')
else:
file.write(line)
def update_env_file(grafana_ip, prometheus_ip):
env_content = f'''
export GRAFANA_URL='https://{grafana_ip}:3000'
export GRAFANA_ADMIN_USER='admin'
export GRAFANA_ADMIN_PASSWORD='admin'
export PROMETHEUS_URL='https://{prometheus_ip}:9090'
'''
with open('.env', 'w') as file:
file.write(env_content.strip())
if __name__ == '__main__':
update_inventory()
grafana_ip = get_instance_ip('Grafana-Server')
prometheus_ip = get_instance_ip('Prometheus-Server')
update_env_file(grafana_ip, prometheus_ip)
generate_grafana_api_key.py
import requests
import os
import boto3
import json
import yaml
from dotenv import load_dotenv
import subprocess
import time
load_dotenv() # This loads the variables from .env into the environment
# Load vars.yml
with open('vars.yml') as file:
vars_data = yaml.safe_load(file)
aws_region = vars_data['aws_region']
def get_terraform_output(output_name):
command = f" cd ../terraform/ && terraform output -raw {output_name}"
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
stdout, stderr = process.communicate()
if stderr:
print("Error fetching Terraform output:", stderr.decode())
return None
return stdout.decode().strip()
# Function to generate Grafana API key
def generate_grafana_api_key(grafana_url, admin_user, admin_password):
headers = {
"Content-Type": "application/json",
}
timestamp = int(time.time())
payload = {
"name": f"terraform-api-key-{timestamp}",
"role": "Admin"
}
response = requests.post(f"{grafana_url}/api/auth/keys", headers=headers, json=payload, auth=(admin_user, admin_password))
if response.status_code == 200:
return response.json()['key']
print("API key generated successfully.")
else:
print(f"Response status code: {response.status_code}")
print(f"Response body: {response.text}")
raise Exception("Failed to generate Grafana API key")
# Function to update AWS Secrets Manager
def update_secret(secret_id, new_grafana_api_key):
client = boto3.client('secretsmanager', region_name=aws_region)
secret_dict = json.loads(client.get_secret_value(SecretId=secret_id)['SecretString'])
secret_dict['grafana_api_key'] = new_grafana_api_key
client.put_secret_value(SecretId=secret_id, SecretString=json.dumps(secret_dict))
# Debugging step: Check if the secret is really updated on AWS
updated_secret_dict = json.loads(client.get_secret_value(SecretId=secret_id)['SecretString'])
if updated_secret_dict['grafana_api_key'] == new_grafana_api_key:
print("Secret successfully updated on AWS.")
else:
print("Failed to update secret on AWS.")
if __name__ == "__main__":
grafana_url = os.environ.get('GRAFANA_URL')
admin_user = os.environ.get('GRAFANA_ADMIN_USER')
admin_password = os.environ.get('GRAFANA_ADMIN_PASSWORD')
secret_id = get_terraform_output("rds_secret_arn") # From the terraform output
api_key = generate_grafana_api_key(grafana_url, admin_user, admin_password)
update_secret(secret_id, api_key)
add_grafana_dashboard.py
import requests
import boto3
import json
import os
import yaml
import subprocess
from dotenv import load_dotenv
load_dotenv() # This loads the variables from .env into the environment
# Load vars.yml
with open('vars.yml') as file:
vars_data = yaml.safe_load(file)
aws_region = vars_data['aws_region']
def get_terraform_output(output_name):
command = f"cd ../terraform && terraform output -raw {output_name}"
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
stdout, stderr = process.communicate()
if stderr:
print("Error fetching Terraform output:", stderr.decode())
return None
return stdout.decode().strip()
def get_grafana_api_key(secret_id):
client = boto3.client('secretsmanager', region_name=get_terraform_output("aws_region"))
secret = json.loads(client.get_secret_value(SecretId=secret_id)['SecretString'])
return secret['grafana_api_key']
def add_prometheus_data_source(grafana_url, api_key, prometheus_url):
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
# Check if data source exists
get_response = requests.get(f"{grafana_url}/api/datasources/name/Prometheus", headers=headers)
if get_response.status_code == 200:
# Data source exists, update it
data_source_id = get_response.json()['id']
data_source_config = get_response.json()
data_source_config['url'] = prometheus_url
update_response = requests.put(
f"{grafana_url}/api/datasources/{data_source_id}",
headers=headers,
json=data_source_config
)
if update_response.status_code == 200:
print("Prometheus data source updated successfully.")
else:
print(f"Failed to update Prometheus data source: {update_response.content}")
else:
# Data source does not exist, create it
data_source_config = {
"name": "Prometheus",
"type": "prometheus",
"access": "proxy",
"url": prometheus_url,
"isDefault": True
}
create_response = requests.post(f"{grafana_url}/api/datasources", headers=headers, json=data_source_config)
if create_response.status_code == 200:
print("New Prometheus data source added successfully.")
else:
print(f"Failed to add as a new data source: {create_response.content}")
def add_dashboard(grafana_url, api_key, dashboard_json):
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
response = requests.post(f"{grafana_url}/api/dashboards/db", headers=headers, json=dashboard_json)
if response.status_code == 200:
print("Dashboard added successfully.")
else:
print(f"Failed to add dashboard: {response.content}")
if __name__ == "__main__":
grafana_url = os.environ.get('GRAFANA_URL')
secret_id = get_terraform_output("rds_secret_arn") # From the terraform output
dashboard_json = {
"dashboard": {
"id": None,
"title": "Simple Prometheus Dashboard",
"timezone": "browser",
"panels": [
{
"type": "graph",
"title": "Up Time Series",
"targets": [
{
"expr": "up",
"format": "time_series",
"intervalFactor": 2,
"refId": "A"
}
],
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 0
}
}
]
}
}
api_key = get_grafana_api_key(secret_id)
prometheus_url = os.environ.get('PROMETHEUS_URL')
add_prometheus_data_source(grafana_url, api_key, prometheus_url)
add_dashboard(grafana_url, api_key, dashboard_json)
wrapper-rds-k8s.sh
#!/bin/bash
cd ../terraform
SECRET_ARN=$(terraform output -raw rds_secret_arn)
REGION=$(terraform output -raw aws_region)
# Fetch secrets
REGION=$(terraform output -raw aws_region)
DB_CREDENTIALS=$(aws secretsmanager get-secret-value --secret-id $SECRET_ARN --region $REGION --query 'SecretString' --output text)
DB_USERNAME=$(echo $DB_CREDENTIALS | jq -r .username)
DB_PASSWORD=$(echo $DB_CREDENTIALS | jq -r .password)
DB_ENDPOINT=$(terraform output -raw rds_instance_endpoint)
DB_NAME=$(terraform output -raw rds_db_name)
cd -
# Create Kubernetes secret manifest
cat <<EOF > db-credentials.yaml
apiVersion: v1
kind: Secret
metadata:
name: fintech-db-secret
type: Opaque
data:
username: $(echo -n $DB_USERNAME | base64)
password: $(echo -n $DB_PASSWORD | base64)
EOF
# Create Kubernetes ConfigMap manifest for database configuration
cat <<EOF > db-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fintech-db-config
data:
db_endpoint: $DB_ENDPOINT
db_name: $DB_NAME
EOF
# Apply the Kubernetes manifests
.
├── configmap.yaml
├── db-config.yaml
├── db-credentials.yaml
├── db-service.yaml
├── deployment.yaml
├── fintech-ingress.yaml
├── secret.yaml
├── service.yaml
└── wrapper-rds-k8s.sh
Kubernetes, or K8s, is a powerful system for automating the deployment, scaling, and management of containerized applications.
Kubernetes Objects Creation: We created various Kubernetes objects, each serving a specific role in the application deployment:
configmap.yaml
): Used to store non-confidential configuration data, like database connection strings.db-credentials.yaml
, secret.yaml
): Managed sensitive data like database passwords, ensuring they're stored securely and accessible only to the relevant components.deployment.yaml
): Defined the desired state of the application, including the Docker image to use, the number of replicas, and other specifications.service.yaml
): Provided a stable interface to access the application pods.fintech-ingress.yaml
): Managed external access to the application, routing traffic to the appropriate services.db-config.yaml
, db-service.yaml
) to manage the database configuration and service, ensuring a decoupled architecture where the application and database are managed independently.fintech-ingress.yaml
file for defining rules for external access to our application, including URL routing and SSL termination.To access the app on AWS, get the address of the ingress by running
kubectl get ingress
By leveraging them alongside Terraform and Ansible, I was able to create a highly efficient, automated, and error-resistant environment. This not only saved time but also enhanced the reliability and consistency of the infrastructure and monitoring setup.
This pipeline was designed to automate the build, test, deploy, and monitoring processes, ensuring a smooth and efficient workflow. Here's an in-depth look at the choices I made :
# ---- Preparation Stage ----
preparation:
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: eu-central-1
# ---- Terraform Provisioning Stage ----
terraform-provisioning:
needs: preparation
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v4
- name: AWS Configure Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: eu-central-1
- name: Set up Terraform
uses: hashicorp/setup-terraform@v3
- name: Terraform Init and Apply
run: |
cd ./terraform
terraform init
terraform apply -auto-approve
init
and apply
commands were used to initialize the working directory containing Terraform configurations and to apply the changes required to reach the desired state of the configuration.# ---- Build Stage ----
build:
needs: terraform-provisioning
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v4
- name: AWS Configure Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: eu-central-1
- name: Build Docker Image
run: docker build -t fintech-app-repo:${{ github.sha }} ./src
- name: Save Docker Image
run: |
docker save fintech-app-repo:${{ github.sha }} > fintech-app.tar
- name: Upload Docker Image Artifact
uses: actions/upload-artifact@v4
with:
name: docker-image
path: fintech-app.tar
# ---- Publish Stage ----
publish:
needs: build
runs-on: ubuntu-latest
steps:
- name: AWS Configure Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: eu-central-1
- name: Login to Amazon ECR
uses: aws-actions/amazon-ecr-login@v2
- uses: actions/download-artifact@v4
with:
name: docker-image
path: .
- name: Load Docker Image
run: docker load < fintech-app.tar
- uses: aws-actions/amazon-ecr-login@v2
- name: Push Docker Image to Amazon ECR
run: |
docker tag fintech-app-repo:${{ github.sha }} ${{ secrets.ECR_REGISTRY }}:${{ github.sha }}
docker push ${{ secrets.ECR_REGISTRY }}:${{ github.sha }}
# ---- Deployment Stage ----
deployment:
needs: publish
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v4
- name: AWS Configure Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: eu-central-1
- name: Retrieve and Set up Kubernetes Config
run: |
cd ./terraform
terraform init
eval "$(terraform output -raw configure_kubectl)"
- name: Install eksctl
run: |
ARCH=amd64
PLATFORM=$(uname -s)_$ARCH
curl -sLO "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_$PLATFORM.tar.gz"
tar -xzf eksctl_$PLATFORM.tar.gz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
- name: Check and Add IAM User to EKS Cluster
env:
CLUSTER_NAME: fintech-eks-cluster # Replace with your actual cluster name
USER_ARN: ${{ secrets.USER_ARN }}
run: |
# Check if the user is already mapped to the EKS cluster
if eksctl get iamidentitymapping --cluster "$CLUSTER_NAME" --arn "$USER_ARN" | grep -q "$USER_ARN"; then
echo "User ARN $USER_ARN is already mapped to the EKS cluster"
else
# Add the user to the EKS cluster
eksctl create iamidentitymapping --cluster "$CLUSTER_NAME" --arn "$USER_ARN" --username wsl2 --group system:masters
echo "User ARN $USER_ARN added to the EKS cluster"
fi
- name: run k8s script
run: |
cd ./k8s/
chmod +x ./wrapper-rds-k8s.sh
./wrapper-rds-k8s.sh
- name: Update Kubernetes Deployment Image Tag
run: |
sed -i "s|image:.*|image: ${{ secrets.ECR_REGISTRY }}:${{ github.sha }}|" ./k8s/deployment.yaml
- name: Apply Kubernetes Ingress
run: |
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.8.2/deploy/static/provider/aws/deploy.yaml
sleep 25
- name: Apply Kubernetes Manifests
run: |
kubectl apply -f ./k8s/
sleep 30
- name: Check Pods Status
run: kubectl get pods -o wide
- name: Get Ingress Address
run: kubectl get ingress -o wide
eksctl
, enhancing our cluster's access and security management (I created a specific IAM user for the ci-cd pipeline but I wanted to interact with the cluster on my local machine which have another IAM user).wrapper-rds-k8s.sh
) to manage Kubernetes resources and settings, showcasing our ability to automate complex Kubernetes tasks. # ---- Monitoring Setup Stage ----
monitoring-setup:
needs: deployment
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v4
- name: AWS Configure Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: eu-central-1
- name: install with pip the following packages
run: |
pip3 install boto3
pip3 install requests
pip3 install python-dotenv
- name: Update Inventory with Latest IP Addresses
run: |
cd ./terraform
terraform init
cd ../ansible
python3 update_inventory.py
- name: Create PEM Key File
run: |
cd ./ansible
echo -e "${{ secrets.PEM_KEY }}" > ../terraform/fintech-monitoring.pem
chmod 400 ../terraform/fintech-monitoring.pem
- name: ansible playbook for the monitoring
run: |
cd ./ansible
ansible-playbook playbook.yml -vv
- name: Generate Grafana API Key and Update AWS Secret
run: |
cd ./ansible
python3 generate_grafana_api_key.py
- name: Add Dashboard to Grafana
run: |
cd ./ansible
python3 add_grafana_dashboard.py
As I conclude this project, it's important to reflect on the lessons learned and the personal and professional growth that came with it.
Understanding the Full Spectrum: From coding in Python to orchestrating containers with Kubernetes, every step was a puzzle piece, contributing to a bigger picture.
The Power of Automation: One of the key takeaways from this experience is the incredible power and efficiency of automation. Whether it was using Terraform for infrastructure setup, Ansible for configuration, or GitHub Actions for CI/CD, automating repetitive and complex tasks not only saved time but also reduced the scope for errors.
Collaboration and Community: The role of community resources and collaboration was invaluable. Whether it was seeking help from online forums, GitHub repositories, or directly friends.
Check the code on GITHUB --> https://github.com/bobocuillere/HA-web-application-workshop-AWS
Nowadays, using Terraform or any Infrastructure as Code services is vital, especially in the cloud computing world where everything is "disposable," you could create an entire infrastructure in minutes and made it available globally without waiting weeks to do it like before.
Terraform come into play by allowing to rapidly provision and manage infrastructure.
This blog post will entirely transform the "The AWS Highly available web application" workshop into terraform. You can find it by following this link.
(https://ha-webapp.workshop.aws/introduction/overview.html).
Why? Because I don't want to waste 4 hours each time I want to do it, and more importantly, it provides versioning, meaning I can rapidly see, change the infrastructure or even destroy it rapidly.
Prerequisites:
So let's start by explaining the architecture!
The workshop doesn't use Route 53 or a CDN.
We will start with number 3.
The workshop is to build a highly available web application through regional services and availability zones.
We have six subnets in total (3 in each AZs).
Every part is redundant and resilient to failure.
Find the terraform code by following this link :GitHub for the project
So how the code works?
We have 11 files in total:
Start by changing your AWS_PROFILE name in the TFVARS.
Your profile name is found at %USERPROFILE%\.aws\credentials (Windows) and ~/.aws/credentials (Linux & Mac).
The variable named "linux_ami" is a RHEL 8 image.
The AMI has to be changed if you use another region other than us-east-1.
You can change any other variable to suit yourself but let's keep it vanilla as AWS put it in the workshop.
Everything is set up to launch terraform.
Open a PowerShell shell and type:
terraform init
and
terraform apply
The entire workshop will be created (resources, security groups, configurations, etc.).
When it's finished, you'll click on the LoadBalancer_DNS_OUTPUT to access the website.
I have modified the script in the workshop because it didn't work (many dependencies and packages failed), so it's just the default Apache homepage.
You will have to change the bash script if you want more of it, my goal was to transform the workshop into terraform.
Quick recap: An Active Directory with ADFS was configured.
We created a user named Jean, added him to two previously created groups.
Now it's time to configure AWS.
PART 1 is available by clicking HERE.
Quick recap: An Active Directory with ADFS was configured.
We created a user named Jean, added him to two previously created groups.
Now it's time to configure AWS.
Once finished, create two IAM roles by choosing the option SAML and the Identity provider (SAML provider) we made earlier.
We shall call our roles AWS-PROD-ADMIN and AWS-PROD-DEV.
Putting AWS before the name is for a good reason I'll explain later (see the error #3 in the section Bugs and Errors below.
These names look familiar because they are the same as our 2 AD groups created at the beginning (PART 1)
Summary: In layman's terms, we gave authorization for our ADFS (The Identity Provider here) to communicate and be used in AWS. We created two roles that our future users will get their permissions from, which are link with the AD groups on-premise.
AWS trust this Identity Provider (ADFS), now we also have to do it the other way, meaning ADFS needs to trust AWS.
AWS is now a trusted relying party in ADFS.
The claim rules are elements needed by AWS like (NameId, RoleSessionName, and Roles) that ADFS doesn't provide by default.
Adding NameId
Adding RoleSeesionName
Claim rule name: RoleSessionName
Attribute store: Active Directory
LDAP Attribute: E-Mail-Addresses
Outgoing Claim Type: https://aws.amazon.com/SAML/Attributes/RoleSessionName
Now we only need to add the Role attributes to finish, but I'll explain what will be happening.
The next two claims are custom made. The first one will get all the groups of the authenticated user (when authenticating on the ADFS page), while the second rule will do the transformation into the roles claim by matching the role's names.
Hope it's clear!
Adding Role Attributes
c:[Type == "https://schemas.microsoft.com/ws/2008/06/identity/claims/windowsaccountname", Issuer == "AD AUTHORITY"]
=> add(store = "Active Directory", types = ("https://temp/variable"), query = ";tokenGroups;{0}", param = c.Value);
c:[Type == "https://temp/variable", Value =~ "(?i)^AWS-"] => issue(Type = "https://aws.amazon.com/SAML/Attributes/Role", Value = RegExReplace(c.Value, "AWS-", "arn:aws:iam::123456789012:saml-provider/Federation-Demo,arn:aws:iam::123456789012:role/"));
Make sure you add the correct ARN for the identity provider (green) which is here:
Same for the role's ARN (yellow)
Open a Powershell Administrator shell and enter :
Get-AdfsProperties
Check that the option EnableIdpInitiatedSignOnPage is True.
To set it to True:
Set-Adfsproperties -enableIdPInitiatedSignonPage $true
Once done, you can restart the service:
Restart-Service adfssrv
Some pages are in French since I'm French :).
Source is my domain's name.
Error #1: Specified provider doesn't exist.
This error is due to the provider's ARN in the issuance's Claims being faulty.
Solution: Check if everything is correctly entered for the Identity Provider we created on AWS at the beginning.
Error #2: Error: RoleSessionName is required in AuthnResponse
Error due to the AD user not having some necessary attributes, which are:
Solution: Checking that each of those attributes is correctly filled.
Some of those options need you to activate the Advanced Features of the Active Directory.
Error #3: Not authorized to perform sts:AssumeRoleWithSAML
This is a matching name error of the Group's name and the Role's name on AWS you're trying to connect with.
Remember what I've told you in the Configuring AWS section about why we put AWS before the name? This is it!
The Roles we created in AWS are the following, and we're interested in the highlighted role below :
I've purposely created an AD's Group without AWS (PROD-TEST) at the beginning to show you.
The two names are not matched.
Solution: Make them matched by either changing the name of your AD Group by adding AWS (example: AWS-845125856994-AWS-PROD-TEST) or adding an AWS tag in the Role's issuance claim here:
I'll be glad to answer any questions or help if needed by contacting me.
]]>Most enterprises using the cloud would want to federate their existing users base, meaning creating an SSO (Single Sign-On) environment to authorize with specific rights what a person can do in AWS cloud.
This post will describe how to use enterprise federation, the integration of ADFS and AWS.
Prerequisites:
1 - User will connect to their ADFS portal
2 - ADFS will check the user access and authenticate against the AD.
3 - A response is receive as a SAML assertion with group membership information.
4 - The ARNs are dynamically build using ADs group membership information for the IAM roles, while the user attributes (distinguishedName, mail, sAMAccountName) are used for the AWS account IDs.
Finally, ADFS will send a signed assertion to AWS STS.
5 - Temporary credentials are giving by STS AssumeRoleWithSAML.
6 - The user is authenticated and giving access to the AWS management console.
Before configuring ADFS, you'll need to have an active working directory.
Before installing our ADFS role, we created a user name Jean whom we added to two groups.
After installing the role, configuring and setting up the environment is easy by keeping the default settings.
Launch the ADFS management page by searching AD FS.
Click to Configure the WIZARD :
If you encounter this error, it means we have to set up the service account created earlier.
setspn -a host/localhost adfsaws
If the command succeeds, you should see something like that:
Configure AWS, which will be in PART 2 -->CLICK HERE
]]>