// systems shipped

Built.
Not described.

Production systems with real benchmarks, real architecture decisions, and real things that went wrong along the way.

01
Distributed Systems Serverless SAGA Pattern
CloudFlow — Distributed SAGA Order Processing
Python · AWS Step Functions · Lambda · DynamoDB · SQS · CDK · LocalStack · X-Ray
+
1,100+Req/min
<120msP99 Latency
30+Tests
SAGAPattern
AutoCompensation
// architecture flow
API GW
Order λ
SQS
Reserve λ
Payment λ
Confirm λ
DynamoDB
X-Ray
On failure → compensating transactions auto-rollback
load_test.py — LocalStack
# 50 concurrent threads
$ python load_test.py --threads 50 --duration 60
Requests sent: 68,412
Success rate: 99.97%
Req/min peak: 1,147
P50 latency: 34ms
P99 latency: 118ms
Compensation: 100% triggered

Why SAGA choreography

Chose choreography over orchestration — each Lambda reacts to SQS events independently. No central brain. Better fault isolation and service autonomy. Compensating transactions handle rollback without a coordinator knowing about it.

Hardest engineering problems

Idempotency under duplicate SQS delivery. Partial failures mid-SAGA. Circuit breaker calibration to prevent cascade failures without over-tripping on transient errors. Getting X-Ray to trace across Lambda boundaries without noise.

Why a recruiter should care: SAGA choreography is how Uber, Netflix, and Amazon handle long-lived distributed transactions in production. Building it without a framework — understanding every moving part — demonstrates the systems thinking that separates senior engineers from developers who just call managed services.
02
Analytics Serverless Lambda Architecture
CloudPulse — Real-Time Analytics Platform
Python · Kinesis · S3 · Athena · DynamoDB · Terraform · React · Cognito · Glue
+
DualPath Lambda Arch
FreeAWS Tier
100%Terraform IaC
JWTCognito Auth
// lambda architecture
Speed Layer
Kinesis
Lambda
DynamoDB TTL
Batch Layer
SQS
S3 / Glue
Athena SQL
// cost + scale profile
Monthly cost
$0 (Free Tier)
IaC coverage
100%
Manual setup
0 clicks
Data freshness
<30s

Architecture decision

Lambda Architecture separates real-time and batch cleanly. Speed layer handles low-latency dashboards via Kinesis → Lambda → DynamoDB. Batch layer stores full event history in S3 for Athena SQL queries — both paths run independently.

Interesting detail

DynamoDB 24-hour TTL auto-cleans stale data without Lambda or cron. Glue Data Catalog makes S3 data queryable via standard SQL in Athena. Cognito JWT auth means API stays stateless and horizontally scalable.

Why it matters: Lambda Architecture is used by Netflix, LinkedIn, and Twitter for analytics at scale. Running this within AWS Free Tier shows infrastructure cost awareness — a trait most junior engineers lack.
03
Security CIS Benchmark Automation
CSPM — Cloud Security Posture Management
Python · Lambda · EventBridge · SNS · S3 · CloudWatch · Terraform · GitHub Actions
+
CISBenchmark v1.5
<5sAlert latency
HourlyEventBridge scan
AutoRemediation
4+AWS services scanned
cspm_scan.py — scan output
$ python cspm_scan.py --env prod
[IAM] Scanning policies... done
[CRITICAL] Root account MFA disabled
[HIGH] 3 overly permissive policies found
[S3] Checking bucket ACLs...
[HIGH] 2 public buckets detected
[SNS] Alert dispatched in 3.2s
[AUTO] Remediating safe violations...
[DONE] Report saved → s3://audit/2026-05
// scan pipeline
EventBridge
Scanner λ
IAM checks
S3 checks
EC2 checks
SNS alert
S3 audit log
Auto-remediate safe violations · block critical

What it detects

CIS Benchmark v1.5 controls across IAM (root MFA, key rotation, overpermissive policies), S3 (public access, versioning, logging), EC2 (security groups, encryption), and CloudTrail (logging enabled, multi-region).

Why this stack

EventBridge for scheduling over CloudWatch Events — better audit trail. SNS over SES for alerts — topic-based, scales to multiple subscribers. All infrastructure in Terraform — CI auto-deploys scanner updates, zero manual provisioning.

Why it matters: CSPM is a $9B market. Wiz raised $1B doing this at enterprise scale. Building it from scratch shows security engineering depth, not just cloud operations. Most engineers know how to deploy — fewer know how to secure what they deploy.
04
Research Formal Methods arXiv
AgriFuture India — Formal Verification Research
TLA+ · BFS Model Checker (Python) · Formal Verification · Distributed Marketplace
+
NovelResearch finding
BFSState exhaustion
TLA+Formal spec
arXivPublished
tlc_agrifuture_final.py
# BFS model checker — TAR preservation
$ python tlc_agrifuture_final.py
States explored: 14,872
Violations found: 1 CRITICAL
TAR violation in partial revocation path:
state[42] → race condition window open
Finding: atomic revocation required
// verification methodology
TLA+ Spec
BFS Checker
State space
14,872 states
Finding
arXiv preprint
Novel: TAR preservation requires atomic revocation

The finding

TAR (Trust and Access Revocation) preservation in distributed marketplaces requires atomic revocation. Partial revocation — revoking access step-by-step — creates a race condition window where unauthorized access is possible. Proven by BFS exhaustive state space exploration.

Why it's real research

Not an implementation. A correctness theorem with a formal proof. Built a custom BFS model checker in Python to exhaustively explore the protocol state space. Found the violation at state 42 of 14,872. Published on arXiv with TLA+-style spec and full methodology.

Why it matters: TLA+ is used by Amazon engineers to verify DynamoDB and S3 protocols. Microsoft uses it for distributed systems in Azure. Most CS graduates have never written a formal spec. This is a differentiator that can't be faked.
05
RAG / AI LLMs Vector Search
Wikipedia Smart Search — RAG QA System
Python · RAG · Vector Search · Embeddings · LLMs · Semantic Retrieval
+
RAGArchitecture
VectorSemantic Search
ZeroHallucination design
NoFramework wrappers
rag_pipeline.py
# Query: "SAGA pattern in distributed systems"
$ python query.py --q "SAGA pattern"
Embedding query... done (34ms)
Vector search top-5... done (12ms)
Chunks retrieved: 5
Context window: 3,847 tokens
LLM inference... done (1.2s)
Answer: grounded response with citations
// RAG pipeline
Wikipedia
Chunk + Embed
Vector Index
Query Embed
Top-k chunks
LLM
Answer
Grounded response — no hallucination

Built without wrappers

No LangChain. No LlamaIndex. Raw embedding API calls, manual vector indexing, custom retrieval pipeline, hand-written prompt templates. Understanding the internals means debugging when things break — which they do, in production.

Production relevance

Every cloud team is building RAG pipelines over internal runbooks, documentation, and knowledge bases. AWS Q Business, Google Vertex AI Search, and Azure AI Search all use this pattern. Understanding it at the implementation level is increasingly required.

No projects match this filter.