Portfolio Project

AWS Serverless
Intake Automation
Platform

A home health startup was drowning in endless records and intake documentation with no scalable path forward. I designed this serverless pipeline on AWS to show a baseline vision on how to replace that manual process with an event-driven architecture. This solution is projected to reduce document processing time by 70 percent and intake cycle time by 60 percent. As a bonus this would require no servers to manage, no polling loops, and near-zero idle cost.

Amazon S3 AWS Lambda Amazon Textract Amazon SNS DynamoDB AWS IAM Node.js 22 CloudWatch
View Proof of Work ↓
01 Overview

What This Project Demonstrates

A home health startup was drowning in endless records and intake documentation with no scalable path forward. I designed this serverless pipeline on AWS to show a baseline vision on how to replace that manual process with an event-driven architecture. This solution is projected to reduce document processing time by 70 percent and intake cycle time by 60 percent.

This architecture reflects real-world patterns used in document intake systems, insurance claim processing, healthcare form digitization, and legal document management.

Real-world applications

  • Document intake systems
  • Insurance claim processing
  • Healthcare form digitization
  • Legal document management

Skills Demonstrated

Serverless Compute & Event-Driven Design Every component in this pipeline reacts to an event. Nothing runs unless something happens. That was a deliberate choice rooted in how real document intake systems behave in practice. When a clinic is slammed with patients, the last thing staff need is a system running up cost in the background waiting for work to arrive.
🔔
Async Job Orchestration via SNS I chose SNS notification over polling because polling is fragile in production. Textract tells the pipeline when it is finished rather than the pipeline repeatedly asking. That one decision removes an entire category of timeout and race condition risk from the architecture.
🔐
IAM Least-Privilege Security In a HIPAA aligned environment you do not get to be casual about permissions. Each Lambda has its own role scoped to exactly what it needs and nothing more. That is not just a best practice, it is the difference between a compliant architecture and a liability.
☁️
Cloud-Native Service Integration S3, Lambda, Textract, SNS, and DynamoDB are not just tools listed on a resume. In this pipeline they work together as a single cohesive system where each service hands off cleanly to the next. That kind of integration is what separates a proof of concept from something an engineering team could actually build on.
02 Architecture

How the Pipeline Works

As a former patient registration coordinator I know first hand what it is like for a clinic to get slammed with patients and endless paperwork. Faxed intake forms piling up, paper records that need to be filed, and a scanner that never moves fast enough. Those hours spent manually processing documents are not just an IT problem, they are a patient care problem. Every minute staff spend filing is a minute they are not spending caring for the needs of patients.

This pipeline was built with that reality in mind.
AMAZON S3 client-intake-forms-bucket LAMBDA #1 StartTextractJob TEXTRACT Async OCR SNS TOPIC TextractJob Complete PUT trigger Start Job Complete Notify LAMBDA #2 ProcessTextractResults DYNAMODB ClientIntakeForms Store text CLOUDWATCH LOGS /aws/lambda/StartTextractJob · /aws/lambda/ProcessTextractResults Execution logs · Job IDs · Extracted line counts · Errors logs logs

The pipeline is fully event-driven. Nothing runs unless something happens. A PDF upload to S3 fires a PUT event that invokes Lambda #1, which starts an asynchronous Textract job and passes the TextractJobComplete SNS topic as its notification destination. When Textract finishes, it publishes a completion message to SNS, which triggers Lambda #2. Lambda #2 paginates through all Textract result blocks using NextToken, then writes the extracted text, Job ID, and timestamp to DynamoDB. Both Lambda functions emit structured logs to dedicated CloudWatch log groups, capturing execution details, Textract Job IDs, extracted line counts, and any errors, providing full observability across the pipeline with zero additional infrastructure.

Step 01
PDF Lands in S3
A document gets uploaded to the S3 bucket. That single upload kicks off everything. No one has to touch it again.
Step 02
Lambda #1 Triggers
S3 fires a PUT event and Lambda #1 starts an asynchronous Textract job, hands it an SNS topic to report back to when finished, and exits. Its job is done in seconds.
Step 03
Textract Processes
Textract reads the document. The pipeline is not waiting on it. Nothing is polling. The rest of the system sits completely idle and costs nothing until Textract signals it is done.
Step 04
SNS Publishes Completion
Textract publishes a completion message to SNS. That message is the only thing that moves the pipeline forward. If something fails it fails cleanly with no silent data loss.
Step 05
Lambda #2 Retrieves Results
SNS triggers Lambda #2 which pages through every block of extracted text until nothing is left. No content gets dropped on longer documents the way a tired registration coordinator absolutely would at hour six of a visit spike.
Step 06
Stored in DynamoDB
Extracted text, Job ID, and timestamp land in DynamoDB keyed by file name. Structured, searchable, and ready for whatever system needs it next. No scanner. No filing cabinet. No lost fax.
03 Source Code

The Lambda Functions

The two functions that power the pipeline. No servers, no polling, no hardcoded credentials. All resource references are stored as environment variables.

Lambda #1 · StartTextractJob JavaScript · Node.js 22
import {
  TextractClient,
  StartDocumentTextDetectionCommand
} from "@aws-sdk/client-textract";

const textract = new TextractClient({});

export const handler = async (event) => {
  console.log("Lambda #1 triggered:", JSON.stringify(event));

  const record = event.Records[0];
  const bucket = record.s3.bucket.name;
  const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));

  const params = {
    DocumentLocation: {
      S3Object: {
        Bucket: bucket,
        Name: key
      }
    },
    NotificationChannel: {
      SNSTopicArn: process.env.SNS_TOPIC_ARN,
      RoleArn: process.env.TEXTRACT_ROLE_ARN
    },
    JobTag: key
  };

  try {
    const response = await textract.send(
      new StartDocumentTextDetectionCommand(params)
    );
    console.log("Textract job started. JobId:", response.JobId);
    return { statusCode: 200, body: "Job started successfully" };
  } catch (err) {
    console.error("Error starting Textract job:", err);
    throw err;
  }
};
Lambda #2 · ProcessTextractResults JavaScript · Node.js 22
import {
  TextractClient,
  GetDocumentTextDetectionCommand
} from "@aws-sdk/client-textract";

import {
  DynamoDBClient,
  PutItemCommand
} from "@aws-sdk/client-dynamodb";

const textract = new TextractClient({});
const dynamodb = new DynamoDBClient({});

const getAllBlocks = async (jobId) => {
  let blocks = [];
  let nextToken = null;

  do {
    const params = { JobId: jobId };
    if (nextToken) params.NextToken = nextToken;

    const response = await textract.send(
      new GetDocumentTextDetectionCommand(params)
    );

    blocks = blocks.concat(response.Blocks || []);
    nextToken = response.NextToken || null;
  } while (nextToken);

  return blocks;
};

export const handler = async (event) => {
  console.log("Lambda #2 triggered:", JSON.stringify(event));

  const snsMessage = JSON.parse(event.Records[0].Sns.Message);
  const jobId = snsMessage.JobId;
  const status = snsMessage.Status;
  const fileName = snsMessage.JobTag;

  console.log(`Job ID: ${jobId} | Status: ${status} | File: ${fileName}`);

  if (status !== "SUCCEEDED") {
    console.error("Textract job did not succeed. Status:", status);
    return { statusCode: 400, body: "Textract job failed" };
  }

  try {
    const blocks = await getAllBlocks(jobId);

    const lines = blocks
      .filter(b => b.BlockType === "LINE")
      .map(b => b.Text)
      .join("\n");

    console.log(`Extracted ${lines.split("\n").length} lines of text`);

    await dynamodb.send(new PutItemCommand({
      TableName: process.env.DYNAMODB_TABLE,
      Item: {
        FileName: { S: fileName },
        ExtractedText: { S: lines },
        JobId: { S: jobId },
        ProcessedAt: { S: new Date().toISOString() }
      }
    }));

    console.log("Successfully saved to DynamoDB");
    return { statusCode: 200, body: "Processing complete" };
  } catch (err) {
    console.error("Error processing results:", err);
    throw err;
  }
};
04 Proof of Work

Implementation Walkthrough

Screenshots from the live AWS environment showing each stage of the pipeline working end-to-end.

S3 Document Upload
Amazon S3 · Entry Point
S3 Document Upload
Documents uploaded into S3 automatically trigger the serverless processing workflow.
Lambda Trigger Workflow
AWS Lambda · Trigger
Lambda Trigger Workflow
Lambda function connected to S3 events to start asynchronous Textract processing.
SNS Notification Trigger
Amazon SNS · Async Notification
SNS Notification Trigger
SNS notifications trigger downstream processing once OCR extraction is complete.
CloudWatch Monitoring
CloudWatch · Monitoring
CloudWatch Monitoring
CloudWatch logs showing successful Lambda execution and Textract job processing.
DynamoDB Processed Records
DynamoDB · Output
DynamoDB Processed Output
Processed intake data successfully stored in DynamoDB after OCR extraction and workflow completion.
DynamoDB Extracted Record
DynamoDB · Record Detail
Extracted Record Detail
The full extracted text from a client intake form, including name, DOB, address, and reason for visit, stored as structured data keyed by file name.
05 Best Practices Applied

Architecture Decisions

Asynchronous Processing
The synchronous Textract API was not a viable option here. Client intake forms vary in length and a synchronous call would have created timeout risk on anything beyond a single page document. Using StartDocumentTextDetection means the pipeline handles multi-page documents without modification and never fails because a file was too large.
Event Driven Design
Every component in this pipeline reacts to something that already happened. Nothing polls, nothing waits, and nothing runs unless a real event triggers it. That pattern keeps idle cost near zero and reflects how production document processing systems actually behave at scale.
Least Privilege IAM
Each Lambda function has its own role scoped to only the permissions it actually needs to do its job. This was not optional. In a HIPAA aligned environment a shared role with broad permissions is a compliance risk, not just a best practice violation. Scoping at the function level means a misconfiguration in one component cannot cascade permissions across the pipeline.
Pagination Handling
Textract returns results in pages and stops giving you data if you stop asking for it. Lambda #2 loops through every page using NextToken before writing to DynamoDB. Skipping that step would mean silently dropping text from longer documents with no error and no indication anything went wrong.
Environment Variables
Every ARN and resource name is stored as an environment variable. Nothing is hardcoded. That decision makes the project portable across environments and means resource names can change without touching the function code.
Single Responsibility
Each component does one thing. Lambda #1 starts the job. Lambda #2 retrieves the results. SNS carries the signal between them. Keeping responsibilities separated means any component can be updated, replaced, or debugged without touching the rest of the pipeline.
06 Program Delivery Layer

More Than a Technical Build

This project was structured as a formal cloud migration feasibility program, not just a technical build. I produced architecture decision records documenting the tradeoffs behind each service choice, a risk assessment covering HIPAA alignment and data handling, and a phased implementation roadmap designed to guide engineering adoption across the organization. That structure reflects how I approach cloud delivery: the architecture has to be something a team can actually adopt and operate, not just something that works in isolation.

Upon researching HIPAA compliance for transitions of this nature, one consideration that stands out is the AWS Shared Responsibility Model. AWS secures the underlying cloud infrastructure, but the agency itself is responsible for configuring it securely. That distinction matters significantly in a healthcare environment and is something this engagement is actively working through as part of ongoing planning and research. Building the architecture is only part of the work. Understanding what it takes to operate it safely is the other half.

07 Resources Created

What Was Built

Resource Type Name Purpose
S3 Bucket client-intake-forms-bucket Stores uploaded PDFs and serves as the entry point of the pipeline
Lambda Function StartTextractJob Triggered by S3 PUT and starts the async Textract job
Lambda Function ProcessTextractResults Triggered by SNS and retrieves results then writes to DynamoDB
SNS Topic TextractJobComplete Notifies Lambda #2 when Textract finishes
DynamoDB Table ClientIntakeForms Stores extracted text keyed by file name
IAM Role LambdaTextractStartRole Grants Lambda #1 permission to read S3 and start Textract
IAM Role TextractSNSRole Allows Textract to publish completion to SNS
IAM Role LambdaTextractResultRole Grants Lambda #2 permission to fetch results and write to DynamoDB