Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

cass-spark Overview

CASS (Coding Accuracy Support System) address validation service using Apache Spark, integrated with Melissa Data for address standardization and AWS SQS for message-driven processing.

Purpose

cass-spark processes addresses through CASS validation using Melissa Data’s address object library. It standardizes and validates US postal addresses, adding critical postal data elements required for direct mail campaigns:

  • Address Standardization - Corrects and formats addresses to USPS standards
  • DPV (Delivery Point Validation) - Confirms address deliverability
  • ZIP+4 Assignment - Adds 4-digit ZIP extensions for postal discounts
  • Carrier Route - Assigns postal carrier routes for mail sorting
  • FIPS Codes - Adds county FIPS codes for geographic analysis

Architecture

Directory Structure

cass-spark/
├── pom.xml                              # Maven build configuration (v335.0.0-SNAPSHOT)
├── build.sh                             # Assembles runnable distribution
├── README.md                            # Project documentation
└── src/
    ├── main/
    │   ├── scala/com/path2response/coop/
    │   │   ├── cli/
    │   │   │   ├── CassLocal.scala      # Core CASS processing with Spark
    │   │   │   ├── WatchForMessage.scala # SQS message listener/processor
    │   │   │   ├── CheckSQSMessages.scala # SQS message checker utility
    │   │   │   └── ListSQSQueues.scala   # SQS queue listing utility
    │   │   └── util/
    │   │       ├── SQSUtil.scala        # SQS helper functions
    │   │       └── SQSMessage.scala     # SQS message data models
    │   └── scripts/bin/
    │       ├── casslocal.sh             # Spark submit wrapper for CASS
    │       ├── watchForMessage.sh       # Message watcher launcher
    │       ├── checkSQSMessage.sh       # Message check utility
    │       ├── listSQSQueue.sh          # Queue listing utility
    │       └── install.sh               # Installation script
    └── test/scala/com/path2response/coop/cli/
        └── CassLocalSpec.scala          # Unit tests for address processing

Technology Stack

ComponentVersionNotes
Scala2.12.18Locked to Spark version
Spark3.5.2Distributed processing engine
Java1.8Compilation target
AWS SDK2.34.8SQS integration
Jackson2.15.2JSON serialization
Melissa Data mdAddr0.0.1CASS processing via JNI
ScalaTest(via spark-testing-base)Testing framework

Dependencies

Internal (Path2Response):

  • spark-common (coop-scala v332.0.0) - Shared Spark utilities
  • convert (coop-scala v332.0.0) - Data conversion utilities

External:

  • Melissa Data mdaddr - CASS address object (JNI wrapper)
  • AWS SDK v2 for SQS operations

Core Functionality

Command Line Tools

ToolPurposeScript
casslocalProcess addresses through CASS using Sparkcasslocal.sh
watchForMessageMonitor SQS queue and trigger CASS runswatchForMessage.sh
checkSQSMessageCheck for pending messages on a queuecheckSQSMessage.sh
listSQSQueuesList available SQS queueslistSQSQueue.sh

CASS Processing (CassLocal)

The ProcessCassLocal Spark application:

  1. Input: JSON-formatted address file from S3 (AddressRequest format)
  2. Processing:
    • Loads addresses and sorts by ZIP code for cache efficiency
    • Initializes Melissa Data address object per partition (JNI)
    • Validates each address through CASS
    • Returns standardized address with postal codes
  3. Output: JSON-formatted results with CASS data (MDResponse format)

Address Input Format (AddressRequest):

{
  "id": "unique-identifier",
  "address1": "123 MAIN STREET",
  "address2": "APT 2",
  "city": "DENVER",
  "state": "CO",
  "zip": "80202"
}

CASS Output Format (MDResponse):

{
  "id": "unique-identifier",
  "cass": {
    "address1": "123 MAIN ST",
    "address2": "APT 2",
    "city": "DENVER",
    "state": "CO",
    "zip": "80202",
    "plus4": "1234",
    "dpc": "01",
    "dpcd": "2",
    "dpv": "Y",
    "countryFips": "08031",
    "carrierRoute": "C001",
    "dpvFootNote": "AABB",
    "zipType": " ",
    "recordType": "S"
  }
}

Key CASS Fields:

FieldDescription
dpvDelivery Point Validation (Y/N) - AS01 result code means deliverable
plus4ZIP+4 extension for postal discounts
dpcDelivery Point Code (barcode component)
dpcdDelivery Point Check Digit
carrierRoutePostal carrier route code
countryFipsCounty FIPS code
recordTypeAddress type code (S=Street, H=Highrise, etc.)
zipTypeZIP code type (P=PO Box, U=Unique, M=Military)

SQS Message Processing (WatchForMessage)

The message watcher provides automated CASS processing:

  1. Monitoring: Polls SQS FIFO queue for S3 event notifications
  2. Triggering: When file appears in toCass/ folder, triggers CASS processing
  3. Processing: Runs casslocal.sh against the input file
  4. Cleanup:
    • Moves results to fromCass/ folder
    • Removes processed input file
    • Deletes SQS message from queue

Configuration Options:

OptionDefaultDescription
--queue-uri(required)SQS FIFO queue URL
--cass-cmdcasslocal.shCASS processing command
--in-foldertoCassInput folder prefix
--out-folderfromCassOutput folder prefix
--log-folder.Log file directory
--wait-time20SQS poll interval (0-20 seconds)
--run-oncefalseExit after one message
--dry-runfalseCheck messages without processing

Melissa Data Integration

JNI Configuration

The Melissa Data address object is accessed via JNI (Java Native Interface):

Data Files Location: /opt/melissadata/data/

Required files:

  • mdAddr.dat - Main address data
  • mdAddr.lic - License file
  • mdAddr.nat - National data
  • mdAddr.str - Street data

CASS Add-ons (for highest validation level):

  • DPV (Delivery Point Validation)
  • LACSLink (Locatable Address Conversion System)
  • SuiteLink (Suite/apartment number validation)

Spark Configuration:

--conf 'spark.driver.extraLibraryPath=/opt/melissadata/AddrObj'
--conf 'spark.executor.extraLibraryPath=/opt/melissadata/AddrObj'

License Management

The Melissa Data license can be set via:

  1. Environment variable: MDADDR_LICENSE
  2. Programmatic: ao.SetLicenseString("LICENSE_KEY")

License information logged at startup:

  • License expiration date
  • Build number
  • Database date
  • Database expiration date

Data Flow

CASS Processing Pipeline

┌──────────────────┐
│  Data Tools      │  (cassRunner service monitors for CASS-ready files)
│  cassRunner      │
└────────┬─────────┘
         │ Upload to S3
         ▼
┌──────────────────┐    SQS Notification    ┌──────────────────┐
│   S3 Bucket      │ ───────────────────────│   SQS FIFO       │
│   toCass/        │                        │   Queue          │
└────────┬─────────┘                        └────────┬─────────┘
         │                                           │
         │                                           │ Poll
         │                                           ▼
         │                                  ┌──────────────────┐
         │                                  │ watchForMessage  │
         │                                  │   (listener)     │
         │                                  └────────┬─────────┘
         │                                           │
         │              ┌───────────────────────────┘
         │              │ Trigger
         ▼              ▼
┌──────────────────────────────────────────┐
│              casslocal                    │
│     (Spark job on melissa-test server)   │
│                                          │
│  ┌─────────────┐    ┌─────────────────┐  │
│  │ Spark       │───▶│ Melissa Data    │  │
│  │ Partitions  │    │ mdAddr (JNI)    │  │
│  └─────────────┘    └─────────────────┘  │
└────────────────────────┬─────────────────┘
                         │
                         ▼ Results
┌──────────────────┐
│   S3 Bucket      │
│   fromCass/      │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  Data Tools      │  (processes CASS results)
└──────────────────┘

Integration with Data Tools

  1. Data Tools cassRunner monitors files that have completed convert and been marked “Review Complete”
  2. Files are uploaded to S3 toCass/ folder
  3. S3 event triggers SQS notification
  4. cass-spark watchForMessage detects the notification
  5. CASS processing runs via Spark
  6. Results written to fromCass/ folder
  7. Data Tools receives CASS-enriched data for downstream processing

AWS Infrastructure

S3 Buckets

BucketPurpose
p2r-cass-test-use1Test environment bucket
toCass/Input folder for CASS requests
fromCass/Output folder for CASS results

SQS Queues

QueueTypePurpose
p2r-cass-test-use1-toCass.fifoFIFOInput notifications
p2r-cass-test-use1-fromCass.fifoFIFOOutput notifications

Region: us-east-1

Server

HostPurpose
melissa-test.path2.response.comCASS processing server with Melissa Data installation

Development

Building

# Full build (includes mvn install which triggers build.sh)
mvn clean install

# Output: target/cass-spark.tar.gz

The build process:

  1. Maven compiles Scala code
  2. Creates shaded JAR with dependencies
  3. build.sh assembles distribution package:
    • Copies dependencies to lib/
    • Copies shell scripts to bin/
    • Creates tarball for deployment

Installation

  1. Copy install.sh from target folder to melissa-test.path2.response.com
  2. Run install.sh to install to ~/cass-spark/
# On melissa-test server
bash install.sh

Running Locally

Prerequisites:

  • Melissa Data installed at /opt/melissadata/
  • Valid Melissa Data license
  • Spark installed and configured
  • AWS credentials configured

Verify SQS access:

aws --region us-east-1 sqs list-queues
aws --region us-east-1 sqs receive-message --queue-url 'https://sqs.us-east-1.amazonaws.com/448838825215/p2r-cass-test-use1-toCass.fifo'

Run message watcher:

bash cass-spark/bin/watchForMessage.sh \
  --cass-cmd ~/cass-spark/bin/casslocal.sh \
  --out-folder fromCass \
  --in-folder toCass \
  --log-folder . \
  --queue-uri 'https://sqs.us-east-1.amazonaws.com/448838825215/p2r-cass-test-use1-toCass.fifo' \
  --run-once

Testing

# Run unit tests
mvn test

# Test address processing logic (CassLocalSpec)
# Tests cover:
# - Empty address handling
# - Simple address processing
# - Two-line address combination
# - DPV Y/N determination
# - Invalid character removal

Debugging

Set environment variable for verbose output:

export CASS_DEBUG=true

This enables detailed logging of:

  • Input address parsing
  • Melissa Data API calls
  • Result code interpretation
DocumentLocationDescription
CASS Processing (Data Tools)data-tools-overview.mdcassRunner service
coop-scala Extractcoop-scala-overview.mdNCOA address extraction
Path2Acquisition Flowpath2acquisition-flow.mdCampaign data flow
Glossaryglossary.mdCASS and postal terms

Key Integration Points

coop-scala

  • Uses spark-common and convert modules from coop-scala
  • Shares version numbering with coop-scala (currently v332.0.0 dependencies)
  • extract-ncoa module in coop-scala handles NCOA-specific extraction

Data Tools

  • cassRunner service triggers CASS processing via S3/SQS
  • Coordinates with convert workflow (post-Review Complete)
  • Receives CASS results for downstream processing

NCOA (National Change of Address)

CASS is a prerequisite for NCOA processing:

  1. Addresses must be CASS-certified before NCOA submission
  2. NCOA updates movers’ addresses using USPS change-of-address data
  3. coop-scala extract-ncoa module handles NCOA-specific workflows

Source: README.md, pom.xml, src/main/scala/, src/main/scripts/bin/

Repository: cass-spark (Bitbucket)

Documentation created: 2026-01-24