cass-spark Overview
CASS (Coding Accuracy Support System) address validation service using Apache Spark, integrated with Melissa Data for address standardization and AWS SQS for message-driven processing.
Purpose
cass-spark processes addresses through CASS validation using Melissa Data’s address object library. It standardizes and validates US postal addresses, adding critical postal data elements required for direct mail campaigns:
- Address Standardization - Corrects and formats addresses to USPS standards
- DPV (Delivery Point Validation) - Confirms address deliverability
- ZIP+4 Assignment - Adds 4-digit ZIP extensions for postal discounts
- Carrier Route - Assigns postal carrier routes for mail sorting
- FIPS Codes - Adds county FIPS codes for geographic analysis
Architecture
Directory Structure
cass-spark/
├── pom.xml # Maven build configuration (v335.0.0-SNAPSHOT)
├── build.sh # Assembles runnable distribution
├── README.md # Project documentation
└── src/
├── main/
│ ├── scala/com/path2response/coop/
│ │ ├── cli/
│ │ │ ├── CassLocal.scala # Core CASS processing with Spark
│ │ │ ├── WatchForMessage.scala # SQS message listener/processor
│ │ │ ├── CheckSQSMessages.scala # SQS message checker utility
│ │ │ └── ListSQSQueues.scala # SQS queue listing utility
│ │ └── util/
│ │ ├── SQSUtil.scala # SQS helper functions
│ │ └── SQSMessage.scala # SQS message data models
│ └── scripts/bin/
│ ├── casslocal.sh # Spark submit wrapper for CASS
│ ├── watchForMessage.sh # Message watcher launcher
│ ├── checkSQSMessage.sh # Message check utility
│ ├── listSQSQueue.sh # Queue listing utility
│ └── install.sh # Installation script
└── test/scala/com/path2response/coop/cli/
└── CassLocalSpec.scala # Unit tests for address processing
Technology Stack
| Component | Version | Notes |
|---|---|---|
| Scala | 2.12.18 | Locked to Spark version |
| Spark | 3.5.2 | Distributed processing engine |
| Java | 1.8 | Compilation target |
| AWS SDK | 2.34.8 | SQS integration |
| Jackson | 2.15.2 | JSON serialization |
| Melissa Data mdAddr | 0.0.1 | CASS processing via JNI |
| ScalaTest | (via spark-testing-base) | Testing framework |
Dependencies
Internal (Path2Response):
spark-common(coop-scala v332.0.0) - Shared Spark utilitiesconvert(coop-scala v332.0.0) - Data conversion utilities
External:
- Melissa Data
mdaddr- CASS address object (JNI wrapper) - AWS SDK v2 for SQS operations
Core Functionality
Command Line Tools
| Tool | Purpose | Script |
|---|---|---|
| casslocal | Process addresses through CASS using Spark | casslocal.sh |
| watchForMessage | Monitor SQS queue and trigger CASS runs | watchForMessage.sh |
| checkSQSMessage | Check for pending messages on a queue | checkSQSMessage.sh |
| listSQSQueues | List available SQS queues | listSQSQueue.sh |
CASS Processing (CassLocal)
The ProcessCassLocal Spark application:
- Input: JSON-formatted address file from S3 (AddressRequest format)
- Processing:
- Loads addresses and sorts by ZIP code for cache efficiency
- Initializes Melissa Data address object per partition (JNI)
- Validates each address through CASS
- Returns standardized address with postal codes
- Output: JSON-formatted results with CASS data (MDResponse format)
Address Input Format (AddressRequest):
{
"id": "unique-identifier",
"address1": "123 MAIN STREET",
"address2": "APT 2",
"city": "DENVER",
"state": "CO",
"zip": "80202"
}
CASS Output Format (MDResponse):
{
"id": "unique-identifier",
"cass": {
"address1": "123 MAIN ST",
"address2": "APT 2",
"city": "DENVER",
"state": "CO",
"zip": "80202",
"plus4": "1234",
"dpc": "01",
"dpcd": "2",
"dpv": "Y",
"countryFips": "08031",
"carrierRoute": "C001",
"dpvFootNote": "AABB",
"zipType": " ",
"recordType": "S"
}
}
Key CASS Fields:
| Field | Description |
|---|---|
dpv | Delivery Point Validation (Y/N) - AS01 result code means deliverable |
plus4 | ZIP+4 extension for postal discounts |
dpc | Delivery Point Code (barcode component) |
dpcd | Delivery Point Check Digit |
carrierRoute | Postal carrier route code |
countryFips | County FIPS code |
recordType | Address type code (S=Street, H=Highrise, etc.) |
zipType | ZIP code type (P=PO Box, U=Unique, M=Military) |
SQS Message Processing (WatchForMessage)
The message watcher provides automated CASS processing:
- Monitoring: Polls SQS FIFO queue for S3 event notifications
- Triggering: When file appears in
toCass/folder, triggers CASS processing - Processing: Runs
casslocal.shagainst the input file - Cleanup:
- Moves results to
fromCass/folder - Removes processed input file
- Deletes SQS message from queue
- Moves results to
Configuration Options:
| Option | Default | Description |
|---|---|---|
--queue-uri | (required) | SQS FIFO queue URL |
--cass-cmd | casslocal.sh | CASS processing command |
--in-folder | toCass | Input folder prefix |
--out-folder | fromCass | Output folder prefix |
--log-folder | . | Log file directory |
--wait-time | 20 | SQS poll interval (0-20 seconds) |
--run-once | false | Exit after one message |
--dry-run | false | Check messages without processing |
Melissa Data Integration
JNI Configuration
The Melissa Data address object is accessed via JNI (Java Native Interface):
Data Files Location: /opt/melissadata/data/
Required files:
mdAddr.dat- Main address datamdAddr.lic- License filemdAddr.nat- National datamdAddr.str- Street data
CASS Add-ons (for highest validation level):
- DPV (Delivery Point Validation)
- LACSLink (Locatable Address Conversion System)
- SuiteLink (Suite/apartment number validation)
Spark Configuration:
--conf 'spark.driver.extraLibraryPath=/opt/melissadata/AddrObj'
--conf 'spark.executor.extraLibraryPath=/opt/melissadata/AddrObj'
License Management
The Melissa Data license can be set via:
- Environment variable:
MDADDR_LICENSE - Programmatic:
ao.SetLicenseString("LICENSE_KEY")
License information logged at startup:
- License expiration date
- Build number
- Database date
- Database expiration date
Data Flow
CASS Processing Pipeline
┌──────────────────┐
│ Data Tools │ (cassRunner service monitors for CASS-ready files)
│ cassRunner │
└────────┬─────────┘
│ Upload to S3
▼
┌──────────────────┐ SQS Notification ┌──────────────────┐
│ S3 Bucket │ ───────────────────────│ SQS FIFO │
│ toCass/ │ │ Queue │
└────────┬─────────┘ └────────┬─────────┘
│ │
│ │ Poll
│ ▼
│ ┌──────────────────┐
│ │ watchForMessage │
│ │ (listener) │
│ └────────┬─────────┘
│ │
│ ┌───────────────────────────┘
│ │ Trigger
▼ ▼
┌──────────────────────────────────────────┐
│ casslocal │
│ (Spark job on melissa-test server) │
│ │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ Spark │───▶│ Melissa Data │ │
│ │ Partitions │ │ mdAddr (JNI) │ │
│ └─────────────┘ └─────────────────┘ │
└────────────────────────┬─────────────────┘
│
▼ Results
┌──────────────────┐
│ S3 Bucket │
│ fromCass/ │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Data Tools │ (processes CASS results)
└──────────────────┘
Integration with Data Tools
- Data Tools cassRunner monitors files that have completed convert and been marked “Review Complete”
- Files are uploaded to S3
toCass/folder - S3 event triggers SQS notification
- cass-spark watchForMessage detects the notification
- CASS processing runs via Spark
- Results written to
fromCass/folder - Data Tools receives CASS-enriched data for downstream processing
AWS Infrastructure
S3 Buckets
| Bucket | Purpose |
|---|---|
p2r-cass-test-use1 | Test environment bucket |
toCass/ | Input folder for CASS requests |
fromCass/ | Output folder for CASS results |
SQS Queues
| Queue | Type | Purpose |
|---|---|---|
p2r-cass-test-use1-toCass.fifo | FIFO | Input notifications |
p2r-cass-test-use1-fromCass.fifo | FIFO | Output notifications |
Region: us-east-1
Server
| Host | Purpose |
|---|---|
melissa-test.path2.response.com | CASS processing server with Melissa Data installation |
Development
Building
# Full build (includes mvn install which triggers build.sh)
mvn clean install
# Output: target/cass-spark.tar.gz
The build process:
- Maven compiles Scala code
- Creates shaded JAR with dependencies
build.shassembles distribution package:- Copies dependencies to
lib/ - Copies shell scripts to
bin/ - Creates tarball for deployment
- Copies dependencies to
Installation
- Copy
install.shfrom target folder tomelissa-test.path2.response.com - Run
install.shto install to~/cass-spark/
# On melissa-test server
bash install.sh
Running Locally
Prerequisites:
- Melissa Data installed at
/opt/melissadata/ - Valid Melissa Data license
- Spark installed and configured
- AWS credentials configured
Verify SQS access:
aws --region us-east-1 sqs list-queues
aws --region us-east-1 sqs receive-message --queue-url 'https://sqs.us-east-1.amazonaws.com/448838825215/p2r-cass-test-use1-toCass.fifo'
Run message watcher:
bash cass-spark/bin/watchForMessage.sh \
--cass-cmd ~/cass-spark/bin/casslocal.sh \
--out-folder fromCass \
--in-folder toCass \
--log-folder . \
--queue-uri 'https://sqs.us-east-1.amazonaws.com/448838825215/p2r-cass-test-use1-toCass.fifo' \
--run-once
Testing
# Run unit tests
mvn test
# Test address processing logic (CassLocalSpec)
# Tests cover:
# - Empty address handling
# - Simple address processing
# - Two-line address combination
# - DPV Y/N determination
# - Invalid character removal
Debugging
Set environment variable for verbose output:
export CASS_DEBUG=true
This enables detailed logging of:
- Input address parsing
- Melissa Data API calls
- Result code interpretation
Related Documentation
| Document | Location | Description |
|---|---|---|
| CASS Processing (Data Tools) | data-tools-overview.md | cassRunner service |
| coop-scala Extract | coop-scala-overview.md | NCOA address extraction |
| Path2Acquisition Flow | path2acquisition-flow.md | Campaign data flow |
| Glossary | glossary.md | CASS and postal terms |
Key Integration Points
coop-scala
- Uses
spark-commonandconvertmodules from coop-scala - Shares version numbering with coop-scala (currently v332.0.0 dependencies)
extract-ncoamodule in coop-scala handles NCOA-specific extraction
Data Tools
cassRunnerservice triggers CASS processing via S3/SQS- Coordinates with convert workflow (post-Review Complete)
- Receives CASS results for downstream processing
NCOA (National Change of Address)
CASS is a prerequisite for NCOA processing:
- Addresses must be CASS-certified before NCOA submission
- NCOA updates movers’ addresses using USPS change-of-address data
- coop-scala
extract-ncoamodule handles NCOA-specific workflows
Source: README.md, pom.xml, src/main/scala/, src/main/scripts/bin/
Repository: cass-spark (Bitbucket)
Documentation created: 2026-01-24