Amazon S3 Integration - Best Practices Guide

Overview

The Magnify Amazon S3 integration allows you to publish data files to your own S3 bucket, where Magnify will automatically ingest them on a recurring schedule. This guide will help you set up your S3 bucket, organize your data files, and follow best practices for reliable data ingestion.

Prerequisites
AWS Setup & Security
File Formats
Organizing Your S3 Bucket
File Naming Conventions
Configuring Streams
File Path Patterns
Testing Your Setup
Additional Resources

Prerequisites

Before setting up the S3 integration in Magnify, you'll need:

An AWS account with S3 access
An S3 bucket created in your AWS account
AWS Access Key ID and Secret Access Key with appropriate permissions
Data files ready to upload in a supported format

AWS Setup & Security

Creating IAM User and Access Keys

Create a dedicated IAM user for Magnify (recommended for security and auditing)
- Log into AWS Console → IAM → Users → Create User
- Name it something like magnify-s3-integration
- Select "Access key - Programmatic access"
Generate Access Keys
- After creating the user, generate an Access Key ID and Secret Access Key
- Important: Your Secret Access Key is only shown once. Copy and store it securely immediately.
- You'll enter these credentials when setting up the connection in Magnify

Required IAM Permissions

Attach the following IAM policy to your Magnify IAM user. This follows the principle of least privilege, granting only the permissions needed for Magnify to read your data:

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": ["s3:GetObject", "s3:ListBucket"],

"Resource": ["arn:aws:s3:::your-bucket-name", "arn:aws:s3:::your-bucket-name/*"]

}

]

}

Replace your-bucket-name with your actual S3 bucket name.

Important Notes:

Both bucket-level (s3:ListBucket) and object-level (s3:GetObject) permissions are required
Do not grant write, delete, or modification permissions to the Magnify IAM user
This read-only access ensures Magnify cannot modify or delete your data

Security Best Practices

✅ DO:

Use a dedicated IAM user specifically for Magnify
Store credentials securely (password manager, secrets management service)

❌ DON'T:

Use your root AWS account credentials
Grant broader permissions than necessary (s3:* or full admin access)

File Formats

Magnify supports the CSV file formats for S3 ingestion.

Organizing Your S3 Bucket

A well-organized S3 bucket structure makes data management easier and helps prevent ingestion errors.

Recommended Directory Structure

your-bucket-name/

├── magnify-ingest/

├── customers/

│ ├── customers_20250101.csv

│ ├── customers_20250102.csv

│ └── customers_20250103.csv

├── orders/

│ ├── orders_20250101.csv

│ ├── orders_20250102.csv

│ ├── orders_20250103.csv

└── events/

├── events_20250101.csv

Organization Best Practices

Use a dedicated root folder for Magnify data (e.g., magnify-ingest/)
Separate by data type: Create folders for each entity (customers, orders, events)
Keep consistent structure: Use the same pattern across all data types

File Naming Conventions

Consistent file naming makes troubleshooting easier and helps you track data over time.

Recommended Pattern

{entity_type}_{YYYYMMDD}_{optional_sequence}.csv

Naming Best Practices

✅ DO:

Include the date in every filename (YYYYMMDD format recommended)
Use lowercase letters for consistency
Use underscores (_) to separate components
Add sequence numbers for multiple files per day (_001, _002)
Use descriptive entity names

❌ DON'T:

Use spaces in filenames (customer data.csv)
Use special characters (orders#final!.csv)
Use inconsistent date formats (mix 20250105 and 01-05-2025)
Create ambiguous names (data.csv, file1.csv)

File Size Recommendations

Optimal: 50 MB - 500 MB per file
Maximum: 1 GB

Configuring Streams

In Magnify, a stream represents a logical grouping of files that share the same data structure (schema). When setting up your S3 connection in the Magnify UI, you'll configure one or more streams.

What is a Stream?

Each stream needs:

Name: A descriptive identifier (e.g., customers, orders, page_views)
File Pattern: A glob pattern that matches files belonging to this stream
Format: The file format

Stream Configuration Guidelines

One schema per stream: All files in a stream must have identical column structures
Separate different data types: Create separate streams for customers, orders, events, etc.
Use descriptive names: Choose names that clearly indicate what data the stream contains
At least one stream required: Every S3 connection must have at least one configured stream
Using the same Bucket All files should be placed in the same bucket as a connection only reads from 1 bucket.

File Path Patterns (Globs)

When configuring streams in Magnify, you'll specify file path patterns using glob syntax. Globs are wildcard patterns that match multiple files.

Example Configuration

When setting up in Magnify UI, you might configure:

Stream 1: Customers

Name: customers
File Pattern: magnify-ingest/customers/*.csv
Format: CSV

Stream 2: Orders

Name: orders
File Pattern: magnify-ingest/orders/*.csv
Format: CSV

Stream 3: Events

Name: page_views
File Pattern: magnify-ingest/events/*.csv
Format: CSV

Testing Your Setup

Before going live with your S3 integration, follow these steps to test:

1. Verify IAM Permissions

Test that your IAM user can access the bucket:

aws s3 ls s3://your-bucket-name/magnify-ingest/ --profile magnify-user

If this command succeeds, your permissions are correctly configured.

Additional Resources

AWS S3 Documentation: https://docs.aws.amazon.com/s3/
AWS IAM Best Practices: https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html
CSV Format Specification: https://tools.ietf.org/html/rfc4180
Glob Pattern Tester: https://globster.xyz/

Updated March 31, 2026 19:21

Search