Overview
The Magnify Amazon S3 integration allows you to publish data files to your own S3 bucket, where Magnify will automatically ingest them on a recurring schedule. This guide will help you set up your S3 bucket, organize your data files, and follow best practices for reliable data ingestion.
Table of Contents
- Prerequisites
- AWS Setup & Security
- File Formats
- Organizing Your S3 Bucket
- File Naming Conventions
- Configuring Streams
- File Path Patterns
- Testing Your Setup
- Additional Resources
Prerequisites
Before setting up the S3 integration in Magnify, you'll need:
- An AWS account with S3 access
- An S3 bucket created in your AWS account
- AWS Access Key ID and Secret Access Key with appropriate permissions
- Data files ready to upload in a supported format
AWS Setup & Security
Creating IAM User and Access Keys
-
Create a dedicated IAM user for Magnify (recommended for security and auditing)
- Log into AWS Console → IAM → Users → Create User
- Name it something like magnify-s3-integration
- Select "Access key - Programmatic access"
-
Generate Access Keys
- After creating the user, generate an Access Key ID and Secret Access Key
- Important: Your Secret Access Key is only shown once. Copy and store it securely immediately.
- You'll enter these credentials when setting up the connection in Magnify
Required IAM Permissions
Attach the following IAM policy to your Magnify IAM user. This follows the principle of least privilege, granting only the permissions needed for Magnify to read your data:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": ["arn:aws:s3:::your-bucket-name", "arn:aws:s3:::your-bucket-name/*"]
}
]
}
Replace your-bucket-name with your actual S3 bucket name.
Important Notes:
- Both bucket-level (s3:ListBucket) and object-level (s3:GetObject) permissions are required
- Do not grant write, delete, or modification permissions to the Magnify IAM user
- This read-only access ensures Magnify cannot modify or delete your data
Security Best Practices
✅ DO:
- Use a dedicated IAM user specifically for Magnify
- Store credentials securely (password manager, secrets management service)
❌ DON'T:
- Use your root AWS account credentials
- Grant broader permissions than necessary (s3:* or full admin access)
File Formats
Magnify supports the CSV file formats for S3 ingestion.
Organizing Your S3 Bucket
A well-organized S3 bucket structure makes data management easier and helps prevent ingestion errors.
Recommended Directory Structure
your-bucket-name/
├── magnify-ingest/
├── customers/
│ ├── customers_20250101.csv
│ ├── customers_20250102.csv
│ └── customers_20250103.csv
├── orders/
│ ├── orders_20250101.csv
│ ├── orders_20250102.csv
│ ├── orders_20250103.csv
└── events/
├── events_20250101.csv
├── events_20250101.csv
├── events_20250101.csv
Organization Best Practices
- Use a dedicated root folder for Magnify data (e.g., magnify-ingest/)
- Separate by data type: Create folders for each entity (customers, orders, events)
- Keep consistent structure: Use the same pattern across all data types
File Naming Conventions
Consistent file naming makes troubleshooting easier and helps you track data over time.
Recommended Pattern
{entity_type}_{YYYYMMDD}_{optional_sequence}.csv
Naming Best Practices
✅ DO:
- Include the date in every filename (YYYYMMDD format recommended)
- Use lowercase letters for consistency
- Use underscores (_) to separate components
- Add sequence numbers for multiple files per day (_001, _002)
- Use descriptive entity names
❌ DON'T:
- Use spaces in filenames (customer data.csv)
- Use special characters (orders#final!.csv)
- Use inconsistent date formats (mix 20250105 and 01-05-2025)
- Create ambiguous names (data.csv, file1.csv)
File Size Recommendations
- Optimal: 50 MB - 500 MB per file
- Maximum: 1 GB
Configuring Streams
In Magnify, a stream represents a logical grouping of files that share the same data structure (schema). When setting up your S3 connection in the Magnify UI, you'll configure one or more streams.
What is a Stream?
Each stream needs:
- Name: A descriptive identifier (e.g., customers, orders, page_views)
- File Pattern: A glob pattern that matches files belonging to this stream
- Format: The file format
Stream Configuration Guidelines
- One schema per stream: All files in a stream must have identical column structures
- Separate different data types: Create separate streams for customers, orders, events, etc.
- Use descriptive names: Choose names that clearly indicate what data the stream contains
- At least one stream required: Every S3 connection must have at least one configured stream
- Using the same Bucket All files should be placed in the same bucket as a connection only reads from 1 bucket.
File Path Patterns (Globs)
When configuring streams in Magnify, you'll specify file path patterns using glob syntax. Globs are wildcard patterns that match multiple files.
Example Configuration
When setting up in Magnify UI, you might configure:
Stream 1: Customers
- Name: customers
- File Pattern: magnify-ingest/customers/*.csv
- Format: CSV
Stream 2: Orders
- Name: orders
- File Pattern: magnify-ingest/orders/*.csv
- Format: CSV
Stream 3: Events
- Name: page_views
- File Pattern: magnify-ingest/events/*.csv
- Format: CSV
Testing Your Setup
Before going live with your S3 integration, follow these steps to test:
1. Verify IAM Permissions
Test that your IAM user can access the bucket:
aws s3 ls s3://your-bucket-name/magnify-ingest/ --profile magnify-user
If this command succeeds, your permissions are correctly configured.
Additional Resources
- AWS S3 Documentation: https://docs.aws.amazon.com/s3/
- AWS IAM Best Practices: https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html
- CSV Format Specification: https://tools.ietf.org/html/rfc4180
- Glob Pattern Tester: https://globster.xyz/
Updated