This chapter will teach you everything about storing data in Azure, starting from absolute basics. We’ll explain what storage actually is, why different types exist, and how to choose and use them effectively.
Storage = Where you save your data permanentlyWhen you write a document, take a photo, or save data from an app, it needs to be stored SOMEWHERE. That “somewhere” is storage.Key Difference from Memory (RAM):
Memory (RAM): Temporary. Lost when computer turns off. Fast.
Storage (Disk): Permanent. Survives computer restart. Slower than RAM.
Analogy:
RAM = Your desk (work in progress, cleared at end of day)
Storage = Your filing cabinet (permanent records, survives overnight)
Every application needs to store data:Example 1: Blog Website
What needs storage:- Blog posts (text, titles, dates)- Images you upload- User comments- User profile pictures- Website code itselfWithout storage: Every time you restart the website, everything is GONE.
Example 2: E-Commerce Site
What needs storage:- Product catalog (names, prices, descriptions)- Product images- Customer orders- User accounts- Payment history- Inventory levelsWithout storage: You'd lose all customer orders every time server restarts!
Problems:1. Upfront cost ($500-2000 per drive)2. Limited capacity (once full, must buy more drives)3. No redundancy (drive fails = data lost)4. Your responsibility: - Physical security (theft, fire) - Backups (manual, time-consuming) - Hardware failures (drive breaks, you replace it)
Real Example:
Company stores customer data- Buy 10 hard drives ($10,000)- Store in office closet- Fire destroys building- ALL DATA LOST- Company goes bankruptThis happened thousands of times before cloud storage.
1. No upfront cost (pay per GB per month)2. Unlimited capacity (need more? Just use more)3. Built-in redundancy (Azure keeps 3+ copies automatically)4. Microsoft's responsibility: - Physical security (datacenters with guards) - Hardware maintenance (replace failed drives) - Automatic backups (built-in)5. Global availability (access from anywhere)
Cost Comparison:
Traditional (1 TB storage):- Buy 2 TB hard drive: $100- Backup drive: $100- Total upfront: $200- Risk: Still lose data if both drives failAzure (1 TB storage):- $18-50/month depending on tier- $216-600/year- Zero upfront cost- Microsoft keeps 3-6 copies automatically- Risk: Virtually zero (99.999999999% durability)For most businesses: Azure is cheaper AND safer
The Core Principle: “Different Data, Different Needs”
Analogy: Your Home StorageYou don’t store everything the same way at home:
Important documents → Fireproof safe
Books → Bookshelf
Clothes → Closet on hangers
Photos → Photo album or digital cloud
Food → Refrigerator or pantry
Why different storage? Because they have different:
Access patterns (how often you need them)
Size (books vs documents)
Value (important documents vs old magazines)
Azure Storage Works the Same Way:
Different Types of Data:1. Large Files (images, videos, backups) → Use: Blob Storage → Why: Optimized for large objects2. Shared Files (documents, configs) → Use: Azure Files → Why: Can mount like a network drive3. VM Hard Drives → Use: Managed Disks → Why: High performance, attached to VMs4. Application Messages (job queues) → Use: Queue Storage → Why: Reliable message passing5. Structured Data (user profiles, logs) → Use: Table Storage or Databases → Why: Query and search capabilities
Photo Sharing App Architecture:1. User Photos (original uploads) Storage: Blob Storage Why: Large files (images), need to serve billions Tier: Hot (frequently viewed) Cost: $18/month per TB2. Thumbnails (small preview images) Storage: Blob Storage Why: Millions of small files Tier: Hot (shown on every page) Cost: $18/month per TB3. Old Photos (not viewed in 6 months) Storage: Blob Storage Why: Same as original, but moved to cheaper tier Tier: Archive (rarely accessed) Cost: $1/month per TB (18x cheaper!)4. Application Code & Static Files Storage: Blob Storage Why: HTML, CSS, JavaScript files Tier: Hot Size: Usually <1 GB5. User Metadata (username, email, likes) Storage: Azure SQL Database or Cosmos DB Why: Need to query (find user by email) Note: NOT just storage, this is a database6. Background Job Queue (resize images) Storage: Queue Storage Why: Reliable message passing between servers Cost: $0.0004 per 10K operations (basically free)Total Storage Cost Breakdown:- 100 TB original photos: $1,800/month- 10 TB thumbnails: $180/month- 500 TB archived photos: $500/month- App code: $0.18/month (tiny)- Queue: ~$1/monthTotal: ~$2,481/month for massive scale app
Question 1: What kind of data?├─ Large files (images, videos, documents)│ └─> Blob Storage│├─ Need to mount as drive (like Z: or /mnt)│ └─> Azure Files│├─ VM hard drive│ └─> Managed Disks│├─ Application messages/jobs│ └─> Queue Storage│└─ Structured data with queries └─> Table Storage or DatabaseQuestion 2: How often accessed?├─ Constantly (website images)│ └─> Hot Tier ($18/TB/month)│├─ Sometimes (monthly reports)│ └─> Cool Tier ($10/TB/month)│└─ Rarely (old backups, compliance) └─> Archive Tier ($1/TB/month)Question 3: How important is the data?├─ Critical (lose it = business ends)│ └─> GRS or GZRS (6 copies, multiple regions)│├─ Important (lose it = bad, but recoverable)│ └─> ZRS (3 copies, 3 availability zones)│└─ Okay to lose (dev/test, temporary) └─> LRS (3 copies, same datacenter)
Before diving into specific services, let’s define essential terms:Blob (Binary Large Object)
Just a fancy name for “file.” Any file—image, video, document, zip file, anything—is a “blob” in Azure.Why the weird name? Historical computer science term. Just think “blob = file.”
Container
A folder that holds blobs. Like a folder on your computer.Example: Container named “profile-pictures” contains all user profile picture blobs.
Storage Account
The top-level resource that contains all your storage (blobs, files, queues, tables).Analogy: Like your “Documents” folder that contains many subfolders.
Access Tier
How “hot” or “cold” your data is (how often it’s accessed). Hotter = more expensive storage, cheaper access. Colder = cheaper storage, more expensive access.Analogy: Storing winter clothes in the attic (archive) vs. keeping everyday clothes in your closet (hot).
Replication
How many copies Azure keeps and where.LRS: 3 copies in one building
ZRS: 3 copies in 3 buildings (same city)
GRS: 3 copies here + 3 copies 1000+ miles away
Redundancy vs. Backup
Redundancy: Multiple copies to prevent hardware failure (automatic)
Backup: Point-in-time copies to prevent human error (you configure)Example: Delete a file by accident
Redundancy: Doesn’t help (all copies deleted)
Backup: Can restore from yesterday’s backup
[!WARNING]
Gotcha: Changing Access Tiers
Moving data from Hot to Cool is free, but moving data from Cool to Hot incurs an “Early Deletion” or “Retrieval” fee. Don’t use Archive tier for backups you might need to restore instantly — it can take up to 15 hours to “rehydrate” data from Archive at standard priority (0.02/GB),or1hourathighpriority(0.10/GB). A real-world example: a company archived their database backups to save money, then spent 12 hours waiting to restore from Archive during an outage, costing $180,000 in downtime. The fix is simple: keep your most recent backup in Cool tier (accessible in milliseconds) and only archive backups older than 30 days.
[!TIP]
Jargon Alert: ReplicationLRS (Locally Redundant): 3 copies in one building (Good enough for non-critical dev).
GRS (Geo-Redundant): 3 copies here + 3 copies in a different region (Essential for Disaster Recovery).
The CAP Theorem in Storage: Consistency vs. Availability
When choosing a replication strategy, you are making a fundamental architectural choice.
Local (LRS/ZRS): Provides CP (Consistency + Partition Tolerance). Because the 3 copies are written synchronously, you are guaranteed to read the latest data, but if all 3 zones go down, the storage is unavailable.
Global (GRS/GZRS): Provides AP (Availability + Partition Tolerance) across regions.
The primary region is updated synchronously (3 copies).
The secondary region (1000+ miles away) is updated asynchronously.
The Trade-off: In a “failover” scenario to the secondary region, you might lose the last few seconds/minutes of data. This is called RPO (Recovery Point Objective).
[!IMPORTANT]
Pro Tip: RA-GRS (Read-Access GRS)
Standard GRS is “Passive”—you can’t touch the secondary region unless a failover occurs. RA-GRS gives you a read-only endpoint in the secondary region at all times. Use this to handle traffic spikes by offloading read-requests to the other side of the world!
A “Stamp” is a cluster of roughly 10-20 racks of storage servers. Each rack has its own power and network.
When you create a storage account, it is assigned to a Stamp.
LRS (Local Replication) ensures your data is written to three different disks on three different racks within that single stamp. Even if a whole rack’s power supply fails, your data is safe.
Real-World Analogy: A Storage Stamp is like a bank vault with three separate lockboxes in three different rooms. When you deposit a document, the bank makes three copies and places each in a different room with independent locks, power, and climate control. If one room floods, your document survives in the other two. Upgrading to GRS is like having three copies in a second bank across town — even a city-wide disaster cannot destroy all copies.
Azure doesn’t just store files as names. It uses a Partition Key system.
Every blob belongs to a partition.
Azure’s Front-End Layer looks at the requested blob name, determines which Partition Server owns it, and routes the request there.
Pro Tip: If you name your blobs with a sequential prefix (like 2024-01-01-log1, 2024-01-01-log2), they might all end up on the same Partition Server, causing a “Hot Partition” bottleneck. Using a random prefix or hash helps distribute the load across the entire stamp.
The ACK: Your app only receives a “Success” message when the data is safely written to the physical disks of all 3 replicas. This is why Azure Storage is Strongly Consistent.
Use cases:- Documents (PDF, Word, Excel)- Media (images, audio, video)- Logs and telemetry- BackupsCharacteristics:- Up to 190.7 TB per blob- Composed of blocks (up to 4.75 TB each)- Can update individual blocks- Most common blob type (95% of use cases)
Use cases:- Azure VM disks (VHD files)- Database files- Random access scenariosCharacteristics:- Up to 8 TB per blob- Optimized for frequent read/write- 512-byte page alignment- Higher cost than block blobs
Most users never directly use page blobs (Azure manages them for VM disks).
Optimized for append operations
Use cases:- Logging and auditing- Time-series data- IoT telemetryCharacteristics:- Up to 195 GB per blob- Append-only (no update/delete of blocks)- Efficient for log aggregation
Automatically transition blobs between tiers to optimize costs. This is one of the highest-ROI configurations in all of Azure — a single JSON policy can save thousands of dollars per month by moving stale data to cheaper tiers without any application changes.Real-World Analogy: Think of lifecycle management like your closet strategy. Current season clothes stay in the closet (Hot). Last season’s clothes go to the attic (Cool). Clothes from 5 years ago go into long-term storage (Archive). Clothes you will never wear again get donated (Deleted). You do not manually move every item — you set rules and follow them on a schedule.
{ "rules": [ { // This rule automatically manages backup blobs through their lifecycle. // Without this policy, old backups sit in Hot tier forever, costing // $18/TB/month when they could be in Archive at $1/TB/month. "name": "MoveToArchive", "enabled": true, "type": "Lifecycle", "definition": { "filters": { "blobTypes": ["blockBlob"], // prefixMatch targets only the "backups/" folder -- you probably // do NOT want to archive your active website images or user uploads. // Always scope lifecycle rules to specific prefixes. "prefixMatch": ["backups/"] }, "actions": { "baseBlob": { // After 30 days: Move from Hot ($18/TB) to Cool ($10/TB). // Saves 44% on storage, but reads cost 25x more -- fine for backups // that are rarely accessed after the first month. "tierToCool": { "daysAfterModificationGreaterThan": 30 }, // After 90 days: Move from Cool ($10/TB) to Archive ($1/TB). // Archive is 94% cheaper but takes up to 15 hours to access. // Only use for data you can afford to wait hours to retrieve. "tierToArchive": { "daysAfterModificationGreaterThan": 90 }, // After 365 days: Delete entirely. // Set this based on your compliance requirements -- some industries // (healthcare, finance) require 7-year retention. "delete": { "daysAfterModificationGreaterThan": 365 } } } } } ]}
Cost Impact Example:
Without lifecycle management (1 year of daily backups, 10 GB each):365 backups x 10 GB = 3.65 TB in Hot tier = $65.70/monthWith lifecycle management:30 backups in Hot (0.3 TB) = $5.40/month60 backups in Cool (0.6 TB) = $6.00/month275 backups in Archive (2.75 TB) = $2.72/monthTotal: $14.12/month (78% savings!)Annual savings: ~$618
Common Pitfall: Setting Archive tier on data you need to access quickly. Rehydrating data from Archive takes up to 15 hours (standard priority) and costs 0.02/GB.Ifyouaccidentallyarchive1TBofdatayouneedtomorrow,thatisa20 rehydration fee plus a day of waiting. Use Cool tier for data you might need within hours.
# Enable versioningaz storage account blob-service-properties update \ --account-name mystorageaccount \ --enable-versioning true# Every modification creates new version# Old versions accessible by version ID# Protects against accidental overwrites
Use case: Track document changes, audit trail
Soft Delete
Recoverable deletion
# Enable soft delete (7 days)az storage account blob-service-properties update \ --account-name mystorageaccount \ --enable-delete-retention true \ --delete-retention-days 7# Deleted blobs retained for 7 days# Can undelete within retention period
# Use managed identity or user identity# RBAC roles: Storage Blob Data Reader/Contributoraz role assignment create \ --role "Storage Blob Data Contributor" \ --assignee <identity-id> \ --scope /subscriptions/.../storageAccounts/mystorageaccount
Encryption at Rest:
Default: Microsoft-managed keys (automatic) ✅ No configuration needed ✅ Free ❌ Microsoft controls keysCustomer-managed keys (CMK): ✅ You control key rotation ✅ Audit key access ⚠️ Requires Azure Key Vault
Performance: Up to 60 MB/s throughput, 1,000 IOPSPricing: $0.06/GB/monthMinimum: NoneUse for: General-purpose file shares, dev/test
Performance: Up to 10,000 IOPS, 200 MB/s per TB provisionedPricing: $0.20/GB/month (provisioned, not consumed)Minimum: 100 GBUse for: High-performance, low-latency workloads
# Get storage account key$storageKey = (az storage account keys list ` --account-name mystorageaccount ` --query "[0].value" -o tsv)# Mount file sharenet use Z: \\mystorageaccount.file.core.windows.net\myshare ` /user:AZURE\mystorageaccount $storageKey# Persistent mount (survives reboot)cmdkey /add:mystorageaccount.file.core.windows.net ` /user:AZURE\mystorageaccount /pass:$storageKey# Add to Windows startup
# Install cifs-utilssudo apt-get install cifs-utils# Create mount pointsudo mkdir /mnt/myshare# Create credentials filesudo bash -c 'echo "username=mystorageaccount" >> /etc/smbcredentials'sudo bash -c 'echo "password=STORAGE_KEY" >> /etc/smbcredentials'sudo chmod 600 /etc/smbcredentials# Mount sharesudo mount -t cifs //mystorageaccount.file.core.windows.net/myshare /mnt/myshare -o credentials=/etc/smbcredentials,dir_mode=0777,file_mode=0777# Add to /etc/fstab for persistent mount//mystorageaccount.file.core.windows.net/myshare /mnt/myshare cifs credentials=/etc/smbcredentials,dir_mode=0777,file_mode=0777 0 0
❌ Bad (hotspot on same partition):/images/2024/01/01/image1.jpg/images/2024/01/01/image2.jpg/images/2024/01/01/image3.jpg✅ Good (distributed across partitions):/ab/images/2024/01/01/image1.jpg/cd/images/2024/01/01/image2.jpg/ef/images/2024/01/01/image3.jpgUse first 2 characters of hash as prefix
3. Parallel Uploads
from azure.storage.blob import BlobServiceClientblob_service = BlobServiceClient(account_url="...", credential="...")blob_client = blob_service.get_blob_client("mycontainer", "largefile.zip")# Upload in parallel (default: 4 MB chunks)with open("largefile.zip", "rb") as data: blob_client.upload_blob( data, max_concurrency=10, # 10 parallel uploads overwrite=True )# 10x faster for large files
4. Use Appropriate Tier
Hot Tier: Frequently accessed (< 30 days)Cool Tier: Infrequently accessed (30-90 days)Archive Tier: Rarely accessed (180+ days)Example:- Website images: Hot- 60-day backup: Cool- 1-year compliance archive: ArchiveSavings: Up to 98% for Archive vs Hot
Backup Strategy for 100 GB database:1. Daily Backups (7 days): - Store in Cool tier - Cost: 7 × 100 GB × $0.01 = $7/month2. Weekly Backups (4 weeks): - Store in Cool tier - Cost: 4 × 100 GB × $0.01 = $4/month3. Monthly Backups (12 months): - Store in Archive tier - Cost: 12 × 100 GB × $0.001 = $1.20/monthTotal: $12.20/month (vs $43 if all Hot)Lifecycle Policy:- Daily: Cool immediately- After 30 days: Move to Archive- After 365 days: DeleteIncremental Backups:- Only changed data backed up daily- Reduces storage by 80-90%- Final cost: ~$2-3/month
Q4: Optimize storage for a media streaming app
Architecture:1. Video Storage: - Original files: Archive tier (rarely accessed) - Transcoded versions: Hot tier (frequently streamed) - Use CDN (Azure Front Door) for global distribution2. Thumbnail Storage: - Hot tier (displayed on every page) - Small files, high access frequency - CDN caching (90%+ cache hit rate)3. User Uploads: - Start in Hot tier (recently uploaded, often viewed) - After 30 days: Move to Cool (older content) - After 180 days: Move to Archive (historical)4. CDN Configuration: - Cache thumbnails: 7 days - Cache videos: 1 day (balance freshness and cost) - Purge cache on content update5. Performance: - Parallel video chunk uploads (10 concurrent) - Adaptive bitrate streaming (HLS/DASH) - Geo-replication (LRS → GRS for critical content)Cost Savings:- Without optimization: $5,000/month- With optimization: $1,200/month (76% reduction)Optimizations:- Lifecycle policies: $2,000 savings- CDN (reduced blob reads): $1,500 savings- Incremental backups: $300 savings
Q5: Implement global data replication with conflict resolution
Global Replication Strategy:1. Use GRS (Geo-Redundant Storage): - Primary: East US - Secondary: West US (read-only) - Automatic failover on regional outage2. Or use multi-region architecture: - Storage accounts in each region - Azure Traffic Manager routes to closest - Application-level replication3. Conflict Resolution: Option A: Last Write Wins (LWW) - Use blob versioning - Latest timestamp wins - Simple, may lose data Option B: Application-Level Merge - Store conflict versions separately - Manual resolution - Complex, but no data loss4. Implementation: ```python from azure.storage.blob import BlobServiceClient # Primary storage primary = BlobServiceClient(account_url="https://storage-eastus...") secondary = BlobServiceClient(account_url="https://storage-westus...") def upload_with_replication(blob_name, data): # Upload to primary primary_blob = primary.get_blob_client("mycontainer", blob_name) primary_blob.upload_blob(data, overwrite=True) # Replicate to secondary secondary_blob = secondary.get_blob_client("mycontainer", blob_name) secondary_blob.upload_blob(data, overwrite=True) # Or use change feed for async replication ```5. Monitoring: - Alert on replication lag > 15 minutes - Monitor blob change feed - Test failover monthly
Q6: Secure storage with zero-trust architecture
Zero-Trust Storage Security:1. Network Isolation: - Disable public access entirely - Use Private Endpoints only - Traffic never leaves Azure backbone2. Identity-Based Access: - No shared keys (disable storage account keys) - Azure AD authentication only - RBAC with least privilege - Managed identities for applications3. Encryption: - Customer-managed keys (Azure Key Vault) - Key rotation every 90 days - Separate keys per container4. Data Protection: - Blob versioning enabled - Soft delete (30 days) - Immutable storage (WORM compliance) - Legal hold for litigation5. Monitoring: - Storage Analytics logs → Log Analytics - Alert on: - Anonymous access attempts - Failed authentication - Key access from Key Vault - Blob deletion6. Implementation: ```bash # 1. Disable public access az storage account update \ --name mystorageaccount \ --public-network-access Disabled # 2. Create private endpoint az network private-endpoint create \ --name pe-storage \ --vnet-name vnet-prod \ --subnet snet-data \ --private-connection-resource-id /subscriptions/.../storageAccounts/mystorageaccount \ --group-ids blob # 3. Disable shared key access az storage account update \ --name mystorageaccount \ --allow-shared-key-access false # 4. Enable customer-managed keys az storage account update \ --name mystorageaccount \ --encryption-key-source Microsoft.Keyvault \ --encryption-key-vault https://myvault.vault.azure.net \ --encryption-key-name storage-key # 5. Enable versioning and soft delete az storage account blob-service-properties update \ --account-name mystorageaccount \ --enable-versioning true \ --enable-delete-retention true \ --delete-retention-days 30 # 6. Configure immutable storage (WORM) az storage container immutability-policy create \ --account-name mystorageaccount \ --container-name compliance \ --period 2555 # 7 years in days ```
This is the #1 support ticket. If your app can’t access a blob:
Client IP: Is your Storage Account Firewall blocking the code’s IP? Check if you have a Private Endpoint but the code is trying to use the Public Endpoint.
SAS Token Expiry: If using SAS tokens, check the clock! Is the system time on your server out of sync with Azure?
RBAC Propagation: Did you just grant the “Storage Blob Data Contributor” role? RBAC changes can take up to 10 minutes to propagate.
Storage Limit: Standard accounts have a limit of 5 PB. If you hit this, you need a second account.
Egress Limits: Standard accounts are limited to roughly 50 Gbps of outbound traffic. If you are serving massive videos to millions of users, you must use a Content Delivery Network (CDN) to offload the traffic.
[!TIP]
Pro Tool: Storage Explorer
Don’t rely solely on the Azure Portal. Use Azure Storage Explorer (desktop app). It provides much better visibility into hidden metadata, lease statuses, and large-scale migrations.
Your company stores 500 TB in Blob Storage Hot tier at $9,000/month. The CFO wants to cut costs by 60%. Design a lifecycle management strategy.
Strong Candidate Answer:
Analyze access patterns first: Enable Storage Analytics metrics. In my experience, 70-80% of stored data has not been accessed in 90+ days.
Lifecycle management policy: Move blobs to Cool after 30 days of no access (5/TBvs18/TB), Archive after 90 days (1/TB).For500TBtypicalsplit:100TBHot(1,800), 200 TB Cool (1,000),200TBArchive(200) = $3,000/month — 67% reduction.
Archive tier gotcha: Rehydration takes 1-15 hours. If compliance requires 4-hour access, use Cool instead. Archive also has a 180-day early deletion penalty.
Hidden cost: blob versioning. Teams enable versioning but forget every version consumes storage at the current tier. I have seen 500 TB of real data with 1,500 TB of versions at Hot pricing. Delete old versions after 30 days.
Follow-up: Legal says some data must be retained 7 years and be immutable. How does this affect your strategy?Use immutable blob storage with time-based retention (7 years). Immutable blobs can still change access tiers, so move them to Archive on day 1. Cost for 200 TB compliance data: 200/monthfor7years=16,800 total, versus $25,200/year in Hot tier.
Explain LRS, ZRS, GRS, and RA-GRS. A startup asks which replication to use for their only database backup.
Strong Candidate Answer:
LRS: 3 copies in one datacenter. Does NOT survive datacenter-level disasters.
ZRS: 3 copies across 3 Availability Zones. Survives datacenter failure, not regional disasters.
GRS: 3 local copies + 3 copies in paired region 300+ miles away. Secondary is NOT readable until Microsoft initiates failover.
RA-GRS: Same as GRS but secondary is always readable via -secondary endpoint.
For the startup’s only backup: GRS minimum. If the backup is LRS and the datacenter fails catastrophically, both database and backup are lost. Cost difference for 100 GB: 0.80/month(GRS2/month vs LRS 1.20/month).Losingyourentiredatabasetosave0.80/month is indefensible.
Async replication caveat: GRS has ~15 minute RPO. Combine with Azure SQL geo-replication (5-second RPO) for the live database.
Follow-up: They back up nightly. If the primary region fails at 3 PM, they lose the day’s data regardless of replication. Acceptable?No. Replication protects against infrastructure failure, not data loss between backups. Use Azure SQL auto-failover groups (RPO ~5 seconds) for the live database, plus nightly GRS backups for logical corruption protection (accidental table drops). These solve different problems and both are needed.
A developer uploads 10 million small files (1 KB each) to Blob Storage and it takes 8 hours. Total data is 10 GB. Why is it slow?
Strong Candidate Answer:
The bottleneck is per-request overhead, not bandwidth. Each upload is a REST API call: TLS handshake, HTTP headers, auth validation, partition lookup, 3-replica write. At 50ms per request sequentially, 10M files = 138 hours. The developer has ~17x concurrency to get 8 hours.
Fix 1 — Increase parallelism to 200-500 concurrent connections. Storage accounts support 20,000 requests/second.
Fix 2 — Use AzCopy. Optimized for bulk transfers with automatic parallelism and retry. Typically completes this in 1-2 hours.
Fix 3 — Partition key distribution. If blobs share a prefix (“data/2024/01/…”), they create a partition hotspot. Prefix with random hash to distribute across partitions.
Fix 4 — For initial bulk loads of 100M+ files, use Azure Data Box (physical device shipped to Azure). Faster than any network upload.
Follow-up: After optimization, you get 503 Server Busy errors at peak concurrency. What is happening?You are hitting the storage account’s ~20,000 requests/second limit. Combined with application traffic on the same account, the total exceeds the ceiling. Fix: use a separate storage account for bulk uploads, or implement exponential backoff with jitter. For sustained high throughput, use Premium Block Blob storage accounts.