๐ File Storage โ NFS, SMB, EFS, Azure Files & Shared File Systems
File storage is one of the three fundamental storage paradigms in computing, alongside block storage and object storage. It organizes data in a hierarchical directory structure โ the familiar tree of folders and files that every developer interacts with daily. Unlike block storage, which deals with raw disk volumes, or object storage, which uses flat key-value namespaces, file storage presents data through a POSIX-compatible or SMB-compatible file system interface that applications can read, write, and traverse using standard OS-level calls.
In distributed systems, file storage becomes especially important when multiple compute nodes need shared access to the same set of files simultaneously. This guide covers everything you need to know: protocols, cloud services, performance characteristics, and when to pick file storage over the alternatives.
๐งฉ What Is File Storage?
File storage systems manage data as a hierarchy of directories (folders) and files. Each file has metadata โ a name, permissions, timestamps, and a path that uniquely identifies it within the tree. The file system itself handles the translation between this logical hierarchy and the physical blocks on disk.
Key characteristics of file storage include:
- Hierarchical namespace: Data is organized into nested directories, making it intuitive for humans and applications.
- File-level access: Operations happen at the file granularity โ open, read, write, close, rename, delete.
- Locking and concurrency: File systems support advisory or mandatory locks, enabling safe concurrent access.
- Metadata-rich: Each file carries ownership, permissions, timestamps (
ctime,mtime,atime), and extended attributes. - Mountable: Network file systems can be mounted on client machines and accessed as if they were local directories.
This model is a natural fit for workloads like content management systems, shared home directories, media processing pipelines, and legacy application migration to the cloud.
โ๏ธ NFS vs SMB/CIFS โ Protocol Comparison
Two dominant protocols power networked file storage: NFS (Network File System) and SMB/CIFS (Server Message Block / Common Internet File System). Your choice depends on the OS ecosystem, performance requirements, and feature needs.
| Feature | NFS (v3 / v4 / v4.1) | SMB / CIFS (SMB 2.x / 3.x) |
|---|---|---|
| Origin | Sun Microsystems (1984) | IBM / Microsoft (1983+) |
| Primary OS | Linux / Unix / macOS | Windows (also Linux via Samba) |
| Transport | TCP (v4+), UDP (v3) | TCP (port 445) |
| Authentication | Kerberos, AUTH_SYS (UID/GID) | NTLM, Kerberos, Active Directory |
| Encryption | Optional (Kerberos-based in v4) | Built-in (SMB 3.0+ AES-128/256) |
| Locking | Advisory (v3), mandatory (v4) | Opportunistic locks (oplocks) |
| Performance | Lower overhead, higher throughput for Linux | Optimized for Windows; multichannel in SMB 3.0 |
| Statefulness | Stateless (v3), stateful (v4) | Stateful |
| Best For | Linux container workloads, HPC, CI/CD | Windows enterprise, Active Directory environments |
For microservices architectures running on Linux, NFS v4.1 with pNFS (parallel NFS) extensions provides excellent performance. For Windows-centric enterprises, SMB 3.x with multichannel and encryption is the clear choice.
โ๏ธ Cloud File Storage Services
Every major cloud provider offers a managed file storage service that abstracts away the underlying infrastructure, giving you an elastic, highly available shared file system.
AWS Elastic File System (EFS)
Amazon EFS is a fully managed, serverless NFS file system. It automatically grows and shrinks as you add and remove files, with no provisioning required. EFS supports NFSv4.1 and integrates natively with EC2, ECS, EKS, and Lambda.
- Storage classes: Standard, Infrequent Access (IA), Archive โ with lifecycle policies to auto-tier cold data.
- Throughput modes: Bursting (scales with size), Provisioned (fixed), and Elastic (auto-scales with workload).
- Replication: Cross-region replication for disaster recovery.
Azure Files
Azure Files provides fully managed file shares accessible via SMB 3.x, NFS 4.1, and the Azure Files REST API. It integrates with Azure Active Directory for identity-based access control.
- Tiers: Premium (SSD-backed), Transaction Optimized, Hot, Cool.
- Azure File Sync: Caches Azure file shares on on-premises Windows Servers for hybrid scenarios.
- Snapshots: Share-level snapshots for point-in-time recovery.
Google Cloud Filestore
Filestore offers managed NFS file servers for GCP workloads. It provides predictable performance with dedicated storage instances.
- Tiers: Basic HDD, Basic SSD, High Scale SSD, Enterprise.
- Integration: Works with GKE, Compute Engine, and Cloud Run.
- Backups: Incremental backups stored independently from the source instance.
| Feature | AWS EFS | Azure Files | Google Filestore |
|---|---|---|---|
| Protocol | NFS 4.1 | SMB 3.x, NFS 4.1 | NFS 3.0 |
| Max Size | Petabyte-scale (elastic) | 100 TiB per share | Up to 100 TiB (Enterprise) |
| Provisioning | Serverless / elastic | Provisioned capacity | Provisioned instances |
| Encryption | At rest (KMS) + in transit | At rest (SSE) + in transit (SMB 3.0) | At rest (CMEK) + in transit |
๐ File vs Block vs Object โ When to Use What
Choosing the right storage type is a critical system design decision. Each paradigm excels at different workloads.
| Criteria | File Storage | Block Storage | Object Storage |
|---|---|---|---|
| Access Pattern | Shared, concurrent reads/writes | Single host, low-latency random I/O | HTTP-based, write-once-read-many |
| Data Model | Hierarchical (directories + files) | Raw blocks (no file system) | Flat namespace (key + metadata) |
| Latency | Sub-millisecond to low milliseconds | Sub-millisecond | Tens of milliseconds |
| Scalability | Hundreds of TBs | Up to 64 TiB per volume | Virtually unlimited |
| Best For | Shared data, CMS, media, home dirs | Databases, boot volumes, VMs | Backups, data lakes, static assets |
Use file storage when: multiple clients need to read/write the same files concurrently, your application expects a POSIX file system interface, or you are migrating a legacy on-premise application that relies on shared network drives.
Use block storage when: you need the lowest possible latency for a database engine or a single application that needs raw disk performance.
Use object storage when: you are storing massive amounts of unstructured data (images, videos, backups, logs) and access is primarily via HTTP APIs. See our object storage deep dive for more.
๐ Performance โ Throughput, IOPS & Latency
File storage performance is governed by three key metrics. Understanding them is essential for capacity planning. Use the SWEHelper Capacity Planner to estimate your requirements.
- Throughput (MB/s): The rate at which data can be read or written sequentially. Critical for large file workloads like video rendering or big data analytics. Cloud services like EFS Elastic mode can deliver up to 10+ GB/s of read throughput.
- IOPS (I/O Operations Per Second): The number of discrete read/write operations per second. Matters for workloads with many small files โ think web servers serving thousands of static assets or build systems compiling source trees.
- Latency: The time for a single I/O operation to complete. NFS over a local network typically sees 0.5โ2 ms latency. Cross-AZ or cross-region access adds network RTT on top.
Performance tuning tips for network file systems:
- Use
nconnectmount option on Linux to open multiple TCP connections to the NFS server. - Set appropriate
rsizeandwsize(read/write buffer sizes) โ 1 MB is recommended for most cloud file systems. - Enable
asyncmounts for write-heavy workloads where durability is handled at the application level. - For SMB, enable multichannel to aggregate bandwidth across multiple network interfaces.
๐ป Mounting File Shares โ Examples
Here is how to mount an NFS file share on a Linux client. This is the most common setup for cloud-based file storage:
# Install NFS client utilities
sudo apt-get update && sudo apt-get install -y nfs-common
# Create the mount point
sudo mkdir -p /mnt/shared-data
# Mount an AWS EFS file system
sudo mount -t nfs4 \
-o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport \
fs-0abc1234def56789.efs.us-east-1.amazonaws.com:/ \
/mnt/shared-data
# Verify the mount
df -h /mnt/shared-data
# Add to /etc/fstab for persistence across reboots
echo "fs-0abc1234def56789.efs.us-east-1.amazonaws.com:/ /mnt/shared-data nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev 0 0" | sudo tee -a /etc/fstab
Mounting an Azure Files SMB share on Linux:
# Install CIFS utilities
sudo apt-get install -y cifs-utils
# Store credentials securely
sudo bash -c 'cat > /etc/smbcredentials/myaccount.cred << EOF
username=myStorageAccountName
password=myStorageAccountKey
EOF'
sudo chmod 600 /etc/smbcredentials/myaccount.cred
# Mount the share
sudo mkdir -p /mnt/azure-files
sudo mount -t cifs \
//myStorageAccountName.file.core.windows.net/myshare \
/mnt/azure-files \
-o vers=3.0,credentials=/etc/smbcredentials/myaccount.cred,dir_mode=0777,file_mode=0777,serverino
๐งช Programmatic Access โ Python Examples
Once a file share is mounted, any application can access it using standard file I/O. Here is a Python example that reads and writes to a mounted NFS share:
import os
import json
from pathlib import Path
SHARED_DIR = Path("/mnt/shared-data/app-config")
def write_config(config: dict, filename: str) -> None:
"""Write a JSON config file to the shared NFS mount."""
SHARED_DIR.mkdir(parents=True, exist_ok=True)
config_path = SHARED_DIR / filename
with open(config_path, "w") as f:
json.dump(config, f, indent=2)
print(f"Config written to {config_path}")
def read_config(filename: str) -> dict:
"""Read a JSON config file from the shared NFS mount."""
config_path = SHARED_DIR / filename
with open(config_path, "r") as f:
return json.load(f)
def list_shared_files(directory: str = "") -> list:
"""List all files in the shared directory tree."""
target = SHARED_DIR / directory
return [
str(p.relative_to(SHARED_DIR))
for p in target.rglob("*")
if p.is_file()
]
# Usage
write_config({"db_host": "10.0.1.50", "db_port": 5432}, "db.json")
config = read_config("db.json")
print(f"DB Host: {config['db_host']}")
For Azure Files, you can also access shares programmatically using the Azure SDK without mounting:
from azure.storage.fileshare import ShareFileClient
file_client = ShareFileClient.from_connection_string(
conn_str="DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;",
share_name="myshare",
file_path="reports/quarterly.pdf"
)
# Download a file
with open("quarterly.pdf", "wb") as f:
data = file_client.download_file()
data.readinto(f)
# Upload a file
with open("updated-report.pdf", "rb") as source:
file_client.upload_file(source)
๐๏ธ Shared File Systems in Distributed Architectures
Shared file systems are fundamental to several architectural patterns in distributed systems:
- Kubernetes Persistent Volumes: EFS, Azure Files, and Filestore can back
ReadWriteMany(RWX) persistent volume claims, allowing multiple pods across nodes to share state. This is critical for container orchestration workloads. - CI/CD Artifact Sharing: Build pipelines can write artifacts to a shared NFS mount, enabling downstream stages or parallel jobs to pick them up without S3 round-trips.
- Machine Learning Training: Distributed training jobs on GPU clusters need shared access to training datasets. File storage eliminates the need to copy datasets to each node.
- Content Management: Web applications serving user-uploaded content from multiple app servers can use a shared file system as the content store behind a load balancer.
- Legacy Application Lift-and-Shift: Applications that depend on local file system semantics can migrate to the cloud by pointing to a managed NFS share with no code changes.
When designing shared file systems into your architecture, consider using the SWEHelper System Design Calculator to estimate throughput and IOPS requirements based on your expected workload.
๐ก Best Practices
- Choose the right protocol: Use NFS for Linux workloads, SMB for Windows or hybrid environments. Do not force a protocol mismatch.
- Enable encryption in transit: Always use NFS over TLS or SMB 3.0+ encryption when data crosses network boundaries.
- Use lifecycle policies: Move infrequently accessed files to cheaper tiers (EFS IA, Azure Cool) to reduce costs by up to 92%.
- Monitor performance: Set up CloudWatch (AWS), Azure Monitor, or Cloud Monitoring alerts for throughput and IOPS. Use the SWEHelper Latency Calculator to model expected latencies.
- Plan for bursting: Understand your cloud provider's burst credits model. EFS in Bursting mode earns credits during idle periods and spends them during peaks.
- Secure access: Use VPC security groups, private endpoints, and IAM policies to restrict who can mount and access file shares.
- Test failover: For mission-critical data, test cross-AZ and cross-region failover scenarios. Ensure your
/etc/fstabentries use the_netdevflag so mounts wait for networking.
โ Frequently Asked Questions
Q: When should I choose file storage over object storage?
Choose file storage when your application needs POSIX file system semantics โ shared concurrent access, file locking, random reads/writes within a file, or a directory hierarchy that the application traverses. Object storage is better for write-once-read-many patterns with HTTP-based access, like storing images, backups, or data lake files. If multiple servers need to read and write the same files simultaneously with low latency, file storage is the right choice.
Q: Can I use EFS or Azure Files with Kubernetes?
Yes. Both AWS EFS and Azure Files provide CSI (Container Storage Interface) drivers that integrate with Kubernetes. EFS supports ReadWriteMany access mode, allowing multiple pods across different nodes to mount the same volume. Azure Files supports both SMB and NFS-backed persistent volumes. Google Filestore similarly integrates with GKE via its Filestore CSI driver.
Q: How does NFS v4 differ from NFS v3?
NFS v4 is a stateful protocol that runs over a single TCP port (2049), making it firewall-friendly. It introduces built-in support for strong authentication (Kerberos), mandatory locking via lease-based delegation, compound operations to reduce round trips, and ACL-based permissions. NFS v3 is stateless, uses multiple ports (including the portmapper), and relies on a separate NLM (Network Lock Manager) for locking โ which is more fragile in network-partitioned environments.
Q: What are the cost implications of cloud file storage?
Cloud file storage is generally more expensive per GB than object storage but cheaper than provisioned block storage. For example, AWS EFS Standard costs around $0.30/GB-month versus S3 Standard at $0.023/GB-month. The key to cost optimization is using tiered storage โ enable lifecycle policies to move cold data to Infrequent Access or Archive tiers, which can reduce costs to $0.008โ$0.016/GB-month. Always monitor your usage patterns and adjust tiers accordingly.
Q: How do I handle file locking in a shared NFS environment?
NFS v4 provides built-in lease-based locking. The server grants lock leases that clients must periodically renew. If a client fails to renew (e.g., due to a crash), the server reclaims the lock after the lease period expires. For application-level coordination, consider using advisory locks via fcntl() or flock() system calls. In highly concurrent environments, you may want to implement distributed locking at the application layer using tools like etcd or ZooKeeper rather than relying solely on file-level locks.