📁 File Storage — NFS, SMB, EFS, Azure Files & Shared File Systems

File storage is one of the three fundamental storage paradigms in computing, alongside block storage and object storage. It organizes data in a hierarchical directory structure — the familiar tree of folders and files that every developer interacts with daily. Unlike block storage, which deals with raw disk volumes, or object storage, which uses flat key-value namespaces, file storage presents data through a POSIX-compatible or SMB-compatible file system interface that applications can read, write, and traverse using standard OS-level calls.

In distributed systems, file storage becomes especially important when multiple compute nodes need shared access to the same set of files simultaneously. This guide covers everything you need to know: protocols, cloud services, performance characteristics, and when to pick file storage over the alternatives.

🧩 What Is File Storage?

File storage systems manage data as a hierarchy of directories (folders) and files. Each file has metadata — a name, permissions, timestamps, and a path that uniquely identifies it within the tree. The file system itself handles the translation between this logical hierarchy and the physical blocks on disk.

Key characteristics of file storage include:

Hierarchical namespace: Data is organized into nested directories, making it intuitive for humans and applications.
File-level access: Operations happen at the file granularity — open, read, write, close, rename, delete.
Locking and concurrency: File systems support advisory or mandatory locks, enabling safe concurrent access.
Metadata-rich: Each file carries ownership, permissions, timestamps (ctime, mtime, atime), and extended attributes.
Mountable: Network file systems can be mounted on client machines and accessed as if they were local directories.

This model is a natural fit for workloads like content management systems, shared home directories, media processing pipelines, and legacy application migration to the cloud.

⚙️ NFS vs SMB/CIFS — Protocol Comparison

Two dominant protocols power networked file storage: NFS (Network File System) and SMB/CIFS (Server Message Block / Common Internet File System). Your choice depends on the OS ecosystem, performance requirements, and feature needs.

Feature	NFS (v3 / v4 / v4.1)	SMB / CIFS (SMB 2.x / 3.x)
Origin	Sun Microsystems (1984)	IBM / Microsoft (1983+)
Primary OS	Linux / Unix / macOS	Windows (also Linux via Samba)
Transport	TCP (v4+), UDP (v3)	TCP (port 445)
Authentication	Kerberos, AUTH_SYS (UID/GID)	NTLM, Kerberos, Active Directory
Encryption	Optional (Kerberos-based in v4)	Built-in (SMB 3.0+ AES-128/256)
Locking	Advisory (v3), mandatory (v4)	Opportunistic locks (oplocks)
Performance	Lower overhead, higher throughput for Linux	Optimized for Windows; multichannel in SMB 3.0
Statefulness	Stateless (v3), stateful (v4)	Stateful
Best For	Linux container workloads, HPC, CI/CD	Windows enterprise, Active Directory environments

For microservices architectures running on Linux, NFS v4.1 with pNFS (parallel NFS) extensions provides excellent performance. For Windows-centric enterprises, SMB 3.x with multichannel and encryption is the clear choice.

☁️ Cloud File Storage Services

Every major cloud provider offers a managed file storage service that abstracts away the underlying infrastructure, giving you an elastic, highly available shared file system.

AWS Elastic File System (EFS)

Amazon EFS is a fully managed, serverless NFS file system. It automatically grows and shrinks as you add and remove files, with no provisioning required. EFS supports NFSv4.1 and integrates natively with EC2, ECS, EKS, and Lambda.

Storage classes: Standard, Infrequent Access (IA), Archive — with lifecycle policies to auto-tier cold data.
Throughput modes: Bursting (scales with size), Provisioned (fixed), and Elastic (auto-scales with workload).
Replication: Cross-region replication for disaster recovery.

Azure Files

Azure Files provides fully managed file shares accessible via SMB 3.x, NFS 4.1, and the Azure Files REST API. It integrates with Azure Active Directory for identity-based access control.

Tiers: Premium (SSD-backed), Transaction Optimized, Hot, Cool.
Azure File Sync: Caches Azure file shares on on-premises Windows Servers for hybrid scenarios.
Snapshots: Share-level snapshots for point-in-time recovery.

Google Cloud Filestore

Filestore offers managed NFS file servers for GCP workloads. It provides predictable performance with dedicated storage instances.

Tiers: Basic HDD, Basic SSD, High Scale SSD, Enterprise.
Integration: Works with GKE, Compute Engine, and Cloud Run.
Backups: Incremental backups stored independently from the source instance.

Feature	AWS EFS	Azure Files	Google Filestore
Protocol	NFS 4.1	SMB 3.x, NFS 4.1	NFS 3.0
Max Size	Petabyte-scale (elastic)	100 TiB per share	Up to 100 TiB (Enterprise)
Provisioning	Serverless / elastic	Provisioned capacity	Provisioned instances
Encryption	At rest (KMS) + in transit	At rest (SSE) + in transit (SMB 3.0)	At rest (CMEK) + in transit

🔍 File vs Block vs Object — When to Use What

Choosing the right storage type is a critical system design decision. Each paradigm excels at different workloads.

Criteria	File Storage	Block Storage	Object Storage
Access Pattern	Shared, concurrent reads/writes	Single host, low-latency random I/O	HTTP-based, write-once-read-many
Data Model	Hierarchical (directories + files)	Raw blocks (no file system)	Flat namespace (key + metadata)
Latency	Sub-millisecond to low milliseconds	Sub-millisecond	Tens of milliseconds
Scalability	Hundreds of TBs	Up to 64 TiB per volume	Virtually unlimited
Best For	Shared data, CMS, media, home dirs	Databases, boot volumes, VMs	Backups, data lakes, static assets

Use file storage when: multiple clients need to read/write the same files concurrently, your application expects a POSIX file system interface, or you are migrating a legacy on-premise application that relies on shared network drives.

Use block storage when: you need the lowest possible latency for a database engine or a single application that needs raw disk performance.

Use object storage when: you are storing massive amounts of unstructured data (images, videos, backups, logs) and access is primarily via HTTP APIs. See our object storage deep dive for more.

📊 Performance — Throughput, IOPS & Latency

File storage performance is governed by three key metrics. Understanding them is essential for capacity planning. Use the SWEHelper Capacity Planner to estimate your requirements.

Throughput (MB/s): The rate at which data can be read or written sequentially. Critical for large file workloads like video rendering or big data analytics. Cloud services like EFS Elastic mode can deliver up to 10+ GB/s of read throughput.
IOPS (I/O Operations Per Second): The number of discrete read/write operations per second. Matters for workloads with many small files — think web servers serving thousands of static assets or build systems compiling source trees.
Latency: The time for a single I/O operation to complete. NFS over a local network typically sees 0.5–2 ms latency. Cross-AZ or cross-region access adds network RTT on top.

Performance tuning tips for network file systems:

Use nconnect mount option on Linux to open multiple TCP connections to the NFS server.
Set appropriate rsize and wsize (read/write buffer sizes) — 1 MB is recommended for most cloud file systems.
Enable async mounts for write-heavy workloads where durability is handled at the application level.
For SMB, enable multichannel to aggregate bandwidth across multiple network interfaces.

💻 Mounting File Shares — Examples

Here is how to mount an NFS file share on a Linux client. This is the most common setup for cloud-based file storage:

# Install NFS client utilities
sudo apt-get update && sudo apt-get install -y nfs-common

# Create the mount point
sudo mkdir -p /mnt/shared-data

# Mount an AWS EFS file system
sudo mount -t nfs4 \
  -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport \
  fs-0abc1234def56789.efs.us-east-1.amazonaws.com:/ \
  /mnt/shared-data

# Verify the mount
df -h /mnt/shared-data

# Add to /etc/fstab for persistence across reboots
echo "fs-0abc1234def56789.efs.us-east-1.amazonaws.com:/ /mnt/shared-data nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev 0 0" | sudo tee -a /etc/fstab

Mounting an Azure Files SMB share on Linux:

# Install CIFS utilities
sudo apt-get install -y cifs-utils

# Store credentials securely
sudo bash -c 'cat > /etc/smbcredentials/myaccount.cred << EOF
username=myStorageAccountName
password=myStorageAccountKey
EOF'
sudo chmod 600 /etc/smbcredentials/myaccount.cred

# Mount the share
sudo mkdir -p /mnt/azure-files
sudo mount -t cifs \
  //myStorageAccountName.file.core.windows.net/myshare \
  /mnt/azure-files \
  -o vers=3.0,credentials=/etc/smbcredentials/myaccount.cred,dir_mode=0777,file_mode=0777,serverino

🧪 Programmatic Access — Python Examples

Once a file share is mounted, any application can access it using standard file I/O. Here is a Python example that reads and writes to a mounted NFS share:

import os
import json
from pathlib import Path

SHARED_DIR = Path("/mnt/shared-data/app-config")

def write_config(config: dict, filename: str) -> None:
    """Write a JSON config file to the shared NFS mount."""
    SHARED_DIR.mkdir(parents=True, exist_ok=True)
    config_path = SHARED_DIR / filename
    with open(config_path, "w") as f:
        json.dump(config, f, indent=2)
    print(f"Config written to {config_path}")

def read_config(filename: str) -> dict:
    """Read a JSON config file from the shared NFS mount."""
    config_path = SHARED_DIR / filename
    with open(config_path, "r") as f:
        return json.load(f)

def list_shared_files(directory: str = "") -> list:
    """List all files in the shared directory tree."""
    target = SHARED_DIR / directory
    return [
        str(p.relative_to(SHARED_DIR))
        for p in target.rglob("*")
        if p.is_file()
    ]

# Usage
write_config({"db_host": "10.0.1.50", "db_port": 5432}, "db.json")
config = read_config("db.json")
print(f"DB Host: {config['db_host']}")

For Azure Files, you can also access shares programmatically using the Azure SDK without mounting:

from azure.storage.fileshare import ShareFileClient

file_client = ShareFileClient.from_connection_string(
    conn_str="DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;",
    share_name="myshare",
    file_path="reports/quarterly.pdf"
)

# Download a file
with open("quarterly.pdf", "wb") as f:
    data = file_client.download_file()
    data.readinto(f)

# Upload a file
with open("updated-report.pdf", "rb") as source:
    file_client.upload_file(source)

🏗️ Shared File Systems in Distributed Architectures

Shared file systems are fundamental to several architectural patterns in distributed systems:

Kubernetes Persistent Volumes: EFS, Azure Files, and Filestore can back ReadWriteMany (RWX) persistent volume claims, allowing multiple pods across nodes to share state. This is critical for container orchestration workloads.
CI/CD Artifact Sharing: Build pipelines can write artifacts to a shared NFS mount, enabling downstream stages or parallel jobs to pick them up without S3 round-trips.
Machine Learning Training: Distributed training jobs on GPU clusters need shared access to training datasets. File storage eliminates the need to copy datasets to each node.
Content Management: Web applications serving user-uploaded content from multiple app servers can use a shared file system as the content store behind a load balancer.
Legacy Application Lift-and-Shift: Applications that depend on local file system semantics can migrate to the cloud by pointing to a managed NFS share with no code changes.

When designing shared file systems into your architecture, consider using the SWEHelper System Design Calculator to estimate throughput and IOPS requirements based on your expected workload.

💡 Best Practices

Choose the right protocol: Use NFS for Linux workloads, SMB for Windows or hybrid environments. Do not force a protocol mismatch.
Enable encryption in transit: Always use NFS over TLS or SMB 3.0+ encryption when data crosses network boundaries.
Use lifecycle policies: Move infrequently accessed files to cheaper tiers (EFS IA, Azure Cool) to reduce costs by up to 92%.
Monitor performance: Set up CloudWatch (AWS), Azure Monitor, or Cloud Monitoring alerts for throughput and IOPS. Use the SWEHelper Latency Calculator to model expected latencies.
Plan for bursting: Understand your cloud provider's burst credits model. EFS in Bursting mode earns credits during idle periods and spends them during peaks.
Secure access: Use VPC security groups, private endpoints, and IAM policies to restrict who can mount and access file shares.
Test failover: For mission-critical data, test cross-AZ and cross-region failover scenarios. Ensure your /etc/fstab entries use the _netdev flag so mounts wait for networking.

❓ Frequently Asked Questions

Q: When should I choose file storage over object storage?

Choose file storage when your application needs POSIX file system semantics — shared concurrent access, file locking, random reads/writes within a file, or a directory hierarchy that the application traverses. Object storage is better for write-once-read-many patterns with HTTP-based access, like storing images, backups, or data lake files. If multiple servers need to read and write the same files simultaneously with low latency, file storage is the right choice.

Q: Can I use EFS or Azure Files with Kubernetes?

Yes. Both AWS EFS and Azure Files provide CSI (Container Storage Interface) drivers that integrate with Kubernetes. EFS supports ReadWriteMany access mode, allowing multiple pods across different nodes to mount the same volume. Azure Files supports both SMB and NFS-backed persistent volumes. Google Filestore similarly integrates with GKE via its Filestore CSI driver.

Q: How does NFS v4 differ from NFS v3?

NFS v4 is a stateful protocol that runs over a single TCP port (2049), making it firewall-friendly. It introduces built-in support for strong authentication (Kerberos), mandatory locking via lease-based delegation, compound operations to reduce round trips, and ACL-based permissions. NFS v3 is stateless, uses multiple ports (including the portmapper), and relies on a separate NLM (Network Lock Manager) for locking — which is more fragile in network-partitioned environments.

Q: What are the cost implications of cloud file storage?

Cloud file storage is generally more expensive per GB than object storage but cheaper than provisioned block storage. For example, AWS EFS Standard costs around $0.30/GB-month versus S3 Standard at $0.023/GB-month. The key to cost optimization is using tiered storage — enable lifecycle policies to move cold data to Infrequent Access or Archive tiers, which can reduce costs to $0.008–$0.016/GB-month. Always monitor your usage patterns and adjust tiers accordingly.

Q: How do I handle file locking in a shared NFS environment?

NFS v4 provides built-in lease-based locking. The server grants lock leases that clients must periodically renew. If a client fails to renew (e.g., due to a crash), the server reclaims the lock after the lease period expires. For application-level coordination, consider using advisory locks via fcntl() or flock() system calls. In highly concurrent environments, you may want to implement distributed locking at the application layer using tools like etcd or ZooKeeper rather than relying solely on file-level locks.

📁 File Storage — NFS, SMB, EFS, Azure Files & Shared File Systems