Distributed File Systems — From Zero
If you've never worked with distributed systems before, start here. This page assumes you know what a file is and nothing else.
What Is a File System?
A file system is the software that manages files on your computer. When you save a document, the file system decides where on the disk to put the bytes, remembers the filename, and lets you retrieve it later. You interact with it through paths like /home/user/Documents/report.pdf.
Every operating system has one. It's the layer between applications and the raw disk.
What Makes It "Distributed"?
A normal file system manages files on one disk connected to one computer. A distributed file system manages files across multiple disks connected to multiple computers (called nodes).
Single-node file system:
┌─────────────────┐
│ Application │
│ ↓ │
│ File System │
│ ↓ │
│ Disk │
└─────────────────┘
Distributed file system:
┌──────────────────────────────────────┐
│ Application │
│ ↓ │
│ Distributed File System │
│ ↙ ↓ ↘ │
│ Node 0 Node 1 Node 2 │
│ ↓ ↓ ↓ │
│ Disk 0 Disk 1 Disk 2 │
└──────────────────────────────────────┘
The application doesn't know (or care) that the files are spread across three machines. It calls "save this file" and the distributed system handles the rest.
Why Would You Want This?
1. Fault Tolerance
If you store important data on one computer and that computer's hard drive dies, the data is gone. With three replicas, you can lose one or even two nodes and still have the data. The probability of all three failing simultaneously is vanishingly small.
2. Availability
If a single file server goes down for maintenance, all clients are blocked. With three replicas, clients can be redirected to any available node.
3. No Single Point of Failure
In a single-server system, the server IS the failure point. If it crashes, everything stops. A distributed system can lose nodes without going offline.
4. Data Locality
In a geographically distributed system, users in Asia can read from a replica in Asia, while users in Europe read from a replica in Europe — reducing latency. (Our project runs on localhost, but the principle is the same.)
The Problem: Consistency
If you write "hello" to Node 0, but Node 1 is temporarily unreachable, what happens?
- Node 0 has "hello"
- Node 1 still has the old data (or nothing)
Now a client reads from Node 1 and gets stale data. This is inconsistency — the replicas don't agree on the state of the file.
How We Solve It
This project uses Totally Ordered Multicast (TO-Multicast) to ensure every write reaches every node, and all nodes apply writes in the exact same order. The result: strong consistency — all nodes agree on the file system state at all times.
We also use Raft leader election to designate one node as the "leader" for reads, ensuring clients always get the latest committed data.
Our System's Specifics
| Property | Value |
|---|---|
| Nodes | 3 (Raft minimum for majority consensus) |
| Replication model | Active replication (all nodes actively serve) |
| Consistency model | Strong consistency (via TO-Multicast) |
| Write protocol | Broadcast to all, wait for ACKs, deliver in order |
| Read protocol | Direct read from leader or any replica |
| Consensus | Raft leader election |
Next: Understand Java RMI
The next page explains Java RMI — the networking technology that lets our nodes call methods on each other as if they were local objects. → Java RMI In Depth