Distributed File Systems — From Zero

If you've never worked with distributed systems before, start here. This page assumes you know what a file is and nothing else.

What Is a File System?

A file system is the software that manages files on your computer. When you save a document, the file system decides where on the disk to put the bytes, remembers the filename, and lets you retrieve it later. You interact with it through paths like /home/user/Documents/report.pdf.

Every operating system has one. It's the layer between applications and the raw disk.

What Makes It "Distributed"?

A normal file system manages files on one disk connected to one computer. A distributed file system manages files across multiple disks connected to multiple computers (called nodes).

Single-node file system:
┌─────────────────┐
│  Application    │
│       ↓         │
│  File System    │
│       ↓         │
│  Disk           │
└─────────────────┘

Distributed file system:
┌──────────────────────────────────────┐
│           Application               │
│              ↓                      │
│     Distributed File System         │
│     ↙         ↓         ↘          │
│  Node 0    Node 1    Node 2        │
│   ↓         ↓         ↓            │
│  Disk 0   Disk 1   Disk 2          │
└──────────────────────────────────────┘

The application doesn't know (or care) that the files are spread across three machines. It calls "save this file" and the distributed system handles the rest.

Why Would You Want This?

1. Fault Tolerance

If you store important data on one computer and that computer's hard drive dies, the data is gone. With three replicas, you can lose one or even two nodes and still have the data. The probability of all three failing simultaneously is vanishingly small.

2. Availability

If a single file server goes down for maintenance, all clients are blocked. With three replicas, clients can be redirected to any available node.

3. No Single Point of Failure

In a single-server system, the server IS the failure point. If it crashes, everything stops. A distributed system can lose nodes without going offline.

4. Data Locality

In a geographically distributed system, users in Asia can read from a replica in Asia, while users in Europe read from a replica in Europe — reducing latency. (Our project runs on localhost, but the principle is the same.)

The Problem: Consistency

If you write "hello" to Node 0, but Node 1 is temporarily unreachable, what happens?

Node 0 has "hello"
Node 1 still has the old data (or nothing)

Now a client reads from Node 1 and gets stale data. This is inconsistency — the replicas don't agree on the state of the file.

How We Solve It

This project uses Totally Ordered Multicast (TO-Multicast) to ensure every write reaches every node, and all nodes apply writes in the exact same order. The result: strong consistency — all nodes agree on the file system state at all times.

We also use Raft leader election to designate one node as the "leader" for reads, ensuring clients always get the latest committed data.

Our System's Specifics

Property	Value
Nodes	3 (Raft minimum for majority consensus)
Replication model	Active replication (all nodes actively serve)
Consistency model	Strong consistency (via TO-Multicast)
Write protocol	Broadcast to all, wait for ACKs, deliver in order
Read protocol	Direct read from leader or any replica
Consensus	Raft leader election

Next: Understand Java RMI

The next page explains Java RMI — the networking technology that lets our nodes call methods on each other as if they were local objects. → Java RMI In Depth

What Is a File System?​

What Makes It "Distributed"?​

Why Would You Want This?​

1. Fault Tolerance​

2. Availability​

3. No Single Point of Failure​

4. Data Locality​

The Problem: Consistency​

How We Solve It​

Our System's Specifics​

Next: Understand Java RMI​