Your computer is a distributed system

Most computers are essentially a distributed system internally, and provide their more familiar programming interface as an abstraction on top of that. Piercing through these abstractions can yield greater performance, but most programmers do not need to do so. This is something unique: an abstraction that hides the distributed nature of a system and actually succeeds.

Many have observed that today's computers are distributed systems implementing the abstraction of the shared-memory multiprocessor. The wording in the title is from the 2009 paper "Your computer is already a distributed system. Why isn't your OS?". Some points from that and other sources:

In a typical computer there's many components running concurrently, e.g. graphics cards, storage devices, network cards, all running their own programs ("firmware") on their own cores, independent of the main CPU and communicating over the internal network (e.g. a PCI bus).
Components can appear and disappear and fail independently, up to and including CPU cores, and other components have to deal with this. Much like on larger-scale unreliable networks, timeouts are used to detect communication failures.
Components can be malicious and send adversarial inputs; e.g., a malicious USB or Thunderbolt device can try to exploit vulnerabilities in other hardware components in the system, sometimes successfully. The system should be robust to such attacks, just as with larger-scale distributed systems.
These components are all updated on their own schedule. Often the computer's owner is not allowed to push their own updates to these components.
The latency between different components of a computer is highly relevant; e.g., the latency of communication between the CPU and main memory or storage devices is huge compared to communication within the CPU.
To reduce these latencies, we put caches in the middle, just as we use caches and CDNs in larger distributed systems.
Those caches themselves are sophisticated distributed systems, implemented with message passing between individual cores. Providing cache consistency has a latency cost just as it does in larger distributed systems.

As the shared-memory multiprocessor is actually a distributed system underneath, we can, if necessary, reason explicitly about that underlying system and use the same techniques that larger-scale distributed systems use:

Performing operations closer to the data, e.g. through compute elements embedded into storage devices and network interfaces, is faster.
Offloading heavy computations to a "remote service" can be faster, both because the service can run with better resources, and because the service will have better locality.
Communicating directly between devices rather than going through the centralized main memory reduces load on the CPU and improves latency.
Using message passing instead of shared memory is faster, for the same reasons larger distributed systems don't use distributed shared memory.

But most programs do not do such things; the abstraction of the shared-memory multiprocessor is sufficient for most programs.

That this abstraction is successful is surprising. In distributed systems, it is often supposed that you cannot abstract away the issues of distributed programming. One classic and representative quote:

It is our contention that a large number of things may now go wrong due to the fact that RPC tries to make remote procedure calls look exactly like local ones, but is unable to do it perfectly.
- Section 1, A Critique of the Remote Procedure Call Paradigm

Following this, most approaches to large-scale distributed programming today make the "distributed" part explicit: They expose (and require programmers to deal with) network failures, latency, insecure networks, partial failure, etc. It's notoriously hard for programmers to deal with these issues.

So it is surprising to find that essentially all programs are written in an environment that abstracts away its own distributed nature: the shared-memory multiprocessor, as discussed above. Such programs are filled with RPCs which appear as if they're local operations. Accesses to memory and disk are synchronous RPCs to relatively distant hardware, which block the execution of the program while waiting for a response. But for most programs, this is completely fine.

This suggests that it may be possible to abstract away the distributed nature of larger-scale systems. Perhaps eventually we can use the same abstractions for distributed systems of any size, from individual computers to globe-spanning networks.