The internet isn’t a magical place; it’s a carefully orchestrated series of agreements between machines, and understanding those agreements is how you stop being a user and start being an engineer.

Let’s watch a simple DNS lookup happen, not as a concept, but as packets flowing.

Imagine you type www.example.com into your browser. Your machine doesn’t know where www.example.com lives. It needs an IP address.

dig www.example.com +trace

This command shows you the entire delegation path. It starts with a request to a root DNS server. The root server doesn’t know the IP for www.example.com, but it knows who does know about .com domains. It hands you back the IP address of a .com Top-Level Domain (TLD) name server.

Your machine then asks the .com TLD server. This server also doesn’t have the final answer, but it knows which authoritative name server is responsible for the example.com zone. It gives you that IP.

Finally, your machine asks the authoritative name server for example.com. This server does have the record, and it returns the IP address for www.example.com. Now your browser can start making HTTP requests to that IP.

This entire process relies on a hierarchy of trust and information. The root servers are the ultimate source of truth for TLDs, and TLD servers are the source of truth for zones like example.com.

The dig command, with +trace, is your window into this delegation. You see the IP addresses of the servers your machine is querying at each step. You can also see the Time-To-Live (TTL) value for each DNS record. This TTL tells your machine (and intermediate DNS resolvers) how long to cache the record before asking again. A low TTL means changes propagate quickly but can increase DNS query load. A high TTL means stability and reduced load, but updates take longer.

When you see a DNS error, it’s often because a server in this chain is unreachable, misconfigured, or has expired records. For instance, if the .com TLD server is down, you won’t be able to resolve any .com domains, even if your authoritative server for example.com is perfectly healthy.

The most surprising thing about TCP/IP is how much state it manages to maintain reliable connections over an unreliable network. It’s not just sending packets; it’s a conversation with acknowledgments, sequence numbers, and retransmissions.

The three-way handshake is the fundamental dance. SYN, SYN-ACK, ACK. This ensures both sides are ready and agree on initial sequence numbers. If any part of that handshake fails, the connection never establishes.

Consider netcat for testing. You can set up a listener on one end:

nc -l -p 12345

And then try to connect from another:

nc <server_ip> 12345

If the connection fails, your nc client will hang or report a connection refused/timeout. This is the network saying, "I couldn’t complete the handshake."

Firewalls are the gatekeepers. They operate on rules, typically checking IP addresses and port numbers. A common mistake is assuming a firewall rule applies to both directions of traffic. If you allow port 80 inbound, you still need to allow port 80 outbound if the server needs to respond.

# Example: Allow SSH inbound on port 22
ufw allow 22/tcp

# Example: Allow HTTP inbound on port 80
ufw allow 80/tcp

When troubleshooting connectivity, always check the firewall on both the client and server, and any intermediate firewalls. The iptables-save command on Linux shows the current ruleset.

The difference between TCP and UDP is critical. TCP is like a registered letter: reliable, ordered, guaranteed delivery. UDP is like a postcard: fast, no guarantees, no order. You use TCP for web browsing (HTTP/S), SSH, and file transfers. You use UDP for DNS, streaming video, and online gaming where speed trumps perfect delivery.

A common point of confusion is that some services use UDP but appear to need reliability. DNS, for example, uses UDP for speed. If a DNS query is lost, the client simply times out and tries again. The application layer handles the "reliability" by retrying.

The concept of routing is how packets find their way across the internet. Routers don’t know the full path; they just know the "next hop" for a given destination network based on their routing tables. traceroute (or mtr) is your tool for visualizing this path.

mtr www.example.com

This command shows you each router hop your packets take and the latency to each. If you see a jump in latency or packet loss at a specific hop, that’s a strong indicator of a network issue at that point.

The most overlooked aspect of network troubleshooting is stateful versus stateless firewalls and load balancers. A stateless firewall just looks at each packet individually. A stateful firewall keeps track of connections. If you have a stateful firewall that’s lost track of a connection (e.g., due to a reboot or crash), it might drop legitimate return traffic, even if your inbound rules are correct. Similarly, load balancers can have sticky sessions that rely on client-side cookies or IP addresses; if those get corrupted or changed, the load balancer might send traffic to the wrong backend.

The next logical step after mastering these fundamentals is understanding how these protocols behave under load and how to instrument them for performance monitoring.

Want structured learning?

Take the full DevOps & Platform Engineering course →