The behind-the-scenes process of browsing

PilarPinto
9 min readApr 12, 2020

Every day millions of people use the internet to access their favourite pages, just by typing their name in the search engine and then press enter. This process has already been mechanized over the last decades. That is taken for granted and is simply expected to always work. But the reality is that a lot of things happen behind the scenes, more than we could imagine.

To approach this topic, you have to think of this process as if it were an onion. We are going to peel layer by layer until we discover how this onion is integrated. The first layer of the onion stores the name of the website when you want to enter. But as it is achieved that the machine that is behind the scenes understands perfectly that you want to go to bears .com and not to birds .com. At this point in the process, DNS comes into play an important role.

DNS or Domain Name System is a distributed database system that allows the translation of the IP address (Internet Protocol Address) of a site to a name understandable by humans. This system is a fundamental part of the Internet infrastructure This system was born out of the need to manage the large number of computers that connect to the internet network. Now it is very common for an average person to have two or more devices and that, taken to the world population, are gigantic numbers. One way to identify each of these devices is with the IP, this is how the address of a house, where the house is similar to a computer.

But then you would have to know the number of that address to visit the house you want to go to. The bad thing is that in this world the addresses consist on average of 8 digits and the human memory is not good remembering so many numbers. Adding the additional condition that there are 1.7 billion sites on the internet today. By far you could remember 5 of your favourite sites. So, one way to solve this is to create a system that translates IPs into more understandable terms for people. This is how the concept of URL was born.

In this case, you want to access the Holberton School page. We do not know the IP where this page is stored. But we know its URL. Next, we are going to take part by part this URL which stands for Uniform Resource Locator. To understand how DNS works hierarchically, then the following image is shown.

This URL, then, represents the hierarchical structure of how DNS searches through its servers. Starting with a search for cookies in the client’s operating system and then using ISPs, which are internet service providers. The servers exchange information and in this way, you begin to go through this hierarchy until you find the page that matches these pieces and thus obtain the IP that corresponds to the site.

The combination of only the domain name and top-level domain is known as a “root domain.” This is the reddish part of the hierarchy and it is the beginning of the DNS hierarchy and it combines the blue area of the diagram that corresponds to the domain name and the green area that represents the level of the top-level domain that in this case corresponds to .com

Top-level domain (TLD) is the formal term for the suffix that appears at the end of a domain name. Some example of top-level domains includes the five shown in the last diagram. But there are over 1,000 possible TLDs from which webmasters can choose. This includes things like .book, .clothing or the used by the countries, like Colombia .co. They are known as country code top-level domains or ccTLDs.

Whereas, Domain names are the second level of a domain’s hierarchy (after the top-level domain). Domain names on a specific TLD are purchased from registrars and represent the specific, unique location of a website, like “holbertonschool”. on the .com side this is the only page that you can access with holbertoonschool. If someone wanted that name they would have to use another type of Top Level domain like .edu or .org.

And subdomains are the third level of a domain’s hierarchy and are parts of a larger top-level domain. They are added in front of the root domain (TLD+Domain name) and separated from the domain name with a period. In this case, is www, but you can use any name for that, for example, softwarecool.holbertonschool.com and the subdomain would be softwarecool. This was a hypothetical case, but the two most common subdomain choices are:

www.holbertonschool.com (subdomain: www)

holbertonschool.com (Has no subdomain)

There are two types of DNS servers, Recursive Servers and Authorized Servers. The Recursive are the ones that make the process of searching through all levels of the hierarchy, starting with the cache as mentioned above. Whereas, the Authorized ones are responsible for providing answers to recursive servers on search requests and framed at the level or zone in which they are inherent.

The last part of the URL is the protocol, this is located in the first section of the URL. First, we have to introduce the concept of protocol, this is a way of doing something correctly by a simple convention. For example, at protocol parties it is correct to be elegant, casual clothing is not allowed and that is the protocol. In this case, we are going to focus on the model that manages the protocols used on the internet and this is TCP / IP.

Transmission control protocol/Internet Protocol is model base on layers, and is used to interconnect network devices on the internet: The TCP protocol defines how applications can create channels of communication across a network. This made through packets that encapsulate the information before it is transmitted, and in the final destination is reassembled in the right order. and the IP defines how to address and route each packet to make sure it reaches the right destination. The IP is the number that defines where is a specific device.

This protocol consists of four layers. The physical layer, where is located the ethernet, ARP protocols. The second is the network layer that manages the packets and its correct transmission. For this, it has the IP protocol to point well to the receiver and other protocols for handling and reporting errors in transmission. The third is the transport layer, this is responsible for maintaining the connection between sender to receiver. Thus ensures a reliable transmission despite the network could being affected by an external factor and the last one the application layer, provides applications with standardized data exchange, this layer contains the URL protocol that can be HTTP, HTTPS or FTP

The HTTP(HyperText Transfer Protocol) handles the communication between a web server and a web browser. While the HTTPS (Secure HTTP) handles secure communication between the same servers as HTTP and FTP (File Transfer Protocol) handles the transmission of files between computers.

The birth of HTTPS is due in part to the security of the data that is transmitted and thus cannot be intercepted by cybercriminals. So how does an HTTP protocol get back into HTTPS, the answer is found in SSL certificates. This is the acronym for Secure Sockets Layer, the standard technology for keeping an Internet connection secure. It uses encryption algorithms to encrypt the data being transmitted. The port that this protocol use is the 443 in the TCP mode. But what is a port, this is like a door to your inner server and is the form that the information can get in or out from your server. For that reason is important just enable the ports that will really use, because that can be like open doors for cybercriminals.

The encryption of a message is simply to return a message, which is written in a language. In another that can only understand a final receiver with a specific key. For example, we have the following message.

Hello everyone, I am a very important number with money

if we now represent the message two additional letters. In other words, if it is an a, it becomes a c. If it is a b in a d and so on

Jgnnq gxgtaqpg, K co c xgta korqtvcpv pwodgt ykvj oqpga

It is very easy to understand, knowing that they are two letters later in the alphabet. But if you didn’t know the two-letter trick, you would think it doesn’t mean anything. Now imagine more complex encryption algorithms. Then the transmission becomes more secure.

Well, now how a web page loads. To start two servers are involved in building a web page. These are the web server and the application server. A web server is a computer that stores the files that make up a website, eg. HTML documents, images, CSS style sheets and JavaScript file. This is known as the static part of a web page. It is the one seen on the user side. The colours, the styles, everything that makes a web page pleasing to the eye.

While an application server is a type of server that handles dynamic content, this is because the server application updates the stored files before sending them via the HTTP server. And it usually works with a database. So, the application server is in charge of the logic, the interaction between the user and the displayed content.

A database is a collection of information organized in such a way that it is easily accessible, managed and updated. It is normally structured in tables and can be accessed using SQL statements. It is simply where information relevant to the website is stored. For example, in the Holberton form, they ask for your name and email to request information. It is likely that there is a table called “form” and has a name and email in its columns, and these are full of names and emails of people who want to be contacted. This table can be connected to other tables and in this way, you can create a relational database.

Then the dynamic and static part can be included within a server, and this would be the services of a single server. But if there are two or more servers, the flow of transmissions increases and so does the load on the network, making it necessary to use a load balancer that is basically in charge of directing transmission traffic. Allowing a more efficient flow. It is like a traffic light that allows you to control the chaos of traffic.

But in addition to a traffic light, you need a police officer and this is the firewall. This is a system that allows filtering the data packets that are on the network. The firewall can authorize or deny a connection, even redirect a connection request without notifying the sender (Drop). This is done for security reasons and in this way, it becomes less possible for a malicious package to infiltrate a device and even our computer. This firewall actually works with the HTTPS protocol for the TCP model, for that reason the firewall has to had the port 443/TCP open in order to let pass the traffic from a server that works with a secure transmission. Because the firewall blocks the transmission if it reaches one of the ports that are not enabled or authorized.

So simply typing holbertonschool.com in the browser and press enter implies an IP translation through DNS. Then use of the protocols of the TCP / IP model in its layers. Then the role of the different servers to bring the information that constructs the hole web site. This through efficient traffic due to load balancers and reliable security through firewalls and SSL encryption. All of this happens in less time than it takes to open and close your eyes. And this is how to work The behind-the-scenes process of browsing.

References

https://moz.com/learn/seo/domain
https://pandorafms.com/blog/dns-monitoring/
https://developer.mozilla.org/es/docs/Learn/Common_questions/Que_es_un_servidor_WEB
https://www.websecurity.digicert.com/es/es/security-topics/what-is-ssl-tls-https
https://www.tecnologia-informatica.com/que-es-firewall-como-funciona-tipos-firewall/

--

--