Today, while using any online service, we got used to looking for the lock icon on the side of the URL. This icon is a graphical indicator that the service should be secure or at least have a certain level of security.
It indicates the use of HTTPS, which is accepted as a secure means of communication. But what on this data flow is actually secured? Is there any information on the data exchange that may be open to eavesdropping, not encrypted? Are HTTPS URLs encrypted, as well?
In this tutorial, we are going to find out.
We know that the struggle for digital security and privacy is almost as old as the technology itself. In the first days of the World Wide Web, as with most Internet services, it was designed for simplicity and openness. Little care about the possibility of third-parties eavesdropping communications.
The HTTP protocol, similar to older application layer protocols, FTP, Telnet, SMTP, used clear-text connections. It means that anyone in the in-between path of the communication flow could capture the data stream and see every bit of information that flowed through it.
To address this issue, the HTTP was extended with cryptographic methods with the name HTTPS, first using SSL then TLS.
2. Why Encrypt?
Encryption is a method of masquerading or encoding messages so that only the sender and receiver can understand their meaning. It has been used for a long time, even before the advent of the digital revolution. Nowadays, encryption is one of the methods used to achieve some of the information security key objectives:
- Confidentiality – protection against unintended information disclosure
- Integrity – the communication cannot be adultered, i.e., any it is possible to detect if the message was changed, clipped, or amended
- Availability – the data is available whenever needed
- Non-repudiation/auditing – should be impossible for the sender to deny its authorship
- Authentication – the identity of any parties involved in the communication can be asserted without doubt
- Authorization – data access only to authorized parties
The modern encryption methods rely on the use of adequated keys known only to the endpoints. Without the keys, the decryption or any alteration should be at least a very hard, or better yet, an impossible endeavor.
The difficulty of tampering with the communication should be easily enhanced by increasing key sizes or using stronger encryption algorithms.
We are not going in this tutorial to delve into the specifics of encryption, for it was already covered in another tutorial. And we can also check these tutorials on the basics of RSA and AES. Those are the more common asymmetric encryption available nowadays that are used in HTTPS standards as well.
HTTPS enables the use of cryptographic extensions in HTTP message exchanges. It is possible to use certificates to authenticate both the server (the usual scenario) and the clients (so that only specific authorized clients can exchange data with the server).
First, let’s review what is and how a URL is defined: on HTTPS, the Universal Resource Locators, URL, designate how to locate any resource on the web. For the HTTPS standard, it is defined as:
The user, password, resource path and name, and the parameters are application-specific and optional. The string “
hostname.domain_name” is also used to compile the Server Name Indicator.
Aside from the inline insertion on the URL, the parameters can also be sent in HTTP headers like this:
POST /test HTTP/1.1
That review is important so we can assert which parts, if any, of the URLS might not get the encryption treatment from the HTTPS.
In a nutshell, a very simplified way of describing a typical HTTPS session is this:
The first part of the communication, the DNS query, is not actually part of the HTTPS connection. It is necessary to translate the server hostname into the corresponding IP address. That is one of the exchanges that can disclose the server hostname. The other one occurs on the actual HTTPS connection.
After the client knows the server’s IP address, it opens an HTTPS connection to the server. Then, on the ClientHello message, it sends the Server Name that it intends to contact.
For example, let’s try to download the Baeldung logo from our website.
As we can see using Wireshark to capture the data exchange, the server name is disclosed in cleartext. However, that is the only part of the URL that is sent at this point:
After the ClientHello and ServerHello exchange, the endpoints negotiate the keys to use during the application data transfers. Once the keys are established, all data flow is encrypted, including the HTTP methods that make URI requests (GET, POST, PUT, HEAD, DELETE, TRACE, PATCH).
As we can see the full URL could not be read directly from the data capture without knowing the server’s private key.
4. So, Are HTTPS URLS Encrypted?
Yes, the full URL string is hidden, and all further communication, including the application-specific parameters.
However, the Server Name Indicator that is formed from the hostname and domain name part of the URL is sent in clear text during the first part of the TLS negotiation.
The Server Name Indicator is used so that the endpoint server can know to which virtual server the connection is supposed to address. This information gives the knowledge necessary in the choosing of the certificate needed to correctly identify itself and the correspondent private encryption key.
Besides the Server Name Indicator, anyone who has the server private key (or if can, somehow, guess it), can use it to decrypt the data flow. On Wireshark, there are specific configurations to add private keys to decrypt HTTPS traffic.
TLS v1.3, the current and recommended version, has a proposal to encrypt the ServerName. However, as the standard is not yet approved, we cannot expect any major browser to support it. The example above used Google Chrome for Windows with its current version: 95.0.4638.69, 64 bits.
We can check out our browsers specifics within the Cloudflare browser security check. It can also verify if the secure DNS (encrypted DNS queries) and DNS SEC (authenticated DNS) support.
5. What About Proxies?
Web proxies are gateways that intercept the traffic between two endpoints. They can be used for many reasons:
- to provide centralized shared caching and minimize bandwidth usage
- to create security or logical frontiers between network segments
- filter web traffic on cooperative (or countrywide networks, see The Great Firewall of China, for instance) networks to enforce security and site access policies
Proxies can be explicitly configured or hidden/transparent (the user does not need to configure a network proxy).
In both cases, they can, at least, intercept the HTTP Connect methods to gather the server name from this HTTP message.
Or, in the worse case, they can impersonate the servers, creating certificates as needed to fake their identities to the client, in a man-in-the-middle architecture.
In that case, to not trigger the site as insecure by the browser, the client had to previously accept the Proxy Server Certification Authority (CA) as a trusted root or intermediary authority. Something that, on corporation networks, can be done by setting the corresponding group policy without even an express user approval.
The user can see if that is the case by verifying the browsed site certification authority to see if it is a regular root Certification Authority or a corporation/country-owned one.
As we saw in this tutorial, even if the full URL is encrypted on HTTPS connections, we can assume that the server name is not. Both the DNS and HTTPS/TLS can disclose the server name.
Moreover, there are technics specific to enable full eavesdropping of the data flow, specifically, if one has access to the private keys of one of the endpoints or it has somehow managed to insert itself in the data flow faking of the endpoint’s identity.