I haven't written an article related to the HTTP protocol for a while. Suddenly, I found that there is still a topic of HTTP proxy in the article selection pool in the "Practical HTTP" series. Let's talk about HTTP proxy today. In the HTTP protocol, the most basic is the request and response message, and the message is composed of a message header and a message entity. Most of the usage scenarios of the HTTP protocol are implemented by setting different HTTP request/response headers. Since we are talking about the proxy, let's first raise two questions as the main line and explain the HTTP proxy from the questions. How does the packet capture tool implement HTTP packet capture? For HTTPS traffic, without installing a certificate, the request and response are still normal through the packet capture tool.
The HTTP proxy we are talking about today is more of an abstract concept, and the principle behind it is the most critical.
2. HTTP Proxy
2.1 What is HTTP proxy? Speaking of HTTP proxy, as a client developer, the most familiar thing is that when using tools such as Fiddler and Charles to capture packets, you need to hang a proxy on your mobile phone to facilitate us to troubleshoot some network problems. This is just one of the many usage scenarios of the proxy. In fact, HTTP proxy (Web proxy) is an entity that exists in the middle of the network and can provide various functions . Without HTTP proxy, the client terminal must interact directly with the terminal server. With HTTP proxy, the client terminal can communicate with the proxy, and then the proxy interacts with the server on behalf of the client. HTTP proxy is the easiest HTTP protocol concept to understand, and it is closest to our lives. In our lives, there are various agency services. For example, if you and your girlfriend are going to travel abroad, some countries that do not have visa exemptions need to apply for visas in advance. We are not familiar with it and naturally think that the process is complicated. At this time, you can choose to let a travel agency to apply for a visa on your behalf. You only need to prepare materials according to the list provided by the other party, and you can easily get a visa. In this process, you save time, and the travel agency makes a little money from you.
A proxy service is a middleman who completes the transaction processing on behalf of the client. It takes over the client's affairs and interacts with the server on behalf of the client. The proxy service is an abstract intermediate entity that can exist at various intermediate points in the network, such as browsers, routers, proxy servers, and reverse proxies of Web servers.
2.2 Classification of HTTP Proxy
Let's start with the most familiar packet capture tools, such as Fiddler and Charles, which are very well encapsulated. Even if we don't understand the details of HTTP proxy at all, we can use them with simple configuration. During use, you will find two scenarios: For HTTP protocol requests, the details of the request/response message can be directly displayed. For HTTPS, if the certificate is not imported, the request can still be sent to the server and the data can be returned normally, but the message details will not be displayed.
Without importing the certificate, we cannot get the details of the HTTPS request, but it does not affect our requests and responses. These two different performances also correspond to two different HTTP proxies: Ordinary proxy . It is defined in HTTP/1.1 based on the revised RFC2616. This proxy plays the role of a "middleman". To the client, it is a server, and to the real server, it is a client. It is responsible for transmitting HTTP messages between the two ends. Tunnel proxy . This is a tunnel transmission proxy based on the TCP protocol. It completes communication through the CONNECT method of the HTTP protocol, and implements any TCP-based application layer protocol proxy in HTTP mode.
Next we will explain these two types of agents separately.
2.3 Ordinary Agent
It is not complicated to understand a common proxy. It is an intermediate entity in the network, located between the client and the server, playing the role of a "middleman" and passing messages back and forth between the two ends. This "middleman" holds the client in its left hand and the server in its right hand. When receiving a request message from the client, it needs to correctly process the request and connection status, send a new request to the server, and after receiving the response, package the response result into a response body and return it to the client.
In the process of ordinary proxy, both ends of the proxy may not be aware of the existence of the "middleman". For example, if we want to visit website A, we actually send a request to the proxy server, and the proxy server then initiates a request to website A, and finally returns the response body to us through the proxy server. From our perspective, we normally initiate a request to a website server, and the other party also returns the correct data to us. In this process, as a client, I will think that the proxy server is the server of website A, and the server of website A thinks that the proxy server is a real user. Here, it is said that the proxy server as a "middleman" can protect its own existence for secure access, but if we want to pass the client's IP to the server as a "rule-abiding" proxy server, we can tell the server the real client IP address through the custom Header X-Forwarded-IP. HTTP protocol is a loose protocol. When the server receives the request header X-Forwarded-IP, it cannot verify its authenticity. It may be forged by the proxy server or it may be real. Therefore, when the server obtains the IP from the HTTP header, it needs to be extra careful. Ordinary proxy is easy to understand, but it also has defects. It only applies to HTTP protocol. In ordinary proxy mode, all request response data are transparent and can be operated arbitrarily for the proxy "middleman", which will bring various data security risks. When it comes to network data security, the first thing that comes to mind is HTTPS, but the certificate authentication mechanism of HTTPS is the nemesis of middleman hijacking. Strictly speaking, there is no middleman attack under HTTPS, unless it is a human error, the certificate is not strictly verified, or the certificate is leaked. In ordinary HTTPS requests, the server does not verify the client's certificate, and the middleman can complete the TLS handshake with the server as a client. However, since the proxy middleman does not have a certificate key, it cannot forge the TLS connection between the server and the client, which will cause the request to fail. In this scenario, compared with the workflow of the packet capture tool, you will find that if you want to use Charles (or Fiddler) to capture HTTPS network data packets, you need to install a Charles CA certificate on the mobile phone and let the mobile device trust this certificate before you can complete the packet capture. At this time, the ordinary proxy mode is used. From another perspective, if the certificate provided by Charles is not installed on the mobile phone, it does not affect the request and response. Charles just cannot decrypt the HTTPS data. How is this done? This requires the use of a tunnel proxy .
2.4 Tunnel Proxy
Tunnel proxy , also known as Web tunnel, can send non-HTTP traffic through HTTP connection, so that data of other protocols can be piggybacked on HTTP. Tunnel proxy is established by using HTTP CONNECT method . CONNECT method was not originally the core specification of HTTP/1.1, but it is a widely used extension. It was not until the revised version of HTTP/1.1 released in 2014 that CONNECT and tunnel proxy were clearly described. What is the workflow of HTTP tunnel proxy? In a normal HTTP request, the Header part ends with two consecutive groups of CRLF (rn). If there is more content behind it, it is the content of the Content part, also known as the request/response body (Body). If there is Content content, Content-Length needs to be added to the Header to mark the length of the Content part. The receiver (server) will read the data according to this length. The request of CONNECT message does not have the Content part, but only the Request-Line and Header. They are only used by the proxy server and will not be passed to the terminal server. Once the Header part of the request ends (two consecutive groups of CRLF), all the following data are regarded as data that should be forwarded to the terminal server. The proxy needs to forward them directly without thinking, and does not allow access to the public data length until the TCP read channel from the client is closed. After the proxy server and the terminal server establish a connection, the CONNECT response message can return a 200Connectestablished status code to the client to indicate that the connection with the terminal server has been successfully established. Once the Header part of this 200Connectestablished ends (two consecutive groups of CRLF), all the following data are data returned by the remote server. Similarly, the proxy server will directly forward the terminal server's data to the client until the TCP read channel of the terminal server is closed.
After understanding the workflow of HTTP tunnel, you will know that the CONNECT method requests the tunnel network manager to create a TCP connection to any destination server and port, and blindly forwards the subsequent data between the client and the server . Through the tunnel proxy, the proxy server no longer acts as a middleman and no longer needs to rewrite the browser's request. Instead, it forwards the data between the browser and the terminal server as is, so that the browser can directly perform a TLS handshake with the terminal server and transmit encrypted data. 2.4 After importing the certificate, Charles captures the HTTPS process . Charles is a packet capture tool. When the certificate is not imported on the mobile phone, the tunnel proxy is used to ensure data transmission. Once the certificate is imported, Charles switches to the working mode of the ordinary proxy, and we can parse the HTTPS traffic data. Here is a brief explanation of the principle. After importing the certificate, the mobile phone will trust the certificate forged by Charles when requesting, and Charles will protect the secure access to establish a correct TLS connection between the real client and the server. At this time, Charles acts as a "middleman", and the TLS traffic on both ends can be decrypted. 3. Summary time
At this point, we have a clear understanding of the details of HTTP proxy. In fact, it is a very abstract concept, but it is also easy to understand. Simply put, HTTP proxy can be divided into two categories, ordinary proxy and tunnel proxy. Ordinary proxy exists as a "middleman". In a request, the client makes a plaintext request to the proxy server. After receiving the request, the proxy server makes a plaintext request to the terminal server. During this entire process, data is transmitted in plaintext, and the middleman can rewrite the data transmitted. This is the famous man-in-the-middle attack, which shows how unsafe it is. This leads to the tunnel proxy that supports HTTPS. At this time, the proxy server no longer acts as a middleman and cannot rewrite the client's request. Instead, it simply forwards the client's request to the terminal server through the established tunnel after the connection is established.
This article comes from online submissions and does not represent the analysis of kookeey. If you have any questions, please contact us