The File Transfer Protocol – FTP – is almost as old as the Internet itself. It was first published even before the current TCP/IP, back in the early 70s. The current standard, published in 1985, is the RFC 959. FTP is an application layer protocol in the OSI model. In 2021, most web browsers started ending their native support for this protocol. However, in the earlier past and even today, it’s easy to find FTP URLs embedded in many websites. In this tutorial, we’ll discuss how FTP operates, detailing the difference between active and passive operating modes.
The FTP was designed to allow easy file transfer and remote file management to a multivendor distributed environment. At that time, there were still a lot of non-compatible proprietary hardware and software architectures. For its simplicity, it has been one of the standards for non-assisted batch file transfer routines in regular datacenter operations worldwide. Accordingly, FTP has options to use different file encodings (binary, ASCII, and EBCDIC file), data transfer modes (stream, block, and compressed – quite limited), and operating modes (active, and passive). It allows even a client to command transfer between two different servers, or execute specific routines on the server. Options are only available in full-fledged FTP clients. The standard FTP URL has the following syntax:
FTP operates using multiple connections:
- Control Connection: the first to be open, by default, using TCP port 21, where the client sends commands to the server
- Data Connections: each data transfer, including directory listing, opens its own Data Connection, which is closed after the stream finishes.
So, the main difference between active and passive modes is what side will open, and what will listen, for data connections. In Active Mode, the server actively opens the data connections (by default, it uses TCP port 20 as its source) calling back the client. In contrast, in the Passive Mode, all connections are opened from the client to the server.
3. Active vs. Passive Modes Connection Flows
The passive mode was added later to the specification – almost at the same time that Internet host administrators understood the need of using network firewalls (check our firewall intro tutorial) and proper network segmentation. At that time, configuring the firewall to allow active mode operation callback connections was quite tricky. It requires full packet inspection using port triggering, a firewall rule that activates one port after valid traffic is detected on another, to allow the server to actually reach the client’s callback listening port. The Passive Mode simplifies this, not requiring that the client’s firewall opens an inbound port triggered by what is negotiated within the Control connection. So, with Passive Mode, only the firewall at the server-side needs to fully inspect the traffic to open the required ports for data transfer. By the way, one advantage of using HTTP or HTTPS for transferring files is just that: There’s no need for multiple data connections or firewall deep inspection tricks. The graph below shows the two FTP operating modes. The default on most clients (web browsers are the notable exception) is to use active mode by default:
4. What (or How) Should We Use?
Well, if we can, we should avoid FTP whatsoever. Why? First of all, FTP traffic is cleartext, which means that the user and password, and the data transfers themselves, are sent without any privacy concern. Anyone in-between the data flow could capture not only the files but also the credential needed to initiate new data transfers. For ensuring data and credential privacy, we should try crypto-enabled counterparts, like SFTP, SCP, or HTTPS. Or, at least, use it over a VPN or SSH tunnel. For bulk transfers, another sound option is RSYNC over SSH, VPN, or SSHFS. If we have too many small files, the FTP’s one data connection for each transfer design imposes some extra overhead that RSYNC would avoid. Aside from that, in case of connection losses, endpoints usually keep the partially transferred files. So, it’s up to us to check if the file transfer has been completed successfully. The more commonly used methods to do this are:
- sending files with trails, headers, or separated metadata files
- checking resulting filesizes
- checking files for hash signatures
- transferring files to temporary folders or filenames, and renaming only a knowingly successfully transfer
One easy way to add and check hash signatures of files is just compressing them as ZIP files, or using our preferred compression scheme (as long as it has file checking capabilities, of course). But, anyway, if we must use FTP, the passive mode is usually simpler to pass through firewalls and should be used by default. Nevertheless, in any case, the firewall should be aware that it is looking for FTP connections in order to apply the correct filtering mode. That’s especially true if we don’t use the standard TCP ports for the control and data connections.
In this tutorial, we’ve shown FTP’s active and passive operation modes. While FTP is slowly transitioning to the list of legacy Internet protocols, it’s still very much in use. Understanding its modes of operation can help to correctly choose the best data transfer architecture in most situations. FTP’s usefulness is more tangible when we have legacy computers and operating systems. It excels in heterogeneous environments, or when we want to automate the moving of data between servers due to the easiness of scripting the default command-line clients.