Resources |
libnet http://www.perl.com/CPAN/modules/by-module/Net libnet FAQ http://www.pobox.com/~gbarr/libnet/ RFC 959 http://www.yahoo.com/Computers_and_Internet/Standards/RFCs/ Being An FTP Client Perl Cookbook, O'Reilly, Recipe 18.2 Net::FTP Perl In A Nutshell, O'Reilly Win32::Internet FTP Functions Perl In A Nutshell, O'Reilly |
This article is the result of my own personal adventures in maintaining a rapidly growing web site via FTP, without the benefit of a telnet shell on my server. If you have FTP access to your web server's file tree, there are four reasons why mirroring with FTP may be preferrable to HTTP:
This article will demonstrate how to recursively traverse an FTP site using the Net::FTP module bundled with Graham Barr's libnet distribution on the CPAN. For the pedantically inclined, further background information regarding the FTP protocol is available in RFC 959.
You may find yourself in the unenviable position of trying to maintain a remote file tree without shell access to the system where your file tree resides. Your file tree might contain a web site, an FTP site, or other data.
Many ISPs do not provide shell accounts, either for security reasons or because the host operating system has no concept of a remote login shell (such as on MacOS or Windows). If you take the login shell out of the equation and wish to automate the process of moving data between file trees on your local machine and your server, a scriptable client becomes a necessity. Fortunately, the Net::FTP module provides an implementation of the FTP protocol so that you can write FTP scripts in your favorite scripting language. Here are some off-the-shelf approaches to tackling this problem:
Each of these tools has its own strengths and weaknesses, and a corresponding place in your toolbox. As my web site has grown over the last couple of years, I have found myself moving individual files and directories with either command-line FTP, or one of the graphical clients mentioned above.
The cornerstone of the Perl philosophy is that "There's always more than one way to do it." I propose the following corollary: "but it's always more fun to do it your way". This article will show you how. Here is an amusing anecdote illustrating why I think it's more fun to write your own software:
An old friend of mine works for one of the big car companies, designing electric cars. One day he described the basic architecture of an electric car, saying "Well, you have some batteries, a motor, a transmission, some software..." I interrupted, "Hold it right there! I write software for a living, and believe me, I don't want ANY software in MY car . at least not any software that I haven't personally written and tested!"
When I stumbled across Net::FTP by accident one day, I began developing a small but effective mirroring program of my own. I had been avoiding the larger mirroring packages, since I find them to be too (how to say this delicately?) "feature-rich" for my taste.
If you have shell access, mirroring a file tree is trivial:
% cd ~/filetree % tar cvf - . | gzip > ../filetree.tar.gz
% cd .. % ftp someisp.net Conected to someisp.net 220 someisp.net FTPServer (Version wu-2.4.2) ready. Name (someisp.net:gerard): gerard 331 Password required for gerard. Password: 230 User gerard logged in. Remote system type is UNIX. ftp> cd /home/html/users/gerard 250 CWD command successful. ftp> bin 200 Type set to I. ftp> put filetree.tar.gz put filetree.tar.gz local: filetree.tar.gz remote: filetree.tar.gz 200 PORT command successful. 150 Opening BINARY mode data connection for filetree.tar.gz. 226 Transfer complete. 333546 bytes sent in 0.0175 secs (1.9e+04 Kbytes/sec) ftp> bye 221-You have transferred 333546 bytes in 1 files. 221-Total traffic for this session was 333977 bytes in 1 transfers. 221-Thank you for using the FTP service on lanois. 221 Goodbye.
% telnet someisp.net Trying 127.0.0.1... Connected to someisp.net. Escape character is '^]'.Red Hat Linux release 6.0 (Hedwig) Kernel 2.2.5-15 on an i686 login: gerard Password: Last login: Mon Oct 4 21:53:57 on tty1 %
% cd /home/html/users/gerard
% gunzip < filetree.tar.gz | tar xvf -
% exit Connection closed by foreign host. %
In the reverse direction:
% telnet someisp.net
% cd /home/html/users/gerard % tar cvf - . | gzip > filetreemirror.tar.gz
% exit Connection closed by foreign host. %
% cd ~ % mkdir filetreemirror % ftp someisp.net ... ftp> get filetreemirror.tar.gz ... ftp> bye ... %
% gunzip < filetreemirror.tar.gz | tar xvf -
For these two simple cases, an automated Perl client is probably overkill. But take the shell account out of the equation, and you'll find yourself engaging in some very long conversations with your FTP server.
Although the documentation for Net::FTP says that only a subset of RFC 959 is implemented, you will find that the implementation provided by Net::FTP is sufficiently robust for a wide variety of uses. The real power of Net::FTP stems from the power of the Perl programming language itself.
The Net::FTP module is contained in the libnet distribution, available from your favorite CPAN mirror in the directory modules/by-module/Net. The filename will be of the form libnet-X.YYYY.tar.gz. As of this writing, the most current version was 1.0607, dated a long time ago: 22-Aug-1998.
There is also a virtually identical FTP capability in the Win32::Internet extension module, although Net::FTP works equally well in both the Unix and Windows environments.
Here is a short example illustrating how to download a single file; I occasionally use this to download my web server. s access log. It is a simple example, but demonstrates all the major steps involved in scripting an FTP session with Net::FTP.
use Net::FTP;
$ftp = NET::FTP->new("someisp.net") or die "ERROR: Net::FTP->new failed\n";
$ftp->login("anonymous", "g_lanois@yahoo.com") or die "ERROR: login failed\n";
$ftp->cwd("/pub/outgoing/logs") or die "ERROR: cwd failed\n";
$ftp->get("access_log") or die "ERROR: get failed\n";
$ftp->quit;
Let's quickly review Perl's recursion capability. Recursion barely gets a mention in the perlsub documentation: "Subroutines may be called recursively." This just means that a subroutine can call itself.
Here is a short example which shows how useful this can be. The factorial of a number n is the product of all the integers between 1 and n. The factorial() subroutine below is recursive: it computes the factorial of $n as $n multiplied by factorial($n - 1).
sub factorial { my $n = shift; return ($n == 1) ? 1 : $n * factorial($n - 1); }
The conceptual model of a file tree is an example of what graph theoreticians call a directed acyclic graph. Recursion is the tool of choice when describing algorithms which traverse the nodes of a file tree.
On the local machine, if we wanted to crawl a file tree recursively, we would use the finddepth() subroutine from the File::Find module. (See Recipes 9.7 and 9.8 in the Perl Cookbook). However, there is no way to perform a finddepth() on a remote file tree via the FTP protocol.
Before we tackle the problem of mirroring a remote file tree, let's first develop the technology to crawl the tree. Our approach combines recursion with Net::FTP calls to perform a find()-like recursive traversal of the remote tree. Here is a snippet of pseudocode:
sub crawl_tree { Get a list of all directories and files in the current directory; for (each item in the list) { if (item is a directory) { Save the current FTP remote working directory; Change into the directory called "item"; crawl_tree(); Restore the remote working directory to what it was before; } } }
crawl_ftp (shown in Listing 1) is a Perl program which traverses a remote file tree, listing the directories and files it finds along the way.
I discovered several interesting issues when developing this script. Any script which uses Net::FTP needs to check for and handle these conditions:
• The columns in the listing are separated by whitespace. • The last column contains the file name. • Directory items in the listing begin with d.
The crawl_ftp program shown here produces a nicely-indented listing of the remote file tree.
It would be far more useful to generalize the crawl_tree() subroutine, using the same subroutine reference callback mechanism employed by File::Find's find() and finddepth(). The perlref documentation brushes lightly over the concept of subroutine references, mentioning it in detail only in the context of anonymous subroutines. In our case, it allows us to package our tree crawling technology into a Perl module.
The next listing gives a modified version, with crawl_tree() renamed to ftp_finddepth() and generalized through the use of a subroutine reference.
crawl_ftp2The first step is to create a module for the general purpose ftp_finddepth() technology we just developed. Then we can write a downloading application that uses the module to traverse the remote file tree's directory structure, transferring any files it finds along the way.
Writing an application to download a file tree is just a simple matter of writing a process_item() callback that mirrors the directory tree and retrieves files, depending on what ftp_finddepth() passed it.
If process_item() is called with a directory (as indicated by the $isdir parameter), we want to create a directory in the local filesystem. If process_item() is called with a file, we issue an FTP get() request to download the file.
Uploading a file is exactly the same as downloading, except you call the Net::FTP get() subroutine instead of put().
You would think that using File::Find's find() or finddepth() would be the way to iterate over the local file tree. There is one small problem with this approach: find() and finddepth() report the full path name of the local directories they find. We only want relative local path names of each directory, so that we can duplicate the relative file subtree on the remote system.
We can get by without a remote mkpath()-like capability on the remote system, since we can mirror the local directory to the remote site on the fly as we descend the local tree. We will keep track of our relative location in the local file tree by pushing each directory we descend into onto the back of a Perl array.
So, leaving File::Find's find() and finddepth() behind, we'll develop our own finddepth(). Longtime users of Perl might remember the old example program called down distributed with Perl 4. Our version, called finddepth_gl() (shown below) performs a similar function -- but more portably, since it doesn't involve invoking a Unix command via the Perl system function.
Beware that Net::FTP's mkdir() will return failure if the directory already exists.
The ability to automate FTP operations relieves a great deal of tedium from having to manually push and pull files to and from your remote file tree. This is particularly useful for periodic and repetitive tasks such as log file retrieval, or unattended updating of an otherwise static web site.
The mirroring applications given above only scratch the surface of what is possible, given a generalized and recursive FTP site traversal mechanism. This gives you the ability to grind over your entire remote file tree. In the case of a web site, this is particularly helpful for rooting out missing or orphaned files. Another application is to automatically check and fix the permissions on all the files in your remote tree. Do you remember the last time you had to do that by hand?
__END__