Tom Christiansen's object-oriented Perl tutorial (the perltoot documentation bundled with Perl) describes a way of making the data stored in an object inaccessible to the outside world using closures. It is a technique that deserves to be better known, so in this article I will elaborate on Tom's description and add a little background.
First the object-oriented background, and a warning. The three great object-oriented virtues might be correctness, maintainability, and re-usability. (I'll leave it to you to work out how these relate to the three great virtues of a programmer.) They don't necessarily include efficiency, and the techniques I'm going to describe are not the fastest. If that bothers you, don't use them.
Those of us who like to use objects and classes do so, in part, in order to manage the complexity of medium to large size programs by breaking them into pieces and restricting the way in which separate pieces interact with one another. An extra benefit of this is that other people can then take some of our pieces and use them in their programs, too. The basic idea is to organize a program as a set of objects that know various things about their own state and can perform certain actions in response to messages from other objects asking them to do so. There is no other way the object's state can be changed - in anthropomorphic terms, the object takes responsibility for organizing its own internal state, and doesn't allow any other object or function to change it directly.
The set of actions an object can perform - the set of messages it responds to - depends on the sort of object it is, so objects can be grouped into classes according to the actions they perform. A program can be designed by thinking about the sorts of things that it models. These sorts of things are then characterized by their behavior, and this leads to a specification of some classes. A particular program performs a task by creating some objects belonging to these classes, which then carries out the desired computation by exchanging messages that cause them to carry out actions.
The well-known object-oriented programming languages such as C++, Java, and Oberon implement objects as structures (or records) that hold the data values representing the object's state, and ensure that these values can only be altered by the functions - or methods, as they are more often called in the context of object-oriented programming - which implement the actions characteristic of the class to which the object belongs. The methods provide a well-defined interface to the object. The user of the object can only affect it by calling methods (sending it messages, if you prefer), never by directly altering the values stored in the object structure. This means that the user does not need to know how the values are stored, because only the external effect of the method matters. (In some languages, it is possible to write down a set of equations, or axioms, that define the effect of the methods purely in terms of each other, without reference to any internal representation at all. You can't do this in C++ or Java, though. Or in Perl.)
This data hiding has several advantages. It leaves the implementor of the class free to change the representation at any time. As long as the interface is left in place, and the methods provide the same external behavior (satisfy the same set of axioms) the user of the class will be none the wiser. Thus, if there is some compelling reason to change the representation of a class, the changes are entirely confined to the definition of the class itself. An example would be a class for storing dates, which provides methods for doing simple arithmetic on dates (adding ninety days, and so on). If a hapless programmer decided to store the year as two decimal digits, then when the truth finally dawned, the necessary changes would be confined to the date class and its methods, not scattered around every program that manipulated dates.
Another advantage concerns debugging. If a program misbehaves and preliminary investigations show that some object has an unexpected value that is causing the problem, you know that the only place you need to look is in the methods of the class to which the offending object belongs, because that is the only code that can alter the value. If any function at all is able to alter the values stored in an object, then all functions are potentially suspect (especially if you use global variables).
Perl has facilities for defining classes and creating objects belonging to them (see the online documentation or a good Perl book for more details), but, unlike the mainstream object-oriented languages it provides no special linguistic support for data hiding. Usually, Perl programmers rely on documentation, a bit of convention, and common sense.
Hence, we need a class TimeSetting, with methods hrs_up(), hrs_down(), mins_up(), mins_down(), and value(). We also need a constructor, which I'll call new(), to make TimeSetting objects. I'll give it an initial pair of integer values as arguments. (It's trivial to test whether these are supplied and set a default otherwise.) Having decided on those methods, I know how to use TimeSetting objects even though I haven't yet decided how to store their values, or how to implement the methods.
Conceptually, there are two sorts of methods: object methods and class methods. Object methods are called with an object as their first argument, and typically use the data stored in that particular object. To distinguish method calls from ordinary subroutine calls, a different syntax is employed. If $t is a TimeSetting object, then I would increment the hours component like this:
$t->hrs_up();
The method is sometimes said to be called through the object reference that is the value of $t. This reference will be passed to the method as its first (in this case, its only) argument. For constructors, this doesn't work, because there may be no object to call the constructor through. Instead, the class name is used, once again being passed as an implicit argument to the method. Although the same syntax can be used, it is customary to use a more readable alternative, called indirect object syntax, where the class name is written between the method name and any arguments, rather like the filehandle is specified with print() in print MYFILE "string". I could create a TimeSetting and assign its reference to $t like this:
my $t = new TimeSetting 23, 58;
The object to which $t refers is a TimeSetting holding two minutes to midnight.
Now consider the implementation. I'll store the pair of values in a hash, using HOURS and MINUTES as keys, so the hash looks like a structure, in a way that is probably familiar to you. All the methods live in a package called TimeSetting, stored in TimeSetting.pm, because that's how Perl classes work. The constructor initializes the hash (it should check its arguments are in range, but I have omitted the test to keep the code uncluttered) and returns a blessed reference to it, which will serve as the object. If you have never worked with Perl's object-oriented features before, this may be a little obscure, but it's quite simple. Blessing a reference marks the thing it refers to as belonging to a class. This means that methods can be called through the reference using the object-oriented method call syntax, and their names will be looked up in the class.
Apart from the constructor, the other methods are straightforward. Each shifts its first argument, which will be the reference through which the method was called; that is, the reference to the hash containing the object's data. This is then dereferenced, and the value-changing methods do some arithmetic on one of the values, while value() formats them with sprintf().
package TimeSetting; sub new { my $class = shift; my ($h, $m) = @_; return bless { HOURS => $h, MINUTES => $m }, $class; } sub hrs_up { my $this = shift; ++$this->{HOURS} < 24 or $this->{HOURS} = 0; } sub hrs_down { my $this = shift; --$this->{HOURS} >= 0 or $this->{HOURS} = 23; } sub mins_up { my $this = shift; ++$this->{MINUTES} < 60 or $this->{MINUTES} = 0; } sub mins_down { my $this = shift; --$this->{MINUTES} >= 0 or $this->{MINUTES} = 59; } sub value { my $this = shift; return sprintf("%02d:%02d", $this->{HOURS}, $this->{MINUTES}); } 1;
TimeSetting is certainly a class, and, provided you play by the rules, all the benefits I've advertised will follow from its use. For example, I could change the stored representation of the time to a number of minutes past midnight, without requiring any program that uses TimeSettings to be changed. Any weird values such as 199:88 can only be generated by the methods in this class, so, should such a value get thrown out during debugging, I would know where to look. (Remember, I’ve elided some range checking code from the constructor.) Or would I? Only, as I said, if everyone plays by the rules. I should have produced some documentation explaining how to use TimeSetting objects, and I could then reasonably expect people to use them correctly. But Perl cannot offer any guarantees: the blessed reference is still a reference to a hash, as well as being blessed, and it can still be dereferenced and indexed. This is legal, if unwise:
my $t = new TimeSetting 23, 58; $t->{HOURS} = 199; $t->{MINUTES} = 88;
Usually, in the Perl world, we say that anybody who does something like that either deserves anything they get or knows what they're doing and should not be constrained by repressive language rules. I have a lot of sympathy for this point of view, but I also have some sympathy for the notion that if a programming language can help us avoid mistakes, we should let it. Remember that object-oriented techniques are only really relevant in large programs, or when we are re-using software components. It is then conceivable that deep in a chain of subroutine calls, someone might mistake a reference to a TimeSetting for some other sort of reference, and use it in such a way as to violate the TimeSetting axioms. And then there is deliberate interference. In the Perl community, we usually dismiss as paranoia worries about people interfering with our code, but there are occasions when such worries are justified. The typical scenario is some programmer feeling that it is necessary for them to use an object in a way its designer didn't intend, but which still seems to be consistent with its semantics. For example, somebody might want to extract the hours from a TimeSetting. Even if they do this in a respectable way by deriving their own class and adding the new method there, their code will still break if a new version of the TimeSetting class is produced that uses the seconds-after-midnight representation. For a widely-distributed class, this could mean a maintenance nightmare for the original programmer.
For circumstances where it might matter if the representation of an object is accessed other than through its methods, there is a better way of storing data in objects in Perl. It is based on closures. Not everybody is familiar with closures, so before showing how they can protect data, I will briefly review their more conventional use.
sub { body }
Much like pointers to functions in C, references to subroutines effectively allow us to assign a subroutine to a scalar variable, store it in an array or hash, or pass it as an argument to or return it as the result of another subroutine. Perl's references to subroutines are more powerful than C's function pointers, because of the way free variables are treated. A free variable is one that is not local (where I use the word 'local' in its conventional, lexically scoped, sense, which has little to do with Perl's local() function). Since a reference to an anonymous subroutine can be created in any context in which an expression can be used, it may access the local variables of an enclosing block. Consider, for example, the following:
sub multiplier { my $x = shift; return sub { return $x * shift; }; }
The $x used inside the anonymous subroutine is the local variable $x of multiplier(). It continues to refer to this variable, even after the anonymous subroutine has been returned as the result, and even when it is subsequently used. If, for example, I call multiplier() like this:
$doubler = multiplier(2);
then $x will be initialized to 2 within multiplier(). Hence, the value returned by multiplier is a reference to a subroutine that multiplies its argument by 2 - a doubler, in fact, so that &$doubler(5) is 10. On the other hand,
$quadrupler = multiplier(4);
sets $quadrupler to a subroutine that multiplies its argument by four. These two subroutines do not interfere with each other. The free variable is bound dynamically at the time the anonymous subroutine is created, so each value returned by multiplier() has its own $x, set to the value of multiplier()'s argument.
If you are used to C's model of function calls, this may seem very strange - C functions can't access free variables, and the locals of a function cease to exist once control returns from the function. However, once you grant that it makes sense to refer to free variables (and a long tradition of programming languages sanctions the practice) then it also makes sense for a subroutine that is returned in this way to go on referring to them. Obviously, arranging that everything works properly is a slight headache for the language implementors, but it can be done. The value of an anonymous subroutine is a thing called a closure, which not only holds the code to be executed, but also contains the environment in which the closure was created - that is, the free variables.
Closures have many uses. In Perl, one of them is the topic of this article: protecting the data of objects from outside interference. The key observation is that the free variables of a closure created inside a subroutine are the locals of that subroutine, and these are out of scope, and therefore truly inaccessible, anywhere else. In the example above, the name $x cannot be used at all outside multiplier() to refer to the variable initialized at the beginning of that subroutine. There are only two ways to access the value of a subroutine's local variables from outside the subroutine: one is to create a reference to them and pass that out of the subroutine, which is inviting interference; the other is to pass out a closure which accesses them, which means that access is only provided through the code of that closure. And this is just what we want with objects: access to the data should only be possible through methods.
There is more than one way to organize the details, but the strategy is to store the object's data in local variables of the constructor, and use closures, created inside the constructor and called from the methods, to perform the actions of the class. To make the closures available for subsequent use, I'll store them in a hash, and bless a reference to that hash to return as the object. My constructor looks like this:
sub new { my $class = shift; my ($h, $m) = @_; my %methods; $methods{HRS_UP} = sub { ++$h < 24 or $h = 0 }; $methods{HRS_DOWN} = sub { - $h >= 0 or $h = 23 }; $methods{MINS_UP} = sub { ++$m < 60 or $m = 0 }; $methods{MINS_DOWN} = sub { - $m >= 0 or $m = 59 }; $methods{VALUE} = sub { return sprintf("%02d:%02d", $h, $m) }; return bless \%methods, $class; } and the methods now look very simple: sub hrs_up { my $this = shift; &{$this->{HRS_UP}}(); }
...and so on. This class can be used in exactly the same way as the previous implementation by programs that only use its methods to access objects. Programs that de-reference the object and change its data directly will no longer work.
The scheme is not bulletproof. It is certainly the case that the variables $h and $m can only be changed using the methods of this class, which is what I wanted. But like any object reference, one returned by this constructor can be de-referenced, which leads to some entertaining possibilities:
my $t = new TimeSetting 10, 30; my $hu = $t->{HRS_UP}; &$hu();
$t->value() now returns 11:30. We could call this a feature, like a with statement, but updating an object without the object is not really in the spirit of the game. At this point, I would be inclined to revert to the Perl establishment's approach, and say that anyone who does this wants their head examined, but it is possible to guard against it. The built-in function caller returns, among other things, the name of the package from which a subroutine call originated, so the closures could check this to see that they had only been called from TimeSetting. Unfortunately, this check can only be made at runtime.
It's also possible for someone to assign new closures to elements of the hash inside a TimeSetting object or add new elements. Again, we could call this a feature. Doing so would not compromise the object's data, but it would, for all intents and purposes, change its class dynamically. There are programming languages in which you can do this - JavaScript, for example. I can't think of a sensible reason to do so (I can't even think of a silly reason) but I am aware that some people program in a style radically different from mine, so I'm prepared to believe that it might be a useful feature. You can, however, prevent it by using a variation on the implementation just given. (It is essentially this variation that is described in the perltoot documentation, although there only get/set methods are being used, so the methods can be collapsed into a single piece of code.) Instead of building a separate closure for each method, just build one that selects a different branch to execute depending on a key passed as an argument, and return a blessed reference to the closure as the object.
sub new { my $class = shift; my ($h, $m) = @_; my $methods = sub { my $key = shift; if ($key eq HRS_UP) { ++$h < 24 or $h = 0 } elsif ($key eq HRS_DOWN) { - $h >= 0 or $h = 23 } elsif ($key eq MINS_UP) { ++$m < 60 or $m = 0 } elsif ($key eq MINS_DOWN) { - $m >= 0 or $m = 59 } elsif ($key eq VALUE) { return sprintf("%02d:%02d", $h, $m) } }; return bless $methods, $class; } sub hrs_up { my $this = shift; &$this(HRS_UP); }
...and so on.
package NetResource; sub new { my $class = shift; my ($host, $path) = @_; my %methods; $methods{GET_HOST} = sub { return $host }; $methods{GET_PATH} = sub { return $path }; return bless \%methods, $class; } sub get_host { my $this = shift; &{$this->{GET_HOST}}(); } sub get_path { my $this = shift; &{$this->{GET_PATH}}(); } 1;
I won't let you change the data once a NetResource object has been constructed, so there are no methods to assign new values to $host and $path.
A HTTPResource class can be derived, which adds a connect() method:
package HTTPResource; use NetResource; @ISA = qw(NetResource); sub new { my $class = shift; my $this = SUPER::new $class @_; $this->{CONNECT} = sub { # make the connection using $this->get_host() and # $this->get_path() to find the components of the URL }; return $this; } sub connect { my $this = shift; &{$this->{CONNECT}}(); } 1;
Here, you gain nothing from the use of closures, because there is no private data in the derived class. For an FTP resource, we might want to store a username and password, which makes the FTPResource class more interesting:
package FTPResource; use NetResource; @ISA = qw(NetResource); sub new { my $class = shift; my $this = SUPER::new $class @_; my ($user_id, $password) = ($_[2], $_[3]); $this->{CONNECT} = sub { # make the connection using $this->get_host() and # $this->get_path() to find the components of the URL # and $user_id and $password for identification }; return $this; } sub connect { my $this = shift; &{$this->{CONNECT}}(); } 1;
I can use these classes in this manner:
my $web_page = new HTTPResource vwww.macavon.demon.co.uk', 'index.html'; # use methods from the base class print "host: ", $web_page->get_host(), "\n"; print "path: ", $web_page->get_path(), "\n"; # use a method from the derived class $web_page->connect(); my $cpan = new FTPResource 'ftp.demon.co.uk', 'pub/mirrors/perl/CPAN', 'Groucho', 'swordfish'; # use a method from the other derived class $cpan->connect();
You can even do something very C++-like, and move the connect() method up into the base class, like this:
package NetResource; sub new { my $class = shift; my ($host, $path) = @_; my %methods; # $methods{GET_HOST}, $methods{GET_PATH} # as before $methods{CONNECT} = sub { die "Can't connect to an abstract net resource\n"; }; return bless \%methods, $class; } sub connect { my $this = shift; &{$this->{CONNECT}}(); } # Other methods as before 1;
The derived classes no longer need to define a connect() method - calls to it through HTTPResource and FTPResource objects will find NetResource::connect() via @ISA - but they must still assign their closures to the CONNECT element of %methods. This saves you a little typing, but doesn't really gain you much else, in the absence of statically typed objects in Perl. (If you go along with the idea that Perl is a post-modern language, you could take a leaf out of the architects' book, and say that you are making an 'ironic reference' to pure virtual functions in C++.)
One thing that may seem problematical to a conventional object-oriented Perl programmer is that methods in derived classes cannot access the data in the base class. This is deliberate. There is little point in protecting this data if all that a programmer needs to do to gain access to it is derive a new class. Derived classes should only be able to access data in their base class through its methods, just like any other code. This means that more thought has to go into the design of base classes, but object-oriented design is not easy if it is to be done properly.
The more robust variation of the data hiding scheme, using a closure as the object itself, is less amenable to inheritance, since the closure returned by new() cannot be extended. Instead, an object would have to store a reference to its parent closure, and pass on method calls it could not handle to that. This begins to duplicate the built-in inheritance mechanism @ISA. For this reason, despite its potential for oddness, I prefer the original implementation.
Note: Although the perltoot documentation is the immediate source of the data hiding scheme described in this article, it is not the first example of traditional scope rules being applied in a similar way. In particular, Malcolm Atkinson and Ron Morrison have described using the 'first class' functions and lexical scoping of their programming language PS-Algol to provide an effective separation of a class interface from its implementation.
__END__
The theory, proposed by the Chapmanian scholar Hauptmann, that Perl: The Programmer's Companion is an elaborate forgery, written by an embittered Hebridean piper known as MacBean of Acharacle, is now widely discredited.