A little while back I posted How to Write C++ Classes, which discussed ways to separate the interface from the implementation when building C++ and Java classes. There is more which could be said, of course, and here is more (not to say everything that could be said :). A brief table of contents: These sections each illustrate that good languages like C++ and Java enable good programming style, but don't necessarily enforce it. Often a class may define "local types", types which pertain to the parameters and returns of the class methods. How and where should these be defined? Consider a class cA which provides an API for encapsulating access to something. This class in turn uses other classes internally for implementation, which are hidden from uses of the cA class. For example cB provides encapsulation of some internal object. Good so far - anyone can use cA and be blissfully unaware that cB is used "under the covers". Now say that cA provides a method to return a value, and that value is actually provided internally by cB. Let's say cB defines an enum type for the possible values. How and where should this type be defined?
The last possibility is strongly preferred. It preserves the separation between interface and implementation, and has no maintenance downside. The takeaway:
When designing class interfaces, it is helpful to impose "constant correctness". This doesn't mean the code is constantly correct and has no errors :) What it means is that the const attribute of types is used whenever appropriate in parameters and return values. This attribute tells the compiler "this thing may not be modified". The most common occurrence is with C strings and other buffers, normally typed as char*. If you have a string or buffer which should not be modified, you should tell the compiler by defining it as const char*. Here's an example:
This not only prevents poor coding (like modifying a caller's parameter) but it also prevents genuine errors. Consider the following:
This is a simple "glue" method which passes back a string value. Not only is the double cast in the "old bad way" ugly, but it has a bug! The inner cast (const char*) causes the CString object to return a pointer to its internal representation of the string. The outer cast (char*) causes the compiler to copy the constant string from inside of the CString object to a new temporary string, so it can be modifiable. This temporary string has local scope! So by passing back a pointer to it, you are passing back garbage to the caller. It might work, but it might not. And it might work for a while, until the heap is modified, and then stop working. Quite subtle and ugly. You could fix this in the old bay way with an indirect cast, like this:
This is so ugly you just know it has to be wrong (W=UH). This doesn't have the bug of creating a temporary string, but it causes the internal objects' data to be accessible and modifiable by the caller - not good. The const char* return type wins easily. Not only is it bug free and clean, but by defining the method with a const return type, you are telling the caller they cannot modify the value. And the compiler will enforce this, which is what you want. The takeaway:
One final thought about C++ classes. This is a bit heretical, but please bear with me. It is standard C++ dogma that one should never ever expose class data in the public interface. Instead, one should always provide get and put methods to access the data. Well, yeah, but... I claim there are times when it is better to make data members public. There are three cases where this is justifiable and efficient: The class is tiny and many objects are instantiated to form a larger structure. Examples include nodes in a tree, links in a chain, or entries in a table. In these cases the additional overhead of having "get" and "put" methods for each datum is not justifiable. It is far easier simply to access the members. Consider the following example, incrementing a counter in the left leaf of a tree node:
Just looking at this example, it is obvious which way is better. W=UH and all that. The class is used to encapsulate something inside another object. Typically such "internal" classes are not publicly used. The primary reason for the class is encapsulation and modularization. In such cases there are often a lot of data inside the inner object which are used by the outer object. Why impose the complexity and overhead of "get" and "put" methods? Separation of interface from storage isn't important, because both objects are likely to be updated and compiled together. The code will read better and execute faster if the outer object has direct access to the inner object's data. You will also spent less time maintaining "glue" between the classes. The class contains internal objects which are exposed in its interface. Consider a class cA which contains one or more instances of another class cB. Say that users of cA want to invoke methods on the cB objects inside it. There are two possibilities:
Again, just looking at the code gives you a strong clue which way is preferred. In the former case you end up with a bunch of "glue" in cA which passes through calls to cB. This glue adds noise but no value. And - if cB is changed in some way, in the pure way new glue would have to be added to cA to make the changes visible, whereas in the cool way the interface to cB is directly available. The takeaway:
There is even more which could be said, of course (and you know me - I'll probably end up saying it); if you have thoughts, comments, suggestions, etc. please let me know!
|
|