Code Style#

The legate source code “style” is heavily based on the Google style guide and is rigorously enforced through the use of clang-format and other automatic linting tools.

While seemingly trivial at first, a unified code style helps to significantly increase the readability of a code-base. If all code is written in a similar style and flow, then a reader may begin to form implicit assumptions (which should hold), and does not need to adapt to new forms. In short, the point of a common style is not to enforce one that is “pretty” or “elegant”. It is first and foremost to improve the understand-ability of the code.

However, there are some areas where automatic enforcement is not yet sufficient, or not yet possible. These are discussed below. In all other cases the following rules apply:

  1. Some particular format is mandated by automated style-checkers.

    Unless there are extenuating circumstances (such as a bug in the style-checker, or extreme loss in readability), the word of the checker is law. You should not attempt to override the checker via e.g. // clang-format off unless it has been specifically approved by a maintainer.

  2. No particular format is mandated by automated style-checkers.

    The author should try to stay consistent with existing practice. More often than not, a similar construct exists somewhere else in the library, which the author should apply to their own situation. The point of this rule is to make future refactoring easier to do. Instead of needing to special-case future checkers for every bespoke solution (and potentially miss cases), they can be tailored to just a handful of cases.


Functions#

  1. Functions should be short and sweet. They should do one thing, and do it very well. A general rule is that if a particular function exceeds 50 lines, or 3 levels of nesting then it should be broken up into smaller pieces.

  2. Functions that are used only in the current translation unit should be moved inside anonymous namespaces.

  3. Functions should not take in-out parameters if this can be avoided. If possible, if a function produces some effect, it should return it as a value.

  4. Do not use raw pointer arguments for input parameters. Use either smart pointers, references, or, when necessary, std::reference_wrapper. Similarly, never use nullptr to indicate lack of value for a pointer-like type. If a pointer argument is optional to the caller, then wrap it in a std::optional instead. Legate assumes all pointers to be non-NULL.

  5. Functions taking out-parameters (if it cannot be avoided), should take them as a pointer, not by reference. Out-parameters must come last in the function definition. For example:

    // GOOD
    void foo(const T& in, T value, T* out_1, T* out_2);
    
    // BAD: out-param taken by reference
    void foo(T& out);
    
    // BAD: out-param must come last
    void foo(T* out, const T& in);
    
    // BAD: all out-params must come last
    void foo(const T& in, T* out, T value, T* out_2);
    

    This is to help readability at the call-site:

    SomeType f;
    
    foo(f); // Does foo() modify f? Who knows?
    foo(&f); // OK, foo() potentially modifies f
    
  6. Functions that only need to take a view of linear memory or containers (e.g. those taking a const std::vector&), should always take that view as a legate::Span<const T>. For example:

    // GOOD
    void foo(Span<const int> values);
    
    // BAD
    void foo(const std::vector<int>& values);
    

    Likewise, functions that take a const std::string& should instead take a std::string_view. Both Span and std::string_view should always be taken by value.

  7. For functions in header files, declarations and definitions must always be separate. If the function is a template (or otherwise very very small), the definition should go in the corresponding .inl. If not, the definition should go in the .cc. This also applies for any member functions of classes.

    This does not apply to translation unit-local functions defined in anonymous namespaces. These may be defined and declared in the same place.

    Note

    “Small” in this case refers to the code-gen, not the size of the source code itself. Anything that is reasonably expected to be completely optimized away is considered “small”. This is usually either:

    • Constructors, where everything is std::move()-ed (which usually end up compiling away to a bunch of pointer shuffling).

    • Getters returning some kind of reference (which end up compiling away to just returning a pointer).

    Things that are not considered “small” are:

    • Functions that throw exceptions.

    • Functions that allocate memory.

    • Functions that call other non-small functions.

    To elaborate on the rationale for this rule: the goal of defining things in an .inl is to facilitate compiler optimizations. In the snippet below:

    struct Foo {
      int get_bar() const { return this->bar_; }
    
      int bar_;
    };
    
    int foo(const Foo& f)
    {
      return f->get_bar();
    }
    

    The compiler should be able to see get_bar(), because with it, it sees that the code effectively reduces to:

    int foo(const Foo& f)
    {
      return f.bar_;
    }
    

    If, however, after unwrapping the various functions, the compiler still has to emit an indirect function call (to a function defined in the .cc), then there is no point in having it be in the .inl.

    In this case, it is better to have it defined in the .cc, where the compiler can inline the other function call, resulting in more efficient code overall.

Variables#

  1. Declarations and code should be separated by a single empty line. Separating declarations and logic helps to improve readability of the code. For example:

    // GOOD
    std::size_t SIZE = 10;
    std::vector<int> y;
    
    y.reserve(SIZE);
    
    std::size_t vec_size = y.size();
    
    // BAD: no empty lines before or after declarations
    std::size_t SIZE = 10;
    std::vector<int> y;
    y.reserve(SIZE);
    std::size_t vec_size = y.size();
    
  2. Use auto whenever you already name a type. For example, when using static_cast() (or the other casts), or when initializing variables. Additionally, use auto when the resulting type would be a large template or other such construct whose type may detract from the readability of the code. For example:

    // GOOD
    auto* x = static_cast<int*>(y);
    auto object = get_complex_type_object();
    auto sv = std::string_view{};
    
    // BAD: we already know it will be an int* from the cast
    int* x = static_cast<int*>(x);
    
    // BAD: the type of the variable both is very complex, and needlessly pollutes the code
    std::unordered_map<std::unique_ptr<SomeType, CustomDeleter>, std::deque<Foo, CustomDeleter>> object
      = get_complex_type_object();
    
    // BAD: we already know what type it is based on the constructor name
    std::string_view sv = std::string_view{}
    
  3. When using auto with pointers, they should be matched with auto*.

  4. When using auto with references, they should be matched with auto&.

  5. If it is unclear whether the return value of a function returns by const reference, value, or r-value reference, the type should be matched with auto&&.

  6. Declare variables as const whenever possible.

  7. Variables should be named readably. Unfortunately, there is no hard and fast rule to follow for this, and generally speaking it relies on a programmer’s good judgment, however a few hard rules are:

    1. Never use Hungarian typing. The compiler knows what type it is. You also know what type it is, because the function is short enough to fit on your screen (as per the short functions rule). There is no need to encode this in the name of the variable as well.

    2. Never use acronyms. You might know what “dcv” (dimension-less color vector) means now, but nobody else does, and neither will you in 4 week’s time.

    3. Use commonly understood names for common things. For example, for-loop indices should almost always be i, j, or k. If something returns an iterator, it should probably be called it, not e.g. pos, idx, or finder.

    4. Don’t use verbose names. iterator_into_vector is no more informative than e.g. it. temporary_vector is no better (arguably, it is worse) than just tmp.

Misc#

  1. Use of LEGATE_CHECK() and LEGATE_ABORT():

    LEGATE_CHECK() and LEGATE_ABORT() will abort the program if the operand evaluates to false. As such they are to be used only when catching library mistakes. They are semantically equivalent to the standard abort() macro, in that they enforce foundational pre-conditions or post-conditions which must never be violated in bug-free library code.

    For instance, they must never be used to check user-supplied values. In these cases, legate should throw an exception that can be caught and handled by the user. For example:

    void foo(int dim)
    {
      if (dim < 0) {
        // The user has given us a bad value, so we should handle this by exception.
        throw an_exception{...};
      }
    
      dim = detail::fixup_dim(dim);
      // The internal function returned a non-positive dim. This can only happen if
      // there exists a bug within legate itself, in which case the library is probably
      // in an inconsistent state and the program cannot continue.
      LEGATE_CHECK(dim > 0);
    }