Code Style#
The legate source code “style” is heavily based on the Google style guide and is rigorously enforced through the use of
clang-format
and other automatic linting tools.
While seemingly trivial at first, a unified code style helps to significantly increase the readability of a code-base. If all code is written in a similar style and flow, then a reader may begin to form implicit assumptions (which should hold), and does not need to adapt to new forms. In short, the point of a common style is not to enforce one that is “pretty” or “elegant”. It is first and foremost to improve the understand-ability of the code.
However, there are some areas where automatic enforcement is not yet sufficient, or not yet possible. These are discussed below. In all other cases the following rules apply:
Some particular format is mandated by automated style-checkers.
Unless there are extenuating circumstances (such as a bug in the style-checker, or extreme loss in readability), the word of the checker is law. You should not attempt to override the checker via e.g.
// clang-format off
unless it has been specifically approved by a maintainer.No particular format is mandated by automated style-checkers.
The author should try to stay consistent with existing practice. More often than not, a similar construct exists somewhere else in the library, which the author should apply to their own situation. The point of this rule is to make future refactoring easier to do. Instead of needing to special-case future checkers for every bespoke solution (and potentially miss cases), they can be tailored to just a handful of cases.
Functions#
Functions should be short and sweet. They should do one thing, and do it very well. A general rule is that if a particular function exceeds 50 lines, or 3 levels of nesting then it should be broken up into smaller pieces.
Functions that are used only in the current translation unit should be moved inside anonymous namespaces.
Functions should not take in-out parameters if this can be avoided. If possible, if a function produces some effect, it should return it as a value.
Do not use raw pointer arguments for input parameters. Use either smart pointers, references, or, when necessary,
std::reference_wrapper
. Similarly, never usenullptr
to indicate lack of value for a pointer-like type. If a pointer argument is optional to the caller, then wrap it in astd::optional
instead. Legate assumes all pointers to be non-NULL.Functions taking out-parameters (if it cannot be avoided), should take them as a pointer, not by reference. Out-parameters must come last in the function definition. For example:
// GOOD void foo(const T& in, T value, T* out_1, T* out_2); // BAD: out-param taken by reference void foo(T& out); // BAD: out-param must come last void foo(T* out, const T& in); // BAD: all out-params must come last void foo(const T& in, T* out, T value, T* out_2);
This is to help readability at the call-site:
SomeType f; foo(f); // Does foo() modify f? Who knows? foo(&f); // OK, foo() potentially modifies f
Functions that only need to take a view of linear memory or containers (e.g. those taking a
const std::vector&
), should always take that view as alegate::Span<const T>
. For example:// GOOD void foo(Span<const int> values); // BAD void foo(const std::vector<int>& values);
Likewise, functions that take a
const std::string&
should instead take astd::string_view
. BothSpan
andstd::string_view
should always be taken by value.For functions in header files, declarations and definitions must always be separate. If the function is a template (or otherwise very very small), the definition should go in the corresponding
.inl
. If not, the definition should go in the.cc
. This also applies for any member functions of classes.This does not apply to translation unit-local functions defined in anonymous namespaces. These may be defined and declared in the same place.
Note
“Small” in this case refers to the code-gen, not the size of the source code itself. Anything that is reasonably expected to be completely optimized away is considered “small”. This is usually either:
Constructors, where everything is
std::move()
-ed (which usually end up compiling away to a bunch of pointer shuffling).Getters returning some kind of reference (which end up compiling away to just returning a pointer).
Things that are not considered “small” are:
Functions that throw exceptions.
Functions that allocate memory.
Functions that call other non-small functions.
To elaborate on the rationale for this rule: the goal of defining things in an
.inl
is to facilitate compiler optimizations. In the snippet below:struct Foo { int get_bar() const { return this->bar_; } int bar_; }; int foo(const Foo& f) { return f->get_bar(); }
The compiler should be able to see
get_bar()
, because with it, it sees that the code effectively reduces to:int foo(const Foo& f) { return f.bar_; }
If, however, after unwrapping the various functions, the compiler still has to emit an indirect function call (to a function defined in the
.cc
), then there is no point in having it be in the.inl
.In this case, it is better to have it defined in the
.cc
, where the compiler can inline the other function call, resulting in more efficient code overall.
Variables#
Declarations and code should be separated by a single empty line. Separating declarations and logic helps to improve readability of the code. For example:
// GOOD std::size_t SIZE = 10; std::vector<int> y; y.reserve(SIZE); std::size_t vec_size = y.size(); // BAD: no empty lines before or after declarations std::size_t SIZE = 10; std::vector<int> y; y.reserve(SIZE); std::size_t vec_size = y.size();
Use
auto
whenever you already name a type. For example, when usingstatic_cast()
(or the other casts), or when initializing variables. Additionally, useauto
when the resulting type would be a large template or other such construct whose type may detract from the readability of the code. For example:// GOOD auto* x = static_cast<int*>(y); auto object = get_complex_type_object(); auto sv = std::string_view{}; // BAD: we already know it will be an int* from the cast int* x = static_cast<int*>(x); // BAD: the type of the variable both is very complex, and needlessly pollutes the code std::unordered_map<std::unique_ptr<SomeType, CustomDeleter>, std::deque<Foo, CustomDeleter>> object = get_complex_type_object(); // BAD: we already know what type it is based on the constructor name std::string_view sv = std::string_view{}
When using
auto
with pointers, they should be matched withauto*
.When using
auto
with references, they should be matched withauto&
.If it is unclear whether the return value of a function returns by
const
reference, value, or r-value reference, the type should be matched withauto&&
.Declare variables as
const
whenever possible.Variables should be named readably. Unfortunately, there is no hard and fast rule to follow for this, and generally speaking it relies on a programmer’s good judgment, however a few hard rules are:
Never use Hungarian typing. The compiler knows what type it is. You also know what type it is, because the function is short enough to fit on your screen (as per the short functions rule). There is no need to encode this in the name of the variable as well.
Never use acronyms. You might know what “
dcv
” (dimension-less color vector) means now, but nobody else does, and neither will you in 4 week’s time.Use commonly understood names for common things. For example,
for
-loop indices should almost always bei
,j
, ork
. If something returns an iterator, it should probably be calledit
, not e.g.pos
,idx
, orfinder
.Don’t use verbose names.
iterator_into_vector
is no more informative than e.g.it
.temporary_vector
is no better (arguably, it is worse) than justtmp
.
Misc#
Use of
LEGATE_CHECK()
andLEGATE_ABORT()
:LEGATE_CHECK()
andLEGATE_ABORT()
will abort the program if the operand evaluates to false. As such they are to be used only when catching library mistakes. They are semantically equivalent to the standardabort()
macro, in that they enforce foundational pre-conditions or post-conditions which must never be violated in bug-free library code.For instance, they must never be used to check user-supplied values. In these cases, legate should throw an exception that can be caught and handled by the user. For example:
void foo(int dim) { if (dim < 0) { // The user has given us a bad value, so we should handle this by exception. throw an_exception{...}; } dim = detail::fixup_dim(dim); // The internal function returned a non-positive dim. This can only happen if // there exists a bug within legate itself, in which case the library is probably // in an inconsistent state and the program cannot continue. LEGATE_CHECK(dim > 0); }