Roger Linters

The package roger provides the tools to analyse the coding style and documentation of R scripts for the automated grading system Roger the Omni Grader.

The tools are actually linters that analyse the code and documentation and report whether or not they respect good coding practices.

This page lists the linters included in roger and provides for each one a description and usage examples.

In a Nutshell

Using the linters first involves parsing the script file with the function getSourceData. The resulting object is then passed to as many linters as you wish.

srcData <- getSourceData("script.R")
assignment_style(srcData)
commas_style(srcData)
...

All linters in roger return TRUE if the code respects good coding practices, or else FALSE and a message indicating the nature of the error and the faulty lines.

The utility function all_style eases the process by running all available style linters at once.

srcData <- getSourceData("script.R")
all_style(srcData)

Style Linters

`assignment_style`

Check that the left assign symbol (<-) is used to assign values to objects instead of the equation assign symbol (=).

Problematic code

x = 2
z = c(x = 42, y = 43)

Correct code

x <- 2
z <- c(x = 42, y = 43)

Rationale

The correct and safe way to assign values to objects in R is with the assignment operator <- (or, in rare instances, ->). Furthermore, <- can be used anywhere, whereas = is only allowed at the top level.

The second line of the problematic code shows the ambiguity that arises when using = for assignment: the first instance is indeed an assignment operator, whereas the other two are used to name arguments.

Just use <- for assignment. R aware editors provide shortcuts to type the symbol.

`close_brace_style`

Check that the closing braces are positioned according to standard bracing styles rules.

Problematic code

foo <- function(x, y) {
    if (x > 2)
        {   z <- 3
            y <- y + z}
    x^2 + y^3 + z^4
    }

Correct code

foo <- function(x, y) 
{
    if (x > 2)
    {
        z <- 3
        y <- y + z
    }
    x^2 + y^3 + z^4
}

Rationale

See open_brace_style.

`close_bracket_style`

Check that spacing around closing square brackets is valid.

Problematic code

z[1 ]
z[1,]

Correct code

z[1]
z[1, ]

Rationale

Closing brackets should not be immediately preceded by a space, unless that space is after a comma (as required by commas_style).

`close_parenthesis_style`

Check that spacing around closing parentheses is valid.

Problematic code

1 + (x + y )
for (i in 1:10 )

Correct code

1 + (x + y)
for (i in 1:10)

Rationale

Typeset parentheses in code as you would in prose. Case in point: a closing parenthesis should not be immediately preceded by a space.

`commas_style`

Check that commas are never preceded by a space and always followed by one, unless the comma ends the line.

Problematic code

z <- c(42,43)
z <- c(42 ,43)

Correct code

z <- c(42, 43)
z <- c(42,
       43)

Rationale

Spaces after commas allow code to breathe. You put spaces after commas — but never before — in prose, right? Well, just do the same in code. Spaces are cheap.

`comments_style`

Check that comment delimiters and the text of comments, when there is any, are separated by at least one space.

Problematic code

##comment
##---comment

Correct code

##
## comment
##--- comment

Rationale

Comments are just easier to read when set apart from their delimiters (one or many symbols #, possibly followed by other punctuation symbols). Spaces are cheap in comments too.

`left_parenthesis_style`

Check that spacing around left (or opening) parentheses is valid.

Problematic code

1 +(x + y)
1 + ( (x + y) + z)
for(i in 1:10)
sqrt (4)
z <- function (x) x^2

Correct code

1 + (x + y)
1 + ((x + y) + z)
for (i in 1:10)
sqrt(4)
z <- function(x) x^2

Rationale

Typeset parentheses in code as you would in prose. Case in point: a left parenthesis should always be preceded by a space, except in function calls or at the start of sub-expressions.

See open_parenthesis_style for rules regarding the space after an opening parenthesis.

`line_length_style`

Check that the length of code and comment lines does not exceed a given limit in number of characters.

Problematic code

## Some very long line of comment that should be split on multiple lines to avoid horizontal scrolling.
x <- c(x^2 + y^3 + z^4 + 3 * x * y - 6 * y * z^2 + x * z^3, x^3 + y^2 + z^3 + 3 * x * y)

Correct code

## Some very long line of comment that should be split on multiple lines
## to avoid horizontal scrolling.
x <- c(x^2 + y^3 + z^4 + 3 * x * y - 6 * y * z^2 + x * z^3, 
       x^3 + y^2 + z^3 + 3 * x * y)

Rationale

Limiting line length to 70-80 characters greatly improves readability. You have a large screen? Use the space to display windows side by side, not to write long lines of code or comments.

`nomagic_style`

Check the absence of magic numbers in code.

Problematic code

fooBar <- 2^32
runif(123)
x[3] * 7 + 2

Correct code

FOOBAR <- 2^32
SIZE <- 42
runif(SIZE)
BAR <- 7
x[3] * BAR + 2

Rationale

Magic numbers are unnamed or insufficiently documented numerical constants in code. Magic numbers make programs hard to read, understand and debug. For example, in the expression y <- x + 42, the constant 42 is a magic number.

A magic number should be assigned to an appropriately named variable. In roger, the value of a “simple” expression (see ?nomagic_style for details) to a variable all in uppercase is recognized as the assignment of a magic number. This explains why fooBar <- 2^32 is not valid, but FOOBAR <- 2^32is.

Furthermore, the common constants -1, 0, 1, 2 and 100, and numbers used as the only expression in indexing are not considered magic. In the expression x[3] * 7 + 2, only 7 is a magic number.

`open_brace_style`

Check that the opening braces are positioned according to standard bracing styles rules.

Problematic code

foo <- function(x, y) 
{   if (x > 2) 
      {   z <- 3
        y <- y + z
    }
    x^2 + y^3 + z^4
}

Correct code

foo <- function(x, y) 
{
    if (x > 2)
    {
        z <- 3
        y <- y + z
    }
    x^2 + y^3 + z^4
}

Rationale

Bracing and indent styles is a mined field subject to holy wars. Still, using a standard and consistent style remains important as it greatly improves readability of code.

roger supports two bracing styles dubbed “R” and “1TBS”. The R bracing style — also known as Allman, BSD or C++ style — has opening and closing braces on their own lines, left aligned with their corresponding statement. This is the style used in the examples.

The 1TBS style — also widely known as K&R style — has the opening brace on the same line as its corresponding statement, separated by a space. The closing brace appears on its own line, left aligned with the statement:

foo <- function(x, y) {
    if (x > 2) {
        z <- 3
        y <- y + z
    }
    x^2 + y^3 + z^4
}

`open_brace_unique_style`

Check that only one bracing style is used throughout the script.

Problematic code

foo <- function(x, y) {
    if (x > 2) {
        z <- 3
        y <- y + z
    }
    x^2 + y^3 + z^4
}

bar <- function(x)
{
    x^2
}

Correct code

foo <- function(x, y) 
{
    if (x > 2)
    {
        z <- 3
        y <- y + z
    }
    x^2 + y^3 + z^4
}

bar <- function(x)
{
    x^2
}

foo <- function(x, y) {
    if (x > 2) {
        z <- 3
        y <- y + z
    }
    x^2 + y^3 + z^4
}

bar <- function(x) {
    x^2
}

Rationale

Using a standard bracing and indent style is good, but being consistent throughout a script (or project) is even better. This linter checks that either the R or 1TBS is used in the script, but not both.

See open_brace_style for additional details.

`open_bracket_style`

Check that spacing around opening square brackets is valid.

Problematic code

z[ 1]
z[ , 1]

Correct code

z[1]
z[, 1]

Rationale

Similar to parentheses, opening brackets should not be immediately followed by a space.

`open_parenthesis_style`

Check that spacing around opening parentheses is valid.

Problematic code

1 + ( x + y)
for (  i in 1:10)

Correct code

1 + (x + y)
for (i in 1:10)

Rationale

Typeset parentheses in code as you would in prose. Case in point: an opening parenthesis should not be immediately followed by a space.

See also left_parenthesis_style for rules regarding the space in front of an opening parenthesis.

`ops_spaces_style`

Check that spacing around infix and unary operators is valid.

Problematic code

x+y
x<-y+ 3
x<- -2
2+!x

Correct code

x + y
x <- y + 3
x <- -2
2 + !x

Rationale

Spaces around operators are like spaces after commas, they allow the code to breathe and they improve readability.

Infix binary operators should have a space (or a line break) on both sides. As for unary operators, they should be immediately followed by their argument.

Note that the assignment operator <- is an infix binary operator.

`trailing_blank_lines_style`

Check that a script file does not contain trailing blank lines.

Problematic code

-----                <- start of file marker
x + y


-----                <- end of file marker

Correct code

-----
x + y
-----

Rationale

No, superfluous empty lines at the end of a script file do no harm. But they also make for an untidy file. Just get rid of them. Good text editors can automagically do it for you.

`trailing_whitespace_style`

Check that a script file does not contain whitespace at the end of lines.

Problematic code

x + y              | <- end of line marker

Correct code

x + y|

Rationale

Just like trailing blank lines, trailing whitespace have no real impact on code; they just make for untidy scripts. Remove unnecessary whitespace (space or tabulation) at the end of lines. Your text editors may automagically do it for you upon saving a file. Check this option and never look back.

`unneeded_concatenation_style`

Check that function c is used with more than one argument.

Problematic code

x <- c()
y <- c(42)

Correct code

x <- numeric(0)
y <- 42

Rationale

In R, the role of the function c is to combine objects. It does not make sense to combine nothing or a single object, right? Then never use c with no or only one argument.

If you mean to create an empty vector, use numeric(0), logical(0) or character(0), depending on the type needed.

Documentation linters

Check for proper documentation of a function in the comments of a script file, and if certain mandatory sections are present.

any_comments checks that the file contains non empty comments.

any_doc checks that the file contains at least some documentation.

signature_doc checks that the signature (or usage information) of every function is present in the documentation.

section_doc checks that the documentation contains a section title corresponding to a regular expression pattern for every (or as many) function definition in the file.

formals_doc checks that the description of every formal argument is present in the documentation.

Problematic code

foo <- function(x, y = 2)
    x + y

Correct code

###
### foo(x, y = 2)
###
##  Adding two vectors
##
##  Arguments
##
##  x: a vector
##  y: another vector
##
##  Value
##
##  Sum of the two vectors.
##
##  Examples
##
##  foo(1:5)
##
foo <- function(x, y = 2)
    x + y

Rationale

For R scripts that are not part of a package with a proper help page, every top-level function should be preceded by a block of documentation in comments. The documentation should normally at least contain:

the signature, or usage information, of the function (its name followed by all the arguments with their default values, if any);
a short description of the function (completing the sentence “This function allows to…”);
the list of all formal arguments with their meaning and admissible values, when pertinent;
the value returned by the function;
one or more examples of usage, depending on the complexity of the function.

This expected documentation format is not unlike R help pages.