Major R language update brings big changes

R 4.0.0 brings numerous and significant changes to syntax, strings, reference counting, grid units, and more

Major R language update brings big changes
VRender / Hakkiarslan / Getty Images

Version 4.0.0 of the R language for statistical computing has been released, with changes to the syntax of the language as well as features pertaining to error-checking and long vectors.

The upgrade was published on April 24. Source code for R 4.0.0 is accessible at cran.r-project.org. A GNU project, R has gathered steam with the rise of data science and machine learning, currently ranking 10th in the Tiobe Index of language popularity and seventh in the PyPL Popularity of Programming Language index.

Changes and features introduced in R 4.0.0 include:

  • A new syntax is offered for specifying _raw_ character constants similar to the one used in C++, where r"..." can be used to define a literal string. This makes it easier to write strings containing backslashes or both single and double quotes.
  • The language now uses a stringAsFactors = FALSE default, and thus by default no longer converts strings to factors in calls to data.frame() and read.table(). Many packages relied on the previous behavior and will need updating.
  • The S3 generic function plot() now is in package base rather than package graphics; it is reasonable to have methods that do not use the graphics package. The generic currently is re-exported from the graphics namespace to allow packages importing it from there to keep working, but this could change in the future. Packages that define S4 graphics for plot() should be re-installed and package code using such generics from other packages must ensure they are imported rather than relying on being looked for on the search path.
  • S3 methods for class array now are dispatched for matrix objects.
  • Reference counting now is used instead of the NAMED mechanism for determining when objects can be safely mutated into base C code. This reduces the need to copy in some cases and should allow future optimizations. It also is expected to help make internal code easier to maintain.
  • assertError() and assertWarning() in package tools now can check for specifierror or warning classes via the new optional second argument classes.
  • DF2formula(), the utility for the data frame method formula(), now works without parsing and explicit evaluation.
  • Long vectors now are supported as the seq argument of a for() loop.
  • matrix() now converts character columns to factors and factors to integers.
  • skeleton() now explicitly lists all exports in the NAMESPACE file.
  • The internal implementation of grid units has changed. The only visible effects at the user level should be a slightly different print format for some units, faster performance for unit operations, and two new functions, unitType() and unit.psum().
  • Printing methods (..) now uses a new format() method.
  • Packages must be re-installed under the new version of R.
  • This version of R is built against the PCRE2 library for Perl-like regular expressions if available.
  • The beginnings of support for C++ 20.
  • Time needed to start a homogeneous PSOCK cluster on localhost with many nodes has been significantly reduced.
  • There also are a number of deprecations. For example, make macro F77_VISIBILITY has been removed and replaced with F_VISIBILITY; deprecated support for specifiying C++ 98 for package installation has been removed; and many defunct functions have been removed from the base and methods packages. 

Copyright © 2020 IDG Communications, Inc.