Package Philosophy

The core advantage of rio is that it makes assumptions that the user is probably willing to make. Eight of these are important:

rio uses the file extension of a file name to determine what kind of file it is. This is the same logic used by Windows OS, for example, in determining what application is associated with a given file type. By removing the need to manually match a file type (which a beginner may not recognize) to a particular import or export function, rio allows almost all common data formats to be read with the same function. And if a file extension is incorrect, users can force a particular import method by specifying the format argument.

rio uses data.table::fread() for text-delimited files to automatically determine the file format regardless of the extension. So, a CSV that is actually tab-separated will still be correctly imported. It’s also crazy fast.

rio, wherever possible, does not import character strings as factors.

rio supports web-based imports natively, including from SSL (HTTPS) URLs, from shortened URLs, from URLs that lack proper extensions, and from (public) Google Documents Spreadsheets.

rio imports from from single-file .zip and .tar archives automatically, without the need to explicitly decompress them. Export to compressed directories is also supported.

rio wraps a variety of faster, more stream-lined I/O packages than those provided by base R or the foreign package. It uses data.table for delimited formats, haven for SAS, Stata, and SPSS files, smarter and faster fixed-width file import and export routines, and readxl and openxlsx for reading and writing Excel workbooks.

rio stores metadata from rich file formats (SPSS, Stata, etc.) in variable-level attributes in a consistent form regardless of file type or underlying import function. These attributes are identified as:

label: a description of variable
labels: a vector mapping numeric values to character strings those values represent
format: a character string describing the variable storage type in the original file

The gather_attrs() function makes it easy to move variable-level attributes to the data frame level (and spread_attrs() reverses that gathering process). These can be useful, especially, during file conversion to more easily modify attributes that are handled differently across file formats. As an example, the following idiom can be used to trim SPSS value labels to the 32-character maximum allowed by Stata:

dat <- gather_attrs(rio::import("data.sav"))
attr(dat, "labels") <- lapply(attributes(dat)$labels, function(x) {
    if (!is.null(x)) {
        names(x) <- substring(names(x), 1, 32)
    }
    x
})
export(spread_attrs(dat), "data.dta")

In addition, two functions (added in v0.5.5) provide easy ways to create character and factor variables from these “labels” attributes. characterize() converts a single variable or all variables in a data frame that have “labels” attributes into character vectors based on the mapping of values to value labels. factorize() does the same but returns factor variables. This can be especially helpful for converting these rich file formats into open formats (e.g., export(characterize(import("file.dta")), "file.csv").

rio imports and exports files based on an internal S3 class infrastructure. This means that other packages can contain extensions to rio by registering S3 methods. These methods should take the form .import.rio_X() and .export.rio_X(), where X is the file extension of a file type. An example is provided in the rio.db package.