vignettes/v08-findingandformatting.Rmd
v08-findingandformatting.Rmd
This vignette explains how to find parts of a pivot table - either one or more data groups (i.e. row/column headings) or one or more cells in the body of the pivot table.
This is often useful to retrieve either a specific value/values, or to change the appearance of specific headings/cells - similar to the conditional formatting capabilities of many off-the-shelf tools.
Many of the examples in this vignette use the
setStyling()
method to format data groups or cells. This
method is described in the Styling
vignette.
A couple of the examples in this vignette use the
mapStyling()
method to format cells based on their value.
This method is also described in the Styling vignette.
This is a long vignette. Some readers may prefer to jump to the summary at the end for a more concise description of the functions and frequently used parameters.
The following pivot table is used as the basis of the examples in the rest of this vignette:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()
The findRowDataGroups()
and
findColumnDataGroups()
functions are used to find data
groups (i.e. row and/or column headings) that match specific criteria.
The functions return a list of data group objects.
These functions can operate in two different ways, specified by the
matchMode
argument.
matchMode="simple"
is used when matching only one
variable, e.g. TrainCategory=“Express Passenger”.
matchMode="combinations"
is used when matching for
combinations of variables, e.g. TrainCategory=“Express Passenger” and
PowerType=“DMU”, which would return the “DMU” data group underneath
“Express Passenger” (but not the “DMU” data group underneath “Ordinary
Passenger”). Examples of each follow below.
These functions also accept the following arguments:
variableNames
- a character vector specifying the
name/names of the variables to find1.variableValues
- a list specifying the variable names
and values to find.
totals
- a word that specifies how totals are matched
(overrides the finer settings above) - must be one of:
include
to match total and non-total data groups.exclude
to match only non-total data groups.only
to match only total data groups.calculationNames
- a character vector specifying the
name/names of the calculations to find.atLevels
- an integer or numeric vector specifying the
level numbers to restrict the search to.minChildCount
and maxChildCount
- an
integer or numeric number to restrict the search based on the number of
child groups.emptyGroups
- a character value specifying whether
empty groups are to be matched - must be one of:
include
to match empty and non-empty data groups.exclude
to match only non-empty data groups.only
to match only empty data groups.outlineGroups
- a character value specifying whether
outline groups are to be matched - must be one of:
include
to match outline and normal data groups.exclude
to match only normal data groups.only
to match only outline data groups.outlineLinkedGroupExists
- a logical value specifying
whether the group linked to the outline group either must exist
(TRUE
) or must not exist (FALSE
).includeDescendantGroups
- a logical value specifying
whether only the top-most matching data group for each row/column is
returned, or whether the descendant groups are also included.rowNumbers
- an integer vector specifying row numbers
to restrict the data groups that are matched (for
findRowDataGroups()
).columnNumbers
- an integer vector specifying column
numbers to restrict the data groups that are matched (for
findColumnDataGroups()
).cells
- a PivotCell
object or a list of
PivotCell
objects to restrict the data groups that are
matched.Several examples follow below. In each of the examples the data groups that have been found are highlighted in yellow by specifying a different style.
The examples in this section use column data groups, but all are equally applicable to row data groups.
matchMode="simple"
The following examples illustrate how the “simple” matching mode works. “simple” match mode is the default.
Find all of the data groups for the “TrainCategory” variable:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
groups <- pt$findColumnDataGroups(variableNames="TrainCategory")
pt$setStyling(groups=groups, declarations=list("background-color"="#FFFF00"))
pt$renderPivot()
Find all of the data groups for the “PowerType” variable with the values “DMU” and “HST”:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
groups <- pt$findColumnDataGroups(variableValues=list("PowerType"=c("DMU", "HST")))
pt$setStyling(groups=groups, declarations=list("background-color"="#FFFF00"))
pt$renderPivot()
Exclude totals:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
groups <- pt$findColumnDataGroups(variableNames="TrainCategory", totals="exclude")
pt$setStyling(groups=groups, declarations=list("background-color"="#FFFF00"))
pt$renderPivot()
Find only totals:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
groups <- pt$findColumnDataGroups(variableNames="TrainCategory", totals="only")
pt$setStyling(groups=groups, declarations=list("background-color"="#FFFF00"))
pt$renderPivot()
Find all of the data groups for the “TrainCategory” variable with the value “Ordinary Passenger”, including the descendant data groups:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
groups <- pt$findColumnDataGroups(
variableValues=list("TrainCategory"="Ordinary Passenger"),
includeDescendantGroup=TRUE)
pt$setStyling(groups=groups, declarations=list("background-color"="#FFFF00"))
pt$renderPivot()
To select the right-most/bottom total data group:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
groups <- pt$findColumnDataGroups(
variableValues=list("TrainCategory"="**"),
includeDescendantGroup=TRUE)
pt$setStyling(groups=groups, declarations=list("background-color"="#FFFF00"))
pt$renderPivot()
matchMode="combinations"
The following examples illustrate how the “combinations” matching mode works.
The key concept to understand here is that the filtering criteria
(i.e. the variableName(s) and variableValues) set for a data group also
apply to all descendant data groups. For example, in the example pivot
table above, the “DMU” under “Express Passenger” effectively means
WHERE ("TrainCategory"="Express Passenger") AND ("PowerType"="DMU")
.
Find all of the data groups that have filter criteria specified for both the “TrainCategory” and “Power Type” variables:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
groups <- pt$findColumnDataGroups(matchMode="combinations",
variableNames=c("TrainCategory", "PowerType"))
pt$setStyling(groups=groups, declarations=list("background-color"="#00FFFF"))
pt$renderPivot()
In the example above, the first row of headings relates only to the “TrainCategory” variable. The second row of headings relates both to the “PowerType” variable and the “TrainCategory” variable.
Find all of the data groups for the “PowerType” variable with the values “DMU” and “HST” for the “TrainCategory” of “Express Passenger”:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
groups <- pt$findColumnDataGroups(matchMode="combinations",
variableValues=list("TrainCategory"="Express Passenger", "PowerType"=c("DMU", "HST")))
pt$setStyling(groups=groups, declarations=list("background-color"="#00FFFF"))
pt$renderPivot()
In the above example, the highlighted “DMU” and “HST” data groups are subject to the “Express Passenger” filtering since they are underneath that data group.
The “combinations” match mode effectively AND’s the criteria together, i.e. the data groups must match both “TrainCategory”=“Express Passenger” AND “PowerType”=(“DMU” OR “HST”).
The “simple” match mode, by contrast, effectively OR’s the criteria together, i.e. the data groups must match either “TrainCategory”=“Express Passenger” OR “PowerType”=(“DMU” OR “HST”). Changing the match mode back to simple (but otherwise leaving the previous example unchanged):
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
groups <- pt$findColumnDataGroups(
variableValues=list("TrainCategory"="Express Passenger", "PowerType"=c("DMU", "HST")))
pt$setStyling(groups=groups, declarations=list("background-color"="#00FFFF"))
pt$renderPivot()
Another example - finding all of the “PowerType” groups under “Express Passenger”:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
groups <- pt$findColumnDataGroups(matchMode="combinations", variableNames="PowerType",
variableValues=list("TrainCategory"="Express Passenger"))
pt$setStyling(groups=groups, declarations=list("background-color"="#00FFFF"))
pt$renderPivot()
To select the sub-total data group under “Express Passenger”:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
groups <- pt$findColumnDataGroups(matchMode="combinations",
variableValues=list("TrainCategory"="Express Passenger", "PowerType"="**"))
pt$setStyling(groups=groups, declarations=list("background-color"="#00FFFF"))
pt$renderPivot()
The following example illustrates the use of the
rowNumbers
, columnNumbers
and
cells
arguments to restrict the data group search:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$addRowDataGroups("Status")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
# find the data groups that are related to column 2
grps <- pt$findColumnDataGroups(columnNumbers=2)
pt$setStyling(groups=grps, declarations=list("background-color"="yellow"))
# find the DMU data groups that are related to columns 5 to 8
grps <- pt$findColumnDataGroups(columnNumbers=5:8, variableValues=list("PowerType"="DMU"))
pt$setStyling(groups=grps, declarations=list("background-color"="orange"))
# find a totals data group that is related to the cell at (3, 7)
cells <- pt$getCell(3, 7)
pt$setStyling(cells=cells, declarations=list("background-color"="pink"))
grps <- pt$findColumnDataGroups(variableValues=list("PowerType"="**"), cells=cells)
pt$setStyling(groups=grps, declarations=list("background-color"="pink"))
# find the data groups at the second level in the hierarchy that are related to row 3
grps <- pt$findRowDataGroups(atLevel=2, rowNumbers=3)
pt$setStyling(groups=grps, declarations=list("background-color"="lightcyan"))
# find the data groups related to rows 5 to 10 with Status "A" or "R"
grps <- pt$findRowDataGroups(rowNumbers=5:10, variableValues=list("Status"=c("A", "R")))
pt$setStyling(groups=grps, declarations=list("background-color"="plum"))
# find the data groups related to row 11:13 or cells in the range (12, 4) to (15, 5)
cells <- pt$getCells(rowNumbers=12:15, columnNumbers=4:5, matchMode="combinations")
pt$setStyling(cells=cells, declarations=list("background-color"="palegreen"))
grps <- pt$findRowDataGroups(rowNumbers=11:13, cells=cells)
pt$setStyling(groups=grps, declarations=list("background-color"="palegreen"))
pt$renderPivot()
The getCells()
function is used to retrieve one or more
cells in the body of the pivot table using row/column numbers or
row/column data groups. This function has the following parameters:
specifyCellsAsList
- a logical value, usage and default
value described below.rowNumbers
- an integer vector specifying row
numbers.columnNumbers
- an integer vector specifying column
numbers.cellCoordinates
- a list of two-element vectors that
specify the coordinates of cells to retrieve. Ignored when
specifyCellsAsList=FALSE
.excludeEmptyCells
- a logical value that specifies
whether cells with no value should be excluded, default
TRUE
.groups
- a PivotDataGroup
object or a list
of PivotDataGroup
objects on either the rows or columns
axes. The cells to be retrieved must be related to at least one of these
groups.rowGroups
- a PivotDataGroup
object or a
list of PivotDataGroup
objects on the rows axis. The cells
to be retrieved must be related to at least one of these row
groups.columnGroups
- a PivotDataGroup
object or
a list of PivotDataGroup
objects on the columns axis. The
cells to be retrieved must be related to at least one of these column
groups.
rowGroups
and columnGroups
are
specified, then the cells to be retrieved must be related to at least
one of the specified row groups and one of the specified column
groups.matchMode
- either “simple” or “combinations”.
rowNumbers=1
and
columnNumbers=2
will match all cells in row 1 and all cells
in column 2.rowNumbers=1
and
columnNumbers=2
will match only the cell single at location
(1, 2).rowNumbers
, columnNumbers
,
rowGroups
and columnGroups
are affected by the
match mode. All other arguments are not.The row/column number arguments can be specified in two different
ways depending on the value of the specifyCellsAsList
argument - the default value is TRUE
.
The getCells()
function returns a list of cell
objects.
specifyCellsAsList=TRUE
To get cells when specifyCellsAsList=TRUE
(the default)
and matchMode="simple"
(the default):
rowNumbers
argument and leaving the
columnNumbers
argument set to the default value of NULL,
orcolumnNumbers
argument and leaving the
rowNumbers
argument set to the default value of NULL,
orcellCoordinates
argument as a list of vectors of length 2,
where each element in the list is the row and column number of one cell,
e.g. list(c(1, 2), c(3, 4))
specifies two cells, the first
located at row 1, column 2 and the second located at row 3, column
4.Retrieving the 2nd row, 4th column, 5th row-7th column cell and 4th row-8th column cell:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
cells <- pt$getCells(specifyCellsAsList=TRUE,
rowNumbers=2, columnNumbers=4,
cellCoordinates=list(c(5, 7), c(4, 8)))
pt$setStyling(cells=cells, declarations=list("background-color"="#FFCC66"))
pt$renderPivot()
specifyCellsAsList=FALSE
To get cells when specifyCellsAsList=FALSE
(not the
default) and matchMode="simple"
(the default):
rowNumbers
argument and leaving the
columnNumbers
argument set to the default value of NULL,
orcolumnNumbers
argument and leaving the
rowNumbers
argument set to the default value of NULL,
orrowNumbers
and columnNumbers
arguments, orNA
, e.g. to retrieve whole rows,
specify the row numbers as the rowNumbers
but set the
corresponding elements in the columnNumbers
vector to
NA
.Retrieving the 2nd row, 4th column, 5th row-7th column cell and 4th row-8th column cell:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
cells <- pt$getCells(specifyCellsAsList=FALSE, rowNumbers=c(2, NA, 5, 4), columnNumbers=c(NA, 4, 7, 8))
pt$setStyling(cells=cells, declarations=list("background-color"="#FFCC66"))
pt$renderPivot()
matchMode="combinations"
matchMode="combinations"
means that each cell must match
the row criteria AND the column criteria. (Contrast
this to matchMode="simple"
where each cell must match
either the row criteria OR the column criteria).
The example below illustrates the difference the
matchMode
makes:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
# "simple" match mode on rowNumbers=2 and columnNumbers=2:3
# result: all cells in both rows and columns are matched,
# i.e. each cell only needs to match one condition
# (row number = 2) OR (column number = 2 or 3)
cells <- pt$getCells(rowNumbers=2, columnNumbers=2:3, matchMode="simple")
pt$setStyling(cells=cells, declarations=list("background-color"="#FFCC66"))
# "combinations" match mode on rowNumbers=4:5 and columnNumbers=5:7
# result: does not match entire rows nor entire columns,
# i.e. each cell must match both conditions
# (row number = 4 or 5) AND (column number = 5, 6 or 7)
cells <- pt$getCells(rowNumbers=4:5, columnNumbers=5:7, matchMode="combinations")
pt$setStyling(cells=cells, declarations=list("background-color"="#FFCC66"))
pt$renderPivot()
The groups
, rowGroups
and
columnGroups
arguments provide a simple way to retrieve the
cells related to one or more data groups.
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$addRowDataGroups("Status")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
rgrps <- pt$leafRowGroups
cgrps <- pt$leafColumnGroups
# get the cells under the first two column data groups
cells <- pt$getCells(groups=cgrps[1:2])
pt$setStyling(cells=cells, declarations=list("background-color"="yellow"))
# get the cells related to the data groups in rows 2 and 4
cells <- pt$getCells(groups=list(rgrps[[2]], rgrps[[4]]))
pt$setStyling(cells=cells, declarations=list("background-color"="lightblue"))
# get the cells related to the data groups in columns 4 and 5
cells <- pt$getCells(columnGroups=cgrps[4:5])
pt$setStyling(cells=cells, declarations=list("background-color"="lightgreen"))
# get the cells related to the data groups in rows 6 and 8
cells <- pt$getCells(rowGroups=list(rgrps[[6]], rgrps[[8]]))
pt$setStyling(cells=cells, declarations=list("background-color"="pink"))
# get the cells related to the data groups in rows 11, 13 and 15, columns 7 and 8
cells <- pt$getCells(rowGroups=list(rgrps[[11]], rgrps[[13]], rgrps[[15]]),
columnGroups=list(cgrps[[7]], cgrps[[8]]),
matchMode="combinations")
pt$setStyling(cells=cells, declarations=list("background-color"="orange"))
pt$renderPivot()
The findCells()
function is used to search for cells
within the body of the pivot table matching one or more criteria. The
function returns a list of cell objects. This function has the following
parameters:
variableNames
- a character vector specifying the
variable name/names to find2.variableValues
- a list specifying the variable names
and values to find.
totals
- a word that specifies how totals are matched
(overrides the finer settings above) - must be one of:
include
to match total and non-total cells.exclude
to match only non-total cells.only
to match only total cells.calculationNames
- a character vector specifying the
name/names of the calculations to find.minValue
- a numerical value specifying a minimum value
threshold when matching cells.maxValue
- a numerical value specifying a maximum value
threshold when matching cells.exactValues
- a vector or list specifying a set of
allowed values when matching cells.valueRanges
- a vector specifying one or more value
range expressions to use when matching cells. The expression(s) can be
any of the following forms:
"v==12"
,
where v represents the cell value."10<=v<15"
.
"10<=v && v<15"
.
"is.na(v)"
.includeNA
- specify TRUE to include NA in the matched
cells, FALSE to exclude NA values.includeNull
- specify TRUE to include cells with no
value, FALSE to exclude cells with no value.emptyCells
- a word that specifies how empty cells are
matched - must be one of:
include
to match empty and non-empty cells.exclude
to match only non-empty cells.only
to match only empty cells.outlineCells
- a word that specifies how outline cells
are matched - must be one of:
include
to match outline and non-outline cells.exclude
to match only non-outline cells.only
to match only outline cells.If multiple variable names and values are specified, then
findCells()
searches for cells that match all of the
criteria - i.e. the equivalent of the combinations
match
method described above.
In addition, the following parameters can also be used with
pt$findCells()
:
rowNumbers
- an integer vector specifying row
numbers.columnNumbers
- an integer vector specifying column
numbers.cellCoordinates
- a list of two-element vectors that
specify the coordinates of cells to retrieve. Ignored when
specifyCellsAsList=FALSE
.groups
- a PivotDataGroup
object or a list
of PivotDataGroup
objects on either the rows or columns
axes. The cells to be retrieved must be related to at least one of these
groups.rowGroups
- a PivotDataGroup
object or a
list of PivotDataGroup
objects on the rows axis. The cells
to be retrieved must be related to at least one of these row
groups.columnGroups
- a PivotDataGroup
object or
a list of PivotDataGroup
objects on the columns axis. The
cells to be retrieved must be related to at least one of these column
groups.
rowGroups
and columnGroups
are
specified, then the cells to be retrieved must be related to at least
one of the specified row groups and one of the specified column
groups.cells
- a PivotCell
object or a list of
PivotCell
objects to restrict the cells that are
matched.rowColumnMatchMode
- either “simple” or “combinations”.
rowNumbers=1
and
columnNumbers=2
will match all cells in row 1 and all cells
in column 2.rowNumbers=1
and
columnNumbers=2
will match only the cell single at location
(1, 2).rowNumbers
, columnNumbers
,
rowGroups
and columnGroups
are affected by the
match mode. All other arguments are not.Broadly speaking, these additional parameters function identically to
the parameters of the same name in the pt$getCells()
method.
The following parameters control how the cells matching the criteria specified above are returned:
lowN
- an integer value specifying that only N cells
containing the lowest values should be returned, e.g.
lowN=1
will find the cell containing the
minimum value.lowN=5
will find the five cells containing
the lowest values, sorted into ascending order (lowest value
first).highN
- an integer value specifying that only N cells
containing the highest values should be returned, e.g.
highN=1
will find the cell containing the
maximum value.highN=5
will find the five cells containing
the highest values, sorted into descending order (highest value
first).Note that lowN
and highN
apply to all cell
values in the pivot table, e.g. including totals. Use the other
arguments described above to exclude the types of cell you are not
interested in, e.g. specifying totals="exclude"
to stop
totals being included.
Several examples of the above are given below.
Finding cells that reference the “PowerType” variable:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
cells <- pt$findCells(variableNames="PowerType")
pt$setStyling(cells=cells, declarations=list("background-color"="#00FF00"))
pt$renderPivot()
All of the cells above reference the “PowerType” variable. For the
findCells()
function, the variableNames
argument is only really used when a pivot table is constructed that has
a custom layout.
Finding cells that reference the “DMU” and “HST” values for the “PowerType” variable:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
cells <- pt$findCells(variableValues=list("PowerType"=c("DMU", "HST")))
pt$setStyling(cells=cells, declarations=list("background-color"="#00FF00"))
pt$renderPivot()
Finding cells that reference the “DMU” and “HST” values for the “PowerType” variable and reference the “London Midland” value for the “TOC” variable:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
cells <- pt$findCells(variableValues=list("PowerType"=c("DMU", "HST"), "TOC"="London Midland"))
pt$setStyling(cells=cells, declarations=list("background-color"="#00FF00"))
pt$renderPivot()
Finding only totals cells that reference the “PowerType” variable:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
cells <- pt$findCells(variableNames="PowerType", totals="only")
pt$setStyling(cells=cells, declarations=list("background-color"="#00FF00"))
pt$renderPivot()
In the example, probably more total cells have been matched than expected.
To explicitly match only the total columns for the “PowerType” variable, specify two asterixes as the variable value:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
cells <- pt$findCells(variableValues=list("PowerType"="**"))
pt$setStyling(cells=cells, declarations=list("background-color"="#00FF00"))
pt$renderPivot()
To explicitly match only the sub-total columns for the “PowerType” variable (i.e. excluding the far right TrainCategory total column), use the following:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
cells <- pt$findCells(variableValues=list("TrainCategory"="!*", "PowerType"="**"))
pt$setStyling(cells=cells, declarations=list("background-color"="#00FF00"))
pt$renderPivot()
To find the grand total cell:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
cells <- pt$findCells(variableValues=list("TrainCategory"="**", "PowerType"="**", "TOC"="**"))
pt$setStyling(cells=cells, declarations=list("background-color"="#00FF00"))
pt$renderPivot()
The findCells()
and getCells()
functions
can be used to help conditionally format the cells of a pivot table
based on the cell values (i.e. calculation values).
For example, to highlight in red those cells in the basic example pivot table that have a value between 30000 and 50000:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
cells <- pt$findCells(minValue=30000, maxValue=50000, includeNull=FALSE, includeNA=FALSE)
pt$setStyling(cells=cells, declarations=list("background-color"="#FFC7CE", "color"="#9C0006"))
pt$renderPivot()
Another example: analysing the average arrival delay (in minutes) in the morning peak time, for the top 20 origin stations, broken down by the hour of the day (i.e. 5am, 6am, 7am, etc.). The data for these examples is derived as follows:
# calculate arrival delay information
library(dplyr)
library(lubridate)
stations <- mutate(trainstations, CrsCodeChr=as.character(CrsCode))
topOrigins <- bhmtrains %>%
mutate(OriginChr=as.character(Origin)) %>%
filter(Origin != "BHM") %>%
group_by(OriginChr) %>%
summarise(TotalTrains = n()) %>%
ungroup() %>%
top_n(20, TotalTrains)
trains <- bhmtrains %>%
mutate(OriginChr=as.character(Origin), DestinationChr=as.character(Destination)) %>%
inner_join(topOrigins, by=c("OriginChr"="OriginChr")) %>%
inner_join(stations, by=c("OriginChr"="CrsCodeChr")) %>%
inner_join(stations, by=c("DestinationChr"="CrsCodeChr")) %>%
select(TOC, TrainCategory, PowerType, Origin=StationName.x,
GbttArrival, ActualArrival, GbttDeparture, ActualDeparture) %>%
mutate(GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival),
GbttHourOfDay=hour(GbttDateTime),
ArrivalDeltaMins=difftime(ActualArrival, GbttArrival, units="mins"),
ArrivalDelayMins=ifelse(ArrivalDeltaMins<0, 0, ArrivalDeltaMins)) %>%
filter(GbttHourOfDay %in% c(5, 6, 7, 8, 9, 10)) %>%
select(TOC, TrainCategory, PowerType, Origin, GbttHourOfDay, ArrivalDelayMins)
The first few rows in the trains
data frame look
like:
Assume we wish to format the average arrival delay as follows:
There are two approaches we can take - either use
mapStyling()
, or use findCells()
and
setStyling()
. mapStyling()
is described in the
Styling vignette.
Using mapStyling()
with mapType="range"
is
the more concise approach:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("GbttHourOfDay")
pt$addRowDataGroups("Origin")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()", visible=FALSE)
pt$defineCalculation(calculationName="TotalDelayMins",
summariseExpression="sum(ArrivalDelayMins, na.rm=TRUE)", visible=FALSE)
pt$defineCalculation(calculationName="AvgDelayMins", type="calculation",
basedOn=c("TotalDelayMins", "TotalTrains"),
calculationExpression="values$TotalDelayMins/values$TotalTrains",
format="%.1f")
pt$evaluatePivot()
# apply the background-color styling
pt$mapStyling(cells=pt$allCells, styleProperty="background-color",
valueType="color", mapType="range",
mappings=list(0, "#C6EFCE", 2, "#FFEB9C", 4, "#FFC7CE"))
# apply the color styling
pt$mapStyling(cells=pt$allCells, styleProperty="color",
valueType="color", mapType="range",
mappings=list(0, "#006100", 2, "#9C5700", 4, "#9C0006"))
pt$renderPivot()
Alternatively, using findCells()
and
setStyling()
:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("GbttHourOfDay")
pt$addRowDataGroups("Origin")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()", visible=FALSE)
pt$defineCalculation(calculationName="TotalDelayMins",
summariseExpression="sum(ArrivalDelayMins, na.rm=TRUE)", visible=FALSE)
pt$defineCalculation(calculationName="AvgDelayMins", type="calculation",
basedOn=c("TotalDelayMins", "TotalTrains"),
calculationExpression="values$TotalDelayMins/values$TotalTrains",
format="%.1f")
pt$evaluatePivot()
# apply the green style for an average arrival delay of between 0 and 2 minutes
cells <- pt$findCells(minValue=0, maxValue=2, includeNull=FALSE, includeNA=FALSE)
pt$setStyling(cells=cells, declarations=list("background-color"="#C6EFCE", "color"="#006100"))
# apply the yellow style for an average arrival delay of between 2 and 4 minutes
cells <- pt$findCells(minValue=2, maxValue=4, includeNull=FALSE, includeNA=FALSE)
pt$setStyling(cells=cells, declarations=list("background-color"="#FFEB9C", "color"="#9C5700"))
# apply the red style for an average arrival delay of 4 minutes or greater
cells <- pt$findCells(minValue=4, includeNull=FALSE, includeNA=FALSE)
pt$setStyling(cells=cells, declarations=list("background-color"="#FFC7CE", "color"="#9C0006"))
pt$renderPivot()
The examples above uses three sets of colours. It is also possible to apply these colours as a continuous colour gradient, e.g. based on the example above an average arrival delay of 3 minutes would be formatted using a mixed yellow-green colour. Again, this can be done following the same two approaches.
Using mapStyling()
with
mapType="continuous"
is much more concise:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("GbttHourOfDay")
pt$addRowDataGroups("Origin")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()", visible=FALSE)
pt$defineCalculation(calculationName="TotalDelayMins",
summariseExpression="sum(ArrivalDelayMins, na.rm=TRUE)", visible=FALSE)
pt$defineCalculation(calculationName="AvgDelayMins", type="calculation",
basedOn=c("TotalDelayMins", "TotalTrains"),
calculationExpression="values$TotalDelayMins/values$TotalTrains",
format="%.1f")
pt$evaluatePivot()
# apply the background-color styling
pt$mapStyling(cells=pt$allCells, styleProperty="background-color",
valueType="color", mapType="continuous",
mappings=list(0, "#C6EFCE", 2, "#FFEB9C", 4, "#FFC7CE"))
# apply the color styling
pt$mapStyling(cells=pt$allCells, styleProperty="color",
valueType="color", mapType="continuous",
mappings=list(0, "#006100", 2, "#9C5700", 4, "#9C0006"))
pt$renderPivot()
Using findCells()
and setStyling()
requires
some additional helper functions to be defined, to assist with
calculating a colour in a colour gradient:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("GbttHourOfDay")
pt$addRowDataGroups("Origin")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()", visible=FALSE)
pt$defineCalculation(calculationName="TotalDelayMins",
summariseExpression="sum(ArrivalDelayMins, na.rm=TRUE)", visible=FALSE)
pt$defineCalculation(calculationName="AvgDelayMins", type="calculation",
basedOn=c("TotalDelayMins", "TotalTrains"),
calculationExpression="values$TotalDelayMins/values$TotalTrains",
format="%.1f")
pt$evaluatePivot()
# colour gradient helper functions
scaleNumber <- function(n1, n2, vMin, vMax, value) {
if(n1==n2) return(n1)
v <- value
if(v < vMin) v <- vMin
if(v > vMax) v <- vMax
if(n1<n2) {
return(n1+((v-vMin)/(vMax-vMin)*(n2-n1)))
}
else {
return(n1-((v-vMin)/(vMax-vMin)*(n1-n2)))
}
}
scale2Colours <- function(clr1, clr2, vMin, vMax, value) {
r <- round(scaleNumber(clr1$r, clr2$r, vMin, vMax, value))
g <- round(scaleNumber(clr1$g, clr2$g, vMin, vMax, value))
b <- round(scaleNumber(clr1$b, clr2$b, vMin, vMax, value))
return(paste0("#",
format(as.hexmode(r), width=2),
format(as.hexmode(g), width=2),
format(as.hexmode(b), width=2)))
}
scale3Colours <- function(clr1, clr2, clr3, vMin, vMid, vMax, value) {
if(value <= vMid) return(scale2Colours(clr1, clr2, vMin, vMid, value))
else return(scale2Colours(clr2, clr3, vMid, vMax, value))
}
hexToClr <- function(hexclr) {
clr <- list()
clr$r <- strtoi(paste0("0x", substr(hexclr, 2, 3)))
clr$g <- strtoi(paste0("0x", substr(hexclr, 4, 5)))
clr$b <- strtoi(paste0("0x", substr(hexclr, 6, 7)))
return(clr)
}
# colour constants
textClrGreen <- hexToClr("#006100")
textClrYellow <- hexToClr("#9C5700")
textClrRed <- hexToClr("#9C0006")
backClrGreen <- hexToClr("#C6EFCE")
backClrYellow <- hexToClr("#FFEB9C")
backClrRed <- hexToClr("#FFC7CE")
# specify some conditional formatting, calculating the appropriate text colour
# and back colour for each cell.
cells <- pt$findCells(includeNull=FALSE, includeNA=FALSE)
formatCell <- function(cell) {
value <- cell$rawValue
textClr <- scale3Colours(textClrGreen, textClrYellow, textClrRed, 0.5, 2, 4, value)
backClr <- scale3Colours(backClrGreen, backClrYellow, backClrRed, 0.5, 2, 4, value)
pt$setStyling(cells=cell, declarations=list("background-color"=backClr, "color"=textClr))
}
invisible(lapply(cells, formatCell))
pt$renderPivot()
The lowN
and highN
arguments allow the
cells containing the minimum/maximum values or the top/bottom N values
to be easily found.
These arguments generally need to be combined with other arguments to find the cells you are interested in as shown in the examples below.
By default, the totals are included, so just specifying
highN=1
will probably locate the grand total cell:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
cells <- pt$findCells(highN=1)
pt$setStyling(cells=cells, declarations=list("background-color"="#00FF00"))
pt$renderPivot()
To exclude the grand total cell, specify
totals="exclude"
:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
cells <- pt$findCells(highN=1, totals="exclude")
pt$setStyling(cells=cells, declarations=list("background-color"="#00FF00"))
pt$renderPivot()
The variable names and values for the cell containing the maximum value can be retrieved as a list, e.g. using the example above:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
cells <- pt$findCells(highN=1, totals="exclude")
print(cells[[1]]$rowColFilters$filteredValues)
## $TOC
## [1] "London Midland"
##
## $TrainCategory
## [1] "Ordinary Passenger"
##
## $PowerType
## [1] "EMU"
Follow the same method as above but specify highN=3
:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
cells <- pt$findCells(highN=3, totals="exclude")
pt$setStyling(cells=cells, declarations=list("background-color"="#00FF00"))
pt$renderPivot()
To highlight the maximum value for a specific data group, then
specify the group value when calling pt$findCells()
,
e.g. to find the maximum value for “Express Passenger”:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
cells <- pt$findCells(highN=1,
variableValues=list("TrainCategory"="Express Passenger"),
totals="exclude")
pt$setStyling(cells=cells, declarations=list("background-color"="#00FF00"))
pt$renderPivot()
Combinations of values can be specified to select exactly where in the pivot table to find the maximum value, e.g. to find the maximum value for the “Ordinary Passenger” and “DMU” combination:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
cells <- pt$findCells(highN=1,
variableValues=list("TrainCategory"="Ordinary Passenger",
"PowerType"="DMU"),
totals="exclude")
pt$setStyling(cells=cells, declarations=list("background-color"="#00FF00"))
pt$renderPivot()
To highlight the maximum value for each column, call
pt$findCells()
specifying the column number:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
for(c in 1:pt$columnCount) {
cells <- pt$findCells(highN=1, columnNumbers=c, totals="exclude")
pt$setStyling(cells=cells, declarations=list("background-color"="#00FF00"))
}
pt$renderPivot()
The example below illustrates using additional criteria (row/column numbers and row/column data groups) to restrict the cells that are searched:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$addRowDataGroups("Status")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
# get the data groups on each axis
rgrps <- pt$leafRowGroups
cgrps <- pt$leafColumnGroups
# search the first two columns for cells that aren't empty
cells <- pt$findCells(groups=cgrps[1:2], emptyCells="exclude")
pt$setStyling(cells=cells, declarations=list("background-color"="yellow"))
# search the sixth and eighth rows for cells with values greater than 10
cells <- pt$findCells(groups=list(rgrps[[6]], rgrps[[8]]), minValue=10)
pt$setStyling(cells=cells, declarations=list("background-color"="lightgreen"))
# search the cells in rows 2 to 4 in columns 7 to 8 to find non-empty cells
cells <- pt$findCells(rowNumbers=2:4, columnGroups=cgrps[7:8],
emptyCells="exclude",
rowColumnMatchMode="combinations")
pt$setStyling(cells=cells, declarations=list("background-color"="pink"))
# search the cells in rows 11 to 17 in columns 4, 5 and 7
# highlight non-empty cells in this range with value 10000 or greater
rgrps <- rgrps[11:17]
cgrps <- list(cgrps[[4]], cgrps[[5]], cgrps[[7]])
cells <- pt$findCells(rowGroups=rgrps, columnGroups=cgrps,
rowColumnMatchMode="combinations")
pt$setStyling(cells=cells, declarations=list("background-color"="blanchedalmond"))
cells <- pt$findCells(rowGroups=rgrps, columnGroups=cgrps,
rowColumnMatchMode="combinations",
minValue=10000, emptyCells="exclude")
pt$setStyling(cells=cells, declarations=list("background-color"="orange"))
# render the pivot
pt$renderPivot()
findRowDataGroups()
and
findColumnDataGroups()
functions are used to find data
groups (i.e. row and/or column headings) that match specific criteria.
variableValues
is the most commonly used parameter and
specifies the variable values (matchMode="simple"
) or
combination of values (matchMode="combinations"
) to
find.pt$findColumnDataGroups(variableValues=list("PowerType"=c("DMU", "HST")))
returns all data groups where PowerType is either DMU or HST.getCells()
function is used to retrieve one or more
cells by row/column number in the body of the pivot table.
getCells()
returns a list of cell objects.pt$getCells(specifyCellsAsList=TRUE, rowNumbers=2, columnNumbers=4, cellCoordinates=list(c(5, 7)))
returns all cells in the second row, all cells in the fourth column and
the cell in the seventh column on the fifth row.findCells()
function is used to search for cells
within the body of the pivot table matching one or more criteria.
findCells()
returns a list of cell objects.variableValues
is a commonly used parameter and
specifies the variable values to find.minValue
, maxValue
and
exactValues
parameters.lowN
and highN
parameters can be used
to find the cell or cells with the minimum value(s) and maximum
value(s).calculationNames
parameter is used to only match those
cells relating to specific calculations.pt$findCells(variableValues=list("PowerType"=c("DMU", "HST")), minValue=500, maxValue=5000)
finds all cells where PowerType is either DMU or HST and the cell value
is between 500 and 5000 inclusive.