vignettes/vA2-appendix.Rmd
vA2-appendix.Rmd
PivotFilters
objectPivotFilterOverrides
objectThis vignette describes how to construct more advanced calculations, such as percentages of totals, cumulative sums, running differences, rolling averages, and ratios/multiples.
Before reading this vignette, it may be helpful to read the Calculations and Cell Context vignettes.
Many of the examples in this vignette leave working columns
(i.e. intermediate values not of interest) visible in the pivot tables.
This is done deliberately to aid understanding. These calculations can
be made invisible by specifying visible=FALSE
as an
additional parameter as part of
pt$defineCalculation(...)
.
A filter is simply the name of a variable and one or more allowed values of that variable (e.g. TrainCategory=“Express Passenger”).
Every cell in a pivot table has a set of associated filters that specify what data is included in the calculation of the value of that cell, as explained in the Cell Context vignette.
By changing these filters it is possible to perform different sorts of calculations.
The filters associated with each cell can be made visible in the
pivot table by specifying
pt$renderPivot(includeWorkingData=TRUE)
as shown in the
example below.
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot(includeWorkingData=TRUE)
There are two possible ways to change the filters that are used when calculating values.
PivotFilters
object
When defining a calculation, it is possible to specify a
PivotFilters
object as part of the calculation. These
additional filters are applied on top of the filters already related to
that cell, effectively further restricting the subset of data that is
covered by the calculation.
This approach has been supported since v0.3.0 of the
pivottabler
package. A weakness of this method is that it
can only be used to further restrict the data - it cannot be used to
refer to data outside the scope of the current cell.
As an example of this method, the “Weekend Trains” calculation below specifies an extra filter that means only rows where the WeekdayOrWeekend variable equals “Weekend” are included in the “Weekend Trains” count.
library(dplyr)
library(lubridate)
library(pivottabler)
# get the date of each train and whether that date is a weekday or weekend
trains <- bhmtrains %>%
mutate(GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival),
DayNumber=wday(GbttDateTime),
WeekdayOrWeekend=ifelse(DayNumber %in% c(1,7), "Weekend", "Weekday"))
# render the pivot table
pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
weekendFilter <- PivotFilters$new(pt, variableName="WeekdayOrWeekend", values="Weekend")
pt$defineCalculation(calculationName="WeekendTrains", summariseExpression="n()",
filters=weekendFilter, noDataValue=0,
caption="Weekend Trains")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()",
noDataValue=0, caption="Total Trains")
pt$defineCalculation(calculationName="WeekendTrainsPercentage",
type="calculation", basedOn=c("WeekendTrains", "TotalTrains"),
format="%.1f %%", caption="% at Weekend",
calculationExpression="ifelse(values$TotalTrains>0, values$WeekendTrains/values$TotalTrains*100, NA)")
pt$renderPivot()
PivotFilterOverrides
object
Newly added in v1.0.0 of the pivottabler
package is the
ability to specify a PivotFilterOverrides
object as part of
a calculation definition. This provides a more flexible and more
powerful way of controlling the filters used when calculating a cell
value. In particular, using a PivotFilterOverrides
object
it is possible to add new filters or change/remove existing filters -
this enables many new calculation possibilities. Many of the examples
below use this approach.
A few different options can be specified when creating a
PivotFilterOverrides
object:
removeAllFilters=TRUE|FALSE
specifies whether to clear
all existing filters for the calculation (default value is
FALSE
)keepOnlyFiltersFor=variableNames
specifies a character
vector of variable names to retain the filter criteria for. Filter
criteria for all other variables will be cleared.removeFiltersFor=variableNames
specifies a character
vector of variable names for which the filter criteria will be
cleared.overrideFunction=function
specifies a custom R function
which will be invoked for each cell to modify the filters before the
calculation is carried out.filter=filter, action="intersect"|"replace"|"union"
specifies a PivotFilter
object containing filter criteria
which will be combined with the current set of filters using the
specified combine method (explained below).variableName=variableName, type="ALL"|"VALUES"|"NONE", values=values, action="intersect"|"replace"|"union"
specifies a new filter condition which will be combined with the current
set of filters using the specified combine method (explained
below).When we talk about combining filters, we mean combining the lists of allowed values in each filter. Three different methods of combining filter criteria are supported:
The tables below provide some examples of filter overrides applied to hypothetical GbttDate and TOC variables:
Existing Criteria | Override Criteria | Combine | Resulting Criteria |
---|---|---|---|
Gbtt = 1st, 2nd, 3rd May 2018 | Gbtt = 2nd May 2018 | intersect | Gbtt = 2nd May 2018 |
Gbtt = 1st, 2nd, 3rd May 2018 | Gbtt = 2nd May 2018 | replace | Gbtt = 2nd May 2018 |
Gbtt = 1st, 2nd, 3rd May 2018 | Gbtt = 2nd May 2018 | union | Gbtt = 1st, 2nd, 3rd May 2018 |
Gbtt = 1st, 2nd, 3rd May 2018 | Gbtt = 9th May 2018 | intersect | NONE (i.e. no matching data) |
Gbtt = 1st, 2nd, 3rd May 2018 | Gbtt = 9th May 2018 | replace | Gbtt = 9th May 2018 |
Gbtt = 1st, 2nd, 3rd May 2018 | Gbtt = 9th May 2018 | union | Gbtt = 1st, 2nd, 3rd, 9th May 2018 |
Filters for different variables are combined completely independently. Where no filter criteria exists for a particular variable, then there is an implicit filter that allows ALL values, i.e. in the tables above and below, in the Existing Criteria column, there is an implicit “TOC = ALL”.
Existing Criteria | Override | Combine | Resulting Criteria |
---|---|---|---|
Gbtt = 1st, 2nd, 3rd May 2018 | TOC = c2c | intersect | Gbtt = 1st, 2nd, 3rd May 2018, TOC=c2c |
Gbtt = 1st, 2nd, 3rd May 2018 | TOC = c2c | replace | Gbtt = 1st, 2nd, 3rd May 2018, TOC=c2c |
Gbtt = 1st, 2nd, 3rd May 2018 | TOC = c2c | union | Gbtt = 1st, 2nd, 3rd May 2018, TOC=ALL |
The first example immediately below provides more explanation of the
approach of overriding the filters. The other examples have less
explanation, but the same approach of displaying the working data using
pt$renderPivot(includeWorkingData=TRUE)
can be used to
display helpful filter details with these examples too.
Objective:
To calculate the percentage of each train operating company’s (TOC’s) trains that are either of category express passenger or ordinary passenger:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()",
caption="Count")
filterOverrides <- PivotFilterOverrides$new(pt, keepOnlyFiltersFor="TOC")
pt$defineCalculation(calculationName="TOCTotalTrains", filters=filterOverrides,
summariseExpression="n()", caption="TOC Total")
pt$defineCalculation(calculationName="PercentageOfTOCTrains", type="calculation",
basedOn=c("CountTrains", "TOCTotalTrains"),
calculationExpression="values$CountTrains/values$TOCTotalTrains*100",
format="%.1f %%", caption="% of TOC")
pt$renderPivot()
In the example above, the “TOC Total” value is the same everywhere in each row. This is because the filter criteria has been overriden to remove the “TrainCategory” filter for this calculation. This can be seen by displaying the context:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()",
caption="Count")
filterOverrides <- PivotFilterOverrides$new(pt, keepOnlyFiltersFor="TOC")
pt$defineCalculation(calculationName="TOCTotalTrains", filters=filterOverrides,
summariseExpression="n()", caption="TOC Total")
pt$renderPivot(includeWorkingData=TRUE)
Objective:
The approach is similar to the previous example, except the filter override removes all of the existing filters:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()", caption="Count")
filterOverrides <- PivotFilterOverrides$new(pt, removeAllFilters=TRUE)
pt$defineCalculation(calculationName="GrandTotalTrains", filters=filterOverrides,
summariseExpression="n()", caption="Grand Total")
pt$defineCalculation(calculationName="PercentageOfAllTrains", type="calculation",
basedOn=c("CountTrains", "GrandTotalTrains"),
calculationExpression="values$CountTrains/values$GrandTotalTrains*100",
format="%.1f %%", caption="% of All")
pt$renderPivot()
Objective:
We wish to find the multiples of Cross Country Express Passenger:
A “CrossCountryExpress” calculation is defined with a set of filters on TrainCategory=“Express Passenger” and TOC=“CrossCountry” that replace the existing filters on these variables:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()", caption="Count")
filterOverrides <- PivotFilterOverrides$new(pt, removeAllFilters=TRUE)
filterOverrides$add(variableName="TrainCategory", values="Express Passenger", action="replace")
filterOverrides$add(variableName="TOC", values="CrossCountry", action="replace")
pt$defineCalculation(calculationName="CrossCountryExpress", filters=filterOverrides,
summariseExpression="n()", caption="CrossCountry Express Trains")
pt$defineCalculation(calculationName="MultipleOfCCExpressTrains", type="calculation",
basedOn=c("CountTrains", "CrossCountryExpress"),
calculationExpression="values$CountTrains/values$CrossCountryExpress",
format="%.2f", caption="Multiple of CC Express")
pt$renderPivot()
Objective:
We wish to find the percentage of trains with PowerType=“DMU”.
A “CountDMU” calculation is defined that applies an additional filter on PowerType=“DMU”, i.e. that is combined using action=intersect:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
filterDMU <- PivotFilter$new(pt, variableName="PowerType", values="DMU")
filterOverrides <- PivotFilterOverrides$new(pt, filter=filterDMU, action="intersect")
pt$defineCalculation(calculationName="CountDMU", filters=filterOverrides,
summariseExpression="n()", caption="DMU")
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()", caption="Count")
pt$defineCalculation(calculationName="PercentageDMU", type="calculation",
basedOn=c("CountTrains", "CountDMU"),
calculationExpression="values$CountDMU/values$CountTrains*100",
format="%.1f %%", caption="% DMU")
pt$renderPivot()
While the above examples have practical use, the following couple of examples are more for illustrative purposes to show what happens when the filter overrides are specified using the same variable as in either the row or column headings. Again, these examples follow the rules described in the table above.
In the following examples, the TOC variable makes up the row headings
and a filter override is specified where
TOC="London Midland"
. This filter override is combined in
three different ways with the existing filters from the row
headings.
intersect:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
filterTOC <- PivotFilter$new(pt, variableName="TOC", values="London Midland")
filterOverrides <- PivotFilterOverrides$new(pt, filter=filterTOC, action="intersect")
pt$defineCalculation(calculationName="CountLM", filters=filterOverrides,
summariseExpression="n()", caption="Test Count")
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()", caption="Count")
pt$renderPivot()
In the above example the filter override in the “Test Count”
calculation intersects the TOC criteria in every row with
TOC="London Midland"
. The intersection of
TOC="London Midland"
with anything other than
TOC="London Midland"
results in an empty set, so only the
“London Midland” row has any matching data.
Another way of looking at this is to say that it is impossible for any row in the source data frame to simultaneously have two different values for TOC, e.g. in a single data frame row, TOC cannot equal both “Arriva Trains Wales” and “London Midland”. Therefore, the only row where the “Test Count” calculation has a value is the “London Midland” row.
replace:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
filterTOC <- PivotFilter$new(pt, variableName="TOC", values="London Midland")
filterOverrides <- PivotFilterOverrides$new(pt, filter=filterTOC, action="replace")
pt$defineCalculation(calculationName="CountLM", filters=filterOverrides,
summariseExpression="n()", caption="Test Count")
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()", caption="Count")
pt$renderPivot()
In the above example the filter override in the “Test Count”
calculation replaces the TOC criteria in every row with
TOC="London Midland"
.
union:
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
filterTOC <- PivotFilter$new(pt, variableName="TOC", values="London Midland")
filterOverrides <- PivotFilterOverrides$new(pt, filter=filterTOC, action="union")
pt$defineCalculation(calculationName="CountLM", filters=filterOverrides,
summariseExpression="n()", caption="Test Count")
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()", caption="Count")
pt$renderPivot()
In the above example the filter override in the “Test Count”
calculation unions the TOC criteria in every row with
TOC="London Midland"
. So the London Midland count is
included in every row for “Test Count”, e.g. the value of “Test Count”
for “Arriva Trains Wales” includes the values for both “Arriva Trains
Wales” and “London Midland”, i.e. 3079 + 14487 = 17566.
The above examples provide different basic ways of modifying filters. For relative / more dynamic filters, a custom filter override function can be used. The package will invoke this function once per cell. Within the function it is possible to examine the existing filters and selectively change them on a cell by cell basis.
When invoked the custom function is provided with a reference to the existing filters and a reference to the current cell. In the majority of cases, working with the filters only will be sufficient. The reference to the cell is provided only for those rare cases where additional information is needed about the cell (e.g. to determine the exact cell location, whether it is a total cell, etc). No changes should be made to the cell, only to the filters.
Objective:
We wish to show the change in the number of trains compared to the previous day for the first seven days of 2017.
A custom filter override function is used to change the filter from the current date (i.e. the date in the current row) to the previous date:
library(dplyr)
trains <- bhmtrains %>%
mutate(GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival),
GbttDate=as.Date(GbttDateTime))
januaryDates <- seq(as.Date("2017-01-01"), as.Date("2017-01-07"), by="days")
# comparison to yesterday
# date filter function to return yesterday
getYesterdayDateFilter <- function(pt, filters, cell) {
# get the date filter
filter <- filters$getFilter("GbttDate")
if(is.null(filter)||(filter$type=="ALL")||(length(filter$values)>1)) {
# there is no filter on GbttDate in this cell
# i.e. we are in one of the total cells that covers all dates,
# so the concept of yesterday has no meaning, so block all dates
newFilter <- PivotFilter$new(pt, variableName="GbttDate", type="NONE")
filters$setFilter(newFilter, action="replace")
}
else {
# get the date value and subtract one day
date <- filter$values
date <- date - 1
filter$values <- date
}
}
# build the pivot
library(pivottabler)
pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("GbttDate", fromData=FALSE,
explicitListOfValues=as.list(januaryDates), visualTotals=TRUE)
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()",
caption="Current Day Count")
filterOverrides <- PivotFilterOverrides$new(pt, overrideFunction=getYesterdayDateFilter)
pt$defineCalculation(calculationName="CountPreviousDayTrains", filters=filterOverrides,
summariseExpression="n()", caption="Previous Day Count")
pt$defineCalculation(calculationName="Daily Change", type="calculation",
basedOn=c("CountTrains", "CountPreviousDayTrains"),
calculationExpression="values$CountTrains-values$CountPreviousDayTrains",
caption="Daily Change")
pt$renderPivot()
The above example uses visualTotals=TRUE
, since the data
frame contains other dates outside of the range 1st to 7th January. If
visual totals is not enabled, then the column totals include data for
all dates, i.e. the column totals in the pivot table would be greater
than the sum of the values in the pivot table column.
Objective:
We wish to show a three-day rolling average of train count for the first seven days of 2017.
A custom filter override function is used to change the filter to include the previous, current and following date:
library(dplyr)
trains <- bhmtrains %>%
mutate(GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival),
GbttDate=as.Date(GbttDateTime))
januaryDates <- seq(as.Date("2017-01-01"), as.Date("2017-01-07"), by="days")
# three-day rolling average
# date filter function to a three day range of dates
getThreeDayFilter <- function(pt, filters, cell) {
# get the date filter
filter <- filters$getFilter("GbttDate")
if(is.null(filter)||(filter$type=="ALL")||(length(filter$values)>1)) {
# there is no filter on GbttDate in this cell
# i.e. we are in one of the total cells that covers all dates,
# so the concept of previous/next day has no meaning, so block all dates
newFilter <- PivotFilter$new(pt, variableName="GbttDate", type="NONE")
filters$setFilter(newFilter, action="replace")
}
else {
# get the date value and create three day filter
date <- filter$values
newDates <- seq(date-1, date+1, by="days")
filter$values <- newDates
}
}
# build the pivot
library(pivottabler)
pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("GbttDate", fromData=FALSE,
explicitListOfValues=as.list(januaryDates), visualTotals=TRUE)
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()",
caption="Current Day Count")
filterOverrides <- PivotFilterOverrides$new(pt, overrideFunction=getThreeDayFilter)
pt$defineCalculation(calculationName="ThreeDayCount", filters=filterOverrides,
summariseExpression="n()", caption="Three Day Total")
pt$defineCalculation(calculationName="ThreeDayAverage", type="calculation",
basedOn="ThreeDayCount",
calculationExpression="values$ThreeDayCount/3",
format="%.1f", caption="Three Day Rolling Average")
pt$renderPivot()
The above example also uses visualTotals=TRUE
for the
same reasons as described in the previous example.
Another variation of the above that is useful when full data for the rolling average is not available for the first and last days:
library(dplyr)
library(dplyr)
# here the trains data frame does not contain data for 31st Dec and 8th Jan,
# so the rolling average for 1st Jan and 7th Jan will be incomplete.
trains <- bhmtrains %>%
mutate(GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival),
GbttDate=as.Date(GbttDateTime)) %>%
filter((as.Date("2017-01-01") <= GbttDate) & (GbttDate <= as.Date("2017-01-07")))
# three-day rolling average
# date filter function to a three day range of dates
getThreeDayFilter <- function(pt, filters, cell) {
# get the date filter
filter <- filters$getFilter("GbttDate")
if(is.null(filter)||(filter$type=="ALL")||(length(filter$values)>1)) {
# there is no filter on GbttDate in this cell
# i.e. we are in one of the total cells that covers all dates,
# so the concept of previous/next day has no meaning, so block all dates
newFilter <- PivotFilter$new(pt, variableName="GbttDate", type="NONE")
filters$setFilter(newFilter, action="replace")
}
else {
# get the date value and create three day filter
date <- filter$values
newDates <- seq(date-1, date+1, by="days")
filter$values <- newDates
}
}
# build the pivot
library(pivottabler)
pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("GbttDate")
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()",
caption="Current Day Count")
filterOverrides <- PivotFilterOverrides$new(pt, overrideFunction=getThreeDayFilter)
pt$defineCalculation(calculationName="DaysWithDataCount", filters=filterOverrides,
summariseExpression="n_distinct(GbttDate)", caption="Days With Data")
pt$defineCalculation(calculationName="ThreeDayCount", filters=filterOverrides,
summariseExpression="n()", caption="Three Day Total")
pt$defineCalculation(calculationName="ThreeDayAverage", type="calculation",
basedOn=c("DaysWithDataCount", "ThreeDayCount"),
calculationExpression="ifelse(values$DaysWithDataCount==3, values$ThreeDayCount/3, NA)",
format="%.1f", caption="Three Day Rolling Average")
pt$renderPivot()
Objective:
We wish to show a cumulative sum of the number of trains for the first seven days of 2017.
A custom filter override function is used to change the filter to include dates between the 1st January 2017 and the current date:
library(dplyr)
trains <- bhmtrains %>%
mutate(GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival),
GbttDate=as.Date(GbttDateTime)) %>%
filter((as.Date("2017-01-01") <= GbttDate)&(GbttDate <= as.Date("2017-01-07")))
januaryDates <- seq(as.Date("2017-01-01"), as.Date("2017-01-07"), by="days")
# date filter function to all dates since 1st jan
getCumulativeFilter <- function(pt, filters, cell) {
# get the date filter
filter <- filters$getFilter("GbttDate")
if(is.null(filter)||(filter$type=="ALL")||(length(filter$values)>1)) {
# there is no filter on GbttDate in this cell
# i.e. we are in one of the total cells that covers all dates,
# can allow this to just be the total
}
else {
# get the date value and modify the filter
date <- filter$values
newDates <- seq(as.Date("2017-01-01"), date, by="days")
filter$values <- newDates
}
}
# build the pivot
library(pivottabler)
pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("GbttDate", fromData=FALSE,
explicitListOfValues=as.list(januaryDates))
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()",
caption="Current Day Count")
filterOverrides <- PivotFilterOverrides$new(pt, overrideFunction=getCumulativeFilter)
pt$defineCalculation(calculationName="CumulativeCount", filters=filterOverrides,
summariseExpression="n()", caption="Cumulative Count")
pt$renderPivot()