Title: | Fits the Bradley-Terry Model to Potentially Large and Sparse Networks of Comparison Data |
---|---|
Description: | Facilities are provided for fitting the simple, unstructured Bradley-Terry model to networks of binary comparisons. The implemented methods are designed to scale well to large, potentially sparse, networks. A fairly high degree of scalability is achieved through the use of EM and MM algorithms, which are relatively undemanding in terms of memory usage (relative to some other commonly used methods such as iterative weighted least squares, for example). Both maximum likelihood and Bayesian MAP estimation methods are implemented. The package provides various standard methods for a newly defined 'btfit' model class, such as the extraction and summarisation of model parameters and the simulation of new datasets from a fitted model. Tools are also provided for reshaping data into the newly defined "btdata" class, and for analysing the comparison network, prior to fitting the Bradley-Terry model. This package complements, rather than replaces, the existing 'BradleyTerry2' package. (BradleyTerry2 has rather different aims, which are mainly the specification and fitting of "structured" Bradley-Terry models in which the strength parameters depend on covariates.) |
Authors: | Ella Kaye [aut, cre], David Firth [aut] |
Maintainer: | Ella Kaye <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0.9300 |
Built: | 2024-11-22 14:24:55 UTC |
Source: | https://github.com/EllaKaye/BradleyTerryScalable |
A package for fitting the Bradley-Terry model to (potentially) large and sparse data sets.
Fit the Bradley-Terry model using the EM or MM algorithm
BT_EM(W, a, b, maxit = 5000L, epsilon = 0.001)
BT_EM(W, a, b, maxit = 5000L, epsilon = 0.001)
W |
a K*K square matrix of class "dgCMatrix" |
a |
the shape parameter of the gamma prior |
b |
the rate parameter of the gamma prior |
maxit |
the maximum number of iterations |
epsilon |
controls the convergence criteria |
A list containing a K*1 matrix with the pi estimate, the N matrix, the number of iterations, and whether the algorithm converged.
Creates a btdata object, primarily for use in the btfit function.
btdata(x, return_graph = FALSE) ## S3 method for class 'btdata' summary(object, ...)
btdata(x, return_graph = FALSE) ## S3 method for class 'btdata' summary(object, ...)
x |
The data, which is either a three- or four-column data frame, a directed igraph object, a square matrix or a square contingency table. See Details. |
return_graph |
Logical. If TRUE, an igraph object representing the comparison graph will be returned. |
object |
An object of class "btdata", typically the result |
... |
Other arguments |
The x
argument to btdata
can be one of four types:
A matrix (either a base matrix
) or a class from the Matrix
package), dimension by
, where
is the number of items. The i,j-th element is
, the number of times item
has beaten item
. Ties can be accounted for by assigning half a win (i.e. 0.5) to each item.
A contingency table of class table
, similar to the matrix described in the above point.
An igraph
, representing the comparison graph, with the items as nodes. For the edges:
If the graph is unweighted, a directed edge from node to node
for every time item
has beaten item
If the graph is weighted, then one edge from node to node
if item
has beaten item
at least once, with the weight attribute of that edge set to the number of times
has beaten
.
If x
is a data frame, it must have three or four columns:
3-column data frameThe first column contains the name of the winning item, the second column contains the name of the losing item and the third columns contains the number of times that the winner has beaten the loser. Multiple entries for the same pair of items are handled correctly. If x
is a three-column dataframe, but the third column gives a code for who won, rather than a count, see codes_to_counts
.
4-column data frameThe first column contains the name of item 1, the second column contains the name of item 2, the third column contains the number of times that item 1 has beaten item 2 and the fourth column contains the number of times item 2 has beaten item 1. Multiple entries for the same pair of items are handled correctly. This kind of data frame is also the output of codes_to_counts
.
In either of these cases, the data can be aggregated, or there can be one row per comparison.
Ties can be accounted for by assigning half a win (i.e. 0.5) to each item.
summary.btdata
shows the number of items, the density of the wins
matrix and whether the underlying comparison graph is fully connected. If it is not fully connected, summary.btdata
will additional show the number of fully-connected components and a table giving the frequency of components of different sizes. For more details on the comparison graph, and how its structure affects how the Bradley-Terry model is fitted, see btfit
and the vignette: https://ellakaye.github.io/BradleyTerryScalable/articles/BradleyTerryScalable.html.
An object of class "btdata", which is a list containing:
wins |
A |
components |
A list of the fully-connected components. |
graph |
The comparison graph of the data (if return_graph = TRUE). See Details. |
Ella Kaye
codes_to_counts
select_components
citations_btdata <- btdata(BradleyTerryScalable::citations) summary(citations_btdata) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) summary(toy_btdata)
citations_btdata <- btdata(BradleyTerryScalable::citations) summary(citations_btdata) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) summary(toy_btdata)
btfit
fits the Bradley-Terry model on (potentially) large and sparse datasets.
btfit( btdata, a, MAP_by_component = FALSE, subset = NULL, maxit = 10000, epsilon = 0.001 )
btfit( btdata, a, MAP_by_component = FALSE, subset = NULL, maxit = 10000, epsilon = 0.001 )
btdata |
An object of class "btdata", typically the result ob of ob <- btdata(..). See |
a |
Must be >= 1. When |
MAP_by_component |
Logical. Only considered if a > 1. Then, if FALSE, the MAP estimate will be found on the full dataset. If TRUE, the MAP estimate will be found separately for each fully-connected component. |
subset |
A condition for selecting a subset of the components. This can either be a character vector of names of the components, a single predicate function (that takes a component as its argument), or a logical vector of the same length as the number of components). |
maxit |
The maximum number of iterations for the algorithm. If returning |
epsilon |
Determines when the algorithm is deemed to have converged. (See Details.) |
Let there be items, let
be the Bradley-Terry strength parameter of item
, for
and let
be the vector of all the
. Let
be the number of times item
wins against item
, let
be the number of times they play, with
by convention and let
. Then the Bradley-Terry model states that the probability of item
beating item
,
, is:
The comparison graph, , has the
players as the nodes and a directed edge from node
to node
whenever item
has beaten item
at least once. The MLE of the Bradley-Terry model exists and is finite if and only if the comparison graph is fully-connected (i.e. if there is a directed path from node
to node
for all items
and
).
Assuming that the comparison graph of the data is fully-connected, the MLE of the Bradley-Terry model can be found using the MM-algorithm (Hunter, 2004).
If the comparison graph of the data is not fully-connected, there are two principled options for fitting the Bradley-Terry model. One is to find the MLE within each fully-connected component. The other is to find the Bayesian MAP estimate, as suggested by Caron & Doucet (2012), where a gamma prior is placed on each
, and the product of these is taken as a prior on
. The MAP estimate can then be found with an EM-algorithm. When
and
, the EM and MM-algorithms are equivalent and the MAP estimate and MLE are identical. The rate parameter of the Gamma prior,
, is not likelihood identifiable. When
,
is set to
, where
is the number of items in the component; this choice of
minimises the number of iterations needed for the algorithm to converge.
The likelihood equations give
for . For the algorithm to have converged, we want
to be such that the LHS and RHS of this equation are close for all
. Therefore, we set the convergence criteria as
for all .
Since the equations do not typeset well within the R help window, we recommend reading this section online: https://ellakaye.github.io/BradleyTerryScalable/reference/btfit.html.
btfit
returns an S3 object of class "btfit". It is a list containing the following components:
call |
The matched call |
pi |
A list of length |
iters |
A vector of length |
converged |
A logical vector of length |
N |
A list of length |
diagonal |
A list of length |
names_dimnames |
The names of the dimnames of the original |
Ella Kaye, David Firth
Caron, F. and Doucet, A. (2012) Efficient Bayesian Inference for Generalized Bradley-Terry Models. Journal of Computational and Graphical Statistics, 21(1), 174-196.
Hunter, D. R. (2004) MM Algorithms for Generalized Bradley-Terry Models. The Annals of Statistics, 32(1), 384-406.
btdata
, summary.btfit
, coef.btfit
, fitted.btfit
, btprob
, vcov.btfit
, simulate.btfit
citations_btdata <- btdata(BradleyTerryScalable::citations) fit1 <- btfit(citations_btdata, 1) summary(fit1) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2a <- btfit(toy_btdata, 1) summary(fit2a) fit2b <- btfit(toy_btdata, 1.1) summary(fit2b) fit2c <- btfit(toy_btdata, 1, subset = function(x) length(x) > 3) summary(fit2c)
citations_btdata <- btdata(BradleyTerryScalable::citations) fit1 <- btfit(citations_btdata, 1) summary(fit1) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2a <- btfit(toy_btdata, 1) summary(fit2a) fit2b <- btfit(toy_btdata, 1.1) summary(fit2b) fit2c <- btfit(toy_btdata, 1, subset = function(x) length(x) > 3) summary(fit2c)
Calculates the Bradley-Terry probabilities of each item in a fully-connected component of the comparison graph, , winning against every other item in that component (see Details).
btprob(object, subset = NULL, as_df = FALSE)
btprob(object, subset = NULL, as_df = FALSE)
object |
An object of class "btfit", typically the result |
subset |
A condition for selecting one or more subsets of the components. This can either be a character vector of names of the components (i.e. a subset of |
as_df |
Logical scalar, determining class of output. If |
Consider a set of items. Let the items be nodes in a graph and let there be a directed edge
when
has won against
at least once. We call this the comparison graph of the data, and denote it by
. Assuming that
is fully connected, the Bradley-Terry model states that the probability that item
beats item
is
where and
are positive-valued parameters representing the skills of items
and
, for
. The function
btfit
can be used to find the strength parameter . It produces a
"btfit"
object that can then be passed to btprob
to obtain the Bradley-Terry probabilities .
If is not fully connected, then a penalised strength parameter can be obtained using the method of Caron and Doucet (2012) (see
btfit
, with a > 1
), which allows for a Bradley-Terry probability of any of the K items beating any of the others. Alternatively, the MLE can be found for each fully connected component of (see
btfit
, with a = 1
), and the probability of each item in each component beating any other item in that component can be found.
If as_df = FALSE
, returns a matrix where the -th element is the Bradley-Terry probability
, or, if the comparison graph,
, is not fully connected and
btfit
has been run with a = 1
, a list of such matrices for each fully-connected component of . If
as_df = TRUE
, returns a five-column data frame, where the first column is the component that the two items are in, the second column is item1
, the third column is item2
, the fourth column is the Bradley-Terry probability that item 1 beats item 2 and the fifth column is the Bradley-Terry probability that item 2 beats item 1. If the original btdata$wins
matrix has named dimnames, these will be the colnames
for columns one and two. See Details.
Ella Kaye
Bradley, R. A. and Terry, M. E. (1952). Rank analysis of incomplete block designs: 1. The method of paired comparisons. Biometrika, 39(3/4), 324-345.
Caron, F. and Doucet, A. (2012). Efficient Bayesian Inference for Generalized Bradley-Terry Models. Journal of Computational and Graphical Statistics, 21(1), 174-196.
citations_btdata <- btdata(BradleyTerryScalable::citations) fit1 <- btfit(citations_btdata, 1) btprob(fit1) btprob(fit1, as_df = TRUE) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2a <- btfit(toy_btdata, 1) btprob(fit2a) btprob(fit2a, as_df = TRUE) btprob(fit2a, subset = function(x) "Amy" %in% names(x)) fit2b <- btfit(toy_btdata, 1.1) btprob(fit2b, as_df = TRUE)
citations_btdata <- btdata(BradleyTerryScalable::citations) fit1 <- btfit(citations_btdata, 1) btprob(fit1) btprob(fit1, as_df = TRUE) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2a <- btfit(toy_btdata, 1) btprob(fit2a) btprob(fit2a, as_df = TRUE) btprob(fit2a, subset = function(x) "Amy" %in% names(x)) fit2b <- btfit(toy_btdata, 1.1) btprob(fit2b, as_df = TRUE)
Extracted from a larger table in Stigler (1994). Inter-journal citation counts for four journals, "Biometrika", "Comm Statist.", "JASA" and "JRSS-B", as used on p448 of Agresti (2002)
citations
citations
A four by four matrix, where the -th element is the number of times journal
has been cited by journal
.
In the context of paired comparisons, the 'winner' is the cited journal and the 'loser' is the one doing the citing.
This dataset also appears in the BradleyTerry2
package.
Agresti, A. (2002) Categorical Data Analysis (2nd ed.). New York: Wiley
Stigler, S. (1994) Citation patterns in the journals of statistics and probability. Statistical Science, 9, 384-406.
Convert a three-column data frame in which the third column is a code representing whether the item in column 1 won, lost or (if applicable) drew over/with the item in column 2, to a dataframe with counts (suitable for use in btdata
)
codes_to_counts(df, codes)
codes_to_counts(df, codes)
df |
A three-column data frame. Each row represents a comparison between two items. The first and second columns are the names of the first and second items respectively. The third column gives a code for which won. See Details and Examples. |
codes |
A numeric vector or character vector, of length two or three (depending on whether there are ties.) The first and second element gives the codes used if the first or second item won respectively. If there are ties, the third element gives the code used in that case. See Details and Examples. |
This function is needed in the BradleyTerryScalable
workflow when the user data is stored in a three-column data frame where each row is a comparison between two items, and where the third column is NOT a count of the number of times the item in the first column beat the item in the second column. Rather, it could be that the third column is a code for which of the two items won (including the possibility of a tie), for example "W1", "W2", "D"
. Or else, it could be that the third column gives the score only in relation to the first item, e.g. 1 for a win, 0 for a loss or 0.5 for a draw without there anywhere in the table being the corresponding record for the second item (i.e. respectively 0 for a loss, 1 for a win and 0.5 for a draw.)
A four-column data frame where the first two columns are the name of the first and second item. The third and fourth column gives the wins count for the first and second item respectively: 1 for a win, 0 for a loss, and 0.5 each for a draw. This data frame is in the correct format to be passed to btdata
Ella Kaye
first <- c("A", "A", "B", "A") second <- c("B", "B", "C", "C") df1 <- data.frame(player1 = first, player2 = second, code = c("W1", "W2", "D", "D")) codes_to_counts(df1, c("W1", "W2", "D")) df2 <- data.frame(item1 = first, item2 = second, result = c(0, 1, 1, .5)) codes_to_counts(df2, c(1, 0, .5)) df3 <- data.frame(player1 = first, player2 = second, which_won = c(1,2,2,1)) codes_to_counts(df3, c(1,2)) codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D"))
first <- c("A", "A", "B", "A") second <- c("B", "B", "C", "C") df1 <- data.frame(player1 = first, player2 = second, code = c("W1", "W2", "D", "D")) codes_to_counts(df1, c("W1", "W2", "D")) df2 <- data.frame(item1 = first, item2 = second, result = c(0, 1, 1, .5)) codes_to_counts(df2, c(1, 0, .5)) df3 <- data.frame(player1 = first, player2 = second, which_won = c(1,2,2,1)) codes_to_counts(df3, c(1,2)) codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D"))
coef
method for class "btfit"
## S3 method for class 'btfit' coef(object, subset = NULL, ref = NULL, as_df = FALSE, ...)
## S3 method for class 'btfit' coef(object, subset = NULL, ref = NULL, as_df = FALSE, ...)
object |
An object of class "btfit", typically the result |
subset |
A condition for selecting one or more subsets of the components. This can either be a character vector of names of the components (i.e. a subset of |
ref |
A reference item. Either a string with the item name, or the number 1, or NULL. If NULL, then the coefficients are constrained such that their mean is zero. If an item name is given, the coefficient estimates are shifted so that the coefficient for the ref item is zero. If there is more than one component, the components that do not include the ref item will be treated as if ref = NULL. If ref = 1, then the first item of each component is made the reference item. |
as_df |
Logical scalar, determining class of output. If TRUE, the function returns a data frame. If FALSE (the default), the function returns a named vector (or list of such vectors). |
... |
other arguments |
Note that the values given in the estimate
column of the item_summary
element are NOT the same as the values in object$pi
. Rather, they are the , where
. By default, these are normalised so that mean(
) = 0. However, if
ref
is not equal to NULL
, then the in the component in which
ref
appears are shifted to , for
, where
is the number of items in the component in which
ref
appears, and is the estimate for the reference item.
If as_df = TRUE, a data frame a numeric vector of estimated coefficients, where the first column is the component the item is in, the second column in the item and the third column in the coefficient. If as_df = FALSE, then a numeric vector is returned if the model is fitted on the full dataset, or else a list of numeric vectors is returned, one for each fully connected component. Within each component, the items are arranged by estimate, in descending order.
Ella Kaye
citations_btdata <- btdata(BradleyTerryScalable::citations) fit1 <- btfit(citations_btdata, 1) coef(fit1) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2a <- btfit(toy_btdata, 1) coef(fit2a) coef(fit2a, subset = function(x) length(x) > 3, as_df = TRUE) coef(fit2a, subset = function(x) "Amy" %in% names(x)) coef(fit2a, as_df = TRUE) fit2b <- btfit(toy_btdata, 1.1) coef(fit2b) coef(fit2b, ref = "Cyd")
citations_btdata <- btdata(BradleyTerryScalable::citations) fit1 <- btfit(citations_btdata, 1) coef(fit1) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2a <- btfit(toy_btdata, 1) coef(fit2a) coef(fit2a, subset = function(x) length(x) > 3, as_df = TRUE) coef(fit2a, subset = function(x) "Amy" %in% names(x)) coef(fit2a, as_df = TRUE) fit2b <- btfit(toy_btdata, 1.1) coef(fit2b) coef(fit2b, ref = "Cyd")
fitted.btfit
returns the fitted values from a fitted btfit model object.
## S3 method for class 'btfit' fitted(object, subset = NULL, as_df = FALSE, ...)
## S3 method for class 'btfit' fitted(object, subset = NULL, as_df = FALSE, ...)
object |
An object of class "btfit", typically the result |
subset |
A condition for selecting one or more subsets of the components. This can either be a character vector of names of the components (i.e. a subset of |
as_df |
Logical scalar, determining class of output. If |
... |
Other arguments |
Consider a set of items. Let the items be nodes in a graph and let there be a directed edge
when
has won against
at least once. We call this the comparison graph of the data, and denote it by
. Assuming that
is fully connected, the Bradley-Terry model states that the probability that item
beats item
is
where and
are positive-valued parameters representing the skills of items
and
, for
.
The expected, or fitted, values under the Bradley-Terry model are therefore:
where is the number of comparisons between item
and item
.
If there are values on the diagonal in the original btdata$wins
matrix, then these appear as the values on the diagonal of the fitted matrix. These values do not appear in the data frame if the as_df
argument is set to TRUE
.
The function btfit
is used to fit the Bradley-Terry model. It produces a "btfit"
object that can then be passed to fitted.btfit
to obtain the fitted values . Note that the Bradley-Terry probabilities
can be calculated using
btprob
.
If is not fully connected, then a penalised strength parameter can be obtained using the method of Caron and Doucet (2012) (see
btfit
, with a > 1
), which allows for a Bradley-Terry probability of any of the items beating any of the others. Alternatively, the MLE can be found for each fully-connected component of
(see
btfit
, with a = 1
), and the probability of each item in each component beating any other item in that component can be found.
If as_df = FALSE
and the model has been fit on the full dataset, returns a matrix where the -th element is the Bradley-Terry expected value
(See Details). Otherwise, a list of such matrices is returned, one for each fully-connected component. If
as_df = TRUE
, returns a five-column data frame, where the first column is the component that the two items are in, the second column is item1
, the third column is item2
, the fourth column, fit1
, is the expected number of times that item 1 beats item 2 and the fifth column, fit2
, is the expected number of times that item 2 beats item 1. If btdata$wins
has named dimnames, these will be the colnames
for columns one and two. Otherwise these colnames will be item1
and item2
. See Details.
Ella Kaye
Bradley, R. A. and Terry, M. E. (1952). Rank analysis of incomplete block designs: 1. The method of paired comparisons. Biometrika, 39(3/4), 324-345.
Caron, F. and Doucet, A. (2012). Efficient Bayesian Inference for Generalized Bradley-Terry Models. Journal of Computational and Graphical Statistics, 21(1), 174-196.
citations_btdata <- btdata(BradleyTerryScalable::citations) fit1 <- btfit(citations_btdata, 1) fitted(fit1) fitted(fit1, as_df = TRUE) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2a <- btfit(toy_btdata, 1) fitted(fit2a) fitted(fit2a, as_df = TRUE) fitted(fit2a, subset = function(x) "Amy" %in% names(x)) fit2b <- btfit(toy_btdata, 1.1) fitted(fit2b, as_df = TRUE)
citations_btdata <- btdata(BradleyTerryScalable::citations) fit1 <- btfit(citations_btdata, 1) fitted(fit1) fitted(fit1, as_df = TRUE) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2a <- btfit(toy_btdata, 1) fitted(fit2a) fitted(fit2a, as_df = TRUE) fitted(fit2a, subset = function(x) "Amy" %in% names(x)) fit2b <- btfit(toy_btdata, 1.1) fitted(fit2b, as_df = TRUE)
Subset a btdata object by selecting components from it.
select_components(btdata, subset, return_graph = FALSE)
select_components(btdata, subset, return_graph = FALSE)
btdata |
An object of class "btdata", typically the result ob of ob <- btdata(..). See |
subset |
A condition for selecting a subset of the components. This can either be a character vector of names of the components, a single predicate function (that takes a component as its argument), or a logical vector of the same length as the number of components). |
return_graph |
Logical. If TRUE, an igraph object representing the comparison graph of the selected components will be returned. |
A btdata
object, which is a list containing:
wins |
A square matrix, where the |
components |
A list of the fully-connected components. The names of the list preserve the names of the original |
graph |
The comparison graph of the selected components (if return_graph = TRUE). |
Ella Kaye
toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) ## The following all return the same component select_components(toy_btdata, "3", return_graph = TRUE) select_components(toy_btdata, function(x) length(x) == 4) select_components(toy_btdata, function(x) "Cyd" %in% x) select_components(toy_btdata, c(FALSE, FALSE, TRUE))
toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) ## The following all return the same component select_components(toy_btdata, "3", return_graph = TRUE) select_components(toy_btdata, function(x) length(x) == 4) select_components(toy_btdata, function(x) "Cyd" %in% x) select_components(toy_btdata, c(FALSE, FALSE, TRUE))
This function simulates one or more pseudo-random datasets from a specified Bradley-Terry model. Counts are simulated from independent binomial distributions, with the binomial probabilities and totals specified through the function arguments.
simulate_BT( pi, N, nsim = 1, seed = NULL, result_class = c("sparseMatrix", "btdata") ) ## S3 method for class 'btfit' simulate( object, nsim = 1, seed = NULL, result_class = c("sparseMatrix", "btdata"), ... )
simulate_BT( pi, N, nsim = 1, seed = NULL, result_class = c("sparseMatrix", "btdata") ) ## S3 method for class 'btfit' simulate( object, nsim = 1, seed = NULL, result_class = c("sparseMatrix", "btdata"), ... )
pi |
a numeric vector, with all values finite and positive. The vector of item strengths in the Bradley-Terry model. |
N |
a symmetric, numeric matrix with dimensions the same as
|
nsim |
a scalar integer, the number of datasets to be generated. |
seed |
an object specifying if and how the random number generator
should be initialized (‘seeded’).
For details see |
result_class |
a character vector specifying whether the generated datasets should be of class "sparseMatrix" or of class "btdata". If not specified, the first match among those alternatives is used. |
object |
An object of class "btfit", typically the result of |
... |
Other arguments |
a list of length nsim
of simulated datasets.
If result_class = "sparseMatrix"
, the datasets are sparse matrices
with the same dimensions as N
. If result_class = "btdata"
then
the datasets are "btdata" objects. See btdata
David Firth
set.seed(1) n <- 6 N <- matrix(rpois(n ^ 2, lambda = 1), n, n) N <- N + t(N) ; diag(N) <- 0 p <- exp(rnorm(n)/4) names(p) <- rownames(N) <- colnames(N) <- letters[1:6] simulate_BT(p, N, seed = 6) citations_btdata <- btdata(BradleyTerryScalable::citations) fit1 <- btfit(citations_btdata, 1) simulate(fit1, nsim = 2, seed = 1) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2 <- btfit(toy_btdata, 1, subset = function(x) "Amy" %in% x) fit2_sim <- simulate(fit2, nsim = 3, result_class = "btdata") fit2_sim$sim_1 purrr::map(fit2_sim, "wins")
set.seed(1) n <- 6 N <- matrix(rpois(n ^ 2, lambda = 1), n, n) N <- N + t(N) ; diag(N) <- 0 p <- exp(rnorm(n)/4) names(p) <- rownames(N) <- colnames(N) <- letters[1:6] simulate_BT(p, N, seed = 6) citations_btdata <- btdata(BradleyTerryScalable::citations) fit1 <- btfit(citations_btdata, 1) simulate(fit1, nsim = 2, seed = 1) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2 <- btfit(toy_btdata, 1, subset = function(x) "Amy" %in% x) fit2_sim <- simulate(fit2, nsim = 3, result_class = "btdata") fit2_sim$sim_1 purrr::map(fit2_sim, "wins")
summary
method for class "btfit"
## S3 method for class 'btfit' summary(object, subset = NULL, ref = NULL, SE = FALSE, ...)
## S3 method for class 'btfit' summary(object, subset = NULL, ref = NULL, SE = FALSE, ...)
object |
An object of class "btfit", typically the result |
subset |
A condition for selecting one or more subsets of the components. This can either be a character vector of names of the components (i.e. a subset of |
ref |
A reference item. Either a string with the item name, or the number 1, or NULL. If NULL, then the coefficients are constrained such that their mean is zero. If an item name is given, the coefficient estimates are shifted so that the coefficient for the ref item is zero. If there is more than one component, the components that do not include the ref item will be treated as if ref = NULL. If ref = 1, then the first item of each component is made the reference item. |
SE |
Logical. Whether to include the standard error of the estimate in the |
... |
other arguments |
Note that the values given in the estimate
column of the item_summary
element are NOT the same as the values in object$pi
. Rather, they are the , where
(i.e. the coefficients as found by They are the coefficients, as found by
coef.btfit
.). By default, these are normalised so that mean() = 0. However, if
ref
is not equal to NULL
, then the in the component in which
ref
appears are shifted to , for
, where
is the number of items in the component in which
ref
appears, and is the estimate for the reference item.
An S3 object of class "summary.btfit"
. It is a list containing the following components:
item_summary |
A |
component_summary |
A |
Ella Kaye
citations_btdata <- btdata(BradleyTerryScalable::citations) fit1 <- btfit(citations_btdata, 1) summary(fit1) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2a <- btfit(toy_btdata, 1) summary(fit2a) fit2b <- btfit(toy_btdata, 1.1) summary(fit2b, SE = TRUE) fit2c <- btfit(toy_btdata, 1) summary(fit2c, subset = function(x) "Amy" %in% names(x)) summary(fit2c, subset = function(x) length(x) > 3, ref = "Amy")
citations_btdata <- btdata(BradleyTerryScalable::citations) fit1 <- btfit(citations_btdata, 1) summary(fit1) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2a <- btfit(toy_btdata, 1) summary(fit2a) fit2b <- btfit(toy_btdata, 1.1) summary(fit2b, SE = TRUE) fit2c <- btfit(toy_btdata, 1) summary(fit2c, subset = function(x) "Amy" %in% names(x)) summary(fit2c, subset = function(x) length(x) > 3, ref = "Amy")
BradleyTerryScalable
packageA toy data set, where the underlying comparison graph of the players is not fully connected. Each row represents one game.
toy_data
toy_data
A data frame with 13 rows and 3 variables:
The name of player1
The name of player2
Outcome of the game: "W1"
if player1 beats player2, "W2"
if player2 beats player2 and "D"
if it was a draw.
vcov
method for class "btfit"
## S3 method for class 'btfit' vcov(object, subset = NULL, ref = NULL, ...)
## S3 method for class 'btfit' vcov(object, subset = NULL, ref = NULL, ...)
object |
An object of class "btfit", typically the result |
subset |
A condition for selecting one or more subsets of the components. This can either be a character vector of names of the components (i.e. a subset of |
ref |
A reference item. Either a string with the item name, or the number 1, or NULL. If NULL, then the coefficients are constrained such that their mean is zero. If an item name is given, the coefficient estimates are shifted so that the coefficient for the ref item is zero. If there is more than one component, the components that do not include the ref item will be treated as if ref = NULL. If ref = 1, then the first item of each component is made the reference item. |
... |
other arguments |
N.B. this can be slow when there are a large number of items in any component.
A square numeric matrix, which is a non-full-rank variance-covariance matrix for the estimates in coef(object, subset = subset, ref = ref)
; or a list of such matrices if object
has more than one component. The rows and columns of the matrix (or matrices) are arranged in the same order as the object$pi
vector(s).
-#' @author David Firth, Ella Kaye
btfit
, coef.btfit
, summary.btfit
citations_btdata <- btdata(BradleyTerryScalable::citations) #' fit1 <- btfit(citations_btdata, 1) #' vcov(fit1) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2a <- btfit(toy_btdata, 1) vcov(fit2a) vcov(fit2a, subset = function(x) length(x) > 3) vcov(fit2a, subset = function(x) "Cyd" %in% names(x)) fit2b <- btfit(toy_btdata, 1.1) vcov(fit2b, ref = "Cyd")
citations_btdata <- btdata(BradleyTerryScalable::citations) #' fit1 <- btfit(citations_btdata, 1) #' vcov(fit1) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2a <- btfit(toy_btdata, 1) vcov(fit2a) vcov(fit2a, subset = function(x) length(x) > 3) vcov(fit2a, subset = function(x) "Cyd" %in% names(x)) fit2b <- btfit(toy_btdata, 1.1) vcov(fit2b, ref = "Cyd")