Title: | Dr. Small's Functions |
---|---|
Description: | Functions used in courses taught by Dr. Small at Drew University. |
Authors: | Ellie Small [aut, cre]
|
Maintainer: | Ellie Small <[email protected]> |
License: | GPL-3 |
Version: | 1.0.3 |
Built: | 2025-02-01 03:42:40 UTC |
Source: | https://github.com/cran/smallstuff |
Functions used by students in the Master's of Data Science program at Drew University.
Some functions are used for Statistics using R, such as pop.var (calculates the population variance), and outliers (finds the outliers in a distribution with their indices), some for Applied Regression Analysis such as projMatrix (Calculates the projection matrix) and systemEq (solves a system of linear equations), some for Machine Learning such as lmSub (finds the best linear model in subset selection), and some for Networks such as get_subgraphs, which splits a graph into subgraphs.
Ellie Small, [email protected].
Maintainer: Ellie Small <[email protected]>
Plot the span of a matrix plus any vectors in a 3D plot at one or more
angles. A plot is produced for each entry of th
.
allspan3D(M, V = NULL, th = c(-90, -45, 0, 45, 90, 135), V2 = NULL, col = NULL)
allspan3D(M, V = NULL, th = c(-90, -45, 0, 45, 90, 135), V2 = NULL, col = NULL)
M |
Matrix for which the span should be shown. |
V |
Either NULL, a vector of length 3, or a matrix with each column a vector of length 3. |
th |
A vector indicating the horizontal angle at which the plot should be shown. |
V2 |
A matrix or vector of the same dimensions as M indicating the starting points of the vectors in M (default is the origin for all). |
col |
Vector colors; if entered, must have a value for each vector. |
No return value, called for side effects
M=matrix(c(1,2,4,3,0,2),3) oldpar <- par(mfrow=c(3,2)) allspan3D(M,cbind(M,M[,1]-M[,2]),V2=matrix(c(rep(0,6),M[,2]),3),col=c(2,2,1)) par(oldpar)
M=matrix(c(1,2,4,3,0,2),3) oldpar <- par(mfrow=c(3,2)) allspan3D(M,cbind(M,M[,1]-M[,2]),V2=matrix(c(rep(0,6),M[,2]),3),col=c(2,2,1)) par(oldpar)
Plot one or more vectors in a 3D plot at one or more angles. A plot is
produced for each entry of th
.
allvectors3D(V, th = c(0, 30, 60, 90, 120, 150), V2 = NULL, col = NULL)
allvectors3D(V, th = c(0, 30, 60, 90, 120, 150), V2 = NULL, col = NULL)
V |
Either a vector of length 3 or a matrix with each column a vector of length 3. |
th |
A vector indicating the angles at which the plot should be shown. |
V2 |
A matrix or vector of the same dimensions as V indicating the starting points of the vectors in V (default is the origin for all). |
col |
Vector colors; if entered, must have a value for each vector. |
No return value, called for side effects
a=c(2,4,8) b=c(6,0,4) oldpar <- par(mfrow=c(3,2)) allvectors3D(cbind(a,b,a-b),V2=matrix(c(rep(0,6),b),3)) par(oldpar)
a=c(2,4,8) b=c(6,0,4) oldpar <- par(mfrow=c(3,2)) allvectors3D(cbind(a,b,a-b),V2=matrix(c(rep(0,6),b),3)) par(oldpar)
Create an adjacency matrix using the definition, i.e. an entry equals 1 if there is an edge from the vertex in the column to the vertex in the row, and cycles are counted twice.
as_adj_def(g, ...)
as_adj_def(g, ...)
g |
the graph (an igraph object) |
... |
additional arguments to be passed to the igraph function
|
Adjacency matrix for graph g
g=igraph::graph_from_literal(1-2,2-2:3:3:4,3-4:5:6,5-1:1:1,6-6,simplify=FALSE) as_adj_def(g)
g=igraph::graph_from_literal(1-2,2-2:3:3:4,3-4:5:6,5-1:1:1,6-6,simplify=FALSE) as_adj_def(g)
Confidence interval for a normally distributed sample mean
CI(x = 0, s = 1, n = 1, level = 0.95)
CI(x = 0, s = 1, n = 1, level = 0.95)
x |
sample mean |
s |
standard deviation |
n |
sample size |
level |
confidence level |
vector with two values containing the confidence interval for the sample mean
CI() CI(150,5,30,.9)
CI() CI(150,5,30,.9)
Plot a coordinate system in 2D with the origin in the center.
coord2D(x = 5, y = 5)
coord2D(x = 5, y = 5)
x |
Distance from the origin to the maximum x-value. |
y |
Distance from the origin to the maximum y-value. |
No return value, called for side effects
coord2D()
coord2D()
Plot a coordinate system in 3D with the origin bottom left.
coord3D(th = 0, x = 10, y = 10, z = 10)
coord3D(th = 0, x = 10, y = 10, z = 10)
th |
The angle at which the 3D plot should be displayed. |
x |
Distance from the origin to the maximum x-value. |
y |
Distance from the origin to the maximum y-value. |
z |
Distance from the origin to the maximum z-value. |
A matrix containing the plot coordinates (used when adding features).
coord3D()
coord3D()
Determine if edges in a graph cross groups or stay within groups. This is similar to the crossings function in igraph, but uses a vector for the split rather than a communities object.
crossing2(split, g)
crossing2(split, g)
split |
a vector with a value for each vertex in |
g |
an igraph object |
A logical vector indicating for each edge if it crosses groups or not. For each edge that crosses, it is TRUE, otherwise it is FALSE.
g=igraph::graph_from_literal(1-2,2-3:4,3-4:5:6,5-1) split=c("A","A","B","B","A","B") igraph::V(g);split igraph::E(g);crossing2(split,g)
g=igraph::graph_from_literal(1-2,2-3:4,3-4:5:6,5-1) split=c("A","A","B","B","A","B") igraph::V(g);split igraph::E(g);crossing2(split,g)
Given a logistic regression model (via glm), or an LDA or QDA model, and a number of folds k, the k-Fold CV error rate is calculated.
CVerror(mod, k = nrow(stats::model.frame(mod)))
CVerror(mod, k = nrow(stats::model.frame(mod)))
mod |
A logistic regression, LDA, or QDA model |
k |
Number of folds; by default LOOCV will be returned |
The k-fold CV error rate if k is entered, otherwise the LOOCV error rate.
mtcars$am=as.factor(mtcars$am) gmod=glm(am~mpg,binomial,mtcars) CVerror(gmod)
mtcars$am=as.factor(mtcars$am) gmod=glm(am~mpg,binomial,mtcars) CVerror(gmod)
Given a dataset with predictors and a vector with responses, a number of neighbors K, and a number of folds k, the k-fold CV error rate for KNN is calculated.
CVerrorknn(pred, resp, K = 1, k = nrow(pred))
CVerrorknn(pred, resp, K = 1, k = nrow(pred))
pred |
A dataset with predictors |
resp |
A vector with responses |
K |
The number of neighborhoods to consider when performing KNN |
k |
The number of folds |
The k-fold CV error rate if k is entered, otherwise the LOOCV error rate.
mtcars$am=as.factor(mtcars$am) CVerrorknn(mtcars[,c("mpg","hp")],mtcars$am)
mtcars$am=as.factor(mtcars$am) CVerrorknn(mtcars[,c("mpg","hp")],mtcars$am)
Given a formula, a dataset and a subset, retrieve the dataset that fulfills the formula and subset.
dataSet(formula, data, subset = NULL)
dataSet(formula, data, subset = NULL)
formula |
A formula |
data |
A dataset |
subset |
Either a logical vector or a vector of indices of the rows to be returned. If NULL (default), all rows are returned. |
The dataset in data
as a data table with variables as
specified in formula
and rows as specified by subset
.
dataSet(mpg~.-disp,mtcars,10:20)
dataSet(mpg~.-disp,mtcars,10:20)
Calculate Cohen's d for one-sample t tests or two-sample independent tests or two-sample paired t-tests
dCohen(x, y = NULL, mu0 = 0, paired = FALSE)
dCohen(x, y = NULL, mu0 = 0, paired = FALSE)
x |
vector with (numeric) data |
y |
for two-sample tests, a vector with (numeric) data for group 2 |
mu0 |
for one-sample tests, the number to test against |
paired |
TRUE for a paired two-sample t-test, FALSE for an independent sample t-test |
value of Cohen's d
#one-sample x=c(1:10,5,6,3:8) dCohen(x,mu0=7) #two-sample independent y=1:15 dCohen(x,y) #two-sample paired dCohen(x,1:18,paired=TRUE)
#one-sample x=c(1:10,5,6,3:8) dCohen(x,mu0=7) #two-sample independent y=1:15 dCohen(x,y) #two-sample paired dCohen(x,1:18,paired=TRUE)
Split a graph into subgraphs using the values in a vector to indicate which vertices belong together.
get_subgraphs(g, split)
get_subgraphs(g, split)
g |
the graph (an igraph object) |
split |
a vector with a value for each vertex in |
A list of graphs, where each graph is a subgraph of g
containing the vertices with the same value in split
.
g=igraph::graph_from_literal(1-2,2-3:4,3-4:5:6,5-1) split=c("A","A","B","B","A","B") igraph::V(g);split igraph::V(get_subgraphs(g,split)[[1]]) igraph::V(get_subgraphs(g,split)[[2]])
g=igraph::graph_from_literal(1-2,2-3:4,3-4:5:6,5-1) split=c("A","A","B","B","A","B") igraph::V(g);split igraph::V(get_subgraphs(g,split)[[1]]) igraph::V(get_subgraphs(g,split)[[2]])
Add graph attributes to a graph from a data frame where each column represents an attribute. Note that only the first row of the data frame is used.
graph_attr_from_df(g, df)
graph_attr_from_df(g, df)
g |
the graph (an igraph object) to which the graph attributes should be added |
df |
data frame, or an object that can be converted to a data frame, where the first row contains a graph attribute in each column |
Graph g
with the graph attributes in df
added.
g=igraph::graph_from_literal(1-2,2-3:4,3-4:5:6,5-1) df=data.frame(name="Test Graph",descr="A graph") graph_attr_from_df(g,df)
g=igraph::graph_from_literal(1-2,2-3:4,3-4:5:6,5-1) df=data.frame(name="Test Graph",descr="A graph") graph_attr_from_df(g,df)
Replace missing values in a vector using a function (by default the mean) on this vector.
impNA(x, fn = mean, ...)
impNA(x, fn = mean, ...)
x |
A numeric vector |
fn |
A function to apply to all values in the vector |
... |
Additional arguments to be passed to function |
Vector x
with all missing values replaced
v1=c(2,5,3,NA,2,4,1,NA) #Replace values with the mean impNA(v1,na.rm=TRUE) #Replace values with the minimum impNA(v1,min,na.rm=TRUE)
v1=c(2,5,3,NA,2,4,1,NA) #Replace values with the mean impNA(v1,na.rm=TRUE) #Replace values with the minimum impNA(v1,min,na.rm=TRUE)
Determine if numbers in a vector are integers (not just of integer type)
isInt(x, inf = TRUE)
isInt(x, inf = TRUE)
x |
integer or numeric type vector |
inf |
logical field answering whether an infinite value should be considered an integer (default TRUE) |
TRUE for each value in x
that is an integer, FALSE otherwise
isInt(c(3,3.23,Inf))
isInt(c(3,3.23,Inf))
Calculate the cross product as defined in linear algebra; note that this differs from the cross product as defined by R.
laCrossProd(x, y)
laCrossProd(x, y)
x |
vector of length 3. |
y |
vector of length 3. |
Cross product of x
and y
.
x=c(1,2,1) y=1:3 laCrossProd(x,y)
x=c(1,2,1) y=1:3 laCrossProd(x,y)
Plot a line in a 3D plot through a set of points
lines3D(pl, x, y, z, ...)
lines3D(pl, x, y, z, ...)
pl |
Matrix containing the current plot coordinates. |
x |
Vector with x-coordinates. |
y |
Vector with y-coordinates. |
z |
Vector with z-coordinates. |
... |
additional graphical parameters (see lines()). |
No return value, called for side effects
pl=coord3D(30) lines3D(pl,0:10,0:10,rep(0,11)) lines3D(pl,0:10,0:10,c(0,2,1,3:8,7,5),col=2)
pl=coord3D(30) lines3D(pl,0:10,0:10,rep(0,11)) lines3D(pl,0:10,0:10,c(0,2,1,3:8,7,5),col=2)
Plot the partial regression plot for one of the predictors of a linear model
lmPartReg(mod, pred, ...)
lmPartReg(mod, pred, ...)
mod |
A linear model object (obtained via the lm function) |
pred |
The name (in quotes) of the predictor for which the plot should be produced |
... |
Any other arguments to be passed to the plot |
A partial regression plot for pred
in the linear model
mod
lmod=lm(mpg~.,mtcars) lmPartReg(lmod,"wt")
lmod=lm(mpg~.,mtcars) lmPartReg(lmod,"wt")
Produces the best linear model for a specific number of predictors in a subset selection.
lmSub(object, d)
lmSub(object, d)
object |
An object of type "regsubsets" |
d |
Number of data predictors |
The best linear model with d
predictors
subs=leaps::regsubsets(mpg~.,mtcars) summary(lmSub(subs,3))
subs=leaps::regsubsets(mpg~.,mtcars) summary(lmSub(subs,3))
Calculate the testing error rate for a dataset on a logistic regression model (or the training error rate if no dataset is entered), and a results table with responses versus predicted responses.
logistErrorRate(gmod, nw = NULL, p = 0.5)
logistErrorRate(gmod, nw = NULL, p = 0.5)
gmod |
A logistic regression model |
nw |
A dataset for which a testing error rate should be calculated
using the model in |
p |
Probability (default .5) above which the observation is assigned to the second level of the response. |
List with training error rate if nw
is NULL, testing error
rate otherwise, and a results table with responses versus predicted
responses.
gmod=glm(state~.,binomial,Puromycin) logistErrorRate(gmod)
gmod=glm(state~.,binomial,Puromycin) logistErrorRate(gmod)
Find the outliers in a vector of values.
outliers(x)
outliers(x)
x |
vector |
A list with a variable idx
containing the indices of the
outliers and a variable values
containing the values of the outliers.
x=c(100,30:40,101,25:28) outliers(x)
x=c(100,30:40,101,25:28) outliers(x)
Plot one or more colors
plotCol(col)
plotCol(col)
col |
vector with colors |
A plot showing the colors in col
plotCol("maroon")
plotCol("maroon")
Calculate the standard deviation of a numeric vector if the data constitutes the whole population. Note that missing values are excluded.
pop.sd(x)
pop.sd(x)
x |
numeric vector |
The population standard deviation of the entries in x
pop.sd(c(1:6,NA,7:10))
pop.sd(c(1:6,NA,7:10))
Calculate the variance of a numeric vector if the data constitutes the whole population. Note that missing values are excluded.
pop.var(x)
pop.var(x)
x |
numeric vector |
The population variance of the entries in x
pop.var(c(1:6,NA,7:10))
pop.var(c(1:6,NA,7:10))
Predict responses for the best model in a subset selection with a specific number of predictors.
## S3 method for class 'regsubsets' predict(object, d, newdata, ...)
## S3 method for class 'regsubsets' predict(object, d, newdata, ...)
object |
An object of type "regsubsets" |
d |
Number of data predictors |
newdata |
Dataset for which to predict responses |
... |
Additional arguments |
A set of predicted responses for newdata
subs=leaps::regsubsets(mpg~.,mtcars,subset=1:25) predict(subs,3L,mtcars[26:32,])
subs=leaps::regsubsets(mpg~.,mtcars,subset=1:25) predict(subs,3L,mtcars[26:32,])
Calculates the projection matrix for a full-rank matrix X with its number of rows greater than or equal to its number of columns
projMatrix(X)
projMatrix(X)
X |
nxp Matrix; must be full-rank and have n >= p |
Projection matrix of X
.
projMatrix(matrix(c(3,4,-1,2,1,1),3))
projMatrix(matrix(c(3,4,-1,2,1,1),3))
Plot a line through the first and third quantile of a halfnormal line
qqlineHalf(x)
qqlineHalf(x)
x |
numeric vector |
No return value, called for side effects
z=rnorm(100) faraway::halfnorm(z) qqlineHalf(z)
z=rnorm(100) faraway::halfnorm(z) qqlineHalf(z)
Simple function using Rcpp
rcpp_hello_world()
rcpp_hello_world()
## Not run: rcpp_hello_world() ## End(Not run)
## Not run: rcpp_hello_world() ## End(Not run)
Plot the ROC curve for logistic regression, LDA, or QDA models.
ROCcurve(mod, nw = NULL)
ROCcurve(mod, nw = NULL)
mod |
A logistic regression, LDA, or QDA model |
nw |
A dataset for which a testing ROC curve should be plotted
using the model in |
A plot with the ROC curve will be produced, nothing is returned.
gmod=glm(state~.,binomial,Puromycin) ROCcurve(gmod)
gmod=glm(state~.,binomial,Puromycin) ROCcurve(gmod)
Plot the ROC curve for a KNN model. Note that it can only be used when the response is dichotomous.
ROCknn(mod, response)
ROCknn(mod, response)
mod |
The output of the knn function, run with prob=TRUE |
response |
A vector with responses for the testing dataset used to run the knn function. |
A plot with the ROC curve will be produced, nothing is returned.
yhat=class::knn(Puromycin[,c("conc","rate")],Puromycin[,c("conc","rate")], Puromycin$state,10,prob=TRUE) ROCknn(yhat,Puromycin$state)
yhat=class::knn(Puromycin[,c("conc","rate")],Puromycin[,c("conc","rate")], Puromycin$state,10,prob=TRUE) ROCknn(yhat,Puromycin$state)
Round to the nearest number with the number of digits as indicated. NOTE:
Unlike the base round
function it rounds a 5 to the higher number,
rather than the nearest even number.
round2(x, digits = 0)
round2(x, digits = 0)
x |
number to be rounded |
digits |
number of digits to round to |
Number rounded to the number of digits
indicated
round2(2.5)
round2(2.5)
Displays a perspective plot showing the plane that is the span of a matrix
span3D(M, th = 0, ph = 15)
span3D(M, th = 0, ph = 15)
M |
Matrix for which the span should be shown. |
th |
A vector indicating the horizontal angle at which the plot should be shown. |
ph |
A vector indicating the vertical angle at which the plot should be shown. |
A matrix containing the plot coordinates (used when adding features).
span3D(matrix(c(1,0,0,1,1,1),3))
span3D(matrix(c(1,0,0,1,1,1),3))
Solve a system of equations if it has a unique solution; output an error message otherwise
systemEq(A, y)
systemEq(A, y)
A |
matrix A in Ax=y |
y |
output vector in Ax=y |
the unique solution x to Ax=y
systemEq(matrix(c(1:3,2,4,4),3),c(3,6,7))
systemEq(matrix(c(1:3,2,4,4),3),c(3,6,7))
Add a Vector to a 2D Coordinate System
vector2D(v, fr = c(0, 0), col = 2)
vector2D(v, fr = c(0, 0), col = 2)
v |
A vector with 2 entries. |
fr |
Vector containing the point at which the vector should start (defaults to the origin). |
col |
Color of the vector (defaults to red). |
No return value, called for side effects
a=c(2,4) b=c(0,3) coord2D() vector2D(a) vector2D(b) vector2D(a-b,b,"blue")
a=c(2,4) b=c(0,3) coord2D() vector2D(a) vector2D(b) vector2D(a-b,b,"blue")
Add a Vector to a 3D Coordinate System
vector3D(pl, v, fr = rep(0, 3), col = "red")
vector3D(pl, v, fr = rep(0, 3), col = "red")
pl |
Matrix containing the current plot coordinates. |
v |
A vector with 3 entries. |
fr |
The point at which the vector should start (defaults to the origin). |
col |
Color of the vector (defaults to red). |
No return value, called for side effects
a=c(2,4,8) b=c(6,0,4) pl=coord3D() vector3D(pl,a) vector3D(pl,b) vector3D(pl,a-b,b,3)
a=c(2,4,8) b=c(6,0,4) pl=coord3D() vector3D(pl,a) vector3D(pl,b) vector3D(pl,a-b,b,3)
Obtain the weight distribution of a graph, indicating for each strength from zero to the maximum strength of any vertex, the proportion of vertices with such a strength. This assumes positive integer weights.
weight_distribution(g, cumulative = FALSE, ...)
weight_distribution(g, cumulative = FALSE, ...)
g |
the graph (an igraph object) |
cumulative |
|
... |
additional parameters to be passed to the igraph function
|
A vector with the weighted degree distribution for the graph
g
.
g=igraph::graph_from_literal(1-2,2-3:4,3-4:5:6,5-1) igraph::E(g)$weight=c(1,2,1,4,2,1,1) table(igraph::strength(g))/6 weight_distribution(g)
g=igraph::graph_from_literal(1-2,2-3:4,3-4:5:6,5-1) igraph::E(g)$weight=c(1,2,1,4,2,1,1) table(igraph::strength(g))/6 weight_distribution(g)
Calculate percentages of values in a matrix or table with respect to the row or column totals.
withinPC(X, rows = TRUE, rnd = 1)
withinPC(X, rows = TRUE, rnd = 1)
X |
matrix or table |
rows |
TRUE (default) to calculate by rows, or FALSE to calculate by columns |
rnd |
numbers of digits to round the result to |
A matrix or table with percentages
(X=matrix(c(1:12),3)) withinPC(X)
(X=matrix(c(1:12),3)) withinPC(X)