Data: Systematic Review Protocol

A. Get published Review Protocols from Campbell Collaboration.

On campbellcollaboration.org webpage, click “Campbell systematic Reviews journal”
The link takes you to the Wiley Online Library and you will need a library account to browse the contents. Click “Campbell Article Types”
Select “Protocol” from the list of types.
Total 248 results which can be further narrowed by selecting “Campbell subject Categories”.

B. Get published review protocols from Cochrane Reviews.

https://www.cochranelibrary.com/
Search ‘mathematical’ in the Title Abstract Keyword and get 1 result under the Cochrance Protocols tab.

C. Layout of the Protocol

  • Background
    • The problem, condition or issue
    • Intervention
    • How the intervention might work
    • Why it is important to do the review
    • Products of this systematic review
  • Objectives
  • Methodology
    • Criteria for including and excluding studies
      • Types of study designs
      • Time and language
      • Types of participants
      • Types of interventions
      • Duration of follow-up
      • Types of settings
    • Search strategy
    • Search terms
    • Description of methods used in primary research
    • Criteria for determination of independent findings
    • Details of study coding categories
    • Statistical procedures and conventions
    • Studies with multiple groups
    • Unit of analysis issues
    • investigation of heterogeneity
    • Sensitivity analysis
    • missing data and author queries
    • Treatment of qualitative research
  • Reference
  • Review Authors
  • Roles and responsibilities
  • Funding
  • Potential conflict of interest
  • Preliminary timeframe
  • Author declaration

Statistics: Matrix Calculation in Excel

A. Matrix Functions in Excel

  • MMULT(): returns the matrix product of 2 arrays. The result is an array with the same number of rows as array1 and the same number of columns as array2. Matrix A multiply Matrix B doesn’t equal to Matrix B multiply Matrix A. The order matters. The multiplication can be done as long as the number of columns of the first matrix equals to the number of rows in the second matrix. To make matrix smaller is to multiply a wider matrix to a long matrix, which returns the matrix with the shorter side of both matrices; and to make a matrix big is to multiple a longer matrix to a wider matrix, which returns the matrix with the longer side of both matrices.
  • MUNIT(): returns the identity matrix for the specified dimension.
  • MINVERSE(): returns the inverse matrix of a given matrix. The product of a matrix and its inverse matrix is the identity matrix. The inverse of a matrix will exist only if the determinant is not zero. The inverse of 2×2 [a, b; c, d] matrix can be calculated as:
    • Calculate the determinant;
    • take the inverse of the determinant;
    • multiple the inverse of the determinant to the [d, -b; -c, a] matrix.
  • MDETERM(): returns the determinant of the square matrix. If matrix determinant is 0, then matrix is sigular. For a 2×2 matrix, the determinant is equal to (ad-bc).

B. Excel Examples

  • Define matrix
Define X matrix
Define Y matrix
  • Transpose Matrix (X’)
Transpose X to X’ using array formula
  • Matrix multiplication
Multiplication of X’ and X
Multiplication of X’ and Y
  • Inverse Matrix (RULE: multiplication of a matrix and its inverse matrix equals to the identity matrix)
Inverse matrix of X’X
  • Combination of Matrices Operations
(X’X)^(-10)X’Y

C. Understand the matrix operation in solving regression equations

  • The matrix multiplication of X’X is to take the sum of product for each combination of all the Xs.
  • The inverse of X’X is converting the X’X into weights for Xs.
  • The matrix multiplication of X’Y is to take the sum of product for each combination of X and Y. Visually, it is like expand the space of X matrix to Y scale.
  • The (X’X)^(-1)X’Y is the estimation of the intercept and coefficient(s).

D. Reference

Data: Library Database Search Syntax for Systematic Review

A. Research Question:

A systematic review study is planned with the purpose of investigating whether current educational programs are effective for developing problem solving in early childhood education.

B. Terms and Definitions

  • Educational Programs
  • Problem Solving
  • Early Childhood

A. Reference for Systematic Review

A. Library Platforms and Databases

  • EBSCO
    • CINAHL
  • OVID
    • MedLine
    • EMBASE
    • PsychINFO
  • ProQuest
    • ERIC
    • PsychINFO

Same platform has same interface, nut the subject headings are different. Same database on different platforms has the same keywords for titles and abstracts, but different heading and different truncation and proximity syntactic rules.

Statistics: Calculate Effect Size for Meta-Analysis

A. Web-based Effect Size Calculator

B. Examples

  • T: Treatment Group
  • C: Control Group
  • T(n): n for Treatment Group
  • C(n): n for Control Group
  • p: p-value
  • std: standard deviation
DataES
1.
Mean SES / Group/ n / std
127.8 / T / 25 / 10.4
132.3 / C / 30 / 132.3
Standardized Mean Difference (d)
Means and Standard Deviation
d= -0.4466
2.
t/ T(n) / C(n)
1.68 / 10 / 12
Standardized Mean Difference (d)
T-Test, Unequal Sample Size
d= 0.7193
3.
T(n) / C(n) / p
10 / 12 / .037
Standardized Mean Difference (d)
T-Test P-Value, Unequal Sample Size
d= 0.9569
4.
r = .27 for binary variable and continuous variable
2*0.27/ sqrt(1-0.27^2)
ES=0.560829
formula from D.W. Wilson’s slides
5.
Group / Mean test scroe / n
1 / 55.38 / 13
2 / 59.40 / 18
3 / 75.14 / 37
4 / 88 / 22
F(3,86) = 7.05, for meta-analysis only interested group 1 and 2, std not reported.
Standardized Mean Difference (d)
F-Test, 3 or more Groups
d= -0.1658
6.
2 x 2 table
n / group / % not improved / % improved
42 / T / 32% / 68%
29 / C / 37% / 63%
Standardized Mean Difference (d)
Frequency Distribution (Proportions)
d= 0.1057

7. frequency table for T and C group, 60 cases for each group (no report of the means and stds)
Degrees of Condition / T(n) / C(n)
0 / 15 / 20
1 / 15 / 20
2 / 15 / 10
3 / 15 / 10
Standardized Mean Difference (d)
Frequency Distribution
d = 0.305
8. regression analysis with nonequivalent comparison group design
covariates: employment status, marital status, age etc.
treatment: intervention / probation only
unstandardized regression coefficient: -.523
std for DV, severity of physical abuse: s=9.23
sample size (intervention / probation only) : n1= 125 / n2=254
Standardized Mean Difference (d)
Unstandardized Regression coefficient
Covariates adjusted ES(d)= -0.0568
=(125-1)*9.23^2+ (254-1)*9.23^2 =32117.72
= 124+254-2 = 377
=32117.72 / 377 = 85.1929
=sqrt(85.1929)= 9.23 = S_pooled
ES=-.523 /9.23 = -.056

C. Web-based Calculator Output

Example 1: Means and Standard Deviation
Example 2: T-Test, Unequal Sample Size
Example 3: T-Test P-Value, Unequal Sample Size
Example 5: F-Test, 3 or more Groups
Example 6, Frequency distribution (Proportions)
Example 7, Frequency distribution
Example 8, Unstandardized Regression Coefficient

D. Citation

Wilson, D. B. (date of version). Meta-analysis macros for SAS, SPSS, and Stata. Retrieved November 3, 2019, from http://mason.gmu.edu/~dwilsonb/ma.html

SAS: Normality Test

A. Reference

B. Data

data BlockDesign;
input block a b y @@;
datalines;
1 1 1 56 1 1 2 41
1 2 1 50 1 2 2 36
1 3 1 39 1 3 2 35
2 1 1 30 2 1 2 25
2 2 1 36 2 2 2 28
2 3 1 33 2 3 2 30
3 1 1 32 3 1 2 24
3 2 1 31 3 2 2 27
3 3 1 15 3 3 2 19
4 1 1 30 4 1 2 25
4 2 1 35 4 2 2 30
4 3 1 17 4 3 2 18
;
run;

C. Normality Check

  • histogram
  • normal
  • plot
  • qqplot
%macro uni (data, var);
Title "Check Nomality and Distribution - &var. in  &data.";
proc univariate data = &data. ;
var &var. ;
histogram / cfill = gray;
run;
proc univariate data = &data. normal plot;
var &var.;
qqplot &var. / normal (mu=est sigma =est);
run;
%mend uni;
%uni(blockdesign, y);

D. Output

The p-value for Shapiro-Wilk test is 0.27, therefore we can not reject the null hypothesis that y is normally distributed.
Q-Q Plot

SAS: Proc Reg – Collinearity Diagnostics

A. Reference

B. Purpose

  • Examine whether predictors are highly collinear which can casuse problems in estimating the regression coefficients.
  • As the degree of multicollinearity increases, the coefficient estimates become unstable and the standard errors for the coefficients can be wildly inflated.

C. SAS code

proc reg data = cars;
model msrp = enginesize cylinders horsepower  /  tol vif collinoint;
run;
quit;

D. Notes

  • proc reg can not deal with categorical variable directly, therefore you need to create dummy variable yourself for the categorical variable.
  • tol: tolerance, the percent of variance in the predictor that cannot be accounted for by other predictors. Regress the predictor variable on the rest of the predictor variable and compute the R square. 1 minus the R square equals tolerance for the predictor.
  • vif: variance inflation factor. It is the inverse function of tolerance. Measures how much the variance of the estimated regression coefficient is “inflated” by the existence of correlation among the predictor variables in the model. A vif of 1 means no inflation at all. Exceeding 4 warrants further investigation Greater than 10 vif means serious multicollinearity and requires correction.
  • collinoint: produce intercept adjusted collinearity diagnostic. This table decomposes the correlation matrix in to linear combination of variables. The variance of each of these linear combinations is called an eigenvalue. Collinearity is assumed by finding 2 or more variables that have large proportions of variance (.50 or more) that correspond to large condition indices. A large condition index, 10 or more is and indication of instability.

E. SAS Output

F. Interpretation

  • Engine Size and cylinders have greater than 5 VIF.
  • The higher condition index is 5.41 with 83.7% and 90.1% of variances from for Engine Size and Cylinders. Since 5.4 is less than 10, therefore there is no multicollinearity.
  • Total eigenvalue accumulates to 3 because there are 3 predictors.

SAS: Proc Corr – Different Correlations between variables

A. Pearson Correlation: Between continuous variable and continuous variable.

%macro corr (in, year);
Title "Check Correlation - &in. group";
proc corr data= &in._&year. 
          outp=corr_&in._&year.
          plot=(matrix  scatterplot);
      var exper salary ;
run;
%mend corr;
  • Variables Information
  • Simple Statistics (N, Mean, Std Dev, Sum, Min, Max, Label
  • Pearson Correlations (Correlation Coefficients, Prob)
  • Scatter Plots
  • Scatter Plots Matrix

B. Polyserial Correlation: Between continuous variable and categorical variable.

%macro corr (in, year);
ods output polyserialCorr=corr_polys_&in._&year. ;
Title "Check Polyserial Correlation - &in. group";
proc corr data= &in._&year.  pearson polyserial;
	with gender minority rank;
      var  exper salary;
run;
%mend corr;
  • Variables Information
  • Simple Statistics
  • Person Correlations
  • PolyserialCorr (Wald Test, LR Test)

C. Polychoric Correlation: Between categorical variable and categorical variable.

%macro corr (in, year);
ods output measures=plcorr_&in._&year. (where=(statistic="Polychoric Correlation"));
Title "Check Polychoric Correlation - &in. group";
proc freq data= &in._&year.;
tables gender*rank minority*rank / plcorr;
run;
%mend corr;
  • Cross-Tabular Freq Table
  • Measures of Association (Polychoric Correlation)

D. Partial Correlation: Pearson correlation between variables while controlling other variables. Following example checks the correlation between salary and gender, controlling for rank and experience.

%macro partial (in, year);
Title "Check Partial Correlation for basesal and gender - &in. group";
proc corr data=&in._&year.  plots=scatter(alpha=.20 .30);
   var salary gender ; /* treat gender as continuous */
   partial rank exper;
 run;
%mend partial;

Data: Git Bash Installation for Windows

A. Reference: Software Carpentry Unix Shell Tutorial

B. Steps

  • Download git (git-2.23.0-64-bit.exe) for windows from gitforwindows.org
  • Install git for windows to c:\program files\git
  • Select Components -> Next
    • Additional icons: On the Desktop
    • Windows Explorer integration: Git Bash Here; Git GUI Here
    • Git LFS (Large File support)
    • Associate .git* configuration files with the default text editor
    • Associate .sh files to be run with Bash
  • Choosing the default editor used by Git: Use Notepad++ as Git’s default editor -> Next
  • Adjusting your PATH environment: Use Git and optional Unix tools from the Command Prompt -> Next
  • Choosing the SSH executable: Use OpenSSH -> Next
  • Choosing HTTPs transport backend: Use the OpenSSL library -> Next
  • Configuring the line ending conversions -> Checkout Windows-style, commit Unix-style line ending -> Next
  • Configuring the terminal emulator to use with Git Bash: Use Windows’ default console window -> Next
  • Configuring extra options: Enable file system caching; Enable Git Credential Manager; -> Next -> Install
  • Set home environment: cmd -> cd users\username\documents ->set HOME “%USERPROFILE%” -> exit

C. Note

  • Windows operating system do not automatically have a Unix Shell program installed. The post shows how to install an emulator included in Git for Windows, which provides the access to both Bash shell commands as well as Git.

D. Help

Data: Export Email to Excel

Reference: https://support.office.com/en-us/article/back-up-your-email-e5845b0b-1aeb-424f-924c-aa1c33b18833

Steps:

  • open Outlook
  • Move the emails that you want to export to a folder
  • Go to the folder
  • File -> Open & Export -> Import and Export
  • Choose an action to perform: ->Export to a file -> Next
  • Create a file of type: -> Comma Separated Values
  • Select folder to export from: (The folder that you were in are highlighted) -> Next
  • Save exported file as: -> Browse -> select folder and create file name -> Next
  • The following actions will be performed: Export “Email messages” from folder: xxxxxxxx -> Map Custom Fields -> Finish
    • Subject
    • Body
    • From: (Name)
    • From: (Address)
    • From: (Type)
    • To: (Name)
    • To: (Address)
    • To: (Type)
    • BCC: (Name)
    • BCC: (Address)
    • BillingInformation
    • Categories
    • Importance
    • Mileage
    • Sensitivity

Tableau: Load Census Shape Data

A. Download data from web

  • 2016 Census Boundary File
  • select ArcGIS (.shp)
  • select census subdivisions and continue to download
  • unzip file to local directory

B. Tableau

  • Data-> New Data Source ->Spatial File-> Open the .shp file
  • Open a new sheet, drag Geometry from Measures to the canvas
  • drag Ccsname over the Colour Property in Marks card
Coloured by census consolidated subdivision name

C. Attribute

  • FID: specific to ArcGIS
  • Shape: specific to ArcGIS
  • CCSUID: Uniquely identifies a census consolidated subdivision (2-digit province/territory + 2-digit census division code +3-digit census consolidated subdivision code)
  • CSDUID: Uniquely identifies a census subdivision (2-digit province/territory + 2-digit census division code +3-digit census subdivision code)
  • CCSNAME: Census consolidated subdivision name.
  • CSDNAME: Census subdivision name.
  • PRUID: Uniquely identifies a province or territory.
  • PRNAME: Province or territory name.
  • CDUID: Uniquely identifies a census division.
  • CDNAME: Census division name.
  • CDTYPE: Census division type.
  • CSDTYPE: Census subdivision type.
  • ERUID: Uniquely identifies an economic region ( 2-digit province/territory + 2-digit economic region code
  • ERNAME: Economic region name.
  • SACCODE: 3-digit statistical area classification code
  • SACTYPE: The statistical area classification groups census subdivisions according to whether they are a component of a census metropolitan area, a census agglomeration, a census metropolitan influenced zone or the territories.
  • CMAUID: Uniquely identifies a census metropolitan area/census agglomeration.
  • CMAPUID: Uniquely identifies the provincial/territorial part of a census metropolitan area/census agglomeration.
  • CMANAME: Census metropolitan area/census agglomeration name.
  • CMATYPE: Identify whether the unit is a census metropolitan area, a tracted census agglomeration or a non-tracted census agglomeration.