SAS STAT: getting output statistics (1)

SAS STAT package has many procedures that can be used to conduct specific statistical analysis with structured data sets.  I found the following use of precedures quite common and there are usually a number of statistics needed to be output for further compilation.  The caveat is to sort the data set in the desired order, so the null hypothesis is in the upper left cell (1,1) in the two-way frequency table, which normally showes the results of study group with a respone equal to 1. This way the output statistics can be interpreted more straight forward.

A. How to get output from statistics procedures?

There are different ways for outputing statistics, eg. ODS OUTPUT, OUTPUT statement or OUT=;

a. use PROC MEANS for general statistics like n, mean, min, max std;

  • use maxdec= option to adjust the decimal places needed.
  • output out=filename: N, MIN, MAX, MEAN, STD, SUMWGT

b. use PROC TTEST for two-sample independent T-test; use CLASS Variable to identify and differentiate the groups for study cases and control cases;

  • ods output
    • ttests=filename: tValue, DF, (pick Satterthwaite method which assume unequal variance);
    • statistics=filename: n, mean, LowerCLMean, UpperCLMean, StdDev, StdErr for ‘0’,’1′, Diff (1-2);
    • equality=filename: fValue, probf (equality of variances)

c. use PROC FREQ for two-sample independent T-test

  • tables statement, out option
    • out= filename: default only provide the TABLES variable, frequency count, percent of total frequency.  To include percent of row frequency  and percent of column frequency, you need to add OUTPCT in TABLES statement.
  • output statement out option
    • Output  relrisk  out = filename;
  • ods output
    • ChiSq=filename: Chi-Square, Prob
    • PdiffTest=filename: Proportion Difference Test, Wald is the default method, Proportion Difference, Z Value, One-sided Prob and two-sided prob.
    • RelativeRisks=filename: relative risk estimate, case-control (odds ratio), column 1 risk, column 2 risk, 95% L/U Confidence Limit

Overall, the ODS OUTPUT is the most versatile and powerful method to obtain statisics results from these procedure.  Check “ODS Table Names” under each procedure in the SAS STAT User Guide.  The SAS procedure assigns a name to each table that it creates.  Use ODS Table Name= filename to assign your own table name to be saved in the work library.  In order to have the output ods table, you also need to check if you have included the corresponding option(s) in the specific statement for the program generate the statistics.

Sample code for Two-sample Proportion Test (include all 3 methods of output):

%macro prop ( in =, var=, out=, weight=);
proc sort data=∈
by descending study descending &var.;
run;
ods graphics on;
ods output ChiSq = &out.chi_&var. PdiffTest=&out.pdiff_&var. RelativeRisks=&out.rr_&var.;
proc freq data=&in. order =data ;
format study grpfmt. &var. rspFmt.;
weight &weight.;
tables study*&var. / chisq measures riskdiff(equal) outpct out=&out._&var.
plots= (freqplot(twoway=groupvertical scale =percent));
output relrisk out=&out._or_&var.;
title "Proportion Test:Case - Control study of variable &var. for &out.";
run;
ods output close;
ods graphics off;
%mend prop;
%prop ( in = g1pair, var= isRetainYr1, out =g1prop, weight = weight1);

*”The Satterthwaite approximation of the standard errors differs from the Pooled method in that it does not assume that the variances of the two samples are equal. This means that if the variances are equal, the Satterthwaite approximation should give us exactly the same answer as the Pooled method.” (Reference: https://wolfweb.unr.edu/~ldyer/classes/396/PSE.pdf)

SAS STAT: Weight Statement

Weight statement can be used to exclude observation from the statistical procedure. If Weight statement is not specified, each obs has a default weight of 1.

PROC MEANS

0: Counts the obs in number of observations
<0: convert to 0 and treat like 0
missing: exclude

PROC FREQ

0: ignore obs unless specify the ZEROS option, which include obswith 0 weights.
missing: exclude

PROC TTEST

0: exclude
<0: exlude
missing: exclude

Basically, assigning a missing value to the weight field to exclude the observation and assigning 1 to the weight field to include the observation in the analysis.

SAS: Sampling using PROC SURVEYSELECT

STEP1: SEPARATE CASES AND CONTROL DATASETS WITH ALL THE VARIABLES OF INTEREST

STEP2: ESTABLISH INDEX:

%let index1 = year||entry_lvl||regist||faculty||isRetainYr0;
%let index2 = year||entry_lvl||regist||faculty||isRetainYr1;
%let index3 = year||entry_lvl||regist||faculty||isRetainYr2;
%let index4 = year||entry_lvl||regist||faculty||isRetainYr3;
  • Index are developed based on the characters you want to control. In the example, we want to control the year, the study level, the registration status (full-time or part-time) and the faculty when the student originally entering into the program;  the retention status of the student.

STEP3: GET FREQUENCY DISTRIBUTION FOR CASES AND CONTROL DATASETS BY INDICES

%macro match (sur1=, sur2=);
data cases&sur1.(keep =id index&sur2. );
set cases (keep= id &match.);
index&sur2. = compress(&&index&sur2) ;
run;
proc freq data = cases&sur1. noprint;
tables index&sur2./list missing out =casescnt&sur1. (keep = index&sur2. count rename=(count=casescnt));
run;
%mend match;
%match (sur1=y1, sur2=1);
%match (sur1=y2, sur2=2);
%match (sur1=y3, sur2=3);
%match (sur1=y4, sur2=4);
%macro match (sur1=, sur2=);
data pool&sur1.(keep =id index&sur2. );
set pool (keep= id &match.);
index&sur2. = compress(&&index&sur2) ;
run;
proc freq data = pool&sur1. noprint;
tables index&sur2./list missing out =poolcnt&sur1. (keep = index&sur2. count rename=(count=poolcnt));
run;
%mend match;
%match (sur1=y1, sur2=1);
%match (sur1=y2, sur2=2);
%match (sur1=y3, sur2=3);
%match (sur1=y4, sur2=4);

STEP4: MERGE CASES AND CONTROL BY INDEX AND CHECK THE CASES THAT HAVE NO POSSIBLE MATCHING CONROL CASES BY INDEX. IF THERE ARE MANY STUDY CASES NOT ABLE TO FIND MATCHING CONTROL CASES (eg. size of control is smaller than the sized of study cases), THEN WE NEED TO CHANGE THE INDEX.

%macro match (sur1=, sur2=);
data count&sur1.;
merge casescnt&sur1. (in =a ) poolcnt&sur1. (in =b);
by index&sur2.;
if casescnt >0;
if a and not b then poolcnt = 0;
nsize = min(casescnt, poolcnt);
run;
data count&sur1. notindex&sur1.;
set count&sur1.;
if nsize = 0 then output notindex&sur1.;
else output count&sur1.;
run;
%mend match;
%match (sur1=y1, sur2=1);
%match (sur1=y2, sur2=2);
%match (sur1=y3, sur2=3);
%match (sur1=y4, sur2=4);

STEP5: LIMIT THE CONTROL CASES WHERE THE INDICES EQUAL TO THOSE OF THE STUDY CASES. SAMPLING BASED ON THE SIZE OF THE STUDY CASES BY EACH STRATA.

%macro match (sur1= , sur2= );
proc sql;
create table pool&sur2. as
select *
from pool&sur1.
where index&sur2. in (select index&sur2. from count&sur1.);
proc sort data = pool&sur2.;
by index&sur2.;
run;
proc surveyselect data = pool&sur2.
sampsize = count&sur1.
method = srs
seed = 300001
out=selected&sur1.;
strata index&sur2.;
run;
%mend match;
%match (sur1=y1, sur2=1);
%match (sur1=y2, sur2=2);
%match (sur1=y3, sur2=3);
%match (sur1=y4, sur2=4);

*change the seed number to get different random samples.  Use the same seed number to achieve reproducible results.

 

 

SAS: Update values with the last record and keep other attributes of the first record by group

The list contains course grades for students and the year students took the course.  It is possible that students can retake the course to improve their gpa.  We want a list that the last records of the students are showing all the grades that they took in the past and if a course was taken multiple times then the most recenct grade of that course.  We also want the last records of the student to show the initial year that the students were taking these course.

The following marco used a temp variable to hold the grade of the course and retain the value for the same student and if the course is retaken, the temp variable will be assigned with new grade, and if the course grade is empty, then the value retained in the  temp variable will be assigned to the course grade.

%macro lst (in =, out=, crs=);
data &out.;
drop temp ;
set &in.;
by id;
length temp $2 firstcrsyr $4 ;
retain temp firstcrsyr ;
if first.id then do; temp = '' ; firstcrsyr = yr; end;
/* Assign TEMP when courseX is non-missing /
if &crs. ne '' then temp=&crs.;
/ When X is missing, assign the retained value of TEMP into courseX */
else if &crs. eq '' then &crs.=temp;
run;
%mend lst;
%lst (in = list1, out = list2, crs= course1);
%lst (in = list2, out = list2, crs= course2);
%lst (in = list2, out = list2, crs= course3);

SAS: Cumulative Count by Study and Control Group

This example is used to set up pairing id (PAIRID) for study group and control group. Variable INDEX is the matching character. Variable STUDY identify the group membership (0 = control group; 1= study group).  Variable ID is the unique identifier. In the SAMPLE dataset, number of membership are equal for both the control group and study group at each strata.

data pair;
set sample;
by study index;
count =1;
if first.study then pairid = 0;
pairid +count;
drop count;
run;

*Not using retain statement.

SAS: Proc SQL Where … In to subset

SQL where … in can be used to subsetting dataset which meet contain condition and the condition is in a separate dataset.

Table ONE is a large dataset and contains variable ID and other related information. Table TWO contains unique ID that of interest.  Table THREE is the subset of table ONE which match the ID value with table TWO.

proc sql;
create table THREE as
select *
from ONE
where ID in (select ID from TWO);

 

Data: Curl Installation and batch download PDF files from a website

cURL Installation

  1. download carcet.pem from  https://curl.haxx.se/docs/caextract.html
  2. download curl from https://curl.haxx.se/download.html (I use a windows system, so I download the binary file for windows 64 bit)
  3. extract curl to c:\curl and put carcet.pem under c:\curl\bin folder
  4. add environment variable curl with the path of curl.exe

Curl is useful for downloading files from the website.  The basic command is the following:

curl -O url

This command needs be run under the c:\curl\bin folder. The files are downloaded to C:\Users\username\

Purpose: download SAS 9.3 user guide pdf files from https://support.sas.com/documentation/onlinedoc/stat/930/

  1. In window cmd, under c:\curl\bin, run curl -o index https://support.sas.com/documentation/onlinedoc/stat/930/
    • generate the index file which contains the wrapped source html code of the webpage
  2.  open git bash, run following
      1. cd c
      2. cd curl
      3. cd bin
      4. grep -i pdf  index > list
  3. list contains the href=”*.pdf”. Use Excel text to column to get only the name of the pdf files.
  4. open list in notepad++ and at the bottom of the window, it shows “Windows (CR LF)”,  right click and select “Unix (LF)”.  This will solve the error “curl: (3) Illegal characters found in URL”
  5. Start a new bash file in notepad++ with the following code
    1. echo “Start!”
      url=https://support.sas.com/documentation/onlinedoc/stat/930/
      while read query
      do
      curl -O “$url${query}”
      echo $url${query}
      done < list
    2. save as echo
  6. in git bash, navigate to where the echo file is and run following
    1. bash echo

Check SAS license and components installed

proc setinit; run;

Proc setinit provides the components list with the expiration dates.

—Base SAS Software
—SAS/STAT
—SAS/GRAPH
—SAS/ETS
—SAS/FSP
—SAS/OR
—SAS/AF
—SAS/IML
—SAS/QC
—SAS/SHARE
—SAS/LAB
—SAS/ASSIST
—SAS/CONNECT
—SAS/INSIGHT
—SAS/EIS
—SAS/SHARE*NET
—MDDB Server common products
—SAS Integration Technologies
—SAS/Secure Windows
—SAS Enterprise Guide
—SAS Bridge for ESRI
—OR OPT
—OR PRS
—OR IVS
—OR LSO
—SAS/ACCESS Interface to DB2
—SAS/ACCESS Interface to Oracle
—SAS/ACCESS Interface to Sybase
—SAS/ACCESS Interface to PC Files
—SAS/ACCESS Interface to ODBC
—SAS/ACCESS Interface to OLE DB
—SAS/ACCESS Interface to Teradata
—SAS/ACCESS Interface to MySQL
—SAS/IML Studio
—SAS Workspace Server for Local Access
—SAS/ACCESS Interface to Netezza
—SAS/ACCESS Interface to Aster nCluster
—SAS/ACCESS Interface to Greenplum
—SAS/ACCESS Interface to Sybase IQ
—DataFlux Trans DB Driver
—SAS Framework Data Server
—Reserved for Dataflux

proc product_status; run; 

Get the version of the product.

SAS Proc Upload and Download

SAS server processes dataset at a much faster rate than the local desktop.  To send the datasets from the WORK library of the desktop to the WORK library of the server,  and get the processed datasets from WORK library of the server to the WORK library of the desktop, we need to use PROC UPLOAD and PROC DOWNLOAD.

proc update data =localdata out =serverdata; run;
…code…
proc download data=serverdata out =localdata; run;