SAS: Missing Value Imputation Procedures

Unfortunately that some of my earlier posted pages are missing from the WordPress site and I am not sure how it happened.

reference: https://www.youtube.com/watch?v=JQHAA_AGxrE
reference: https://www.youtube.com/watch?v=V2wBBB0wNRc

It is very common when conducting survey, the results are incomplete or having missing values. Most of the statistics procedures will drop the observation with missing values which will some time render the survey unable to draw conclusion due to not enough sample size. SAS provides some procedures to impute those missing values so the observation can be used for analysis.

/* --- Identify Missing Value --- */
proc format;
 value $missfmt ' ' = 'Missing' other = 'Not Missing';
 value missfmt .  = 'Missing' other = 'Not Missing';
run;

proc freq ;
format _char_ $missfmt.;
format _numeric_ missfmt.;
tables _char_ / missing nocum nopercent;
tables _numeric_ /missing nocum nopercent;
run;
/* --- Multiple Impuatation Method --- */
/* --- Step 1. Generate Imputed Samples --- */
proc mi data = data nimpute=20 seed = 135782 
    out = ImputedSample;
var v1-v3;
run;
quit;

/* --- Step 2. Fit the Generated Imputed Samples --- */
proc reg data=  ImputedSample outest=Estimates covout;
model v1 = v2-v3;
by _Imputation_;
run;
quit;

/* --- Step 3: Use Proc Mianalyze   --- */
proc mianalyze data = Estimates;
    modeleffects Intercept v2 v3;
run;
quit;
/* --- Direct maximum Likelyhood --- */
/* --- Step 1. Proc Calis --- */
proc calis data = d2 method = fiml;
path v1 <--- v2 v3;
run;
quit;

Numeric missing value in the SAS table some time shows as . (dot)and other times as _ (underscore). When the value of the variable are used to create categorical variable, please taking the null value into consideration. Sometimes the condition ‘ne .’ would not exclude null values but ‘gt .’ can. The following macro is used to identify the significance of the p value. There are some null value in the p-value variable

%macro sig (data =,stat=, var =);
%let i=1;
%do %while (%scan(&data., &i, ' ') ne );
%let dsn=%scan(&data., &i, ' ');
%put &dsn.;
data &dsn._&stat.;
set &dsn._&stat. ;
format sig&var. $4. prob&var.  pvalue6.4;
if Prob&var.=0.01  then sig&var.="*";
if Prob&var.=0.001   then sig&var.="**";
if Prob&var.=0.0001  then sig&var.="***";
if Prob&var.<0.0001 and Prob&var. gt .  then sig&var.="****"; /* exclude null value */
if Prob&var.<0.0001 and Prob&var. ne .  then sig&var.="****"; /* ne . doesn't work. **** shows for null p-value/
run;
%let i = %eval (&i +1);
%end;
%mend sig;
%sig (data =group1 group2, stat= anova, var= f);

SAS: Batch Macro Variable Assignment by Class and Application in Dynamic Labeling

Set up “Class” table for Proc Tabulation.  “Class” table can force missing value in the table so the layout of the combination of class variables will not change due to 0 counts of the frequency of the variables.

data class;
do rank= 1, 2, 3;
do area = '0', '1';
do gender = 'F', 'M';
output;
end;
end;
end;
run;

Output Proc Tabulate summary to data set using “Class” layout. “Layout” data set contains frequency counts by rank by area and by gender.

proc tabulate  data = data out = layout missing classdata = class;
class rank area gender ;
var count;
table rank*(area="" all), (gender all)*count*(sum="");
run;

Add customary titles by rank by area in the “Layout” data set. Assign counts and titles to macro variables.

data layout;
set layout;
length varname $11 sum $2 title1 $60 title2 $60 vartitle1 $12 vartitle2 $12;
if count_sum = . then count_sum = 0;
sum = put(count_sum, 2.);
drop _TYPE_ _PAGE_ _TABLE_;
if gender = '' then gender = 'N';
if rank = 1 then title1 = "2017 Assistant Professors/Lecturers";
if rank=2 then title1 = "2017 Associate Professors/Lecturers";
if rank =3 then title1 = "2017 Fall Professors/Senior Lecturers";
if area = 0 then title2 = "General Arts"; 
if area = 1 then title2 = "Engineering";
if area = '' then do;
title2 = "All Areas of Specialization";
varname ="rank"||trim(left(rank_cd))||gender;
vartitle1 = "rank"||trim(left(rank_cd))||'t1';
vartitle2 = "rank"||trim(left(rank_cd))||'t2';
end;
else do;
varname = "rank"||trim(left(rank_cd))||"area"||trim(left(area))||gender;
vartitle1 = "rank"||trim(left(rank_cd))||"area"||trim(left(area))||'t1';
vartitle2 = "rank"||trim(left(rank_cd))||"area"||trim(left(area))||'t2';
end;
call symput(varname, sum);
call symput (vartitle1, title1);
call symput (vartitle2, title2);
run;
options symbolgen;

Application
Print the regression line by rank by area with corresponding titles and insert counts information of total observation and by gender on the respective chart. The “Inset” statement will translate the macro variables into format like N=75(46F/29M) and put into a text box inside of the axes of the plot.

ods pdf file="c:\test.pdf";
%macro plots;
%do j = 1 %to 3;
%do i = 0 %to 1;
title1 height=14pt "&&rank&j.area&i.t1";
title2 height=12pt "&&rank&j.area&i.t2";
proc sgplot data=glm_out (where=(rank=&j. and area = "&i.")) ;
scatter x=experience y=salary / group=gender grouporder=ascending  name='plot' markerattrs=(symbol=circlefilled) ;
series x=experience y=yhat / name='predict' legendlabel='ln(Predicted Sal)' lineattrs=(color=blue ) transparency = 0.5 ;
inset "N=&&rank&j.area&i.N (&&rank&j.area&i.f.F/&&rank&j.area&i.m.M)"  / position = bottomright  BORDER;
run;
quit;
%end;
%end;
%mend plots;
%plots
Ods pdf close;

SAS: Plot Regression Line with 2 Standard Deviation

Get GLM output in data set

proc glm data=data noprint;
class rank_cd area;
model salary= Experience rank area / predicted cli; * regress SALARY against the 3 predictor variables ;
output out = glm_out Predicted = yhat R=resid lcl=lcl lclm=lclm ucl=ucl uclm=uclm rstudent=rstd student=stu dffits = infl stdr =error;
run;

Calcuate sample standard deviation using GLM output data set

proc univariate data=glm_out;
var resid;
output out = univar_out STD = sample_std_devn;
run;

Assign sample standard deviation to macro variable

data std;
set univar_out;
call symput('sstd',sample_std_devn);
run;
%put &sstd.;

Update GLM output data set with 2 std information

data glm_out;
set glm_out;
ustd_2 = yhat + 2*&sstd.;
lstd_2 = yhat - 2*&sstd.;
run;

Plot regression line with 2 standard deviation lines (upper/lower)

proc sgplot data=glm_out (where = (rank = 1 and area =1);
scatter x=experience y=salary / group=gender grouporder=ascending name='plot' markerattrs=(symbol=circlefilled) ;
series x=experience y=yhat / name='predict' legendlabel='ln(Predicted Sal)' lineattrs=(color=blue ) transparency = 0.5 ;
series x=experience y=ustd_2 / name='upper' legendlabel='2 Standard Deviation' lineattrs=(color = lightblue) transparency = 0.5;
series x=experience y=lstd_2 / name='lower' legendlabel='2 Standard Deviation' lineattrs=(color = lightblue) transparency = 0.5;
run;
quit;

SAS: Configuration

SAS configuration file sasv9.cfg can be edited to change the SAS system options at SAS initialization. Certain system options can only be configured through the .cfg file, for example, -memsize and -altlog. -memsize defines the ram size used for SAS session; -altlog defines the file directory for copy of the sas log. In the configuration file, the syntax for system option is ‘-option’ which is different from when they are used in sas programs.

The Default Configuration file
!SASROOT = C:\Program Files\SASHome\x86\SASFoundation\9.3
Default folder (where all plots and results are saved) = C:\Program Files\SASHome\x86\SASFoundation\9.3

Sometimes there are two sasv9.cfg files in separate SAS folders, but one file will be pointing to the other sasv9.cfg which has the configuration setup.
Configuration location = C:\Program Files\SASHome\x86\SASFoundation\9.3\sasv9.cfg
-config “C:\Program Files\SASHome\x86\SASFoundation\9.3\nls\en\sasv9.cfg”
Default configuration location = C:\Program Files\SASHome\x86\SASFoundation\9.3\nls\en\sasv9.cfg

Example 1: change memsize

-MEMSIZE 4G

Example 2: save a copy of the SAS log from the session to user-defined directory

-ALTLOG  "C:\users\ussername\documents\log\%Y-%m-%d_%H-%M.saslog"

SAS: Output Table to Excel Using Tagsets.excelxp

Frequently we need to output SAS results to Excel.  With tagsets.excelxp, we will have the flexibility to create multiple sheets; define worksheet name(s) and column width; contrl titles and footnotes on each sheet.

Here is the basic tageset.excel for the Excel workbook.

ods listing close;
ods tagsets.excelxp path = "" file = "" style =
options (embedded_titles='' embedded_footnotes='' orientation='' sheet_name ='' pages_fitwidth='' pages_fitheight='' autofit_height=''
width_fudge="" absolute_column_width= "");
ods tagsets.excelxp close;

If the individual worksheet has different setting, put the following  statement before the corresponding SAS procedure that generates the content for the worksheet.

ods tagsets.excelxp options (sheet_name='RespHome');

Use sheet_name and sheet_interval to generate multiple sheets by groups defined in the procedure in the same Excel workbook.

... sheet_name='#byval1' sheet_interval="bygroup" ...

sample code:

ods listing close;
ods tagsets.excelxp path = "C:\SAS\" file = "sum.xls" style = journal
options (embedded_titles='yes' embedded_footnotes='yes' orientation='portrait' pages_fitwidth='1' pages_fitheight='1' autofit_height='yes'
width_fudge=".0625" absolute_column_width= "100");
title1 'Title1';
title2 "Title2";
footnote "Prepared by First Last @dept on %sysfunc(date(),worddate.)";
ods tagsets.excelxp options (sheet_name='homeresp');
proc tabulate data=count;
class home /order= formatted ;
class reponsible;
var count;
table (home="" all="Total"), (responsible="Responsible Faculty" all="Total")*(count="")*(sum="" )*f=10.3/ box = "Home Faculty" misstext = '0.000';
run;
ods tagsets.excelxp options (sheet_name='RespHome');
proc tabulate data=count;
class responsible /order= formatted ;
class home ;
var count;
table (responsible="" all="Total"), (home="Home Faculty" all="Total")*(count="")*(sum="" )*f=10.3/ box = "Resonsible Faculty" misstext = '0.000';
run;
ods tagsets.excelxp close;

SAS Proc Tabulate: Order Class Variable

The following block of code provides a cross-tabulation by student’s home faculty and responsible faculty who students took course from.  The special treatment in the code is the two class statements used for setting order for the home faculty class variable based on format of the home faculty.  The second class statement where the responsible class variable is defined doesn’t have the “/order” option, therefore the order for responsible variable will be based on internal value.

proc tabulate data=count;
class home /order= formatted ;
class responsible;
var ffte;
table (home="Home Faculty" all), (responsible="Responsible Faculty" all)*(count="")*(sum="");
run;

SAS: Data Set Compilation through Remote Submit and Download

Sometime it is better to submit the block of codes to the server for execution when the data sets are large and multiple years of data needed to be  compiled.  The remote calculation can also free up the calculation capacity on your own PC, so you can work on something else while waiting for the server.

The following codes are used to:

  • loop through 5 reporting periods each year;
  • retrieve specific variables that meet the conditions;
  • compile data set for each year;
  • stack data sets for multiple years and generate final data set with all years records for download;
  • delete data set for each year in the workspace on the server

The caveats for remote submit are:

  • codes need to be thoroughly validated locally before submit to the server
  • use “nodetails” and “nolist” to omit output report

To remote submit the code, you first select the code block,  then right click and then select “Remote Submit…”.

%let varlist1 = var1 var2 var3 var4 var5 var6 var7 ;
proc datasets library = work nodetails nolist;
delete opsis;
run;
%macro opsis (strtyear, stopyear);
%do yr= &strtyear %to &stopyear;
%put &yr.;
%let nextyr = %eval(&yr.+1);
%let charyr = %substr(&yr.,3,2);
%let charnextyr = %substr(&nextyr.,3,2);
%put &charyr.;
DATA opsis_&charyr.;
SET opsis.uecJun&charyr. (keep=&varlist1.)
opsis.uecJul&charyr. (keep=&varlist1.)
opsis.uecNov&charyr. (keep=&varlist1.)
opsis.uecFeb&charnextyr. (keep=&varlist1.)
opsis.uecMar&charnextyr. (keep=&varlist1.);
where var1 ne 'XX' and var2 ne '' and var3= 'XXX';
proc append base=opsis data =opsis_&charyr.;
run;
%end;
%mend opsis;
%opsis (2009, 2017);
proc download data = opsis out=opsis;run;
proc datasets library=work nodetails;
delete opsis_09 opsis_10-opsis_17;
run;

SAS: Proc Freq for all Character Variables

Before processing the large data set, it is a good practice to check the levels of the categorical varaible and make sure records are assigned to the right category and missing categories are treated.

Use _character_ in the freqency procedure to apply the procedure to all character variables.  Be careful not to include any character variables that are numeric in nature, eg id variable in the data set.  Using “nlevels” option to provide a summary table of number of the levels, missing levels, non-missing levels for each categorical variable.  Then check the frequency table for the specific categorical variable for the values of the levels.

proc freq data= table1 nlevels;
tables _character_ / nocum nopercent out=freqcnt;
run;
Number of Variable Levels
Variable Label Levels Missing Levels Nonmissing Levels
gender Gender Code 4 0 4
Var2 label2 12 1 11
Gender Code
gender Frequency
F 495320
M 402399
U 3475
{NUL} 1

SAS: Sample Data Sets

SAShelp and SASuser data library contain many data sets for the base SAS usages.  But for components other than base SAS, such as SAS/EST and SAS/STAT, the examples used in the user manual are not directly in the SAShelp and SASuser libraries.  Need to use the “Help” menu in SAS to access the datasets that are referred in the these user manuals.

  1. Help Menu
  2. SAS Help and Documentation
  3. Contents Tab
  4. Learning to Use SAS
  5. Sample SAS Programs
  6. Relevant Compontent (eg: SAS/STAT, SAS ETS)
  7. Samples

VBScript: Open and Process Excel Files in a Folder and Save in another Folder

In cmd, run “cscript test.vbs” under the folder where test.vbs is saved.

Contents of test.vbs

Set objFSO = CreateObject("Scripting.FileSystemObject")
sFolder = "X:\Work\test1\"
oFolder = "X:\Work\test1\result\"
Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = True
objExcel.DisplayAlerts = False
For Each objFile In objFSO.GetFolder(sFolder).Files
Set objWorkbook = objExcel.Workbooks.Open(sFolder & objFile.Name)
objWorkbook.sheets("delete").Select
objWorkbook.sheets("delete").Delete
objWorkbook.SaveAs oFolder & objFile.Name
objWorkbook.Close
objExcel.Quit
Set objWorkbook = Nothing
Next
Set objFSO = Nothing
Set objExcel = Nothing;