PowerBI and R Integration – Setup and Visualization Example

Reference: https://docs.microsoft.com/en-us/power-bi/desktop-r-visuals

Notes for setup:

  • In PowerBI: File -> Options and Settings -> Options -> R scripting
  • The libraries required to run R visuals in PowerBI needs to be installed in the R library. The R codes in PowerBI can’t process install.packages(” “) function.
  • Go to c:\program Files\R\R-x.x.x\library folder to check if you already have the required R package installed.
  • In the Options for “R scripting”, click “Change temporary storage location” to point to c:\program Files\R\R-x.x.x\library folder.
Set up R in PowerBI

Note for the PowerBI and R Integration Example:

  • Use “Get Data” to get raw dataset into PoweBI, then the variables will show up in the “Fields” panel.
  • Click “R” in the “Visualizations” panel, then a grey visual placeholder will show up in the body section of the page and the “R script editor” will show up at the bottom of the page.
  • Drag or select variables in the “Fields” panel, the variables will be added to the “Values” section on the “Visualizations” panel, and the variables will be automatically incorporated in a dataframe called “dataset”. You don’t need to do additional code to create the dataframe, and don’t remove the “#” sign from line 1 to line 7 of the code.
  • Starting on line 7, create your own r script.
  • Start with require(” “) and library() to attached the required packaged for this R visual.
  • Filters are interactive with the R visual.
  • Multiple R visuals can be created on the same page and each visual has its own R codes.

Data Visualization – Sankey Diagram in PowerBI

I did the sankey diagram in PowerBI a while ago for a faculty data request regarding students flow into and out of selected interdisciplinary majors.
Both PowerBI and Tableau have apps for building sankey diagram, but personally I think sankey app for PowerBI is more user friendly.
Here is the link to the mock-up Sankey diagram for a fake program. You can filter by major, entry status, year level and you can build more programs and cohorts to it.
Sankey Example

SAS: Proc SQL Group By generate duplicate results and fix

I need to get a list of sum of award amount by year by student id and by award. The following codes generated duplicated results and a warning message.

Problem:

proc sql;
create table want as
select year, id, award, funding_type, type, sum(amount) as sum, 1 as count
from have
where (funding_type eq "A" and type = "Internal") or (funding_type eq "B")
group by year, funding_type, sisid, award;quit

“The query requires remerging summary statistics back with the original data. “

Fix: “In SAS SQL, in a query with a group by clause that includes extraneous columns on the select statement (i.e. columns not part of the group by and not derived from an aggregating function), SAS “remerges” the summary statistics back to the original data (with a note to that effect).”
Reference:
https://stackoverflow.com/questions/25538392/sas-proc-sql-returning-duplicate-values-of-group-by-order-by-variables

Because “type” variable is in the select statement but not in the group by statement, SAS needs to remerging the summary statistics back with the original data to get information on the “type” variable. In the following codes, I added “type” to the group by statement and the warning message was gone.

proc sql;
create table want as
select year, id, award, funding_type, type, sum(amount) as sum, 1 as count
from funding
where (funding_type eq "A" and type = "Internal") or (funding_type eq "B")
group by year, funding_type, type, id, award;
quit;

Proc SQL: Update with other table

Update statement in Data step:

Syntax:

Data master;
update master  transaction;
by key1 key2;
run;

Limitation:

  • The master data set should be sorted by the by group with no duplicated grouping id.
  • The transaction data set should be sorted by the by group with no duplicated grouping id.
  • Transaction data set should have just the column that contains the updated information.

Update using Proc SQL Update statement:

Syntax:

proc sql;
update table1 as t1
set major=(select major from table2 as t2 where t1.id = t2.id and t1.year=t2.year)
where exists (select 1 from table2 t2 where t1.id = t2.id and t1.year=t2.year);
quit;

Limitation:

  • Can’t use join statement. Can only use nested Select statement.
  • Code is a little complicated to comprehend.

SAS: Identify Lag and Lead Status by Row

For character variable.

data want;
set have;
by id;
set  have (firstobs =2 keep = index0 rename= (index0= index1))
     have (obs=1 drop = _all_ );
indexlag1 = ifc( first.id, '', lag1(index0));
index1 = ifc( last.id, '', index1);
run;

For numeric variable.

data want;
set have;
by id;
set  have (firstobs =2 keep = index0 rename= (index0= index1))
     have (obs=1 drop = _all_ );
indexlag1 = ifn( first.id, (.), lag1(index0));
index1 = ifn( last.id, (.) , index1);
run;

SAS: Identify Row Level Changes and Tagging

Sort the dataset first and identify the change using lag(var).
The example below shows if the student cumulative credits drops from previous term, the count of degree will increase by 1.

/* identify the row observation that the change took place */
data data2;
set data1;
by sisid term; 
if first.sisid then deg =1;
else if sisid = lag1(sisid) and cumm_cr < lag1(cumm_cr) then deg = 2;
run;
/* tagging the observations associated with the change */
data data3 ;
set data2;
by sisid;
retain temp;
if first.sisid then do; temp = 0; end;
if deg ne . then temp = deg;
else if deg eq . then deg= temp;
run;

Updated to

data data2;
set data1;
by sisid term; 
retain deg;
if first.sisid then deg =1;
else if sisid = lag1(sisid) and cumm_cr < lag1(cumm_cr) then deg = deg +1;
run;

Access VBA to Customize Report

In Microsoft Access, VBA code can be used to customize Access reports. Various subroutines can be embedded in the components of the report using “Private Sub”.
The following example applies to the format function/procedure of the Categoryfooter section of the report, so the notes(label) can be customized, showing or hiding the note contents fore specific group.
View of the report design:

tab012

The components in the report are objects that be used for VBA coding. The name of the components can be checked through the property window. The name of the components needs to match the object names in the VBA code window.

View of the project explorer:
tab010

View of the object and procedure selector in the code window:
tab011

In order for the VBA code to be effective, the code section are uniquely named as object_procedure. When you move the cursor from section to section in the code window, the Object and Procedure selector at the top of the code window will always reflect the object and procedure of the section where your cursor is.

Private Sub Categroyfooter_Format(Cancel As Integer, FormatCount As Integer)
If Me.level1.Value = "A" Then
    Me.CategoryFooter.Visible = True
    Me.catAfootnote.Visible = True
    Me.catAfootnote.Caption = "*Note:For 18/19, the admission information are as of the reporting date. "
ElseIf Me.level1.Value = "C" Then
    Me.CategoryFooter.Visible = True
    Me.catAfootnote.Visible = True
    Me.catAfootnote.Caption = "*Note:For 18/19, the enrolment FTEs are as of the reporting date."
Else
    Me.catAfootnote.Visible = False
    Me.CategoryFooter.Visible = False
    End If
End Sub

Tableau: Ratios

Below is a summary table by census topic and sub-topic vs. by geographic area. I want to show instead of the sum, the percentage.
tab007
Select Analysis from the menu. In the drop down menu, select Create Calculated Field…. Create “Percent of Total” with the following formula.
tab008
Drop the “Percent of Total” from Measures to the Text icon in Marks.
tab009

Tableau: Hierarchical Filter

  1. Organize data into hierarchy by drag dimension to other dimension. In the example, the Topic is a hierarchy structure, with L1Topic>L2Topic>Item.
    tab001
  2. Hierarchy structure can be used for Conditional filter. In the following examples, when the L1Topic is selected for Mobility or Labour main topic respectively, the L2Topic automatically shows only the associated sub-topics.
    tab003tab002
  3. Sort field by value of another field. Before sorting the L2Topic and Item fields, the rows are in alphabetical order as default.
    tab005
    Select sort for the field and choose sort by field, select Item ID as field name and choose sum as aggregation method.
    tab004
    The L2Topic and Item are now ordered by Item ID which doesn’t have to show in the result table.
    tab006

4. Hierarchical Filter Configuration

  • In the filter, select only relevant values for the hierarchical filter to take effect.
  • When data source are linked, the “Only Relevant Values”, “All Values in Hierarchy”, and “All Values in Database” may disappear, then you will not be able to use the hierarchical filter.