PowerBI and R Integration – Data Manipulation using R script

Reference: Power BI Introduction: Working with R Scripts in Power BI Desktop — Part 3, by Robert Sheldon.

In previous post I was focusing on using R script to develop visuals in PowerBI directly. In that case, the R scripts that are integrated with the visuals don’t generate datasets for PowerBI, and the R scripts are isolated for each visuals.

The PowerBI and R scripts can also be integrated to get and transform dataset. The caveat is the R scripts can only generate and transform one dataset at a time through Power Query Editor.

Get Data:

  • File / Home Ribbon > Get Data > More … > Other > R Script > Connect
  • The “cars” dataset appears in the “Model” view.
3 examples: cars (simple R script in Source step to GET DATA); cars_mean (R script in Source step and Run R script step); cars_co (more complex R script in Source step)
  • Click “Edit Queries” on the Home Ribbon to get into the “Power Query Editor”
  • Select “Car” from the “Queries” list
  • In the “Query Settings” the “Applied Steps” will show up in the sequece
  • To edit the original R script that gets the dataset, click the * besides the “Source” step (first step) and the “R script” box will appear. If the “Query Settings” is now showing, click “View” ribbon and “Query Settings”.
Source step in Power Query Editor for table cars.

Transform Data:

  • To create another dataset that contains the mean distance by speed group.
  • Repeat the same GET DATA steps above and name the dataset “cars_mean”
  • Get into “Power Query Editor”, select “cars_mean” table.
  • Click “Run R Script” on the “Transform” ribbon and enter the R codes that are showing below.
  • Unless in the sources step, you have to reference the dataset as “dataset” rather than the name of the table, eg “cars_mean”.
Run R script step in the Power Query Editor for table cars_mean
cars_mean table in the Data View.

More Complicated R script for Source step:

  • In the Source step, you can refer the dataset with its name. In this example, it is cars.
  • The R script needs to return a table. If the script only produce a value, you need data.frame() function to convert it to a table.
  • Make sure the required R package have been installed in the R library.
  • Use library() function to attach all the required libraries.
R script in the Source step for table cars_co in Power Query Editor
cars_co in Data View

Advanced Editor:

PowerBI and R Integration – Setup and Visualization Example

Reference: https://docs.microsoft.com/en-us/power-bi/desktop-r-visuals

Notes for setup:

  • In PowerBI: File -> Options and Settings -> Options -> R scripting
  • The libraries required to run R visuals in PowerBI needs to be installed in the R library. The R codes in PowerBI can’t process install.packages(” “) function.
  • Go to c:\program Files\R\R-x.x.x\library folder to check if you already have the required R package installed.
  • In the Options for “R scripting”, click “Change temporary storage location” to point to c:\program Files\R\R-x.x.x\library folder.
Set up R in PowerBI

Note for the PowerBI and R Integration Example:

  • Use “Get Data” to get raw dataset into PoweBI, then the variables will show up in the “Fields” panel.
  • Click “R” in the “Visualizations” panel, then a grey visual placeholder will show up in the body section of the page and the “R script editor” will show up at the bottom of the page.
  • Drag or select variables in the “Fields” panel, the variables will be added to the “Values” section on the “Visualizations” panel, and the variables will be automatically incorporated in a dataframe called “dataset”. You don’t need to do additional code to create the dataframe, and don’t remove the “#” sign from line 1 to line 7 of the code.
  • Starting on line 7, create your own r script.
  • Start with require(” “) and library() to attached the required packaged for this R visual.
  • Filters are interactive with the R visual.
  • Multiple R visuals can be created on the same page and each visual has its own R codes.

Data Visualization – Sankey Diagram in PowerBI

I did the sankey diagram in PowerBI a while ago for a faculty data request regarding students flow into and out of selected interdisciplinary majors.
Both PowerBI and Tableau have apps for building sankey diagram, but personally I think sankey app for PowerBI is more user friendly.
Here is the link to the mock-up Sankey diagram for a fake program. You can filter by major, entry status, year level and you can build more programs and cohorts to it.
Sankey Example

SAS: Proc SQL Group By generate duplicate results and fix

I need to get a list of sum of award amount by year by student id and by award. The following codes generated duplicated results and a warning message.

Problem:

proc sql;
create table want as
select year, id, award, funding_type, type, sum(amount) as sum, 1 as count
from have
where (funding_type eq "A" and type = "Internal") or (funding_type eq "B")
group by year, funding_type, sisid, award;quit

“The query requires remerging summary statistics back with the original data. “

Fix: “In SAS SQL, in a query with a group by clause that includes extraneous columns on the select statement (i.e. columns not part of the group by and not derived from an aggregating function), SAS “remerges” the summary statistics back to the original data (with a note to that effect).”
Reference:
https://stackoverflow.com/questions/25538392/sas-proc-sql-returning-duplicate-values-of-group-by-order-by-variables

Because “type” variable is in the select statement but not in the group by statement, SAS needs to remerging the summary statistics back with the original data to get information on the “type” variable. In the following codes, I added “type” to the group by statement and the warning message was gone.

proc sql;
create table want as
select year, id, award, funding_type, type, sum(amount) as sum, 1 as count
from funding
where (funding_type eq "A" and type = "Internal") or (funding_type eq "B")
group by year, funding_type, type, id, award;
quit;

Proc SQL: Update with other table

Update statement in Data step:

Syntax:

Data master;
update master  transaction;
by key1 key2;
run;

Limitation:

  • The master data set should be sorted by the by group with no duplicated grouping id.
  • The transaction data set should be sorted by the by group with no duplicated grouping id.
  • Transaction data set should have just the column that contains the updated information.

Update using Proc SQL Update statement:

Syntax:

proc sql;
update table1 as t1
set major=(select major from table2 as t2 where t1.id = t2.id and t1.year=t2.year)
where exists (select 1 from table2 t2 where t1.id = t2.id and t1.year=t2.year);
quit;

Limitation:

  • Can’t use join statement. Can only use nested Select statement.
  • Code is a little complicated to comprehend.

SAS: Identify Lag and Lead Status by Row

For character variable.

data want;
set have;
by id;
set  have (firstobs =2 keep = index0 rename= (index0= index1))
     have (obs=1 drop = _all_ );
indexlag1 = ifc( first.id, '', lag1(index0));
index1 = ifc( last.id, '', index1);
run;

For numeric variable.

data want;
set have;
by id;
set  have (firstobs =2 keep = index0 rename= (index0= index1))
     have (obs=1 drop = _all_ );
indexlag1 = ifn( first.id, (.), lag1(index0));
index1 = ifn( last.id, (.) , index1);
run;

SAS: Identify Row Level Changes and Tagging

Sort the dataset first and identify the change using lag(var).
The example below shows if the student cumulative credits drops from previous term, the count of degree will increase by 1.

/* identify the row observation that the change took place */
data data2;
set data1;
by sisid term; 
if first.sisid then deg =1;
else if sisid = lag1(sisid) and cumm_cr < lag1(cumm_cr) then deg = 2;
run;
/* tagging the observations associated with the change */
data data3 ;
set data2;
by sisid;
retain temp;
if first.sisid then do; temp = 0; end;
if deg ne . then temp = deg;
else if deg eq . then deg= temp;
run;

Updated to

data data2;
set data1;
by sisid term; 
retain deg;
if first.sisid then deg =1;
else if sisid = lag1(sisid) and cumm_cr < lag1(cumm_cr) then deg = deg +1;
run;

Access VBA to Customize Report

In Microsoft Access, VBA code can be used to customize Access reports. Various subroutines can be embedded in the components of the report using “Private Sub”.
The following example applies to the format function/procedure of the Categoryfooter section of the report, so the notes(label) can be customized, showing or hiding the note contents fore specific group.
View of the report design:

tab012

The components in the report are objects that be used for VBA coding. The name of the components can be checked through the property window. The name of the components needs to match the object names in the VBA code window.

View of the project explorer:
tab010

View of the object and procedure selector in the code window:
tab011

In order for the VBA code to be effective, the code section are uniquely named as object_procedure. When you move the cursor from section to section in the code window, the Object and Procedure selector at the top of the code window will always reflect the object and procedure of the section where your cursor is.

Private Sub Categroyfooter_Format(Cancel As Integer, FormatCount As Integer)
If Me.level1.Value = "A" Then
    Me.CategoryFooter.Visible = True
    Me.catAfootnote.Visible = True
    Me.catAfootnote.Caption = "*Note:For 18/19, the admission information are as of the reporting date. "
ElseIf Me.level1.Value = "C" Then
    Me.CategoryFooter.Visible = True
    Me.catAfootnote.Visible = True
    Me.catAfootnote.Caption = "*Note:For 18/19, the enrolment FTEs are as of the reporting date."
Else
    Me.catAfootnote.Visible = False
    Me.CategoryFooter.Visible = False
    End If
End Sub

Tableau: Ratios

Below is a summary table by census topic and sub-topic vs. by geographic area. I want to show instead of the sum, the percentage.
tab007
Select Analysis from the menu. In the drop down menu, select Create Calculated Field…. Create “Percent of Total” with the following formula.
tab008
Drop the “Percent of Total” from Measures to the Text icon in Marks.
tab009