Shiny - Data Visualization in R
When Microsoft Access and Excel were the main tools for business data storage and analysis, business people liked to create data visualization in them directly. They used name manager to capture inputs, set up data validation to restrict impossible values, programmed ifelse() as well as vlookup() functions to manipulate and merge datasets and created VBA to automate the whole flow. Previously, this process was straightforward, user-friendly and easy to revise. However, when big data is becoming popular these days and when interactions among data sources are much more complicated than before, Excel seems to have fewer advantages. Therefore, more and more data scientists and business analysts turn to R, an open source IDE, for data manipulation, due to its powerful, cutting-edge toolkits and robust, vibrant community. After data is cleaned and models are built in R platform, people want to create visualizations directly in R platform, too. This is when shiny package stars to shine.
Structure of a Shiny Application
For traditional R programming, all codes can be written in one piece of script but shiny apps have to have two addition scripts:
• A user-interface script
• A server script
The user-interface script has to be named “ui.R” and it controls the appearance of the shiny app’s inputs and outputs. On one hand, shiny apps are interactive and get their inputs from users. For example, users can control the inputs through sliders or dropdown list. The user-interface script “ui.R” determines where the sliders and/or dropdown lists are located, what the sliders’ or dropdown lists’ labels and choices are, what variables the inputs are assigned to and etc.. Besides inputs from sidebar or dropdown list, there is a title command, which determines what title is displayed in the shiny app. On the other hand, “ui.R” determines what graphs are output and displayed in the main panel. However, the main panel command only list the name of the object. For specific attributes such as graph functions or data, people are supposed to refer to sever script, “server.R”, since user-interface script only deals with interaction with users.
The server script contains the codes that computers need to display in the app and, hence, the “server.R” file does the real computation for the data visualization and connects between inputs and outputs. On one hand, the inputs can be imported from Excel or files of other formats. On the other hand, the inputs can come from user-interface script. No matter where the inputs are from, they might need some steps of calculations and/or formations before display. For data visualization, the functions to generate displays, either histogram, scatter plot or even map, are required in this script.
Running an App
All data visualization through shiny should be run through a function called runApp(). The one and only argument of the runApp() function is a folder path, the path of the working directory that stores files for this shiny app. The file names of the user-interface script and server script are not arguments of runApp(). Therefore, their names have to be “ui.R” and “server.R” for the runApp() function to recognize. This also indicates that the working directory and shiny function have to be one-to-one paired, because it is impossible to saved two “ui.R” or two “server.R” scripts into one folder or input two paths into one runApp() function. Shiny official website says runApp() is similar to read.csv() and read.table(). However, when I am searching for analogue, I just see it as a simple function whose argument is required to be a folder path.
The commands are always nested and complicated. Therefore, I highly suggest you into adopting existent and well-written codes at first and write your own only after you get very familiar with this package. The instructions of the 2 examples below are very detailed. The other reason why I am proposing these examples is that they can be used to solve actuarial problems. The first example is an interactive histogram where the user can change ceiling of losses and see how the histogram of the capped loss distribution would change accordingly. The second example displays deductibles and severities in maps by zip codes. The coding is as easy to understand as examples from official website, http://shiny.rstudio.com/, but the scenarios narrow down to answer questions from actuarial analysis. Therefore, it is a good idea for actuaries to learn shiny app from these examples as alternative approaches.
Firstly, you need to download the compressed folder called “Capped Losses” from the bottom-right of this files page. Then, decompressing is needed and the files “Capped Losses.R”, “server.R” and “ui.R” should be saved in a unique working directory — I call it “Capped Losses”.
Assuming a catastrophe happened in the past policy year and we have a list of incurred losses in file “incurred losses.csv”. As you can see in the first couple rows, the distribution is right skew due to the catastrophe. If we want to slightly remove the impacts of that catastrophe, we might need to cap the incurred losses and multiply the capped losses by a factor. Concerning which ceiling to cap in order to have a less skew distribution, we can run the following R codes for data visualization. First, you need to open “Capped Losses.R” file in R or RStudio.
The next step is to update the assignment of “my_app” with your file path instead of using mine. The file is the working directory where you just save the downloaded files. When you run codes as shown above, the following histogram will appear.
So far, the ceiling is defaulted to be the maximum of the incurred losses and hence the losses are uncapped. Within the working directory, “ui.R” file controls the side bar which is a slider for the ceiling. Therefore, when you move the widget, the ceiling assigned to the “input” list in “server.R” will change accordingly. This change affects the histogram shown on the right, because the ceiling is the max argument in “server.R”’s histogram function(see assignment of object “breaks”). You can play around with the widget and try some other values smaller than $3,000. For example, when you set the ceiling to be $1,000, the distribution is much less skewer and might be a better distribution to multiply the catastrophe factor. To be more extreme, you can set the ceiling to be $0. I am not surprised that the histogram is green all over. It is because all losses are now $0 and the density of less than or equal to $0 is 1.
Again from the bottom-right of this blog page, there is another compressed folder called “Deductibles in Map”. When you decompress it, you will see four files, “Deductibles in Map.R”, ” server.R” and “ui.R” and “sev vs ded.csv” as well as a subfolder called “census-app”. These four files contain codes and internal data for the second app and the subfolder contains the codes to draw maps by counties.
Assuming that deductibles are unique for their counties and severities are calculated by dividing the sums of paid losses by the sums of claim numbers for each county. When we increase a deductible, we expect its severity in the same county to decrease, because incurred losses are subtracted by a larger deductibles before being paid. The following R codes can produce data visualization to test this hypothesis.
Internal data “sev vs ded.csv” has a list of severities and their corresponding deductibles are recorded in another list. Besides checking on the correlations between severities and deductibles, you might also want to map them both and see the relationship visually by colors. Hence, I borrow well-written source code, “helpers.R”, which write a function called “percent_map” to deal with the county data set. According to the comments in “helpers.R”, it may not work correctly with other data sets if their row order does not exactly match the order in which the maps package plots counties, but you have the idea how internal data is plotted geographically.
It is a little more complicated than the previous example but the idea is the same. In “ui.R”, you define an inputs from dropdown list. Inside the selectInput () function, you can determine the label, possible choices or even selected default value of choice, “Percent White” in this case. The input is called “cor” for later reference. In “server.R”, “cor”, a text variable, is used to calculated the “cor” component in the “output” list, which is then displayed in “ui.R”.
The only thing you need to do is open “Deductibles in Map.R” file in Rtudio, input your working directory path to replace the assignment of “my_app” and run the 3-line codes. Then, the following app will show up like magic.
The first map is colored by the county severities, the more severest the darker, while the second map by the county deductible, the more severest the darker as well. It is obvious that we see some large deductibles in the counties of Florida, Nevada and Wyoming. Correspondingly, their severities appear to be lower and hence lighter green than the other states. Besides data visualization, we get some numbers that reflect the relationship. When you change the dropdown list, the correlation shown between the maps are change as well. Though different absolute values, they are all negative and less than 0.5, hence significant negative.
There are more to explore from the shiny’ gallery, http://shiny.rstudio.com/gallery/ and user showcase https://www.rstudio.com/products/shiny/shiny-user-showcase/. In summary, each app should have a unique working directory, which contains a user-interface script, “ui.R”, a server script, “server.R”, and an R files containing the argument of runApp(). There could be more files or folders for input data as alternative sources. The user-interface and server script have to be free of bugs in order to run runApp() in R files.
Are you excited about how beautiful visualizations shiny package can generate? If yes, enjoy the journey and don't forget to post your new works as returns!