Portfolio returns with R & flexdashboard

Oct 20, 2017 R dashboards finance

Yahoo Finance recently changed their API for downloading historical stock prices. As result, the Excel dashboard I had built to track my portfolio stopped working. Yahoo now requires an established browser session and the associated cookie to download data. Loathe to reacquaint myself with VBA, I decided to take the opportunity to build a new dashboard with R. Thankfully, the R community had already addressed the cookie issue. The tidyquant package by Matt Dancho and Davis Vaughan (available on CRAN) simplifies the query process and features integration with the tidyverse (a fundamental element of my typical workflow in R). The rest of the heavy lifting was done with the flexdashboard packge from RStudio.

The dashboard uses a vertical scrolling layout with two stacked boxes on the right and a sidebar on the left. In the sidebar, input controls allow the user to enter the ticker symbol, purchase date, number of shares, and purchase price for up to nine securities (though any number could be included). On the right side of the dashboard, the top box contains a table that summarizes the performance of each individual security, while the bottom box displays a simple line graph of the portfolio’s performance over time. A user-selected benchmark is displayed for comparison.

To load the required libraries for the dashboard, the code chunk is executed with the ‘setup’ flag. Based on my limited experience using the flexdashboard package, it seems to treat each layout container like a separate environment. Variables defined in one container are not available in others. Anything loaded in the ‘setup’ code chunk, however, is available to every container on the dashboard. The relevant flag is included here (commented out) to illustrate correct usage.

#```{r setup}

Summary Results Table

The upper-right container on the dashboard presents a table of summary financial metrics, one row for each portfolio position. To efficiently query the Yahoo Finance API for this data, the user supplied inputs need to be transformed into vectors that can be passed as arguments to a function. The Shiny function, reactiveValuesToList(), converts all user-supplied inputs into a single list. The names of the individual list items can then be used to categorize and sort the four groups of inputs: ticker symbols, purchase dates, number of shares, and purchase price. The following code is an example of this process for the ticker symbols.

input_list <- reactiveValuesToList(input)
#gather all the ticker symbol inputs into a single list
input_tickers <- input_list[grep('ticker', names(input_list))]

#filter out blank values (use !is.na) for numeric inputs)
input_tickers <- input_tickers[input_tickers != '']

#order the list by ticker name (e.g. ticker1, ticker2, etc)
input_tickers <- input_tickers[order(names(input_tickers))]

#convert list to vector
input_tickers <- unlist(input_tickers)

Using tq_get() from the tidyquant package, the newly created input vectors can now be used to query the Yahoo Finance API for historical stock price data. A simple function is defined (return_func) that combines this API call with some basic financial summary metric calculations. The pmap_df() function from the purrr package is then used to iterate return_func() over the four vectors of inputs (ticker symbol, purchase price, purchase date, and number of shares), producing a tidy table of summary results.

return_func <- function(ticker, shares, price, pdate){
  tmp <- tq_get(ticker, get = "stock.prices", from = pdate) %>%
    filter(date == min(date) | date == max(date)) %>%
    arrange(date) %>%
    summarise(ticker = ticker, shares = shares,
              current_price = close[2],
              #if purchase price = 0, the closing price on the date of purchase is used
              purchase_price = ifelse(y == 0, close[1], y),
              initial_value = shares*purchase_price,
              current_value = shares*close[2],
              #adjusted initial value that accounts for dividends and splits
              initial_value_adj = shares*(adjusted[1] - (close[1] - purchase_price)),
              #total return generated by this portolfio position
              total_return = current_value - initial_value_adj,
              #the return generated by the appreciation (or depreciation) in price
              price_return = current_value - initial_value,
              #the return from dividends (and occasionally splits)
              div_return = total_return - price_return,
              #number of days between opening first position and today
              delta_days = as.numeric(max(date) - min(date)))
return_dat <- pmap_df(list(input_tickers, input_shares, input_prices, input_dates), return_func)

The numeric values in the return_dat table require some formatting before being displayed on a dashboard. Percentages and dollar values, for instance, require the addition of character symbols to make reading easier. The dollar() function from the scales package is used to convert to dollar values. The simple function defined below is used to format numeric values with two decimal places and a percent(%) sign.

percent_func <- function(x, digits = 1, format = 'f') {
  paste0(formatC(100 * x, format = format, digits = digits), "%")

The data set is now grouped by portfolio position (ticker symbol), the financial measures are summarized, and the values are formatted for display.

return_dat_disp <- return_dat %>%
  group_by(ticker) %>%
  #this additional summarise aggregates multiple purchases of the same security
  summarise(total_shares = sum(shares),
      current_price = min(current_price),
      current_value = sum(current_value),
      initial_value = sum(initial_value),
      initial_value_adj = sum(initial_value_adj),
      total_return = sum(total_return), 
      price_return = sum(price_return),
      div_return = sum(div_return),
      delta_days = sum(delta_days*shares)/sum(shares)) %>%
  ungroup() %>%
  mutate(weight = percent_func(current_value/sum(current_value)),
      `purchase price` = dollar(initial_value/total_shares),
      `current price` = dollar(current_price),
      `current value` = dollar(round(current_value, digits = 0)),
      `total return` = dollar(round(total_return, digits = 0)),
      `% total return` = percent_func(total_return/initial_value),
      `% price return` = percent_func(price_return/initial_value),
      `% div return` = percent_func(div_return/initial_value),
      #AR = annualized return
      `% div AR` = percent_func((1 + (div_return/initial_value))^(365/delta_days) - 1),
      `% total AR` = percent_func((1 + (total_return/initial_value))^(365/delta_days) - 1)) %>%
  select(ticker, weight, `purchase price`, `current price`, `current value`, `total return`,
      `% total return`, `% price return`, `% div return`, `% total AR`, `% div AR`)

To complete the table of summary measures, grand totals are calculated. The resulting table is then joined to the table of position-level returns and displayed in the upper-right layout container via Shiny’s renderTable() function. A few additional arguments passed to that function provide some final formatting. Values are centered (align = ‘c’), rows are striped (striped = TRUE) and highlighting is turned on (hover = TRUE). Check out the Rmd file in the github repo for this project to see the exact implementation of the renderTable() function.

sum_return_dat <- return_dat %>%
  summarise(sum_current_value = sum(current_value),
      sum_total_return = sum(total_return),
      sum_pct_total_return = sum(total_return)/sum(initial_value),
      sum_pct_price_return = sum(price_return)/sum(initial_value),
      sum_pct_div_return = sum(div_return)/sum(initial_value),
      sum_pct_div_return_ann = (1 + sum(div_return)/sum(initial_value))^(365/max(delta_days)) - 1,
      sum_pct_total_return_ann = (1 + sum(total_return)/sum(initial_value))^(365/max(delta_days)) - 1) %>%
  mutate(ticker = 'TOTAL', weight = '100%',
      `purchase price` = '',
      `current price` = '',
      `current value` = dollar(round(sum_current_value, digits = 0)), 
      `total return` = dollar(round(sum_total_return, digits = 0)),
      `% total return` = percent_func(sum_pct_total_return),
      `% price return` = percent_func(sum_pct_price_return),
      `% div return` = percent_func(sum_pct_div_return),
      `% div AR` = percent_func(sum_pct_div_return_ann),
      `% total AR` = percent_func(sum_pct_total_return_ann)) %>%
  select(ticker, weight, `purchase price`, `current price`, `current value`, `total return`,
      `% total return`, `% price return`, `% div return`, `% total AR`, `% div AR`)
display_tbl <- bind_rows(return_dat_disp, sum_return_dat)

Line Chart of Portfolio Performance vs Benchmark

To compare the portfolio’s aggregate performance to a benchmark, a line chart is generated in the bottom-right layout container. The benchmark is user-selected and requires an additional input control. I made many attempts to fit this input control in the container with the chart, but the results were always less than ideal. The input drop-down was too big, the chart got squished, or the input control wasn’t visible at all. The flexdashboard package certainly simplifies the dashboard creation process in R, but it doesn’t allow for the precise control of many elements of the dashboard’s fomatting. Speed and simplicity come at a cost. In my opinion, the cost is worth the benefit in this case. The inability to fit and control the formatting of an input control inside a layout container that already contains a chart is a small price to pay for flexdashboard’s ease of use and flexibility.

Fortunately, the problem of the additional input control has a simple remedy. Put it in a really narrow layout container between the summary results table and the line chart. The following code uses some css to achieve the desired formatting (i.e. as narrow as possible).

div(style="display: inline-block;vertical-align:top; width: 100px; height: 20px",paste('Benchmark:'))
div(style="display: inline-block;vertical-align:top; width: 100px; height: 20px",
    selectInput('benchmark_id', NULL, choices =  c('SPY', 'SHY', 'IWM', 'BND', 'VT'), selectize = FALSE, size = 1))

Because flexdashboard appears to confine objects to the containers in which they are defined, the four vectors of Yahoo API inputs need to be reconstructed. The code is identical to that used above. A new function, plot_return_func(), accepts the four vectors (ticker symbol, purchase date & price, and number of shares), queries Yahoo for the historical data, and then calculates the total return for each position by day.

  plot_return_func <- function(ticker, shares, price, pdate){
    tmp <- tq_get(ticker, get = "stock.prices", from = pdate) %>%
      mutate(ticker = ticker, shares = shares, purchase_price = price, flag = 'portfolio') %>%
      arrange(date) %>%
      mutate(purchase_price = ifelse(purchase_price == 0, close[1], purchase_price),
             initial_value = shares*purchase_price,
             initial_adjustment = close[1] - adjusted[1],
             adjustment = initial_adjustment - (close - adjusted),
             total_return = shares*(close + adjustment) - initial_value,
             pct_total_return = total_return/initial_value)
  plot_dat <- pmap_df(list(input_tickers, input_shares, input_prices, input_dates), plot_return_func)

To ensure a fair comparison of the porfolilio’s performance to that of the benchmark, the benchmark must be adjusted when new investments are made. If a new position is established or an existing position receives additional contributions, a matching investment is applied to the benchmark. To accomplish this, changes in the portfolio’s initial value (the purchase price multiplied by the number of shares purchased) are summarised by date. The delta values are then applied as new contributions to the benchmark position on the dates when the portfolio is adjusted.

#summarise the portflio's initial value by date
initial_values <- plot_dat %>%
  group_by(date) %>%
  summarise(initial_value = sum(initial_value))

bench_shares <- tq_get(input$benchmark_id, get = "stock.prices", from = min(input_dates)) %>%
  #join the daily sum of the portfolio's investments to the benchmark's price data
  inner_join(initial_values, by = 'date') %>%
  #reduce data set to include only the dates in which new investments are made
  filter(date %in% as.Date(input_dates, origin = '1970-01-01')) %>%
  group_by(date) %>%
  #calculate the number of new benchmark shares purchased on each date
  mutate(total_shares = initial_value/close,
         delta_shares = ifelse(is.na(total_shares - lag(total_shares)), total_shares, 
                          total_shares - lag(total_shares))) %>%
  #extract the numbers of new benchmark shares as a numeric vector

Similar to plot_return_func(), defined above, bench_return_func() is used to query Yahoo Finance for the benchmark price data and calculate the total returns by day. The variable extracted above (delta_shares) is used as the ‘number of shares’ input vector. The date inputs remain the same, though the unique() function is used to return only the distinct values. This eliminates unnecessary looping if multiple portfolio investments are made on the same day. The ticker input vector is simply the user-supplied value repeated for the length of the dates vector.

bench_return_func <- function(ticker, shares, pdate){
  tmp <- tq_get(ticker, get = "stock.prices", from = pdate) %>%
    arrange(date) %>%
    mutate(ticker = ticker, shares = shares, flag = 'benchmark',
           purchase_price = close[1],
           initial_value = (shares*purchase_price),
           initial_adjustment = close[1] - adjusted[1],
           adjustment = initial_adjustment - (close - adjusted),
           total_return = shares*(close + adjustment) - initial_value,
           pct_total_return = total_return/initial_value) %>%

#create vector of distinct dates when additional investments were made
input_dates <- unique(input_dates)
bench_dat <- pmap_df(list(rep(input$benchmark_id, length(input_dates)), bench_shares, input_dates),

plot_dat <- bind_rows(plot_dat, bench_dat)

The last step is to create the line chart that will be displayed in the lower-right layout container. This is a pretty straightforward ggplot() call, though one final transformational step is required. The data is grouped by flag (‘portfolio’ or ‘benchmark’) and summarised. Some additional formatting is applied via a ggplot theme (I like the simplicity of ‘theme-tufte’), and the rest is basic text formatting for the titles.

plot_dat %>%
  group_by(flag, date) %>%
  summarize(total_return = sum(total_return),
            initial_value = sum(initial_value),
            pct_total_return = total_return/initial_value) %>%
  ggplot(aes(x = date, y = pct_total_return, color = flag)) + geom_line() +
  ggtitle('Portfolio total return vs benchmark') +
  theme_tufte(base_family = 'Tahoma') +
  theme(panel.grid.major = element_line(size = 0.1, color = "grey"),
        axis.text.y = element_text(size = 10),
        axis.text.x = element_text(size = 10),
        axis.title.x = element_blank(), axis.title.y=element_blank(),
        plot.title = element_text(hjust = 0.5, face = 'bold', size = 18),
        legend.position = 'left', legend.title = element_blank(),
        legend.text = element_text(size = 12)) +
  scale_y_continuous(labels = scales::percent, position = 'right')

To view the working dashboard, check out the test version hosted on shinyapps. Instructions and the complete R markdown file for creating the dashboard can be found in the github repo for this project. To track your own portfolio, just copy the Rmd file and adjust the default input values to match your portfolio positions. If you find any issues, please let me know via github, email, or in the comments below. I am especially interested if anyone can comment on my assumption regarding the lack of object sharing between layout containers. Having to query the Yahoo API twice (once for each container) is a drag on performance that I would like to amelioriate if possible.