The Promise of Learning Analytics: Micro-level analytics

The promise of Learning Analytics is a broad concept, I would like to use network and word correlation analysis to demonstrate micro-level analytics involves the finer-grained process data for individual learners, and answer a simple question, “what kinds of learning are we really able to track with LA?” In this context, Micro-level analytics is used as a technology of epistemology, and entails collaborative efforts among educators and learners.

Generally speaking, a well-designed course encourages both individual accomplishment and group knowledge construction. The core element of rich learning data, which we can harness, is around conversations and interactions. In the following, we will demonstrate an example of micro-level learning analytics. We employed community detection method to identify the nodes which were more closely connected within than to outside, and used natural language processing approach to examine the quality of learning conversations.

Networks often have different clusters or communities of nodes that are more densely connected to each other than to the rest of the network. The algorithm for detecting community is to identify subsets of network that are more connected within than to the rest of the network. Let’s harness the learner interaction data in a course to identify peripheral community if there is any. Each node represents a student of the class, and the edges indicate interactions, the quantity of each students responding to one another.







For the sake of this demonstration, we applied two different methods for community detection. Despite the fact that a few nodes were grouped to different clusters by comparing the two results, the peripheral subset (S1 and S11) pops out consistently in both diagrams. The diagram evolves as the dynamic of learner interaction shifts. With this information, faculty can easily tell whether there are ‘isolated’ groups or peripheral nodes. If there is any, they can further explore the possible factors contribute to such pattern, and make data-informed intervention if needed.

An even better way to understand the content of each cluster is to combine text analysis. To get a better understanding of the numerous relationships exist, we can use a network graph to depict words correlations. Let’s take a look of networks of words where the correlation is fairly high (> .70). The first graph was derived from the entire class, shows a few clusters with words appearing together more frequently than others. For instance, one cluster shows that education, human, resources, and a few other terms are more likely to appear together than not. This type of graph provides a great starting point to find content relationships within text. The second and third network graphs represent word relationships derived from the conversations contributed by S1 and S11 respectively.







Now back to the topic, the Promise of Learning Analytics, we must not ignore the human factor in algorithms. In order for educators to provide proper interventions, and for the learners to follow guidance and achieve desirable actions/behaviors, both educators and learners must be part of the process. They need to be trained and equipped with keen information as to what types of learning data was harnessed, and how the results were derived.


Text Mining: Word Relationships.

The Promise of Learning Analytics. (2014, June 13).

Buckingham Shum, S. (2012). Learning analytics: Policy Brief. Moscow: UNESCO Institute for Information Technologies in Education.

Tailored dashboards to suit instructional needs

Under the assumption that instructors may use online discussion with different focus to meet instructional needs, we tailor diagrams to effectively present the core elements of the same data and help faculty make informed interventions to engage students in discussion activities. For instance, some instructors would like to promote discussion interactions among students. They may choose not to provide answers/feedback to individuals’ postings, instead to encourage or specifically require students to read and comment on peers’ threads. While some other instructors tend to use online discussion as a means for students to share their written reflections upon a topic/concept, and then provide feedback to individual students’ work, all communications are intentionally designed to share with the entire class.

In the following, we will demonstrate how we can tailor the visual representation of same data to help instructor address a particular pedagogical concern.

Scenario One:

Instructor posted a discussion topic each week, the requirement for a complete participation was twofold: 1)post an initial thread; 2)comment on at least one peer’s posting. In order to effectively facilitate each discussion topic, the instructor would like to see 1)how students interacted among one another? 2)who did not provide any feedback to peers’ posting? 3)whether there were popular threads that receive many replies? 4)whether the amount of student interactions differed each week?

This diagram shows that S11 posted an initial thread, received one reply, but did not interact with rest of the class. Depending on nature of the discussion topic and circumstance, with the information, instructor can decide whether it was necessary to ‘nudge’ or send a ‘reminder’ to S11.


This diagram suggests that S8 was the most active student in this discussion topic. S8 posted a well-written initial thread, which received a number of replies, and also provided feedback to peers’ postings.


Boxplots indicate that there were difference in the number of interactions between Discussion 2 and Discussion 1, Discussion 4 and Discussion 1.The top and bottom lines of the rectangle are the 3rd and 1st quartiles (Q3 and Q1), respectively.  The length of the rectangle from top to bottom is the interquartile range (IQR). • The line in the middle of the rectangle is the median (or the 2nd quartile, Q2). • The top whisker denotes the maximum value or the 3rd quartile plus 1.5 times the interquartile range (Q3 + 1.5*IQR), whichever is smaller. • The bottom whisker denotes either the minimum value or the 1st quartile minus 1.5 times the interquartile range (Q1 – 1.5*IQR), whichever is larger.

Scenario Two:

Instructor used online discussion as a platform for students to share their written reflections with the class. The instructor requires students to post their essay on discussion without the expectation of student-to-student interactions, but only instructor-to-student interaction. In order to effectively facilitate the activity, the instructor would like to know: 1)who has not submitted his/her work? 2)which submission has not been replied and still needs feedback?

The diagram shows that the instructor provided feedback to most students’ assignments. S20 and S26 did not post their work. S31 submitted work, but his/her submission was not commented. S5 made a reply to peer’s thread, but did not receive comments from the instructor.

The matrix is another visual presentation of the same data.


Shiny applications for learning analytics

Shiny is an R package that makes it easy to build interactive web apps straight from R. You can host standalone apps on a webpage or embed them in R Markdown documents or build dashboards. You can also extend your Shiny apps with CSS themes, htmlwidgets, and JavaScript actions. Shiny combines the computational power of R with the interactivity of the modern web.

Using Shiny to share R-based analytics online can bring the core elements of a dashboard from prototyping to production in just a few hours.

The back-end architecture (designed by Will Cowen at Dartmouth)

Data harvesting, analyzing and application deployment

Step 1: harvesting and storing the data

Querying API endpoints and ingest the results to SQL database.

MySQL is a very popular relational database that is similar to SQLite but is more powerful. MySQL databases can either be hosted locally (on the same machine as the Shiny app) or online using a hosting service.

You can use the RMySQL package to interact with MySQL from R. Since MySQL databases can be hosted on remote servers, the command to connect to the server involves more parameters, but the rest of the saving/loading code is identical to the SQLite approach. To connect to a MySQL database, you need to provide the following parameters: host, port, dbname, user, password.

Setup: You need to create a MySQL database (either locally or using a web service that hosts MySQL databases) and a table that will store the responses.

Step 2: Loading the data

loadData <- function() {
  # Connect to the database
  db <- dbConnect(MySQL(), dbname = databaseName, host = options()$mysql$host, 
      port = options()$mysql$port, user = options()$mysql$user, 
      password = options()$mysql$password)
  # Construct the fetching query
  query <- sprintf("SELECT * FROM %s", table)
  # Submit the fetch query and disconnect
  data <- dbGetQuery(db, query)

Step 3: Analyzing and visualizing the data using R

Step 4: Deploying your Shiny app on the web



Using Google Bubble Chart to visualize data with 4 dimensions

Over the past year, we have done an extensive exploration for learning data associated with student discussion participation, the duration of online course access and quiz performance. But we had struggled with providing instructors with visualizations that clearly represent 4 or more dimensions of student learning data. The descriptive graphs generated in our custom analytics app are somewhat segmented, and not cohesive enough that allows our instructors to examine all aspects of student activities with single clicks.

Being inspired by the power of Google Charts with R, we built a Shiny app using R that merges student activity and performance data, and produces bubble charts that visualize learning data set with 4 dimensions. The first two dimensions are visualized as coordinates, the 3rd as color and the 4th as size.

  1. x-axis denotes the duration of activity (in seconds) that students spent in an LMS course
  2. y-axis represents quiz performance or running total score (in percentage)
  3. color represents groups
  4. radius of a bubble corresponds to the number of discussion participation (in order to compare two directional discussion activity between providing feedback to peers and receiving comments from peers, we added another dimension to the chart)

Chart one: the radius of a bubble corresponds to the number of comments received by a student.

Chart two: the radius of a bubble corresponds to the number of feedback provided by a student to his/her peers. 


An example for translating LMS access data into actionable information

When we employ solid approach for data analysis, the results derived from LMS access data can bring actionable insights, and help instructors identify how top performing and at-risk students do differently. By comparing the results between the two groups, instructors can potentially determine where at-risk students struggle, and tailor course materials to effectively help student prepare for exams.

In this blog, we will show an example of how content access analytics can inform the efficacy of course materials in relation to students’ performance. Furthermore, we will share ideas about how to leverage the results to make data-informed changes.

First, we are interested in learning whether there is a correlation between time spent in an LMS and performance. So we gathered LMS access data for a course and produced a scatter plot that shows the relationship between students activity time in a course and their performance on quiz. Although the scatter plot does not suggest strong relationship between course activity_time and quiz performance, it does reveal two ‘unique’ data points: One (marked as Student5) spent the least amount time (about 5 hours) in the course comparing to rest of the class, while did reasonably well on quizzes. The other one (marked as Student57) appeared to be quite active in the course (49 hours), but did not do as well as his/her peers did. Student22 seems to fit into the ‘ideal’ or ‘predictable’ model: If you spend time and study hard, you will perform well. For the sake of focus of this blog, we will only drill into file access activities for Student5 and Student57.

We know that there are many factors might have contributed to this scenario, but could comparing content access patterns between the two students shed some lights?

time-performanceThis histogram shows the frequency of times all students viewed a given course content. The x-axis represents number of times a given content was clicked/viewed, and the y-axis corresponds to the frequency of individual times. The blue dot indicates the average times a given student accessed the content, and the black dot shows the mean of entire class.In this case, on average, the number of times that student5 accessed course files is less than the class mean.

We are interested in learning what files that Student5 and Student57 most reviewed/accessed. Were their top-accessed files also frequently viewed by the entire class? Did certain course material effectively prepare students for quizzes?

Let’s take a look of the list of files which Student5 and Student57 most frequently viewed. CourseMaterial206 and CourseMaterial108 show up as most-accessed-file for Student5 and Student57 respectively. In addition, in terms of file access, it does appear that Student5 and Student57 have different preference, because the lists of top reviewed files for the two students appear to be quite different.

Now let’s take a look of the pattern of access to the two files by the entire class. We used Sankey diagram to visualize students’ access to a course material. Return_visit refers to re-visiting a course material after an initial access, or more specific, students accessed the same material again after the day for their initial visit. New_visit indicates that a student accessed a material, but has not returned to click on the same material beyond the day made initial access.

CourseMaterial206 show up as most-frequently-accessed-file for Student5. Interestingly, we notice that all CourseMaterial206 ‘visitors’ are return ‘visitors,’ which means that students who had accessed CourseMaterial206 all came back at various points and reviewed the material (CourseMaterial206) again. In comparison, CourseMaterial108, which Student57 most frequently accessed, is less ‘popular’ than CourseMaterial206. Majority students who had accessed CourseMaterial108 did not revisit the material after initial visit.The data analysis and visualizations shed lights on good insights, but also lead to more questions: Why Student57 accessed CourseMaterial108 more times than rest of class? While didn’t Student57 review CourseMaterial206 like many of his peers did? Had Student57 been struggling with CourseMaterial108? Could additional help(intervention) or materials be provided to students like Student57?


Leveraging R Shiny as a scalable approach to analyze LMS course data

R is a free software environment for statistical computing and graphics. Shiny is an open source R package that provides an elegant and powerful web framework for building web applications straight from R.

As learning management systems (LMS) become more widely and deeply adopted to support teaching and learning, a substantial amount of data about how students participate in learning activities is available. How can we analyze the data and translate it into a useful form? How can we make the LMS data accessible to faculty to inform the efficacy of the instruction and the quality of students’ learning experience? To support the effort of exploring LMS data to address teaching and learning related questions, we leveraged R Shiny and developed a number of analytical applications that graphically analyze LMS data using R.

The following examples demonstrate three Shiny applications that analyzes and visualizes three common types of LMS (Canvas) learning data, which can be harvested using Canvas APIs:

  • Quiz submission data
  1. example Shiny app with sample quiz data
  2. results interpretation and application Using quiz submission data to inform quiz design
  • Discussion interaction data
  1. example Shiny app with sample discussion interaction data
  2. application one Using social network analysis to model online interactions
  3. application two Role modeling in Online discussion forums/
  • LMS access data
  1. example Shiny app with sample LMS access data
  2. data interpretation and application LMS course content access analytics

Additional resources about building Shiny apps with R:

Exploration of student-generated educational data in LMS

The types of educational activity data captured by a LMS that can be harnessed and translated to actionable knowledge:

  • Click stream data
  • Page views and content access
  • Discussion participation
  • Assignment and quiz submissions

Google Analytics for student’s click stream data:

Data solution 1: Nodes are points through which traffic flows. A connection represents the path from one node to another, and the volume of traffic along that path. An exit indicates where users left the flow. In Events view, exits don’t necessarily indicate exits from your site; exits only show that a traffic segment didn’t trigger another Event. Use the Behavior Flow report to investigate how engaged users are with your content and to identify potential content issues. The Behavior Flow can answer questions like:

  • Did students go right from homepage to assignments/quizzes without additional navigation?
  • Is there an event that is always triggered first? Does it lead students to more events or more pages?
  • Are there paths through a course site that are more popular than others, and if so, are those the paths that you want students to follow?

Behavior Flow: Like all flow reports, the Behavior Flow report displays nodes, connections and exits, which represent the flow of traffic in a course site.

Data solution 2: Funnel Visualization: how students funnel through to a destination page in your course site? and

Funnel Visualization: The funnel visualization shows the stream of visitors who follow specific paths of a website and thus interact with it in order to reach a website goal.

The sample data for the example funnel visualization was gathered from a Canvas (LMS) course site, the goal was set to be the Modules navigation menu. 843 users accessed the course homepage during certain period of time. Of those 843 users. 31 percent of them went from the homepage directly to the course module page (destination). (581-177)/843=48% navigated to a different page of the course and 177(21%) exited the course.

The funnel conversion rate (59.20%) indicates the percentage of visits that included at least one page view for the first step before at least one page view of the goal page. Page views can occur non sequentially for a funnel match. We can look at each step of the funnel, analyze the number of users to the first step versus the number of users to the second step. Wherever we lost a drastic number of people, we can go back to that page and optimize it to increase that conversion rate percentage.

Social Network Analysis for discussion interaction data:

  • How active do students interact with each other on online discussion forums?
    • identify the students who are actively engaged in discussions by providing many comments to peers’ postings.
    • identify the students whose initial discussion thread became so popular that received quite a number of replies.
  • Does the quantity and/or richness of discussion posts vary across topics?
  • Does the community structure of discussion interactions represent subgroups of students who have common interest in reality?
  • Does discussion interaction patterns represent or reflect students’ participation in class activities?
  • Does the role modeling using centrality metrics represents the level of influence of a student in reality?

Histogram and scatter plot for quiz submission data (quiz performance and correlation between quizzes):


  • How well an individual student did in comparison to the entire class?
  • What was the overall performance on a quiz?
  • Is there a relationship between quiz performance and content access, or overall activities in a LMS?



Role Modeling in Online Discussion Forums

As LMS becoming more widely adopted in fully online, hybrid, blended courses, its asynchronous discussion platforms are often used as the channel for information exchange and peer-to-peer supports. For F2F courses that leverage online discussion forums as a complement to classroom communications or a tool for flipped-classroom that facilitates active learning, asynchronous discussion activities correlate to higher engagements in courses and better performance overall. Under this notion, insights into roles in discussion forums can contribute to improved design and facilitation for asynchronous discussions.

In light of the research conducted in the field of role mining for social networks (Abnar, Takaffoli, & Rabbany, 2014), we limit our focus on the roles which have been identified in social contexts, and we re-defined them in the context of asynchronous discussions.

We developed a Shiny application using social network methods, centrality and power analysis, to analyze and visualize online discussion interactions. Degree and closeness centrality scores are used to identify leaders and periphery/outermosts, and mediators yield a high betweenness centrality score. The graphs shared below were produced in the application.

Graph 1: each node represents an individual, the color corresponds to a group/community.

Roles derived from asychronous discussion activities

Leaders: the most active individuals in online discussion forums, i.e., posting well-thought threads that welcome peers’ comments and meanwhile, providing feedback to peers’ postings.Peripheries/Outermosts: the least active individuals in an online discussion forum, who posted few threads, which got none responses from peers, and replied to few peers’ postings.Mediators: the individuals who connect different groups in a network.Outsiders: the individuals who had minimum participation in a discussion, i.e., posted one thread to a discussion topic.


When asynchronous discussions are structured and designed to promote deep learning through collaborations, such as seeking information from peers, suggesting alternative solutions and providing answers/feedback, it would be desirable to help participants move from the periphery of the information exchange network to the core. When an online discussion forum with a well-defined topic or prompt is used primarily for students to post responses to the topic, instructors can incorporate incentives into the discussion forums to motivate learners to participate in discussions in a constructive manner (Hecking, Chounta & Hoppe, 2017).


Abnar, A., Takaffoli, M., Rabbany, R., & Zaiane, O. (2014). SSRM: Structural Social Role Mining for Dynamic Social Networks. 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

Hecking, T., Chounta, I., & Hoppe, U. H. (2017). Role modeling in MOOC discussion forums. Journal of Learning Analytics. 4(1), 85-116.

Leveraging Canvas quiz submission data to inform quiz design

Quizzes are often used as an assessment tool to evaluate student understanding of course content. Practice quizzes, an informative self-assessment, have also been utilized to help students study for final exams. This self-assessment capability through the use of the practice tests enhances the usage level of course materials.

If course instructors used quizzing assessment strategy in Canvas, we can gather quiz submission data and use it to analyze the effectiveness of quiz questions. By analyzing the quiz submission data, course instructors are able to verify whether a quiz is effective in helping students grasp course content, and whether the quizzes produce meaningful data about students’ performance and understanding of course materials.

In this blog, we will introduce a self-service tool that leverages the quiz submission data to inform student learning and the efficacy of quiz designs in helping students master course materials. If a quiz is particularly designed for students to study for a high-stake exam or a formative assessment, we can use a scatter plot (with a smooth regression line) to see whether there is a correction between student performance on the practice test and on the final exam. If faculty implements a pre and post test to evaluate the efficacy of an instruction in helping student grasp course content, we can use a density plot to display the distribution of score percentage (kept_score/points_possible) for the pre and post quiz.

Canvas built-in Student Analysis tool allows course instructors to download quiz submissions data for one quiz at a time, and examine student performance. However, it is cumbersome if course instructors would like to to gather submission data for all quizzes in a course.

Course Instructors can install an userscript that gathers the submission data for all quizzes in a course:

  1. Install a browser add-on: Greasemonkey for Firefox or Tampermonkey for Chrome/Safari. Please skip this step if you have already installed the add-on previously.
  2. Install the Get Quiz Submission Data userscript.
  3. Login into Canvas, go to a course, navigate to the Quizzes page, scroll down to the bottom of the page and click on the “Get Quiz Submission Data” button.
  4. Save the data as ‘Comma Separated’ csv file format to your local computer, you may name it as ‘quiz.csv’
  5. Open Shiny app, load the quiz.csv file to the app, and a series of visualizations of the submission data will be created for you.
    • The plot shows student score percentage in comparison to the mean and median score percentage for the class side by side, which allows course instructor to easily see where a student is at in relation to the entire class.
    • If a quiz is particularly designed for student to practice for a high-stake exam, we can use a scatter plot (with a smooth regression line) to see whether there is a correction between student performance on the quiz and on the exam.
    • If faculty would like to use a pre and post test to evaluate the effectiveness of an instructional strategy in helping student grasp course content, we can use a density plot to display the distribution of time_spent on the pre and post quiz and percentage (kept_score/points_possible) for the pre and post quiz.




Learner Content Access Analytics

If you are interested in exploring learner content access data to inform your course design, you are at the right place. This blog is geared to inform instructors and course designers about the efficacy of a course design, such as how many students returned to access a course content after a course ended and how often? Which format/type of content was mostly viewed? How often did learners access course content while the course was in session?

In this blog, we will demonstrate self-service tools that allow course instructors to answer the questions with regard to how students interact with Canvas. We will show you how to download student access report data for a Canvas course using an user script, and upload the data file to a Shiny app that visualizes student engagements in the Canvas course.

The Shiny app produces a number of visualizations for student content access activities over time. The information provides course designers/instructors with insights about the efficacy of a content design. For instance, if you embedded a number of files in a page hoping students review them, it is helpful to know whether students accessed the page, which files in the page students were more likely to view, and which files they rarely clicked on.

A user script is a script that runs in a Web browser to add some functionality to a web page. The user script we are going to use is to add a ‘get user page views’ tab on a Canvas course People page. To enable a user script you need to first install a user script manager. For Firefox, the best choice is Greasemonkey. For Chrome: Tampermonkey. Once you’ve installed a user script manager, click on Student Usage Report Data  userscript, and click on the Install button. The script is then installed and will run in a Canvas course site it applies to.

Quick Installation of an userscript that downloads the access report data for an entire course

  1. Install a browser add-on: Greasemonkey  for Firefox or Tampermonkey  for Chrome/Safari
  2. Install the Student Usage Report Data  userscript.
  3. Login into Canvas, go to a course, click on the ‘People’ course menu and navigate to the People page. (If you don’t see the tab after you have successfully installed the user script, please refresh the People page) 
  4. Click on the ‘Get User Page Views’ tab, and click on ‘Start’ to begin data extracting process.
  5. After the page views info for every student is extracted, you will be prompted with a dialogue box asking you to either save or open the file.
  6. Open the file in Excel, and save it as a ‘Comma Delimited’ file on your local computer.

Loading the data file to a Shiny app that analyzes and visualizes the data

  1. Click on the link to open the Content Access Analysis app.
  2. Click on the Browse button to upload the student usage report csv file to the app, and the visualizations for students content access will be created for you.
    • Category refers to the content type: announcements, assignments, collaborations, conferences, external_urls, files, grades, home, modules, quizzes, roster, topics, and wiki.
    • Title is the name of a specific content that you defined, such as a file name, a page title, an assignment title, a quiz title, etc..
    • The time series plot visualizes student content access by first and last access date.
    • The primary reason for referencing the Last Access Date is to examine whether or not students access course content after a course has ended. And is there a pattern as to when they are more likely to revisit a course site after the course ends?
    • In addition, we added a date range control widget to the timeseries plot, which allows course instructors to analyze course access between a date range. For instance, course instructors can select a date range to see whether students revisited course materials after a course has ended, or whether students leverage course materials to prepare for an exam right around the exam date.


  1. If you get an error message after you load the access_report csv file to the Shiny app, “Error: replacement has 0 rows, data has #####”, the error is a result of mismatch in headers (column names), please open the csv file in Excel, make sure the data file includes the following headers, and there is no space in each header: UserID, DisplayName, Category, Class, Title, Views, Participations, LastAccess, FirstAccess
  2. If you get an error message for a time series plot like this, “Error: ‘to’ cannot be NA, NaN or infinite“, please open the csv file in Excel, and save it as ‘Comma delimited’ csv. Reload the data file to the Shiny app, and the time series plot should show up properly.