Our Chief Software Architect Romain Doumenc takes a look at the inner workings of Engage EHS Insights dashboards:
A key benefit of using an integrated health and safety software is that all data is available in one place for analysis. In this effect, the Engage platform has, since the beginning, provided a (sometimes minimalist) reporting solution for H&S data to be extracted and analyzed.
As we set to build the new generation of the Engage platform, we knew there was still a wealth of high-quality data we could expose to our users, and we started conceiving our new analysis system. We started thinking about a new way to offer better insight from the system data. We called these "Insights".
The initial results are visible throughout the system, each time you click on an “Insights” tab within a module, and the initial feedback has been very positive: we see strong usage in the customer base, using it daily to gain better understanding of their H&S landscape within their organizations.
Structured approach to data analysis systems
On the surface, a reporting system is simply a few graphical displays over data stored in a database. But there are many potential issues around how we visualise data, ranging from inefficient visual representations to downright misleading graphs — for a good example of the latter, I highly recommend this article from A. Cotegreave (whom I had the pleasure to meet in a previous life, Hi Andy!):
For our reporting system (codename “Sassy”, as we wanted it to be lively, smart, likeable, and, why not, a bit talkative!), we focused on three key aspects to provide maximum value to our users: using dashboards, maximizing the density of information, and minimizing surprise.
Dashboards: The first aspect, dashboards, seems (and in fairness, is) a rather common approach to data representation, but a few aspects of the process are nonetheless worth mentioning: from a single data set (carefully built to provide accurate information, for example excluding double counting; but also to hide data based on system-defined permissions) multiple graphical representations are built, each one showing the data based on a specific set of factors.
Then, global, consistent filters are available to select a more specific subset of the data — this is often called “slice and dice” in the data analysis linguo. This is an exploratory process: using the multi-dimentional information provided by the graphs, see which factors seems to have the most influence, then use the filters to focus on a specific point of interest, use all the graphs to get a more informed opinion, and so forth.
Example of our Risk Insights dashboard
Hight Information Density: The second aspect is a requirement of the world our customers operate in: there are always multiple factors to consider, multiple ways of understanding the data. So we are extremely careful about making sure each pixel on the screen is put to good use (I guess we are just cheap like that).
You will surely notice that most of our graphs provide at least two exploratory factors for each key metric: the number of training by sites and by compliance status, the number of incidents over time and compared to the previous year, the number of observations per site and per category, etc. This requires a careful approach, as we need to ensure that we do not make the data less legible by doing so, and experience has shown that it takes at least three to four rounds of design reviews before we are happy with the graphs we present on the dashboards.
Example of our Observations Insights dashboard
Minimum Surprise: The third aspect creates a more friendly environment to customers familiar with the world of H&S, but maybe new to the methods and practices of data analysis. For example, each dashboard includes a list of numbers at the top: actions created, incidents with injuries, …
Those numbers (sometimes referred to as key performance indicators, or KPI for short) have a very precise definition and are widely understood across H&S professionals. We also use common visual representations such as the incident triangles (representing the number of incidents per severity).
Numbers at the top of the Incidents dashboard
Interestingly, this property often conflicts with the second aspect — and indeed, said incident triangles sparkled a vivid discussion between the H&S and the data analysis people at Effective, that resulted in the combined widget displaying the information as triangles, but also using the more conventional bar charts you can see in the Incidents dashboard.
Technical challenges in implementing an analytics solution over the web
Fair warning: we are going to get a more technical look “under the hood” of Sassy. If you're not so technical, or you just don't care, just skip the next bit!
The field of analytics solution is so wide that we can’t easily list all the possible approaches; suffice to say that navigating that particular sea requires a very good compass.
Ours was simple: make sure our users get a great interactive experience while understanding their data, and keep the operational budget under control (which is not a financial statement, but instead relate to the capacity to operate the system with high availability).
We did reject many powerful approaches (focused on big data, too much tied to a specific ecosystem, …), and instead focused on a simple, somewhat original approach: cache the data in memory server-side, and use a lightweight visual representation client to build the dashboards components.
Let’s follow the data through all the parts that make this possible.
- The first step is data extraction from the main production database. We do this using the (vastly underused) capacity of the PostgreSQL database engine to connect remotely to other data sources, then use complex SQL queries to clean up the data when needed, and build a “star schema” — basically, a huge table containing the data (fact table) and multiple auxiliary tables (dimension tables).
This is currently done on an automated, daily schedule. As a side-effect, the data is available at this stage to our Enterprise customers through the remote data source (RDS) access — let us know if you are interested to know more about this advanced analytics capability!
- The second step occurs when a user requests a dashboard in the system: a lightweight cache is then started, reads the full data set accessible to the user from the database, then builds an in-memory representation of the data suitable for fast filtering. Out of the cached data, the cache builds the data set, which include all partial aggregations (sums, …) needed for the display.
Sending a data set ready for use is a good idea in the sense that it minimizes the network bandwidth required (no need to send all data), and the time to render (no need to aggregate it on the client); but the consequence is that all filtering actions need an extra round-trip to the server.
Performance measurements have shown that this is actually not detrimental at all: median response time is around 10 ms, and the full redraw process is just a hair short of the 1/60 of a second often touted as the delay over which lag is noticeable.
What’s next for Sassy?
We are now reaching the end of this short introduction to our analytics engine, now let's take a look at what's planned for our roadmap.
Our first direction for improvement is the liveness of the data; we are fully aware that the daily refresh is sometimes too coarse. Efforts are under way to remove the technical limitations that lies behind this choice, and I hope to be able to talk more about these soon.
The second set of enhancements will make the analytics data more pervasive throughout the system: adding visual cue in the right location has demonstrated effects on the quality of the results and experience of using the platform; and we also want that information to be available on mobile.
On a more technical note, the Engage team is eager to contribute more to open sources initiatives, and we will make our cache and aggregation engine available through a liberal license.
Thank you for following along, and I hope that Sassy is helping you in your daily H&S duties, and, why not?, that you like it as much as we do.
Written by Romain Doumenc, Chief Software Architect at Effective Software