SCImago Graphica : a new tool for exploring and visually communicating data

Despite the increasing number of data visualization authoring systems in recent years, it remains a challenge to simultaneously achieve high expressive power and ease of use in a single tool. In this paper we present SCImago Graphica , a no-code tool which allows the creation of complex visualizations by simple drag-and-drop interactions. Users bind the data variables to the different encoding channels, and specify the settings of each binding, from which the tool generates the interactive graphical display. Due to its efficiency of use, SCImago Graphica is not only suitable for visually communicating data, but also for exploratory data analysis. We evaluate the expressiveness and ease of use of SCImago Graphica through various examples of chart construction and a catalog of visualizations. The results show that SCImago Graphica makes it possible to create a wide variety of data visualizations quickly and easily.


Introduction
In the age of big data, data visualization tools are essential to explore, understand and make sense of data. The most widely used tools for visualizing data are spreadsheet applications. In these applications the user selects the data set to be displayed, chooses which type of chart to use from a gallery, and then customize some basic aspects of the chart's appearance. Although it is a very easy-to-use interactive model, it is not without its problems. The main drawback is that it is not possible to create charts other than those in the chart gallery, which are often basic. To that, it is necessary to add that sometimes these applications include chart types that are widely considered bad practices in data visualization, such as 3D charts.
A different way of visualizing data is through textual programming, which offers authors complete control over the visual appearance and interactive behaviour of the chart, but in addition to requiring programming skills, creating a chart demands a large amount of time and effort.
It is common to describe the visualization authoring systems along two opposing dimensions: its expressiveness and its ease of use (sometimes called accessibility) Heer, 2009;Qin et al., 2020). Expressiveness refers to the flexibility of customization and variety of visual outputs that can be created, while ease of use refers to its ease of learning and efficiency of use. Therefore, as we have seen, spreadsheet software and textual programming of visualizations would represent the two opposite extremes: a system that is easy to use but not very expressive, versus one that is very expressive but hard to use.
Over the last few years, a plethora of visualization systems (web apps, desktop applications, programming toolkits...) have emerged, occupying different spaces in the expressiveness/ease-of-use continuum. In this paper we present SCImago Graphica, a professional visualization authoring tool that aims to combine a high level of expressiveness with an ease-of-use interface. In addition, SCImago Graphica has been designed to enable both visual communication of data, as well as exploratory analysis.

Related work
Spreadsheets are not the only chart typology tools. Recently, modern visualization web apps have emerged that allow you to create charts with very little effort (just upload or paste the data, select a chart type, and customize it). Datawrapper 1 is a very popular tool among media, with which it is possible to create and publish online aesthetic and responsive charts and maps in just a few steps. Another noteworthy tool is RAWGraphs (Mauri et al., 2017), which offers a wide gallery of visualizations, some quite sophisticated. However, as these are template-based systems, their expressiveness is limited to the chart templates in their catalogue.
Chart typologies are a restrictive and oversimplified way of thinking about graphics and graphing software, which is why Wilkinson (1999) proposes his famous Grammar of Graphics (GoG), a mathematical theory of statistical and scientific graphics. Wilkinson's GoG describes the fundamental principles underlying the composition of any graphic and the correct coordination of its components. The impact of GoG on data visualization software is undisputed, having inspired most advanced visualization systems.
Based on Wilkinson's GoG, Wickham (2010) proposes a layered grammar of graphics and its open-source implementation ggplot2, a popular package for the statistical language R. Implementations of low-level grammars such as ggplot2, but also Protovis (Bostock; Heer, 2009), D3 (Bostock; Ogievetsky; Heer, 2010) or Vega (Satyanarayan et al., 2016), have revolutionized the creation of statistical graphics using code, reducing the time and effort required without sacrificing expressiveness. More recently, high-level grammars such as Vega-Lite (Satyanarayan et al., 2017) or ECharts (Li et al., 2018) have been proposed, simplifying the specification of charts by being less verbose, but inevitably at the expense of some expressiveness. While all these grammar implementations have made the creation of interactive visualizations more accessible to professionals beyond engineering, specifying visualizations via imperative or declarative programming is clearly more intricate and tedious than doing so via an interactive tool.
Tableau (formerly Polaris) (Stolte; Tang; Hanrahan, 2002) is an interactive visualization authoring tool, inspired directly by Wilkinson's GoG, which has been a great commercial success. To create a chart in Tableau, the user only needs to bind the data attributes to be displayed with the visual encodings to be used (color, size, position…) by drag and drop. Lyra (Satyanarayan; Heer, 2014) (Zong et al., 2020) is a tool built on top of Vega that, like Tableau, uses drag and drop to map data with visual properties. While Tableau is more oriented to exploratory analysis, Lyra offers more control over the design of the charts.
Other recent no-code visualization tools, such as iVisDesigner (Ren; Höllerer; Yuan, 2014), Charticulator (Ren; Lee; Brehmer, 2019) or Data Illustrator (Liu et al., 2018), use interaction techniques analogous to the vector design tools, achieving expressive power comparable to programming-based systems. These tools are conceived for designing visualizations, providing the author with greater flexibility and control over the layout or marks of the chart; but their learning curve and interactive complexity do not make them suitable for data exploration.
The concept behind SCImago Graphica is closer to grammar-based tools, like Tableau, than to vector data-design tools, but with important differences in order to achieve greater expressiveness without sacrificing efficiency and ease of use.

Design
In this section we describe the fundamental components of SCImago Graphica.

Data source
Regardless of the format of the data source (CSV, Excel file...), as in other grammar-based tools like Tableau, data must be organized as Tidy Data (Wickham, 2014). In this way of organizing data, each variable must have its own column, each row represents an observation, and each cell must contain a single value. In addition, the type of each variable (number, string, date-time, or country) must be specified with the data source. The country variable type enables the creation of data maps using the name or ISO code of each country (more granular geographic types will be added in the future).
One of the main differences between SCImago Graphica and other grammar-based tools is that it can work with network data as input. In network data, one subset of the data describes the nodes and their attributes, and another subset describes the relationships or links between nodes and their attributes. Although SCImago Graphica uses its own CSVbased format for network data, the desktop application includes parsers for the most common formats: GML (Graph Modelling Language), GraphML and GEXF (Graph Exchange XML Format).

Grammar specification
In SCImago Graphica, the grammatical specification of each chart employs a JSON representation. Each definition has three sections: General, Grammar and Annotations (see Figure 1). It is not mandatory to define all of them in every chart, only the needed ones.
In the General section all overall properties of the chart are defined, such as the type of mark (see Table 1), the margins, the color palette, the form of edges in the case of graphs (see Figure 2), among many other options.
The Grammar section is where data variables are mapped to encoding properties. This mapping is defined as an array of objects (where the order matters), each of which binds an encoding property to a variable. Each variable-property binding object can also include specific settings on how the mapping between them should be done: aggregation function, scale, sorting, interactive filters, etc.
The Grammar section is not only used to encode data variables through visual encoding channels in the sense of Bertin's retinal variables (Bertin, 1983), such as position, size, color, opacity (alpha) and shapes. It is also used to encode variables by means of labels or tooltips: to define filtering rules; or to specify which categorical variables dictate the different symbols to be displayed, or how these symbols should be subdivided.
Finally, the Annotations section is used to define textual annotations attached to all those symbols in the chart that match the specified criteria.
The grammar specification in SCImago Graphica has certain similarities to Vega-Lite's unit plots specification (Satyanarayan et al., 2017), as both are simple and readable. The most notable difference is that in SCImago Graphica more

Visualization generation engine
The generation engine is the core of SCImago Graphica, the component that, from a data source and a JSON grammar specification, renders an interactive and responsive visualization. It is written from scratch in JavaScript, allowing it to run in the browser, on the server and even as part of a standalone desktop application (see 3.4). By default, the output is drawn using SVG (Scalable Vector Graphics), but it can also display interactive charts on Canvas/WebGL in combination with the PixiJS library 2 .
The first task that the generation engine handles is the grammar check. For example, it checks the compatibility between the type of data variables and the encoding channels used, or if an encoding channel has been bound to more variables than it supports.
The engine also performs all data transformation tasks (aggregations, data binning, filtering...) as well as statistical computations derived from the grammar specification, such as clustering -using an algorithm based on Clauset, Newman and Moore (2004)-, regression analysis, or network metrics and statistical measures calculation.
Another key task of the generation engine is the computation of the chart layout, which depends on multiple factors. First, it is conditioned by the type of mark chosen; dots and bars, for example, are not visually positioned and arranged in the same way. Second, it is determined by the combination of variables that have been bound to positional visual properties (X-axis, Y-axis and small multiples). As can be seen from the examples in the Figure 3, SCImago Graphica shows great flexibility in how these combinations can be done. But there is a third factor that conditions the position of each symbol on the chart: the layout algorithm chosen. The default algorithm simply aligns the symbols in successive rows, and the "Avoid overlap" algorithm moves the symbols iteratively until none of them overlaps with the others. But it is when working with network data that the range of options increases considerably (Figure 4): -Force Directed, based on Fruchterman and Reingold (1991); -Force Directed+Distances, a variation of the Force Directed algorithm that in addition uses the shortest path distance between nodes in computing the forces of repulsion between them; -Kamada and Kawai's algorithm for undirected graphs (Kamada; Kawai, 1989); -LinLog, an algorithm that uses the energy model of Noack (2007), which is particularly useful for depicting clustering  "none" "filled disk" "disk" "bar" "line" The "line_type" property allows the values "none", "curve" and "orthogonal".
Layout algorithms can be combined with fixed positions, connecting the X or Y position to a quantitative axis, as shown in Figure 5.
The generated visualizations, in addition to being responsive -dynamically adapting to the available width and height-, can be interactive. Interactions that can be specified in SCImago Graphica include tooltips, zoom and panning, hyperlinks, highlighting when hovering, and interactive filtering.

User interface
To make data exploration and visualization simple and effortless with SCImago Graphica, a standalone desktop application was built. The Electron framework 4 was used, which allowed the fast development of a cross-platform application (Windows, MacOS and Linux) using JavaScript, HTML, and CSS.
As Grammel, Tory and Storey (2010) note, one of the common barriers in the data visualization process is the choice of which data variables answer the goals and questions that the user is trying to address. For this reason, after loading the dataset, a small chart with the distribution of each variable is displayed in the header of each column of the data table (Figure 6), helping the user to become familiar with the data.
The application enables the creation of multiple visualizations from the same data source, organized in tabs (Figure 7.1). Figure 5. A) Example of layout by positional properties. B) Example of layout by positional properties combined with overlap avoiding algorithm. The Y position of each symbol is given by the categorical variable "IncomeGroup" and by the layout algorithm application, whereas X position is exclusively given by the quantitative variable "People fully vaccinated". The interactive model for composing a chart is similar to other grammar-based tools: the user binds data variables (Figure 7.2) to encoding shelves (Figure 7.3) by drag-and-drop. By clicking on the variables dropped on each shelf, specific properties (aggregation function, format, legend visibility, etc.) can be modified. The visualization is displayed in the central area (Figure 7.4), in which the user can add annotations, edit titles, or resize the chart in an interactive way. Lastly, the right sidebar (Figure 7.5) allows the user to define all the general properties of the visualization.
The designed interface not only supports quick design and composition of charts, but also exploration and understanding of the data. The user can easily bind variables to coding channels, switch variables from one coding shelf to another, or undo and redo actions.
All charts created with the application can be exported to PNG, SVG or to the HTML+JavaScript code needed to be published online as an interactive and responsive data visualization. The generated code, in addition to the required libraries and styles, includes the JSON grammar specification of the chart.

Evaluation
As stated in the objectives of this work, our proposed tool seeks to achieve both a high degree of expressiveness and ease of use, qualities that are evaluated below.

Ease of use
Although it may seem that the most objective way to compare the ease of use of a visualization authoring tool with the others is through a comparative study, the reality is that it has serious limitations (Ren et al., 2018). Visualization authoring tools are complex systems, differing in their design philosophy, interaction models, underlying technology, features supported or target audience. By comparing completion times or counting interactions, for example, conclusions can only be drawn in relation to the specific task evaluated, but never about the overall usability of the tools.
For this reason, in this paper we have opted to analyze some of the key differences of SCImago Graphica with regard The following examples in Figure 8 are discussed below: A) Histograms To analyze the distribution of a quantitative variable, one of the most used charts are histograms. To create a histogram in SCImago Graphica, a user only has to perform three actions: select 'bar' as the mark type, bind the quantitative variable to the X-axis, and bind the variable "graphica.com.frequency" to the Y-axis. This last variable is calculated by the application, which refers to the count or number of occurrences. In this example the user does not need to specify that a data binning should be performed, nor what should be the bin width, as these operations are automatically performed by the tool.

B) Small multiple
In small multiple displays, data are presented in the form of small charts of the same type that share the same scale. This way of breaking down the data display facilitates quick visual comparisons among categories or time periods. Despite its benefits, grammar-based tools usually require the user to select a categorical variable for the rows, and another one for the columns, variables that may not exist in the dataset, and therefore must be created expressly to achieve the desired layout. In SCImago Graphica, by contrast, it is enough to drop a categorical or date variable on the "Small Multiple" shelf, and the tool will subdivide the available space to fit the different small charts. This simple way of splitting the data displays is particularly useful for exploratory data analysis. C) Parallel coordinate Parallel coordinate plots are of huge value for multivariate data analysis because of their ability to reveal correlations, similarities, or anomalies in the data. For this reason, it is surprising that in the most popular data visualization tools there is no simple way to create them. In the case of SCImago Graphica, it is only needed to bind as many variables as desired to visualize to the same axis (X-axis or Y-axis) and to select the line as the type of mark.
Although the examples analyzed do not allow an overall assessment of the usability of SCImago Graphica, they do pro-

Expressiveness
The scope of possible design configurations that a grammar-based allows cannot be captured through user studies, so Ren et al. (2018) recommend, as a way of evaluating the expressiveness of visualization tools, providing a gallery of varied examples of charts created with them. As the same authors point out, one of the benefits of providing these galleries is that they can serve as a means of comparing expressiveness between different tools.
We provide a Data Viz Catalog in progress, which can be accessed online 5 . The examples in the online catalog are classified by function (comparisons, distribution, correlation, part-to-whole, evolution, map and network), which makes it easier to find those that enable the desired type of analysis. As can be seen (Figure 9), the examples include from basic and frequent chart types, such as bar charts, to more complex ones, such as contiguous cartograms. Given that SCImago Graphica is a grammar-based tool, not a template-based tool, the examples in the catalog represent only a small fraction of all its combinatorial possibilities.
As our Data Viz Catalog show, although SCImago Graphica does not reach the level of expressiveness of low-level programming grammars, it is more expressive in several respects than other interactive grammar-based tools. Its capacity to visualize network data, or the flexibility of its layouts, are some of its most outstanding points.

Discussion and future work
We have presented a novel data visualization authoring tool, which can be used for both visual data communication and exploratory analysis. The expressive power of the tool has been demonstrated through a catalog of data visualizations, while its ease of use has been inspected through several examples of chart building. Obviously, the usability of the tool requires further inquiry. Although the revised examples illustrate the tool's capability to create advanced charts with few interactions, both user studies and analysis of usage in real-world scenarios will be necessary to further understand other usability attributes.
While drag-and-drop visual mapping enables to create data visualizations easily and efficiently, it can be a barrier for novice users (Grammel; Tory; Storey, 2010). The solution in these cases may be the integration of a chart recommendation feature. 'Show me' Stolte, 2007) suggests chart types compatible with the variables the user selects to display. Keshif (Yalçın; Elmqvist; Bederson, 2018) automatically generates exploration-oriented dashboards with the least user effort. Voyager (Wongsuphasawat et al., 2015), on the other hand, is a faceted browsing system of recommended charts chosen on the basis of perceptual and statistical measures.
We believe that an important function that should also be addressed by these chart recommendation systems is to shorten the learning curve of grammar-based tools. Therefore, the future recommendation system to be integrated in SCImago Graphica will not only be designed to simplify the creation of charts for novices, but also to help them learn how to reach these solutions through visual mapping.