# 10.1: Simulation of Systems with a Large Number of Variables - Mathematics

We are ﬁnally moving into the modeling and analysis of complex systems. The number of variables involved in a model will jump drastically from just a few to tens of thousands! What happens if you have so many dynamical components, and moreover, if those components interact with each other in nontrivial ways? This is the core question of complex systems. Key concepts of complex systems, such as emergence and self-organization, all stem from the fact that a system is made of a massive amount of interactive components, which allows us to study its properties at various scales and how those properties are linked across scales.

Modeling and simulating systems made of a large number of variables pose some practical challenges. First, we need to know how to specify the dynamical states of so many variables and their interaction pathways, and how those interactions affect the states of the variables over time. If you have empirical data for all of these aspects, lucky you— you could just use them to build a fairly detailed model (which might not be so useful without proper abstraction, by the way). However, such detailed information may not be readily available, and if that is the case, you have to come up with some reasonable assumptions to make your modeling effort feasible. The modeling frameworks we will discuss in the following chapters (cellular automata, continuous ﬁeld models, network models, and agent-based models) are, in some sense, the fruit that came out of researchers’ collective effort to come up with “best practices” in modeling complex systems, especially with the lack of detailed information available (at least at the time when those frameworks were developed). It is therefore important for you to know explicit/implicit model assumptions and limitations of each modeling framework and how you can go beyond them to develop your own modeling framework in both critical and creative ways.

Another practical challenge in complex systems modeling and simulation is visualization of the simulation results. For systems made of a few variables, there are straightforward ways to visualize their dynamical behaviors, such as simple time series plots, phase space plots, cobweb plots, etc., which we discussed in the earlier chapters. When the number of variables is far greater, however, the same approaches won’t work. You can’t discern thousands of time series plots, or you can’t draw a phase space of one thousand dimensions. A typical way to address this difﬁculty is to deﬁne and use a metric of some global characteristics of the system, such as the average state of the system, and then plot its behavior. This is a reasonable approach by all means, but it loses a lot of information about the system’s actual state.

An alternative approach is to visualize the system’s state at each time point in detail, and then animate it over time, so that you can see the behavior of the system without losing information about the details of its states. This approach is particularly effective if the simulation is interactive, i.e., if the simulation results are visualized on the ﬂy as you operate the simulator. In fact, most complex systems simulation tools (e.g., NetLogo, Repast) adopt such interactive simulation as their default mode of operation. It is a great way to explore the system’s behaviors and become “experienced” with various dynamics of complex systems.

## Confirmed! We Live in a Simulation

Ever since the philosopher Nick Bostrom proposed in the Philosophical Quarterly that the universe and everything in it might be a simulation, there has been intense public speculation and debate about the nature of reality. Such public intellectuals as Tesla leader and prolific Twitter gadfly Elon Musk have opined about the statistical inevitability of our world being little more than cascading green code. Recent papers have built on the original hypothesis to further refine the statistical bounds of the hypothesis, arguing that the chance that we live in a simulation may be 50&ndash50.

The claims have been afforded some credence by repetition by luminaries no less esteemed than Neil deGrasse Tyson, the director of Hayden Planetarium and America&rsquos favorite science popularizer. Yet there have been skeptics. Physicist Frank Wilczek has argued that there&rsquos too much wasted complexity in our universe for it to be simulated. Building complexity requires energy and time. Why would a conscious, intelligent designer of realities waste so many resources into making our world more complex than it needs to be? It's a hypothetical question, but still may be needed.: Others, such as physicist and science communicator Sabine Hossenfelder, have argued that the question is not scientific anyway. Since the simulation hypothesis does not arrive at a falsifiable prediction, we can&rsquot really test or disprove it, and hence it&rsquos not worth seriously investigating.

However, all these discussions and studies of the simulation hypothesis have, I believe, missed a key element of scientific inquiry: plain old empirical assessment and data collection. To understand if we live in a simulation we need to start by looking at the fact that we already have computers running all kinds of simulations for lower level &ldquointelligences&rdquo or algorithms. For easy visualization, we can imagine these intelligences as any nonperson characters in any video game that we play, but in essence any algorithm operating on any computing machine would qualify for our thought experiment. We don&rsquot need the intelligence to be conscious, and we don&rsquot need it to even be very complex, because the evidence we are looking for is &ldquoexperienced&rdquo by all computer programs, simple or complex, running on all machines, slow or fast.

All computing hardware leaves an artifact of its existence within the world of the simulation it is running. This artifact is the processor speed. If for a moment we imagine that we are a software program running on a computing machine, the only and inevitable artifact of the hardware supporting us, within our world, would be the processor speed. All other laws we would experience would be the laws of the simulation or the software we are a part of. If we were a Sim or a Grand Theft Auto character these would be the laws of the game. But anything we do would also be constrained by the processor speed no matter the laws of the game. No matter how complete the simulation is, the processor speed would intervene in the operations of the simulation.

In computing systems, of course, this intervention of the processing speed into the world of the algorithm being executed happens even at the most fundamental level. Even at the most fundamental level of simple operations such as addition or subtraction, the processing speed dictates a physical reality onto the operation that is detached from the simulated reality of the operation itself.

Here&rsquos a simple example. A 64-bit processor would perform a subtraction between say 7,862,345 and 6,347,111 in the same amount of time as it would take to perform a subtraction between two and one (granted all numbers are defined as the same variable type). In the simulated reality, seven million is a very large number, and one is a comparatively very small number. In the physical world of the processor, the difference in scale between these two numbers is irrelevant. Both subtractions in our example constitute one operation and would take the same time. Here we can clearly now see the difference between a &ldquosimulated&rdquo or abstract world of programmed mathematics and a &ldquoreal&rdquo or physical world of microprocessor operations.

Within the abstract world of programmed mathematics, the processing speed of operations per second will be observed, felt, experienced, noted as an artifact of underlying physical computing machinery. This artifact will appear as an additional component of any operation that is unaffected by the operation in the simulated reality. The value of this additional component of the operation would simply be defined as the time taken to perform one operation on variables up to a maximum limit that is the memory container size for the variable. So, in an eight-bit computer, for instance to oversimplify, this would be 256. The value of this additional component will be the same for all numbers up to the maximum limit. The additional hardware component will thus be irrelevant for any operations within the simulated reality except when it is discovered as the maximum container size. The observer within the simulation has no frame for quantifying the processor speed except when it presents itself as an upper limit.

If we live in a simulation, then our universe should also have such an artifact. We can now begin to articulate some properties of this artifact that would help us in our search for such an artifact in our universe.

• The artifact is as an additional component of every operation that is unaffected by the magnitude of the variables being operated upon and is irrelevant within the simulated reality until a maximum variable size is observed.
• The artifact presents itself in the simulated world as an upper limit.
• The artifact cannot be explained by underlying mechanistic laws of the simulated universe. It has to be accepted as an assumption or &ldquogiven&rdquo within the operating laws of the simulated universe.
• The effect of the artifact or the anomaly is absolute. No exceptions.

Now that we have some defining features of the artifact, of course it becomes clear what the artifact manifests itself as within our universe. The artifact is manifested as the speed of light.

Space is to our universe what numbers are to the simulated reality in any computer. Matter moving through space can simply be seen as operations happening on the variable space. If matter is moving at say 1,000 miles per second, then 1,000 miles worth of space is being transformed by a function, or operated upon every second. If there were some hardware running the simulation called &ldquospace&rdquo of which matter, energy, you, me, everything is a part, then one telltale sign of the artifact of the hardware within the simulated reality &ldquospace&rdquo would be a maximum limit on the container size for space on which one operation can be performed. Such a limit would appear in our universe as a maximum speed.

This maximum speed is the speed of light. We don&rsquot know what hardware is running the simulation of our universe or what properties it has, but one thing we can say now is that the memory container size for the variable space would be about 300,000 kilometers if the processor performed one operation per second.

This helps us arrive at an interesting observation about the nature of space in our universe. If we are in a simulation, as it appears, then space is an abstract property written in code. It is not real. It is analogous to the numbers seven million and one in our example, just different abstract representations on the same size memory block. Up, down, forward, backward, 10 miles, a million miles, these are just symbols. The speed of anything moving through space (and therefore changing space or performing an operation on space) represents the extent of the causal impact of any operation on the variable &ldquospace.&rdquo This causal impact cannot extend beyond about 300,000 km given the universe computer performs one operation per second.

We can see now that the speed of light meets all the criteria of a hardware artifact identified in our observation of our own computer builds. It remains the same irrespective of observer (simulated) speed, it is observed as a maximum limit, it is unexplainable by the physics of the universe, and it is absolute. The speed of light is a hardware artifact showing we live in a simulated universe.

But this is not the only indication that we live in a simulation. Perhaps the most pertinent indication has been hiding right in front of our eyes. Or rather behind them. To understand what this critical indication is, we need to go back to our empirical study of simulations we know of. Imagine a character in a role-playing game (RPG), say a Sim or the player character in Grand Theft Auto. The algorithm that represents the character and the algorithm that represents the game environment in which the character operates are intertwined at many levels. But even if we assume that the character and the environment are separate, the character does not need a visual projection of its point of view in order to interact with the environment.

The algorithms take into account some of the environmental variables and some of the character&rsquos state variables to project and determine the behavior of both the environment and the character. The visual projection or what we see on the screen is for our benefit. It is a subjective projection of some of the variables within the program so that we can experience the sensation of being in the game. The audiovisual projection of the game is an integrated subjective interface for the benefit of us, essentially someone controlling the simulation. The integrated subjective interface has no other reason to exist except to serve us. A similar thought experiment can be run with movies. Movies often go into the point of view of characters and try to show us things from their perspective. Whether or not a particular movie scene does that or not, what&rsquos projected on the screen and the speakers&mdashthe integrated experience of the film&mdashhas no purpose for the characters in the film. It is entirely for our benefit.

Pretty much since the dawn of philosophy we have been asking the question: Why do we need consciousness? What purpose does it serve? Well, the purpose is easy to extrapolate once we concede the simulation hypothesis. Consciousness is an integrated (combining five senses) subjective interface between the self and the rest of the universe. The only reasonable explanation for its existence is that it is there to be an &ldquoexperience.&rdquo That&rsquos its primary raison d&rsquoêtre. Parts of it may or may not provide any kind of evolutionary advantage or other utility. But the sum total of it exists as an experience and hence must have the primary function of being an experience. An experience by itself as a whole is too energy-expensive and information-restrictive to have evolved as an evolutionary advantage. The simplest explanation for the existence of an experience or qualia is that it exists for the purpose of being an experience.

There is nothing in philosophy or science, no postulates, theories or laws, that would predict the emergence of this experience we call consciousness. Natural laws do not call for its existence, and it certainly does not seem to offer us any evolutionary advantages. There can only be two explanations for its existence. First is that there are evolutionary forces at work that we don&rsquot know of or haven&rsquot theorized yet that select for the emergence of the experience called consciousness. The second is that the experience is a function we serve, a product that we create, an experience we generate as human beings. Who do we create this product for? How do they receive the output of the qualia generating algorithms that we are? We don&rsquot know. But one thing&rsquos for sure, we do create it. We know it exists. That&rsquos the only thing we can be certain about. And that we don&rsquot have a dominant theory to explain why we need it.

So here we are generating this product called consciousness that we apparently don&rsquot have a use for, that is an experience and hence must serve as an experience. The only logical next step is to surmise that this product serves someone else.

Now, one criticism that can be raised of this line of thinking is that unlike the RPG characters in, say. Grand Theft Auto, we actually experience the qualia ourselves. If this is a product for someone else than why are we experiencing it? Well, the fact is the characters in Grand Theft Auto also experience some part of the qualia of their existence. The experience of the characters is very different from the experience of the player of the game, but between the empty character and the player there is a gray area where parts of the player and parts of the character combine to some type of consciousness.

The players feel some of the disappointments and joys that are designed for the character to feel. The character experiences the consequences of the player&rsquos behavior. This is a very rudimentary connection between the player and the character, but already with virtual reality devices we are seeing the boundaries blur. When we are riding a roller coaster as a character in say the Oculus VR device, we feel the gravity.

Where is that gravity coming from? It exists somewhere in the space between the character that is riding the roller coaster and our minds occupying the &ldquomind&rdquo of the character. It can certainly be imagined that in the future this in-between space would be wider. It is certainly possible that as we experience the world and generate qualia, we are experiencing some teeny tiny part of the qualia ourselves while maybe a more information-rich version of the qualia is being projected to some other mind for whose benefit the experience of consciousness first came into existence.

So, there you have it. The simplest explanation for the existence of consciousness is that it is an experience being created, by our bodies, but not for us. We are qualia-generating machines. Like characters in Grand Theft Auto, we exist to create integrated audiovisual outputs. Also, as with characters in Grand Theft Auto, our product mostly likely is for the benefit of someone experiencing our lives through us.

What are the implications of this monumental find? Well, first of all we can&rsquot question Elon Musk again. Ever. Secondly, we must not forget what the simulation hypothesis really is. It is the ultimate conspiracy theory. The mother of all conspiracy theories, the one that says that everything, with the exception of nothing, is fake and a conspiracy designed to fool our senses. All our worst fears about powerful forces at play controlling our lives unbeknownst to us, have now come true. And yet this absolute powerlessness, this perfect deceit offers us no way out in its reveal. All we can do is come to terms with the reality of the simulation and make of it what we can.

## Introduction

Uncertainty in LCA is pervasive, and it is widely acknowledged that uncertainty analyses should be carried out in LCA to grant a more rigorous status to the conclusions of a study (ISO 2006, JRC-IES 2010). The most popular approach for doing an uncertainty analysis in LCA is the Monte Carlo approach (Lloyd and Ries 2007), partly because it has been implemented in many of the major software programs for LCA, typically as the only way for carrying out uncertainty analysis (for instance, in SimaPro, GaBi, Brightway2, and in openLCA).

The Monte Carlo method is a sampling-based method, in which the calculation is repeated a number of times, in order to estimate the probability distribution of the result (see, e.g., Helton et al. 2006, Burmaster and Anderson 1994). This distribution is then typically used to inform decision-makers about characteristics, such as the mean value, the standard deviation or quantiles (such as the 2.5 and 97.5 percentiles). In LCA, the results are typically inventory results (e.g., emissions of pollutants) or characterization/normalization results (e.g., climate change, human health, etc.). In comparative LCA, such distributions form the basis of paired comparisons and tests of hypothesis (Mendoza Beltran et al. 2018). Many programs and studies offer or present visual aids for interpreting the results, including histograms and boxplots (Helton et al. 2006 McCleese and LaPuma 2002).

A disadvantage of the Monte Carlo method is that it can be computationally expensive. Present-day LCA studies can easily include 10,000 or more unit process, and calculating such as system can take some time. Repeating this calculating for a new configuration then takes the same time, and this is repeated a large number of times. Finally, the stored results must be analyzed in terms of means, standard deviations, p values and visual representations. Altogether, if we use the symbol Nrun to refer to the number of Monte Carlo runs, the symbol Tcal for the CPU time needed to do one LCA calculation, and Tana for the time needed to process the Monte Carlo results, the total time needed, Ttot, is simply

Usually, Tcal > Tana and certainly Nrun × TcalTana, so that we can write

and further ignore the aspect of Tana.

The time needed for a Monte Carlo analysis is thus determined by two factors: Tcal, which is typically in the order of seconds or minutes, and Nrun. A normal practitioner has little influence on Tcal, as it is dictated by the combination of algorithm, the hardware, and the size of the database. Typically, it is between 1 s and 5 min. (This is a personal guess there is no literature on comparative timings using a standardized LCA system). A practitioner has much more influence on the number of Monte Carlo runs, Nrun. So, the trick is often to take Nrun not excessively high, say 100 or 1000. On the other hand, it has been claimed that this number must be large, for instance 10,000 or even 100,000. For instance, Burmaster and Anderson (1994) suggest that “the analyst should run enough iterations (commonly ≥10,000),” and the authoritative Guide to the Expression of Uncertainty in Measurement (BIPM 2008) writes that “a value of M = 10 6 can often be expected to deliver [a result that] is correct to one or two significant decimal digits.” In the LCA literature, we find similar statements, for instance by Hongxiang and Wei (2013) (“more than 2000 simulations should be performed”) and by Xin (2006) (“[it] should run at least 10,000 times”). Such claims also end up in reviewer comments: We recently received the comment “Monte Carlo experiments are normally run 5000 or 10,000 times. In the paper, Monte Carlo experiments are only run 1000 times. Explain why?”. With the pessimistic Tcal = 5 min, using Nrun = 100,000 runs will require almost 1 year. If we take the short calculation time of Tcal = 1 s, we still need more than one full day. And, even Brightway2’s (https://brightwaylca.org/) claim of “more than 100 Monte Carlo iterations/second” (of which we do not know if this also applies to today’s huge systems) would take more than 16 min. Such waiting times may be acceptable for Big Science, investigating fundamental questions on the Higgs boson or the human genome. But, for a day-by-day LCA consultancy firm, even 1 h is much too long.

In this study, we investigate the role of Nrun. We will in particular focus on the original purpose of the Monte Carlo technique vis-à-vis its use in LCA, and consider the fact that in LCA, the input probability distributions are often based on small samples, or on pedigree-style rules-of-thumb, as well as the fact that in LCA, we are in most cases interested in making comparative statements (“product A is significantly better than product B”).

The next section discusses the elements of the analysis: the mathematical model and its probabilistic form, the description of probabilistic (“uncertain”) data, the estimation of input data, and the estimation of output results. Section 3 provides two numerical examples. Section 4 finally discusses and concludes.

## Results and discussion

In order to derive an algorithm for the interpretation of SBML models in a differential equation framework, it is first necessary to take a closer look at the mathematical equations implied by this data format. Based on this general description, we will then discuss all necessary steps to deduce an algorithm that takes all special cases for the various levels and versions of SBML into account.

### A formal representation of models in systems biology

The mathematical structure of a reaction network comprises a stoichiometric matrix N, whose rows correspond to the reacting species S → within the system, whereas its columns represent the reactions, i.e., bio-transformations, in which these species participate. The velocities ν → of the reaction channels R → determine the rate of change of the species’ amounts:

The parameter vector P → contains rate constants and other quantities that influence the reactions’ velocities. According to Liebermeister et al.[27, 28] the modulation matrix W is defined as a matrix of size | R → | × | S → | containing a numerical representation of the type of the regulatory influences of the species on the reactions, e.g., competitive inhibition or physical stimulation. Integrating the homogeneous ordinary differential equation system (1) yields the predicted amounts of the species at each time point of interest within the interval [t0,t T]:

where t 0 ∈ ℝ and t0<t T. Depending on the units of the species, the same notation can also express the change of the species’ concentrations. In this simple case, solving equation (2) can be done in a straightforward way using many (numerical) differential equation solvers. The nonlinear form of the kinetic equations in the vector function ν → constitutes the major difficulty for this endeavor and is often the reason why an analytical solution of these systems is not possible or very hard to achieve. Generally, differential equation systems describing biological networks are, however, inhomogeneous systems with a higher complexity. Solving systems encoded in SBML can be seen as computing the solution of the following equation:

with t0≡0 and t T ∈ ℝ + . The vector Q → of quantities contains the sizes of the compartments C → , amounts (or concentrations) of reacting species S → , and the values of all global model parameters P → . It should be noted that these models may contain local parameters P → that influence the reactions’ velocities, but which are not part of the global parameter vector P → , and hence also not part of Q → .

All vector function terms may involve a delay function, i.e., an expression of the form delay(x,τ) with τ>0. It is therefore possible to address values of x computed in the earlier integration step at time tτ, turning equation (3) into a delay differential equation (DDE). Note that x can be an arbitrarily complex expression.

In the general case of equation (3), not all species’ amounts can be computed by integrating the transformation N ν → : the change of some model quantities may be given in the form of rate rules by function g → ( Q → , t ) . Species whose amounts are determined by rate rules must not participate in any reaction and hence only have zero-valued corresponding entries in the stoichiometric matrix N. Thereby, the rate rule function g → ( Q → , t ) directly gives the rate of change of these quantities, and returns 0 for all others.

In addition, SBML introduces the concept of events f → E ( Q → , t ) and assignment rules r → ( Q → , t ) . An event can directly manipulate the value of several quantities, for instance, reduce the size of a compartment to a certain portion of its current size, as soon as a trigger condition becomes satisfied. An assignment rule also influences the absolute value of a subject quantity.

A further concept in SBML is that of algebraic rules, which are equations that must evaluate to zero at all times during the simulation of the model. These rules can be solved to determine the values of quantities whose values are not determined by any other construct. In this way, conservation relations or other complex interrelations can be expressed in a very convenient way. With the help of bipartite matching[29] and a subsequent conversion it is possible to turn algebraic rules into assignment rules and hence include these into the term r → ( Q → , t ) . Such a transformation, however, requires symbolic computation and is thus a complicated endeavor.

When the system under study operates at multiple time scales, i.e., it contains a fast and a slow subsystem, a separation of the system is necessary, leading to differential algebraic equations (DAEs). Some species can be declared to operate at the system’s boundaries, assuming a constant pool of their amounts or concentrations. Care must also be taken with respect to the units of the species, because under certain conditions division or multiplication with the sizes of their surrounding compartments becomes necessary in order to ensure the consistent interpretation of the models. For all these reasons, solving equation (3) is much more complicated than computing the solution of the simple equation (2) alone.

From the perspective of software engineering, a strict separation of the interpretation of the model and the numerical treatment of the differential equation system is necessary to ensure that regular numerical methods can be used to solve equation (3). In order to efficiently compute this solution, multiple preprocessing steps are required, such as the conversion of algebraic rules into assignment rules, or avoiding repeated recomputation of intermediate results. The next sections will give a detailed explanation of the necessary steps to solve these systems and how to efficiently perform their numerical integration with standard numerical solvers.

### Initialization

At the beginning of the simulation the values of species, parameters and compartments are set to the initial values given in the model. All rate laws of the reactions, assignment rules, transformed algebraic rules (see below), initial assignments, event assignments, rate rules and function definitions are integrated into a single directed acyclic syntax graph. This graph is thus the result of merging the abstract syntax trees representing all those individual elements. Equivalent elements are only contained once. In comparison to maintaining multiple syntax trees, this solution significantly decreases the computation time needed for the evaluation of syntax graphs during the simulation. Figure 1 gives an example for such a syntax graph.

Example for the creation of an abstract syntax graph of a small model. This figure displays a unified representation of kinetic equations from an example model that consists of the following reactions: R 1 : F1,6BP ⇌ DHAP + GA3P , R 2 : DHAP ⇌ GA3P . Both reactions are part of the glycolysis. The contained molecules are fructose 1,6-bisphosphate (F1,6BP), dihydroxyacetone phosphate (DHAP), and glyceraldehyde 3-phosphate (GA3P). Using the program SBMLsqueezer[31] the following mass action kinetics have been created: ν R 1 = k + 1 · [ F1,6BP ] − k − 1 · [ DHAP ] · [ GA3P ] , ν R 2 = k + 2 · [ DHAP]−k−2·[ GA3P]. The nodes for [DHAP] and [GA3P] are only contained in the syntax graph once and connected to more than one multiplication node. This figure clearly indicates that the syntax graph is not a tree. As can be seen in this picture, the outdegree of syntax trees does not have to be binary.

After the creation of this graph, the initial assignments and the assignment rules (including transformed algebraic rules) are processed and initial values defined by these constructs are computed.

### Solving algebraic rules

The most straightforward approach to deal with algebraic rules is to convert them to assignment rules, which can in turn be directly solved. In every equation of an algebraic rule, there should be at least one variable whose value is not yet defined through other equations in the model. This variable has to be determined for the purpose of interpreting the algebraic rule. At first, a bipartite graph is generated according to the SBML specifications[19–22]. This graph is used to compute a matching using the algorithm by Hopcroft and Karp[29]. The initial greedy matching is extended with the use of augmenting paths. This process is repeated until no more augmenting paths can be found. Per definition, this results in a maximal matching. As stated in the SBML specifications[19–22], if any equation vertex remains unconnected after augmenting the matching as far as possible, the model is considered overdetermined and thus is not a valid SBML model. If this is not the case, the mathematical expression of every algebraic rule is thereafter transformed into an equation with the target variable on its left-hand side, and hence fulfills the definition of an assignment rule. The left-hand side is represented by the respective variable vertex, to which the considered algebraic rule has been matched. Figure 2 displays the described algorithm in the form of a flow chart.

Algorithm for transforming algebraic rules to assignment rules. The first step is to decide whether the model is overdetermined by creating a matching between the equations and the variables of a model. For this purpose, an initial greedy matching is computed based on a bipartite constructed according to the SBML specifications. To obtain a maximal matching, augmenting paths are determined and the current matching is extended. If there are no augmenting paths available anymore, the computed matching is maximal. Having an unconnected equation vertex results in an overdetermined model. If the matching is not overdetermined, for each algebraic rule an assignment rule is generated. The left-hand side of each rule corresponds to the variable the respective algebraic rule has been matched to.

### Event handling

An event in SBML is a list of assignments that is executed depending on whether a trigger condition switches from false to true. In addition, SBML enables modellers to define a delay which may postpone the actual execution of the event’s assignments to a later point in time. With the release of SBML Level 3 Version 1, the processing of events has been raised to an even higher level of complexity: in earlier versions it was sufficient to determine, when an event triggers and when its assignments are to be executed. In Level 3 Version 1 only a few new language elements have been added, but these have a significant impact on how to handle events: for example, the order, in which events have been processed, used to be at programmer’s discretion in SBML Level 2, but in Level 3 Version 1 it is given by the event’s priority element. Coordinating the sequence, in which events are to be executed, has now become the crucial part of event handling. Furthermore, there exists the option to cancel an event during the time since its trigger has been activated and the actual time when the scheduler picks the event for execution. Events that can be cancelled after the activation of their triggers are called nonpersistent.

At every time step, the events to be executed are a union of two subsets of the set of all events. On one hand, there are events whose triggers have been activated at the current time and which are to be evaluated without delay. On the other hand, there are events triggered at some time point before, and whose delay reaches till the current point in time. For every element of the resulting set of events, the priority rule must be evaluated. One event is randomly chosen for execution from all events of highest priority. In principle, all other events could be processed in the same manner, but the assignment of the first event can change the priority or even the trigger condition of the events that have not yet been executed. Therefore, the trigger of nonpersistent events and the priority of the remaining events have to be evaluated again. In this case, the event that has now the highest priority is chosen as next. This process must be repeated until no further events are left for execution. Figure 3 shows the slightly simplified algorithm for event processing at a specific point in time: Let E be the set of all events in a model, and EI be the set of events whose trigger conditions have already been evaluated to true in previous time steps. We refer to elements within EI as inactive events. We define the set EA as the subset of E containing events whose trigger condition switches from false to true at the current time step t. At the beginning of the event handling, EA is empty. We call an event persistent, if it can only be removed from EA under the condition that all of its assignments have been evaluated. This means that a nonpersistent event can be removed from EA when its trigger condition becomes false during the evaluation of other events. The function trig(e) returns 1 or 0 depending on whether or not the trigger condition of event eE is satisfied. Similarly, the function persist(e) returns 1 if event e is persistent, or 0 otherwise.

Processing of events: simplified algorithm (handling of delayed events omitted). At each iteration, the trigger conditions of active events eaEA that are not persistent are checked. If the trigger condition of such an event has changed from true (1) to false (0), the event is removed from EA. The next step comprises the evaluation of the triggers of all events. If its trigger changes from false to true, an event is added to the set of active events EA. An event with its trigger changed from true to false is removed from the list of inactive events. After the processing of all triggers, the event e of highest priority in the set of active events is chosen for execution by the function choose(EA). Note that priorities are not always defined, or multiple events may have an identical priority. The function choose(EA) is therefore more complex than shown in this figure. The selected event is then processed, i.e., all of its assignments are evaluated, and afterwards the triggers of all events in E have to be evaluated again, because of possible mutual influences between the events. The algorithm proceeds until the set EA of active events is empty.

The interpretation of events is the most time consuming step of the integration procedure. This is why efficient and clearly organized data structures are required to ensure high performance of the algorithm.

### Time step adaptation considering events and the calculation of derivatives

The precise calculation of the time when events are triggered is crucial to ensure exact results of the numerical integration process. It could, for instance, happen that an event is triggered at time t τ, which is between the integration time points tτ−1 and tτ+1. When processing the events only at time points tτ−1 and tτ+1, it might happen that the trigger condition cannot be evaluated to true at neither of these time points. Hence, a numerical integration method with step-size adaptation is required in order to hit the correct time points. Rosenbrock’s method[30] can adapt its step size h if events occur (see Figure 4 for details). For a certain time interval [tτ−1,tτ+1] and the current vector Q → , Rosenbrock’s method determines the new value of vector Q → at a point in time tτ−1+h, with h>0. If the error tolerance cannot be respected, h is reduced and the procedure is repeated.

Refined step-size adaptation for events. For a certain time interval, the Rosenbrock solver (KiSAO term 33) always tries to increase time t by the current adaptive step size h and calculates a new vector of quantities Q → next . After a successful step, the events and rules of the model are processed. If this causes a change in Q → , h is first decreased and the Rosenbrock solver then calculates another vector Q → next using this adapted step size. The precision of the event processing is therefore determined by the minimum step size hmin. The adapt function is defined by Rosenbrock’s method[30].

After that, the events and the assignment rules are processed at the new point in time tτ−1+h. If the previous step causes a change in Q → , the adaptive step size is decreased by setting h to h/10 and the calculation is repeated until either the minimum step size is reached or the processing of events and assignment rules does not change Q → anymore. Hence, the time at which an event takes place is precisely determined.

For given values Q → at a point t in time the current vector of derivatives Q → ̇ is calculated as follows. First, the rate rules are processed Q → ̇ = g → ( Q → , t ) . Note that function g → returns 0 in all dimensions in which no rate rule is defined. Second, the velocity ν i of each reaction channel R i is computed with the help of the unified syntax graph (e.g., Figure 1). The velocity functions depend on Q → at time t. During the second step, the derivatives of all species that participate in the current reaction R i need to be updated (see the flowchart in Figure 5).

Calculation of the derivatives at a specific point in time. First, the vector for saving the derivatives of all quantities Q → ̇ is set to the null vector 0 → . Then the rate rules of the model are processed by solving the function g → ( Q → , t ) , which can change Q → ̇ in some dimensions. After that for every reaction channel R j its velocity J j is computed. The derivatives of each species (with index i) participating in the currently processed reaction channel R j are updated in each step adding the product of the stoichiometry n i j and the reaction’s velocity J j. In this figure, the stoichiometric values n i j in the matrix N are assumed to be constant for the sake of simplicity. These values can be variable. Before Level 3, SBML provided StoichiometryMath elements that could be used for a direct computation of the stoichiometry. In Level 3, the StoichiometryMath element has been removed and these values can be changed by treating them as the subject of assignment rules. In both cases, the values for n i j have to be updated in each simulation step.

### A reference implementation of the algorithm

The algorithm described above has been implemented in Java™ and included into the Systems Biology Simulation Core Library. Figure 6 displays an overview of the software architecture of this community project, which has been designed with the aim to provide an extensible numerical backend for customized programs for research in computational systems biology. The SBML-solving algorithm is based on the data structures provided by the JSBML project[31]. With the help of wrapper classes several numerical solvers originating from the Apache Commons Math library[32] could be included into the project. In addition, the library provides an implementation of the explicit fourth order Runge-Kutta method, Rosenbrock’s method, and Euler’s method.

Architecture of the Systems Biology Simulation Core Library (simplified). Numerical methods are strictly separated from differential equation systems. The upper part displays the unified type hierarchy of all currently included numerical integration methods. The middle part shows the interfaces defining several special types of the differential equations to be solved by the numerical methods. The class SBMLinterpreter (bottom part) implements all of these interfaces with respect to the information content of a given SBML model. Similarly, an implementation of further data formats can be included into the library.

Due to the strict separation between numerical differential equation solvers, and the definition of the actual differential equation system, it is possible to implement support for other community standards, such as CellML[9].

In order to support the standard Minimum Information About a Simulation Experiment (MIASE)[33], the library also provides an interpreter of Simulation Experiment Description Markup Language (SED-ML) files[26]. These files allow users to store the details of a simulation, including the selection and all settings of the numerical method, hence facilitating the creation of reproducible results. A simulation experiment can also be directly started by passing a SED-ML file to the interpreter in this library. Each solver has a method to directly access its corresponding Kinetic Simulation Algorithm Ontology (KiSAO) term[34] to facilitate the execution of SED-ML files.

Many interfaces, abstract classes, and an exhaustive source code documentation in the form of JavaDoc facilitate the customization of the library. For testing purposes, the library contains a sample program that benchmarks its SBML interpreter against the entire SBML Test Suite version 2.3.2[24].

### Benchmark and application to published models

The reference SBML implementation has successfully passed the SBML Test Suite[24] using the Rosenbrock solver. The results are shown in Table 1. All models together can be simulated within seconds, which means that the simulation of one SBML model takes only milliseconds on average, using regular desktop computers.

The total simulation time for all models in SBML Level 3 Version 1 is significantly higher than for the models in other SBML levels and versions. This can be explained by the fact that the test suite contains some models of this version whose evaluation requires a time-consuming processing of a large number of events. In particular, the simulation of model No. 966 of the SBML Test Suite, which is only provided in SBML Level 3 Version 1, takes 20s because it contains 23 events to be processed. Two events fire every 10 −2 time units within the simulation time period of 1,000 time units. These events must therefore be evaluated thousandfold within the specified time interval. The evaluation of this model accounts for over 50% of the total simulation time for the models in SBML Level 3 Version 1.

An implementation of an SBML solver that passes the test suite should in principle also be capable of computing the solution of all models from BioModels Database, a resource that contains a collection of published and curated models. This online database currently provides neither reference data for the models, nor any settings for the numerical computation (such as step size, end time etc.). However, it offers pre-computed plots of the time courses for the vast majority of models. Therefore, while it cannot be directly used as a benchmark test, it can help checking that a solver implementation supports all features of many published models and that the algorithm always successfully terminates. The Systems Biology Simulation Core Library solves all curated models from BioModels Database (release 23, October 2012) without raising any errors, see Methods for details. These results suggest the reliability of the simulation algorithm described in this work.

In the following, we select two models that exhibit diverse features from this repository to illustrate the capabilities of this library: BioModels Database model No. 206 by Wolf et al.[35] and BioModels Database model No. 390 by Arnold and Nikoloski[36].

The model by Wolf et al.[35] mimics glycolytic oscillations that have been observed in yeast cells. The model describes how the dynamics propagate through the cellular network comprising eleven reactions, which interrelate nine reactive species. Figure7a displays the simulation results for the intracellular concentrations of 3-phosphogylcerate, ATP, glucose, glyceraldhyde 3-phosphate, and NAD + : after an initial phase of approximately 15 s all metabolites begin a steady-going rhythmic oscillation. Changes in the dynamics of the fluxes through selected reaction channels within this model can be seen in Figure 7b.

Simulation of glycolytic oscillations. This figure displays the results of a simulation computed with the Systems Biology Simulation Core Library based on model No. 206 from BioModels Database [35, 37]. A) Shown are the changes of the concentration of the most characteristic intracellular metabolites 3-phosphoglycerate, ATP, glucose, glyceraldehyde 3-phosphate (GA3P), and NAD + within yeast cells in the time interval [ 0, 30] seconds. B) This panel displays a selection of the dynamics of relevant fluxes ( D -glucose 6-phosphotransferase, glycerone-phosphate- forming, phosphoglycerate kinase, pyruvate 2-O-phosphotransferase, acetaldehyde forming, ATP biosynthetic process) that were computed as intermediate results by the algorithm. The computation was performed using the Adams-Moulton solver [38] (KiSAO term 280) with 200 integration steps, 10 −10 as absolute error tolerance and 10 −5 as relative error tolerance. Due to the importance of feedback regulation the selection of an appropriate numerical solver is of crucial importance for this model. Methods without step-size adaptation, such as the fourth order Runge-Kutta algorithm (KiSAO term 64), might only be able to find a high-quality solution with an appropriate number of integration steps. The simulation results obtained by using the algorithm described in this work reproduces the results provided by BioModels Database.

By comparing a large collection of previous models of the Calvin-Benson cycle, Arnold and Nikoloski created a quantitative consensus model that comprises eleven species, six reactions, and one assignment rule[36]. All kinetic equations within this model call specialized function definitions. Figure 8 shows the simulation results for the species ribulose-1,5-bisphosphate, ATP, and ADP within this model. As in the previous test case, the dynamics computed by the Simulation Core Library reproduce the figures provided by BioModels Database.

Simulation of the Calvin-Benson cycle. Another example of the capabilities of the Simulation Core Library has been obtained by solving model No. 390 from BioModels Database [36, 37]. This figure shows the evolution of the concentrations of ribulose 1,5-bisphosphate, a key metabolite for CO2 fixation in the reaction catalyzed by ribulose-1,5-bisphosphate carboxylase oxygenase (RuBisCO), and the currency metabolites ATP and ADP during the first 35s of the photosynthesis. This model was simulated using Euler’s method (KiSAO term 30) with 200 integration steps.

### Comparison to existing solver implementations for SBML

In order to benchmark our software, we chose similar tools exhibiting the following features from the SBML software matrix[39]:

The last updated version was released after the final release of the specification for SBML Level 3 Version 1 Core, i.e., October 6 th 2010.

No dependency on commercial products that are not freely available (e.g., MATLAB™ or Mathematica™)

The selected programs are in alphabetical order: BioUML[40], COPASI[41], iBioSim[42], JSim[43], LibSBMLSim[44], and VCell[45, 46]. Table 2 summarizes the comparison of the most recent versions of all six programs. It should be mentioned that this comparison can only mirror a snapshot of the ongoing development process of all programs at the time of writing. An up-to-date comparison of the capabilities of SBML solvers can be found online[47].

### Limitations and perspective

The modifications done to the Rosenbrock solver enable a precise timing of events during simulation. However, this precise timing can lead to a noticeable increase in run-time when events are triggered in very small intervals, e.g., every 10 −3 time units. This behavior can, for example, be observed in BioModels Database model No. 408[48] (a model with three events). When the precise timing of events is not of utmost importance, a solver other than Rosenbrock can be chosen. Furthermore, there are plans to improve the runtime behavior of the Rosenbrock solver for the simulation of models containing events.

When dealing with stiff problems, Rosenbrock’s method is a good choice, because it is has been designed for stiff pODE. However, our experiments show, that the Rosenbrock solver can be inefficient for non-stiff problems in comparison to other solvers. This issue can lead to an increased run-time regarding large models such as model No. 235 of the BioModels Database, which contains 622 species that participate in 778 reactions, distributed accross three compartments[49]. In some cases, tuning the relative and absolute tolerance can help, but depending on the system’s structure, Rosenbrock’s method is sometimes stretched to its limits. The Runge-Kutta-Fehlberg method[50] (KiSAO term 86), which is included in iBioSim, shows also an increase in run-time concerning this model.

The performance of the Runge-Kutta-Fehlberg and Rosenbrock methods show, however, that simpler ODE solvers can have more difficulties with some biological models than more advanced solvers, such as CVODE from SUNDIALS [51] that can adapt to both non-stiff and stiff problems. The SUNDIALS library, which is incorporated into BioUML, can handle complicated pODE significantly better, but since it is not available under the LGPL and no open-source Java version of these solvers can currently be obtained, we disregarded its use.

Algebraic rules constitute an important problem for any implementation of the SBML standard. The unbound variable of each such equation can be efficiently identified[29], whereas the transformation of an algebraic rule into an assignment rule includes symbolic computation and is very difficult to implement. In some cases, such a transformation is not even possible. Alternatively, the current value of the free variable in an algebraic equation could, for instance, be identified using nested intervals. However, this approach consumes a significantly higher run-time, because the nested intervals would have to be re-computed at every time step, whereas the transformation approach considers every algebraic rule only once (during the initialization).

Since Level 3, SBML entails one further aspect: it is now possible to add additional features to the model by declaring specialized extension packages. The algorithm discussed in this paper describes the core functionality of SBML. The extension packages are very diverse, reaching the graphical representation[53], the description of qualitative networks, such as Petri nets[54], and many more. It is therefore necessary to separately derive and implement algorithms for the interpretation of individual SBML packages.

The agenda for the further development of the open-source project, the Systems Biology Simulation Core Library, includes the implementation of SBML extension packages, support for CellML, and the incorporation of additional numerical solvers. Contributions from the community are welcome.

### Tutorials - Videos - Webinars - Online Courses for You and Your Team

• Optimization Tutorial
• Simulation/Risk Analysis
Tutorial
• Data Mining Tutorial

#### Optimization Tutorial

Solvers, or optimizers, are software tools that help users determine the best way to allocate scarce resources . Examples include allocating money to investments, or locating new warehouse facilities, or scheduling hospital operating rooms. In each case, multiple decisions need to be made in the best possible way while simultaneously satisfying a number of requirements (or constraints). The "best" or optimal solution might mean maximizing profits, minimizing costs, or achieving the best possible quality. Here are some representative examples of optimization problems:

Finance/Investment : Cash management, capital budgeting, portfolio optimization.

Manufacturing : Job shop scheduling, blending, cutting stock problems.

#### Simulation/Risk Analysis Tutorial

Quantitative risk analysis is the practice of creating a mathematical model of a project or process that explicitly includes uncertain parameters that we cannot control, and also decision variables that we can control. Monte Carlo simulation explores thousands of possible scenarios, and calculates the impact of the uncertain parameters and the decisions we make on outcomes that we care about -- such as profit and loss, investment returns, environmental results and more. Industries where simulation and risk analysis are heavily used include:

Pharmaceuticals : Modeling R&D and clinical trials

Oil & Gas : Modeling drilling projects

Insurance : Modeling frequency and types of claims

#### Data Mining Tutorial

Data mining software tools help users find patterns and hidden relationships in data, that can be used to predict behavior and make better business decisions. A machine learning algorithm "trained" on past observations can be used to predict the likelihood of future outcomes such as customer "churn'" or classify new transactions into categories such as "legitimate" or "suspicious". Other methods can be used uncover "clusters" of similar observations, or find associations among different items. Common applications include:

Financial Services : Fraud detection, good vs. bad credit risks

Direct Marketing : Segmentation to improve response rates

Electoral Politics : Identifying "most persuadable" voters

## Introduction

Until recently, many of the models ignored stochastic effects because of difficulty in solution. But now, stochastic differential equations (SDEs) play a significant role in many departments of science and industry because of their application for modeling stochastic phenomena, e.g., finance, population dynamics, biology, medicine and mechanics. If we add a random element or elements to the deterministic differential equation, we have transition from an ordinary differential equation to SDE. Unfortunately, in many cases analytic solutions are not available for these equations, so we are required to use numerical methods [1, 2] to approximate the solution. [3–6] discussed the numerical solutions of SDEs. [7] presented many numerical experiments. Some analytical and numerical solutions were proposed in [8]. [9] considered numerical approximations of random periodic solutions for SDEs. On the other hand, [10] constructed a Milstein scheme by adding an error correction term for solving stiff SDEs.

In this paper we consider the general form of one-dimensional SDE with

where f is the drift coefficient, while g is the diffusion coefficient and (W(t,omega)) is the Wiener process. From now on, let (X(t,omega)=X(t)) and (W(t,omega)=W(t)) for simplicity. The Wiener process (W(t)) satisfies the following three conditions:

(W(t)-W(s) simsqrt N(0,1)) for (0leq s < t) , where (N(0,1)) indicates a standard normal random variable

Increments (W(t)-W(s)) and (W( au)- W(upsilon)) are independent on distinct time intervals for (0leq s< t< au< upsilon) .

Integral form of (1) is as follows:

If (f(t,X(t))=a_<1>(t)X(t)+a_<2>(t)) and (g(t,X(t))=b_<1>(t)X(t)+b_<2>(t)) are linear, then the SDE is linear, and if they are nonlinear, the SDE is nonlinear, where (a_<1>) , (a_<2>) , (b_<1>) , (b_ <2>) are specified functions of time t or constants. In the next section we give the Monte Carlo simulation, the method we will use for our experiments. In Section 3 we denote the numerical methods for SDE. First, we represent a stochastic Taylor expansion and we obtain the Euler-Maruyama [11] and Milstein methods [12] from the truncated Ito-Taylor expansion. In Section 4 we consider a nonlinear SDE, and we solve and explicate our equation for two different methods, namely the EM and Milstein methods. We use MATLAB for our simulations and support our results with graphs and tables. And the last section consists of conclusion, which gives our suggestions.

## Case studies

This section presents various case studies where serious games and simulation software have been implemented in different educational contexts. More precisely, the case of the University of Cantabria (Spain) is based on a traditional on-campus learning approach the case of Universidade Aberta (Portugal) follows a pure online approach while the case of the Universitat Autonoma de Barcelona (Spain) presents a blended approach. In addition, a fourth case regarding Trinity College Dublin (Ireland) has been included as well. It highlights some gender-related aspects to be considered when designing and implementing serious games and simulation software. Due to the present gender imbalance in the number of students in STEM studies, this last case is of special interest.

### A case study at the Universidad de Cantabria (Spain)

Free direct feedback given by game players was positive but scarce, limited to only a small number of students. The use of this game was discontinued after one academic year, as no noticeable learning results were observed, as well as no general attitude of interest towards it. The lack of interest was likely related to the fact that the activity was not considered in the assessment of the course. The instructor’s view is that gaming activity should be tutored inside the classroom in order to have any impact on learning. Alternatively, it should be integrated with an online learning environment such as Moodle, so that the instructor can track the students’ activity and progress. We also believe the poor results might be assigned to the game mechanics itself, as it is not actually related to the contents being taught, but merely a container for questions and answers that could be used for any topic.

### A case study at the Universidade Aberta (Portugal)

The Universidade Aberta (UAb, http://portal.uab.pt) is a pure on-line Portuguese university offering university degrees over the Internet to more than 8,000 students located in different continents. An adapted version of the well-known Moodle (https://moodle.org) platform for e-Learning is used in most of their courses. This case study analyses the Advanced Optimisation course in the Doctoral degree in Applied Mathematics and Modelling, during the time period from academic year 2015/2016 to 2018/2019 (4 academic years, one-course edition per year). All of the 5 to 8 students who take this module every year exhibit a strong background in Mathematics, although not all the students show a good background in programming and/or simulation concepts. Approximately 60% of them are male and 40% are female. Given that it is an online course, the students come from different Portuguese-speaking countries, such as Portugal, Brazil, Angola, etc. These students show a high degree of cultural heterogeneity and, of course, they live in different time zones, which explains why an asynchronous learning model is required.

In this course, students have to deal with complex decision-making problems that arise in real-life logistics, transportation, production, telecommunication, and financial systems. Most of these problems are large-scale and they include stochastic as well as dynamic components, which represent additional challenges for managers. During the course, students have to analyse different heuristic-based algorithms –implemented in programming languages, such as Java– that can effectively solve these problems in reasonable computing times. Simulation techniques are usually integrated inside the heuristic algorithm in order to deal with the real-life uncertainty that characterises some of these systems (Juan et al. 2015). Also, visual representations of the solutions generated by the algorithms are provided in order to obtain insights on how the different system components –e.g., distribution routes in a multi-depot environment– interact among them (Fig. 2).

Screen shot of a simulated solution to a logistics distribution problem

According to the different information sources used, that is, students’ scores, students’ opinions in the online forums about their learning process, and instructors’ view of the learning process, it can be concluded that the use of simulation techniques and the visual representation of the solutions generated by the optimization algorithms were key factors to enrich and extend the existing theoretical background of the students so that they could link mathematical formulations and concepts to real-life applications in different fields. Although students in the course enjoy the possibility of learning new solving approaches that can effectively support managers during complex decision-making processes, they also acknowledge the methodological challenges associated with the design and code implementation of such algorithms, which usually require interdisciplinary skills in different areas, i.e., optimisation concepts, advanced programming skills, and a good understanding of the specific application field (logistics, telecommunication, finance, etc.) as well as of the manager’s utility function.

### A case study at the Universitat Autonoma de Barcelona (Spain)

With over 30,000 registered students in more than 250 degrees (including both undergraduate and graduate programs), the Universitat Autonoma de Barcelona (UAB, http://www.uab.cat) is one of the largest and most prestigious universities in the area of Barcelona, Spain. Traditionally, UAB courses have been taught in a face-to-face modality. However, at present, the UAB also offers some degree programs which follow a blended learning paradigm via the support of online collaborative tools, such as Cisco WebEx (https://www.webex.com). Using this tool, students from any part of the world can follow the classes on-line and share their comments and questions with other students who are physically located inside the class where the instructor is lecturing.

One of these degrees is the UAB MSc in Aeronautical Management, which includes a course on theoretical and applied simulation. This case study analyses this master course, during the time period from the academic year 2013/2014 to 2018/2019 (6 academic years, one-course edition per year). The number of students per year ranges from 20 to 40, from which 65-70% of the students follow the course in a face-to-face format, while the remaining 30-35% follow the course online from South America. These students come with very different backgrounds, ranging from Aeronautical Management to Industrial Engineering or even Business Administration. About 70% of the students are male, while the remaining 30% are female. This course contains a lab in which students are requested to use simulation software, such as Simio (https://www.simio.com) and Cast (https://airport-consultants.com) to model and analyse different scenarios in the context of airport and airlines management. For example, students can model a simple baggage handling system and monitoring how its performance evolves over time under different configurations (Fig. 3). By varying the components of the baggage handling system and the available resources, students can obtain insight on how the process work and make informed decisions about the right number of resources (assistants, vehicles, etc.) to be assigned during the check-in and transportation stages. Similar analysis can be performed on the security-control point, the boarding process (Mas et al. 2013 Carmona et al. 2014), the aircraft turn-around process (Silverio et al. 2013 San Antonio et al. 2017), the aircraft evacuation process (Estany et al. 2017), etc.

Screen shot of a simple Simio model with monitoring graphs

The possibility of using modern simulation software to build their own models of the real-life systems allow students to promote their creativity and modelling skills, as well as their understanding of how these systems work and how they can be improved (in terms of some key performance indicators) by choosing the right set up, as confirmed by students’ scores, students’ opinions in the online forums about their learning process, and instructors’ view of the learning process during several years of interaction with the described simulation tools and concepts. Moreover, the fact that modern software benefits from the object-oriented paradigm also facilitates the development of complex simulation models by simply using drag-and-drop actions on an extensive library of objects.

### A case study at Trinity College Dublin (Ireland)

Traditionally, the number of female students interested in Engineering degrees is low. In the case of Ireland, women constituted only about 12% of the new entrants to engineering courses in the academic year 2017-2018. With the aim of attracting their interest in the STEM careers, in the last years, Trinity College Dublin, ( www.tcd.ie ) has held 6-weeks summer schools for groups of 10-15 female students of around 16 years old. Trinity College Dublin, with 17,000 undergraduate and postgraduate students, is ranked within the top 100 world universities in the 2017/2018 QS World University Ranking across all indicators. Within this context, the Department of Mechanical and Manufacturing Engineering developed and put into practice a tailor-made serious game for three years (2014-2017). The game is designed for the player to make a series of decisions required to manufacture hairdryers, e.g., the quantity to be manufactured, the selection of components considering aspects such as materials and suppliers, business strategies, etc. The main goal of the game is to show students specific job roles to enable them to envisage the type of work they might perform as engineers. The students’ feedback collected by questionnaires showed that the most enjoyable diary activity in the summer school was the game, and in most of the cases, they continued playing at home. Nevertheless, further information about the capacity of the game to engage students in engineering careers has not been tracked.

When they designed the serious game, the mechanisms of cooperation and competition put in place were thoroughly considered given the different attitudes between male and female players, that is, women usually feel more comfortable cooperating than competing. In addition, students in pure competitive games do not benefit from the experiences and ideas of other colleagues. Nevertheless, competition is directly linked with the extrinsic motivation caused by a reward, as mentioned in “Introduction” section. An interesting element of the hairdryer manufacture game, which might have been the cause of its success in terms of popularity, is a newspaper where related pieces of news are released in a humorous tone. In addition, students can also release their own communications. This design component presents three advantages, i.e., to highlight the achievements of some players, which is related to the extrinsic motivation, to create the feeling of community, and to have fun, which both are related to the intrinsic motivation. In a second stage of the game, a more challenging version, requiring the application of engineering decisions is expected to be developed and used in the degree courses. The main goal is to avoid the loss of interest of students in the first years of Engineering, given the large content of foundation knowledge, without a straightforward application. Nevertheless, the development of the software is subjected to temporal and economic limitations, given that this type of activity is not seen as a priority, and there are not quantifiable indicators supporting their utility.

### Discussion

Despite applied to different fields, target students and countries, the insights gained with each case seem to be consistent with the rest of the cases. In fact, recent literature addressing other case studies present well-aligned conclusions. For instance, like the cases of the Universidade Aberta and the Universitat Autonoma de Barcelona, Milosz and Milosz (2018) portray a case study in which simulation games are employed to train engineers in logistics-related concepts. Areas such as logistics, transport, and smart cities offer a clear environment where SE practices can be extremely useful in the training of new generations of decision-makers who do not necessarily have to certify a strong engineering background. Luna et al. (2018) tested the impact of integrating various learning strategies (i.e., simulation, serious games, case studies, and multimedia cases) in the curriculum of a Business Engineering course at the Universidad del Pacifico (Peru). Here, the use of simulation and serious games is guided by the instructors. They conclude that simulation games facilitate the development of students’ analytical thinking, as discussed in the case study of the Universitat Autonoma de Barcelona.

In the field of marine ecology, Ameerbakhsh et al. (2019) used SE games to compare a student-centred (active) training approach with a teacher-led (passive) approach. The idea was to interact with a simulation game modelling a biomass production system. Then, by properly setting this model, the goal was to increase the sustainability of the marine environment. The study concluded that the participation of an expert instructor could significantly enrich the experience of the students with the simulation model and guide them better during their learning process. These results reinforce the idea that the instructor’s support and guidance add value to the simulation-supported training process, as observed in the case study of the Universidad de Cantabria. Also, this conclusion is supported by Luna et al. (2018), who emphasises the fact that serious games need to provide goal-focused challenges for the users, and that the users should receive informative feedback from both the game and the instructor.

Many works mention the enjoyable learning experience and how the students feel more motivated when simulation and serious games are incorporated in their academic curricula, as highlighted in the case of Trinity College Dublin. Nevertheless, the adequate engagement of the students with these tools might require the recognition of the effort and time invested when students are assessed. In addition, they also contribute to reducing the gap between theory and practice, which in some STEM areas might be quite noticeable. This feature underpins the work of Reis and Kenett (2017), who present a set of storyboards to illustrate the potential of simulation in higher education when training students in a number of statistical methods.

## Design and simulation of assembly line feeding systems in the automotive sector using supermarket, kanbans and tow trains: a general framework

A growing number of manufacturers are adopting the so-called supermarket strategy to supply components to the production system. Supermarkets are decentralized storage areas used as intermediate warehouses for parts required by the production system (typically assembly lines). Such a feeding system is widely used in the automotive industry where assembly stations in multiple mixed-model assembly lines are usually refilled by means of a systematic part replenishment driven by Kanban systems, adopting small trucking vehicles towing some wagons (tow trains). The aim of this paper is to provide a simple but robust framework in order to design the supermarket/feeding system dedicated to complex multiple mixed-model assembly lines. This framework proposes an integrated approach both for long-term (static analytical model) and short-term (dynamic simulation) problems dealing with Kanban and Supermarket systems dedicated to assembly lines, and the tow train fleet sizing and management. This proposed methodology is applied to a case study derived from the Italian automotive industry, and the results highlight the high interrelation between the long and the short term variables that can be evaluated only by an integrated approach that considers both static and dynamic aspects of the problem. The results of this study are then presented and widely discussed.

This is a preview of subscription content, access via your institution.

## Abbreviations

Harrell FE, Lee KL, Califf RM, Pryor DB, Rosati RA. Regression modelling strategies for improved prognostic prediction. Stat Med. 1984 3(2):143–52.

Harrell FE. Regression Modeling Strategies: with applications to linear models, logistic regression, and survival analysis. New York: Springer 2001.

Steyerberg EW. Clinical Prediction Models. Statistics for Biology and Health. New York: Springer 2009.

Gart J, Zweifel J. On the Bias of Various Estimators of the Logit and Its Variance with Application to Quantal Bioassay, Vol. 1 1967. pp. 181–7.

Jewell N. Small-sample Bias of Point Estimators of the Odds Ratio from Matched Sets. Biometrics. 1984 40(2):421–35.

Nemes S, Jonasson J, Genell A, Steineck G. Bias in odds ratios by logistic regression modelling and sample size. BMC Med Res Method. 2009 9(1):56.

Altman DG, Royston P. What do we mean by validating a prognostic model?Stat Med. 2000 19(4):453–73.

Vergouwe Y, Steyerberg EW, Eijkemans MJ, Habbema JDF. Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. J Clin Epidemiol. 2005 58(5):475–83.

Moons KGM, de Groot JAH, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, Reitsma JB, Collins GS. Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies: The CHARMS Checklist. PLoS Med. 2014 11(10):e1001744.

Moons KGM, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, Vickers AJ, Ransohoff DF, Collins GS. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration. Ann Int Med. 2015 162(1):W1–W73.

Pavlou M, Ambler G, Seaman SR, Guttmann O, Elliott P, King M, Omar RZ. How to develop a more accurate risk prediction model when there are few events. BMJ. 2016 353:i3235.

Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996 49(12):1373–9.

Vittinghoff E, McCulloch CE. Relaxing the rule of ten events per variable in logistic and cox regression. Am J Epidemiol. 2007 165(6):710–8.

Courvoisier DS, Combescure C, Agoritsas T, Gayet-Ageron A, Perneger TV. Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure. J Clin Epidemiol. 2011 64(9):993–1000.

Bull SB, Greenwood CMT, Hauck WW. Jacknife bias reduction for polychotomous logistic regression. Stat Med. 1997 16(5):545–60.

Albert A, Anderson J. On the existence of maximum likelihood estimates in logistic regression models. Biometrika. 1984 71(1):1–10.

Heinze G, Schemper M. A solution to the problem of separation in logistic regression. Stat Med. 2002 21(16):2409–19.

Heinze G. A comparative investigation of methods for logistic regression with separated or nearly separated data. Stat Med. 2006 25(24):4216–26.

Steyerberg EW, Schemper M, Harrell FE. Logistic regression modeling and the number of events per variable: selection bias dominates. J Clin Epidemiol. 2011 64(12):1464–5.

Firth D. Bias reduction of maximum likelihood estimates. Biometrika. 1993 80(1):27–38.

Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Stat Med. 2006 25(24):4279–92.

R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing 2003. http://www.R-project.org.

Lesaffre E, Albert A. Partial Separation in Logistic Discrimination. J R Stat Soc Series B. 1989 51(1):109–16.

King G, Zeng L. Logistic Regression in Rare Events Data. Pol Anal. 2001 9(2):137–63.

Cordeiro G, McCullagh P. Bias correction in generalized linear models. J R Stat Soc Series B. 1991 53(3):629–43.

Clarkson DB, Jennrich RI. Computing Extended Maximum Likelihood Estimates for Linear Parameter Models. J R Stat Soc Series B. 1991 53(2):417–26.

Bull SB, Lewinger JP, Lee SSF. Confidence intervals for multinomial logistic regression in sparse data. Stat Med. 2007 26(4):903–18.

Bull SB, Mak C, Greenwood CMT. A modified score function estimator for multinomial logistic regression in small samples. Comput Stat Data Anal. 2002 39:57–74.

Steyerberg EW, Eijkemans MJ, Harrell FE, Habbema JDF. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med. 2000 19(8):1059–79.

Ambler G, Brady AR, Royston P. Simplifying a prognostic model: a simulation study based on clinical data. Stat Med. 2002 21(24):3803–22.

### Lipid Profiles and Heart Failure Risk

• Clemens Wittenbecher
• , Fabian Eichelmann
• , Estefanía Toledo
• , Marta Guasch-Ferré
• , Miguel Ruiz-Canela
• , Jun Li
• , Fernando Arós
• , Chih-Hao Lee
• , Liming Liang