# 8.2.1: Simulating Multi-step Experiments

## Lesson

Let's simulate more complicated events.

Exercise (PageIndex{1}): Notice and Wonder: Ski Business

What do you notice? What do you wonder?  Exercise (PageIndex{2}): Alpine Zoom

Alpine Zoom is a ski business. To make money over spring break, they need it to snow at least 4 out of the 10 days. The weather forecast says there is a (frac{1}{3}) chance it will snow each day during the break.

Use the applet to simulate the weather for 10 days of break to see if Alpine Zoom will make money.

1. Describe a chance experiment that you could use to simulate whether it will snow on the first day of break.
2. How could this chance experiment be used to determine whether Alpine Zoom will make money?
• In each trial, spin the spinner 10 times, and then record the number of 1’s that appeared in the row.
• The applet reports if the Alpine Zoom will make money or not in the last column.
• Click Next to get the spin button back to start the next simulation.
3. Based on your simulations, estimate the probability that Alpine Zoom will make money.

Exercise (PageIndex{3}): Kiran's Game

Kiran invents a game that uses a board with alternating black and white squares. A playing piece starts on a white square and must advance 4 squares to the other side of the board within 5 turns to win the game. For each turn, the player draws a block from a bag containing 2 black blocks and 2 white blocks. If the block color matches the color of the next square on the board, the playing piece moves onto it. If it does not match, the playing piece stays on its current square.

1. Take turns playing the game until each person in your group has played the game twice.
2. Use the results from all the games your group played to estimate the probability of winning Kiran’s game.
3. Do you think your estimate of the probability of winning is a good estimate? How could it be improved?

How would each of these changes, on its own, affect the probability of winning the game?

1. Change the rules so that the playing piece must move 7 spaces within 8 moves.
2. Change the board so that all the spaces are black.
3. Change the blocks in the bag to 3 black blocks and 1 white block.

Exercise (PageIndex{4}): Simulation Nation

Match each situation to a simulation.

Situations:

1. In a small lake, 25% of the fish are female. You capture a fish, record whether it is male or female, and toss the fish back into the lake. If you repeat this process 5 times, what is the probability that at least 3 of the 5 fish are female?
2. Elena makes about 80% of her free throws. Based on her past successes with free throws, what is the probability that she will make exactly 4 out of 5 free throws in her next basketball game?
3. On a game show, a contestant must pick one of three doors. In the first round, the winning door has a vacation. In the second round, the winning door has a car. What is the probability of winning a vacation and a car?
4. Your choir is singing in 4 concerts. You and one of your classmates both learned the solo. Before each concert, there is an equal chance the choir director will select you or the other student to sing the solo. What is the probability that you will be selected to sing the solo in exactly 3 of the 4 concerts?

Simulations:

1. Toss a standard number cube 2 times and record the outcomes. Repeat this process many times and find the proportion of the simulations in which a 1 or 2 appeared both times to estimate the probability.
2. Make a spinner with four equal sections labeled 1, 2, 3, and 4. Spin the spinner 5 times and record the outcomes. Repeat this process many times and find the proportion of the simulations in which a 4 appears 3 or more times to estimate the probability.
3. Toss a fair coin 4 times and record the outcomes. Repeat this process many times, and find the proportion of the simulations in which exactly 3 heads appear to estimate the probability.
4. Place 8 blue chips and 2 red chips in a bag. Shake the bag, select a chip, record its color, and then return the chip to the bag. Repeat the process 4 more times to obtain a simulated outcome. Then repeat this process many times and find the proportion of the simulations in which exactly 4 blues are selected to estimate the probability.

### Summary

The more complex a situation is, the harder it can be to estimate the probability of a particular event happening. Well-designed simulations are a way to estimate a probability in a complex situation, especially when it would be difficult or impossible to determine the probability from reasoning alone.

To design a good simulation, we need to know something about the situation. For example, if we want to estimate the probability that it will rain every day for the next three days, we could look up the weather forecast for the next three days. Here is a table showing a weather forecast:

today (Tuesday)WednesdayThursdayFriday
probability of rain(0.2)(0.4)(0.5)(0.9)
Table (PageIndex{1})

We can set up a simulation to estimate the probability of rain each day with three bags.

• In the first bag, we put 4 slips of paper that say “rain” and 6 that say “no rain.”
• In the second bag, we put 5 slips of paper that say “rain” and 5 that say “no rain.”
• In the third bag, we put 9 slips of paper that say “rain” and 1 that says “no rain.”

Then we can select one slip of paper from each bag and record whether or not there was rain on all three days. If we repeat this experiment many times, we can estimate the probability that there will be rain on all three days by dividing the number of times all three slips said “rain” by the total number of times we performed the simulation.

## Practice

Exercise (PageIndex{5})

Priya’s cat is pregnant with a litter of 5 kittens. Each kitten has a 30% chance of being chocolate brown. Priya wants to know the probability that at least two of the kittens will be chocolate brown.

To simulate this, Priya put 3 white cubes and 7 green cubes in a bag. For each trial, Priya pulled out and returned a cube 5 times. Priya conducted 12 trials.

Here is a table with the results.

trial numberoutcome
1ggggg
2gggwg
3wgwgw
4gwggg
5gggwg
6wwggg
7gwggg
8ggwgw
9wwwgg
10ggggw
11wggwg
12gggwg
Table (PageIndex{2})
1. How many successful trials were there? Describe how you determined if a trial was a success.
2. Based on this simulation, estimate the probability that exactly two kittens will be chocolate brown.
3. Based on this simulation, estimate the probability that at least two kittens will be chocolate brown.
4. Write and answer another question Priya could answer using this simulation.
5. How could Priya increase the accuracy of the simulation?

Exercise (PageIndex{6})

A team has a 75% chance to win each of the 3 games they will play this week. Clare simulates the week of games by putting 4 pieces of paper in a bag, 3 labeled “win” and 1 labeled “lose.” She draws a paper, writes down the result, then replaces the paper and repeats the process two more times. Clare gets the result: win, win, lose. What can Clare do to estimate the probability the team will win at least 2 games?

Exercise (PageIndex{7})

1. List the sample space for selecting a letter a random from the word “PINEAPPLE.”
2. A letter is randomly selected from the word “PINEAPPLE.” Which is more likely, selecting “E” or selecting “P?” Explain your reasoning.

(From Unit 8.1.5)

Exercise (PageIndex{8})

On a graph of side length of a square vs. its perimeter, a few points are plotted.

1. Add at least two more ordered pairs to the graph. 2. Is there a proportional relationship between the perimeter and side length? Explain how you know.

(From Unit 2.4.2)

## 8.2.1: Simulating Multi-step Experiments

Learning Targets

The more complex a situation is, the harder it can be to estimate the probability of a particular event happening. Well-designed simulations are a way to estimate a probability in a complex situation, especially when it would be difficult or impossible to determine the probability from reasoning alone.

To design a good simulation, we need to know something about the situation. For example, if we want to estimate the probability that it will rain every day for the next three days, we could look up the weather forecast for the next three days. Here is a table showing a weather forecast:

We can set up a simulation to estimate the probability of rain each day with three bags.

• In the first bag, we put 4 slips of paper that say “rain” and 6 that say “no rain.”
• In the second bag, we put 5 slips of paper that say “rain” and 5 that say “no rain.”
• In the third bag, we put 9 slips of paper that say “rain” and 1 that says “no rain.”

Then we can select one slip of paper from each bag and record whether or not there was rain on all three days. If we repeat this experiment many times, we can estimate the probability that there will be rain on all three days by dividing the number of times all three slips said “rain” by the total number of times we performed the simulation.

## Lesson 7

In this lesson, students see that compound events can be simulated by using multiple chance experiments. In this case, it is important to communicate precisely what represents one outcome of the simulation (MP6). For example, if we want to know the probability that a family with three children will have at least one girl, we can toss one coin to represent each child and use each set of three coin tosses to represent one family. Therefore, if we toss a coin 30 times, we will have run this simulation only 10 times.

Students continue to consider how a real-world situation can be represented using simulation (MP4).

### Learning Goals

Let’s simulate more complicated events.

### Required Preparation

Print and cut up spinners from the Alpine Zoom blackline master. One spinner for each group of 3 students.

For the Kiran’s Game activity, a paper bag containing 4 snap cubes (2 black and 2 white) is needed for every 3 students.

Other simulation tools (number cubes, bags with colored snap cubes, etc.) should be available.

### Print Formatted Materials

IM 6–8 Math was originally developed by Open Up Resources and authored by Illustrative Mathematics®, and is copyright 2017-2019 by Open Up Resources. It is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). OUR's 6–8 Math Curriculum is available at https://openupresources.org/math-curriculum/.

The second set of English assessments (marked as set "B") are copyright 2019 by Open Up Resources, and are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

Spanish translation of the "B" assessments are copyright 2020 by Illustrative Mathematics, and are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

The Illustrative Mathematics name and logo are not subject to the Creative Commons license and may not be used without the prior and express written consent of Illustrative Mathematics.

## Lesson 7

In this lesson, students see that compound events can be simulated by using multiple chance experiments. In this case, it is important to communicate precisely what represents one outcome of the simulation (MP6). For example, if we want to know the probability that a family with three children will have at least one girl, we can toss one coin to represent each child and use each set of three coin tosses to represent one family. Therefore, if we toss a coin 30 times, we will have run this simulation only 10 times.

Students continue to consider how a real-world situation can be represented using simulation (MP4).

### Learning Goals

Let’s simulate more complicated events.

### Required Preparation

Print and cut up spinners from the Alpine Zoom blackline master. One spinner for each group of 3 students.

For the Kiran’s Game activity, a paper bag containing 4 snap cubes (2 black and 2 white) is needed for every 3 students.

Other simulation tools (number cubes, bags with colored snap cubes, etc.) should be available.

### Print Formatted Materials

For access, consult one of our IM Certified Partners.

For access, consult one of our IM Certified Partners.

For access, consult one of our IM Certified Partners.

IM 6–8 Math was originally developed by Open Up Resources and authored by Illustrative Mathematics®, and is copyright 2017-2019 by Open Up Resources. It is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). OUR's 6–8 Math Curriculum is available at https://openupresources.org/math-curriculum/.

The second set of English assessments (marked as set "B") are copyright 2019 by Open Up Resources, and are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

Spanish translation of the "B" assessments are copyright 2020 by Illustrative Mathematics, and are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

The Illustrative Mathematics name and logo are not subject to the Creative Commons license and may not be used without the prior and express written consent of Illustrative Mathematics.

##### Problem 1

Priya’s cat is pregnant with a litter of 5 kittens. Each kitten has a 30% chance of being chocolate brown. Priya wants to know the probability that at least two of the kittens will be chocolate brown.

To simulate this, Priya put 3 white cubes and 7 green cubes in a bag. For each trial, Priya pulled out and returned a cube 5 times. Priya conducted 12 trials.

Here is a table with the results.

1) How many successful trials were there?

2) Describe how you determined if a trial was a success.

3) Based on this simulation, estimate the probability that exactly two kittens will be chocolate brown.

4) Based on this simulation, estimate the probability that at least two kittens will be chocolate brown.

5) Write and answer another question Priya could answer using this simulation.

6) How could Priya increase the accuracy of the simulation?

##### Problem 2

7) A team has a 75% chance to win each of the 3 games they will play this week. Clare simulates the week of games by putting 4 pieces of paper in a bag, 3 labeled “win” and 1 labeled “lose.” She draws a paper, writes down the result, then replaces the paper and repeats the process two more times. Clare gets the result: win, win, lose. What can Clare do to estimate the probability the team will win at least 2 games?

## R-bloggers

Posted on December 31, 2018 by Health Economics with R in R bloggers | 0 Comments

### Introduction

Multi-state models are used to model a trajectory through multiple states. Survival models are a special case in which there are two states, alive and dead. Multi-state models are therefore useful in clinical settings because they can be used to predict or simulate disease progression in detail. Putter et al. provide a helpful tutorial.

In this post, we will consider “clock-reset” (i.e., semi-Markov) models rather than “clock-forward” (i.e., Markov) models. In a “clock-reset” model, time refers to time since entering the most recent state, whereas in a “clock-forward” model time refers to time since entering the initial state. When using a “clock-reset” approach, state occupancy probabilities can only be computed in a general fashion via simulation.

The analysis will be restricted to parametric models, which are useful for extrapolating beyond the the time horizon in the existing data. Example probability distributions include the exponential, Weibull, Gompertz, gamma, log-logistic, lognormal, and generalized gamma.

The flexsurv package will be used to estimate the parametric models and the mstate and hesim (admittedly developed by me) packages will be used to simulate the estimated models. We will compare the computational efficiency of different simulation methods.

### An example 6-state model

To illustrate, we will follow Wreede et al. and use a 6-state model for leukemia patients following bone marrow transplantation (see figure below). The six states are (1) Transplant (Tx), (2) Recovery (Rec), (3) Adverse Event (AE), (4) AE and Rec, (5) Relapse (Rel), and (6) Death. The following 12 transitions are possible.

1. Tx to Rec
2. Tx to AE
3. Tx to Rel
4. Tx to Death
5. Rec to AE and Rec
6. Rec to Rel
7. Rec to Death
8. AE to AE and Rec
9. AE to Rel
10. AE to Death
11. AE and Rec to Rel
12. AE and Rec to Death The transitions can be characterized with a 6 x 6 transition matrix, which is a square-matrix where the (i,j) element is a positive integer if a transition from i to j is possible and NA otherwise.

### Estimation

Parametric multi-state models can be fit using flexsurv and both non-parametric and semi-parametric models can be fit with mstate . In our analysis, we will fit a parametric model to the ebmt4 dataset from the mstate package. For additional information on model fitting and multi-state data beyond what is provided in this post, I recommend the mstate and flexsurv articles published in the Journal of Statistical Software.

The ebmt4 dataset is in a “wide” format, which is not suitable for multi-state modeling. Luckily, the mstate package contains a helper function, mstate::msprep() , which can convert data in wide format to a suitable “long” format.

Notice that the data is set up so that there is a time-to-event for each permitted transition from a given state. In the msebmt dataset, this variable is time , which measures the time elapsed (in days) from Tstart to Tstop . The dataset also contains a variable named status , which denotes whether the transition is observed ( status = 1 ) or whether it was censored ( status = 0 ). From a given state, only one transition is observed and all others are censored.

For example, patient 2 began in state 1 (Tx) at time 0 . From state 1, there are 4 possible transitions: transition 2 (Tx to AE) was observed at time 12 while transition 1 (Tx to Rec), transition 3 (Tx to Rel), and transition 4 (Tx to Death) were censored. The patient remained in state 3 until time 29 , when a transition to state 4 (AE and Rec) occurred and the transitions to states 5 (Rel) and 6 (Death) were censored. Patient 2 subsequently remained in state 4 until entering state 5 (Rel) at time 422 , at which time transition 12 (AE and Rec to Death) was censored.

#### Fitting

Separate hazard functions, $lambda_(t|Z)$ are estimated for each possible transition from state $r$ to state $s$ as a function of time $t$ and covariates $Z$. In “clock-reset” models, the hazard function depends on elapsed time since entering state $r$, or simply Tstop - Tstart . The hazard functions can be estimated using a joint model with patient and transition interaction terms or by fitting separate models for each transition.

Here we illustrate by fitting 12 transition-specific Weibull models with a yearly time scale. The shape parameter for each model does not depend on covariates whereas the scale parameter depends on four prognostic factors known at baseline: (1) an indicator for whether the donor is a gender mismatch ( match ), prophylaxis (yes or no) ( proph ), (3) the year of the transplant (1985-1989, 1990-1994, 1995-1998) ( year ), and (4) age at transplant in years ( agecl ).

### Simulation

We first simulate the model using the maximum likelihood estimates of the regression coefficients that is, we assume that there is no parameter uncertainty. Outcomes will be simulated for patient 2 with the covariate profile displayed below.

State occupancy probabilities will be computed from baseline to year 10.

#### Mstate

Multi-state models can be simulated using mstate::mssample() , which simulates state occupancy probabilities from predicted cumulative hazards. flexsurv can be used to predict cumulative hazards for a given patient (i.e., patient 2) given a covariate profile. When predicting the cumulative hazards, it is critical that the time grid (the t argument) is not too coarse. A time step of .01 is used for the time grid, which (after trial and error) was deemed to be sufficiently accurate — simulations with coarser grids differed significantly from simulations based on continuous time using hesim . Finer grids would further increase accuracy but at the cost of slower simulation times.

The mstate::mssample() function works by sampling survival times from each possible transition from the cumulative hazards. More precisely, the cumulative hazards are used to simulate the (discrete) times at which patients transition between health states using the base R function sample() , which is, in turn, used to count the number of patients in each health state at the times specified by the argument tvec .

The function below uses mstate::mssample() to simulate state occupancy probabilities with a “clock-reset” model at the times specified in yr_grid .

#### Hesim

hesim's approach to simulating multi-state models differs from mstate's in a couple of ways. First, if parametric models are estimated, then hesim samples survival times directly from parametric probability distributions (e.g., the Weibull distribution). This increases accuracy (since no discrete time approximation is required) and speed (since it is considerably faster to sample from probability distributions with known functional forms than by sampling from cumulative hazards with sample() ). Second, the simulation code is vectorized (i.e., the simulation code is written in C++ by leveraging Rcpp) across heterogeneous patients and treatment strategies. In other words, hesim can be used to quickly simulate multiple covariate profiles and treatment alternatives.

We set up input data for the simulation by creating a dataset of many identical patients each with the covariate profile of patient 2 and a single treatment strategy (i.e., bone marrow transplantation). An example dataset of 1,000 identical patients is displayed. (Note that hesim uses data.table to increase speed.)

hesim combines the input data with the model fit from flexsurvreg() to set up a disease model—specifically, an individual-level continuous time state transition model (CTSTM). Similar to mstate::mssample , the $sim_stateprobs() function simulates state occupancy probabilities by first simulating a trajectory through the multi-state model for each patient and treatment strategy combination and then counting the number of simulated patients (for each treatment strategy) in each state over time. #### Comparison We simulate state occupancy probabilities using both mstate and hesim and vary the number of simulated patients to examine the impact on precision. As shown in the plot below, the differences in state occupancy probabilities generally become smaller as the number of simulated patients increases. Further, (although not shown) the simulation results from mstate become increasingly close to hesim as the time grid becomes finer. Since 1,000 iterations generate reasonably accurate estimates, we will assess speed by simulating 1,000 patients. The elapsed times (in seconds) suggest that hesim is 671.41 faster than mstate when simulating a parametric multi-state model. ### Probabilistic sensitivity analysis The results presented so far have ignored the impact of parameter uncertainty. In contrast, probabilistic sensitivity analysis (PSA) propagates uncertainty in the parameters to the state occupancy probabilities. In our case, the regression coefficients from the multi-state model are drawn from a suitable probability distribution and the multi-state simulation is run for each draw of the coefficients. There are a number of ways to simulate the distribution of the coefficients including bootstrapping, Bayesian modeling, and asymptotic Monte Carlo approximation. We will take the latter approach by sampling the maximum likelihood estimates from an asymptotic multivariate distribution, which is the fastest option. Although sampling from a multivariate normal distribution is not computationally intensive, repeatedly rerunning the simulation for each draw of the regression coefficients is. In this section, we show how to perform PSA using both mstate and hesim and compare performance. #### Mstate PSA can be performed using mstate::mssample() by looping through a distribution of cumulative hazards for a covariate profile and running mstate::mssample() for each iteration of the loop. The most straightforward way to predict a distribution of cumulative hazards is with the CTSTMs from the hesim package. The$cumhazard() function predicts cumulative hazards by transition number, parameter sample, treatment strategy, patient, and time. In our example, we predict cumulative hazards for a single patient (id = 2) and treatment strategy.

A distribution of simulated state occupancy probabilities for patient 2 can then be simulated by looping over the cumulative hazards for each parameter sample.

#### Hesim

With hesim , the entire analysis is inherently Bayesian so PSA is seamless. When creating a CTSTM from a flexsurvreg object, the user must simply set the argument point_estimate = FALSE and choose the number of samples of the parameters to draw. The distribution of the regression coefficients is then drawn by sampling from the multivariate normal distribution. Furthermore, the PSA is vectorized since the loops over the parameter samples are written in C++.

#### Comparison

In our comparisons, we continue to simulate 1,000 patients. With hesim , we consider both 100 and 1,000 parameter samples with mstate we restrict the analysis to 100 parameter samples because it becomes increasingly slow as the number of parameter samples increases.

hesim is fast, even when performing a PSA. With 1,000 patients, 100 PSA iterations can be simulated in less than a second and 1,000 PSA iterations in approximately 5 seconds. Computational efficiency becomes increasingly important as the computational demands grow: with 1,000 patients and 100 PSA iterations, mstate takes around 34 minutes to run and hesim is 4633.57 times faster.

### Conclusion

In this post, we describe some of the R packages that facilitate multi-state modeling. The flexsurv package is particularly useful for estimating parametric models. When “clock-reset” models are fit, state occupancy probabilities can only be predicted for general multi-state models using simulation. Both the hesim and mstate packages provide functionality for running such simulations. However, when parametric models are fit, hesim is considerably faster and this computational advantage grows with the number of required iterations. The speed advantage may be particularly useful when running a PSA, multiple subgroup analyses, and/or simulations using competing survival distributions.

## Generate the Sample Space Using a Table to Form the Cartesian Product

When there are only two stages in an experiment, a common way to list the possible outcomes is to form a Cartesian Product. This is similar to when you formed sets of ordered pairs in algebra when you graphed on a Cartesian coordinate plane. In algebra, we used a horizontal and vertical axis where the horizontal axis represented the first values, x, and the vertical axis represented the second values, y, in ordered pairs, (x, y). Here, we create a table to aid us in forming the Cartesian Product where the rows represent the possible outcomes for the first experiment (down the side) and the columns represent the possible outcomes for the second experiment (across the top). Like this: To form the Cartesian Product, we list in each interior cell of the table, the ordered pair that results from the outcome listed at the side followed by the outcomes listed at the top. The sample space is the set of outcomes listed in the shaded cells,

Note that we have twelve possible outcomes in this sample space for this two-stage experiment, which follows from the Fundamental Principle of Counting 2 · 6 = 12..

## Simulating the Multi-Stage Environment for Single-Stage Compressor Experiments

Place, JMM, Howard, MA, & Cumpsty, NA. "Simulating the Multi-Stage Environment for Single-Stage Compressor Experiments." Proceedings of the ASME 1995 International Gas Turbine and Aeroengine Congress and Exposition. Volume 1: Turbomachinery. Houston, Texas, USA. June 5–8, 1995. V001T01A046. ASME. https://doi.org/10.1115/95-GT-187

The performance of a single-stage low-speed compressor has been measured both before and after the introduction of certain features of the multi-stage flow environment. The aim is to make the single-stage rig more appropriate for developing design rules for multi-stage compressors. End-wall blockage was generated by teeth on the hub and casing upstream of the rotor. A grid fitted upstream produced free-stream turbulence at rotor inlet typical of multi-stage machines and raised stage efficiency by 1.8% at the design point. The potential field that would be generated by blade rows downstream of an embedded stage was replicated by introducing a pressure loss screen at stage exit. This reduced the stator hub corner separation and increased the rotor pressure rise at flow rates below design, changing the shape of the pressure-rise characteristic markedly. These results highlight the importance of features of the flow environment that are often omitted from single-stage experiments and offer improved understanding of stage aerodynamics.

ISHIKAWA, T. (1982): Prevention against lightning accidents in Japan. Nihon Univ. J. Med. 24: 1–14.

ISHIKAWA, T., MIYAZAWA, T., OHASHI, M., HOSOMI, Y., FUJISHIRO, Y., MUTO, T., KITAGAWA, N., TSURUMI, S. and KINOSHITA, K. (1978): Threshold value of lethal energy of artificial lightning discharge applied on rabbits. J. Toden Hosp. 8: 89–100 (in Japanese).

ISHIKAWA, T., MIYAZAWA, T., OHASHI, M., MUTO, T., KITAGAWA, N., TAKAGI, K., KINOSHITA, K. and TSURUMI, S. (1981): Experimental studies on the effect of artificial respiration after lightning discharge. Res. Exp. Med. (Berl.) 179: 59–68.

KITAGAWA, N. (1972): The nature of lightning incidence on the human bodies: Basis for their protection. Proceedings of the Institute of Electrostatics Japan, 17: 239–241.

KITAGAWA, N., BROOK, M. and WORKMAN, E. J. (1962): Continuing currents in cloud-to-ground lightning discharge. J. Geophysical Research, 67: 637–647.

KITAGAWA, N., KINOSHITA, K. and ISHIKAWA, T. (1972): Discharge experiments using dummies and rabbits simulating lightning strokes on human bodies. Int. J. Biometeor. 17: 239–241.

NAGAI, N., ISHIKAWA, T., OHASHI, M. and KITAGAWA, N. (1982): Study of lethal effects of multiple-stroke flash. Research letters on Atmospheric Electricity, 2: 87–90.

OHASHI, M., HOSOMI, Y., FUJISHIRO, Y. and MUTO, T. (1978): Threshold value of lethal energy of electric discharge to rats. J. Toden Hosp. 8: 71–79 (in Japanese).

OHASHI, M., HOSOMI, Y. and FUJISHIRO, Y. (1981a): Lethal threshold energy of artificial lightning applied on rats: comparison of lethal energy of rats and rabbits. J. Toden Hosp. 10–11: 41–50 (in Japanese).

OHASHI, M., HOSOMI, Y., FUJISHIRO, Y., ISHIKAWA, T., MIYAZAWA, T., KITAGAWA, N., TSURUMI, S., KINOSHITA, K., NAGAI, Y. and TAKAGI, K. (1981b): Experimental studies of resuscitation for rabbit after artificial lightning discharge. J. Toden Hosp. 10–11: 51–61.

OHASHI, M., KIMURA, T. and KIKUCHI, K. (1981c): Autopsies of rabbit subjected by artificial lightning stroke. J. Toden Hosp. 10–11: 63–73 (in Japanese).

## R-bloggers

Posted on December 31, 2018 by Devin Incerti in R bloggers | 0 Comments

### Introduction

Multi-state models are used to model a trajectory through multiple states. Survival models are a special case in which there are two states, alive and dead. Multi-state models are therefore useful in clinical settings because they can be used to predict or simulate disease progression in detail. Putter et al. provide a helpful tutorial.

In this post, we will consider “clock-reset” (i.e., semi-Markov) models rather than “clock-forward” (i.e., Markov) models. In a “clock-reset” model, time refers to time since entering the most recent state, whereas in a “clock-forward” model time refers to time since entering the initial state. When using a “clock-reset” approach, state occupancy probabilities can only be computed in a general fashion via simulation.

The analysis will be restricted to parametric models, which are useful for extrapolating beyond the the time horizon in the existing data. Example probability distributions include the exponential, Weibull, Gompertz, gamma, log-logistic, lognormal, and generalized gamma.

The flexsurv package will be used to estimate the parametric models and the mstate and hesim (admittedly developed by me) packages will be used to simulate the estimated models. We will compare the computational efficiency of different simulation methods.

### An example 6-state model

To illustrate, we will follow Wreede et al. and use a 6-state model for leukemia patients following bone marrow transplantation (see figure below). The six states are (1) Transplant (Tx), (2) Recovery (Rec), (3) Adverse Event (AE), (4) AE and Rec, (5) Relapse (Rel), and (6) Death. The following 12 transitions are possible.

1. Tx to Rec
2. Tx to AE
3. Tx to Rel
4. Tx to Death
5. Rec to AE and Rec
6. Rec to Rel
7. Rec to Death
8. AE to AE and Rec
9. AE to Rel
10. AE to Death
11. AE and Rec to Rel
12. AE and Rec to Death The transitions can be characterized with a 6 x 6 transition matrix, which is a square-matrix where the (i,j) element is a positive integer if a transition from i to j is possible and NA otherwise.

### Estimation

Parametric multi-state models can be fit using flexsurv and both non-parametric and semi-parametric models can be fit with mstate . In our analysis, we will fit a parametric model to the ebmt4 dataset from the mstate package. For additional information on model fitting and multi-state data beyond what is provided in this post, I recommend the mstate and flexsurv articles published in the Journal of Statistical Software.

The ebmt4 dataset is in a “wide” format, which is not suitable for multi-state modeling. Luckily, the mstate package contains a helper function, mstate::msprep() , which can convert data in wide format to a suitable “long” format.

Notice that the data is set up so that there is a time-to-event for each permitted transition from a given state. In the msebmt dataset, this variable is time , which measures the time elapsed (in days) from Tstart to Tstop . The dataset also contains a variable named status , which denotes whether the transition is observed ( status = 1 ) or whether it was censored ( status = 0 ). From a given state, only one transition is observed and all others are censored.

For example, patient 2 began in state 1 (Tx) at time 0 . From state 1, there are 4 possible transitions: transition 2 (Tx to AE) was observed at time 12 while transition 1 (Tx to Rec), transition 3 (Tx to Rel), and transition 4 (Tx to Death) were censored. The patient remained in state 3 until time 29 , when a transition to state 4 (AE and Rec) occurred and the transitions to states 5 (Rel) and 6 (Death) were censored. Patient 2 subsequently remained in state 4 until entering state 5 (Rel) at time 422 , at which time transition 12 (AE and Rec to Death) was censored.

#### Fitting

Separate hazard functions, $lambda_(t|Z)$ are estimated for each possible transition from state $r$ to state $s$ as a function of time $t$ and covariates $Z$. In “clock-reset” models, the hazard function depends on elapsed time since entering state $r$, or simply Tstop - Tstart . The hazard functions can be estimated using a joint model with patient and transition interaction terms or by fitting separate models for each transition.

Here we illustrate by fitting 12 transition-specific Weibull models with a yearly time scale. The shape parameter for each model does not depend on covariates whereas the scale parameter depends on four prognostic factors known at baseline: (1) an indicator for whether the donor is a gender mismatch ( match ), prophylaxis (yes or no) ( proph ), (3) the year of the transplant (1985-1989, 1990-1994, 1995-1998) ( year ), and (4) age at transplant in years ( agecl ).

### Simulation

We first simulate the model using the maximum likelihood estimates of the regression coefficients that is, we assume that there is no parameter uncertainty. Outcomes will be simulated for patient 2 with the covariate profile displayed below.

State occupancy probabilities will be computed from baseline to year 10.

#### Mstate

Multi-state models can be simulated using mstate::mssample() , which simulates state occupancy probabilities from predicted cumulative hazards. flexsurv can be used to predict cumulative hazards for a given patient (i.e., patient 2) given a covariate profile. When predicting the cumulative hazards, it is critical that the time grid (the t argument) is not too coarse. A time step of .01 is used for the time grid, which (after trial and error) was deemed to be sufficiently accurate — simulations with coarser grids differed significantly from simulations based on continuous time using hesim . Finer grids would further increase accuracy but at the cost of slower simulation times.

The mstate::mssample() function works by sampling survival times from each possible transition from the cumulative hazards. More precisely, the cumulative hazards are used to simulate the (discrete) times at which patients transition between health states using the base R function sample() , which is, in turn, used to count the number of patients in each health state at the times specified by the argument tvec .

The function below uses mstate::mssample() to simulate state occupancy probabilities with a “clock-reset” model at the times specified in yr_grid .

#### Hesim

hesim's approach to simulating multi-state models differs from mstate's in a couple of ways. First, if parametric models are estimated, then hesim samples survival times directly from parametric probability distributions (e.g., the Weibull distribution). This increases accuracy (since no discrete time approximation is required) and speed (since it is considerably faster to sample from probability distributions with known functional forms than by sampling from cumulative hazards with sample() ). Second, the simulation code is vectorized (i.e., the simulation code is written in C++ by leveraging Rcpp) across heterogeneous patients and treatment strategies. In other words, hesim can be used to quickly simulate multiple covariate profiles and treatment alternatives.

We set up input data for the simulation by creating a dataset of many identical patients each with the covariate profile of patient 2 and a single treatment strategy (i.e., bone marrow transplantation). An example dataset of 1,000 identical patients is displayed. (Note that hesim uses data.table to increase speed.)

hesim combines the input data with the model fit from flexsurvreg() to set up a disease model—specifically, an individual-level continuous time state transition model (CTSTM). Similar to mstate::mssample , the $sim_stateprobs() function simulates state occupancy probabilities by first simulating a trajectory through the multi-state model for each patient and treatment strategy combination and then counting the number of simulated patients (for each treatment strategy) in each state over time. #### Comparison We simulate state occupancy probabilities using both mstate and hesim and vary the number of simulated patients to examine the impact on precision. As shown in the plot below, the differences in state occupancy probabilities generally become smaller as the number of simulated patients increases. Further, (although not shown) the simulation results from mstate become increasingly close to hesim as the time grid becomes finer. Since 1,000 iterations generate reasonably accurate estimates, we will assess speed by simulating 1,000 patients. The elapsed times (in seconds) suggest that hesim is 671.41 times faster than mstate when simulating a parametric multi-state model. ### Probabilistic sensitivity analysis The results presented so far have ignored the impact of parameter uncertainty. In contrast, probabilistic sensitivity analysis (PSA) propagates uncertainty in the parameters to the state occupancy probabilities. In our case, the regression coefficients from the multi-state model are drawn from a suitable probability distribution and the multi-state simulation is run for each draw of the coefficients. There are a number of ways to simulate the distribution of the coefficients including bootstrapping, Bayesian modeling, and asymptotic Monte Carlo approximation. We will take the latter approach by sampling the maximum likelihood estimates from an asymptotic multivariate distribution, which is the fastest option. Although sampling from a multivariate normal distribution is not computationally intensive, repeatedly rerunning the simulation for each draw of the regression coefficients is. In this section, we show how to perform PSA using both mstate and hesim and compare performance. #### Mstate PSA can be performed using mstate::mssample() by looping through a distribution of cumulative hazards for a covariate profile and running mstate::mssample() for each iteration of the loop. The most straightforward way to predict a distribution of cumulative hazards is with the CTSTMs from the hesim package. The$cumhazard() function predicts cumulative hazards by transition number, parameter sample, treatment strategy, patient, and time. In our example, we predict cumulative hazards for a single patient (id = 2) and treatment strategy.

A distribution of simulated state occupancy probabilities for patient 2 can then be simulated by looping over the cumulative hazards for each parameter sample.

#### Hesim

With hesim , the entire analysis is inherently Bayesian so PSA is seamless. When creating a CTSTM from a flexsurvreg object, the user must simply set the argument point_estimate = FALSE and choose the number of samples of the parameters to draw. The distribution of the regression coefficients is then drawn by sampling from the multivariate normal distribution. Furthermore, the PSA is vectorized since the loops over the parameter samples are written in C++.

#### Comparison

In our comparisons, we continue to simulate 1,000 patients. With hesim , we consider both 100 and 1,000 parameter samples with mstate we restrict the analysis to 100 parameter samples because it becomes increasingly slow as the number of parameter samples increases.

hesim is fast, even when performing a PSA. With 1,000 patients, 100 PSA iterations can be simulated in less than a second and 1,000 PSA iterations in approximately 5 seconds. Computational efficiency becomes increasingly important as the computational demands grow: with 1,000 patients and 100 PSA iterations, mstate takes around 34 minutes to run and hesim is 4633.57 times faster.

### Conclusion

In this post, we describe some of the R packages that facilitate multi-state modeling. The flexsurv package is particularly useful for estimating parametric models. When “clock-reset” models are fit, state occupancy probabilities can only be predicted for general multi-state models using simulation. Both the hesim and mstate packages provide functionality for running such simulations. However, when parametric models are fit, hesim is considerably faster and this computational advantage grows with the number of required iterations. The speed advantage may be particularly useful when running a PSA, multiple subgroup analyses, and/or simulations using competing survival distributions.

## Multi-step forming simulation and experiment of swing arms for torsion beam Numerical simulations of bending, performing, and hydroforming processes of automotive swing arms for torsion beam were conducted using finite element analysis. The process parameters such as bending radius, mandrels, loading paths, and calibration pressure were optimized in the simulation. In addition, closing force was predicted by die stress&ndashstrain analysis using the elastic finite element method. Subsequently, the bending, performing, and hydroforming experiments were performed based on the simulation results. The experimental results indicated that swing arms could be fabricated when the bending radius, calibration pressure, and closing force would be of 120 mm, 150 MPa, and 15,000 kN, respectively. Finally, swing arms with high thickness uniformity were obtained and the thickness distribution between numerical and experimental results was well consistent.

### Journal

The International Journal of Advanced Manufacturing Technology &ndash Springer Journals