[ad_1]

**Part 3: Histograms & Density Plots**

A Monte Carlo Simulation (MCS) is a sampling experiment whose aim is to estimate the distribution of a quantity of interest that depends on one or more stochastic input variables.

The development of an MCS involves **three basic steps**:

1. Set up a predictive model clearly identifying the dependent variable (outcome, unknown quantity of interest, function to be evaluated) and the independent variables (random input variables).

2. Identify the probability distributions of the input variables. The standard procedure involves analyzing empirical or historical data and fitting the data to a particular distribution. When such data is not available, select an appropriate probability distribution and choose wisely its parameters.

3. Repeat the simulation a significant number of times obtaining random values of the output or dependent variable. Use statistical methods to calculate some descriptive statistical measures and draw charts such as histograms or density plots of the outcome variable.

As indicated in a previous article, we can use a Monte Carlo simulation to quantitatively account for risk in business or operations management tasks.

A practical application of a Monte Carlo Simulation includes manufacturing operations where time between machine failures and their corresponding repair times are random variables.

Assume a big factory where a processing unit runs 24 hours per day, 335 days per year (30 days for personnel vacations and yearly maintenance). Management wants to buy another processing unit to elaborate on another final product. They know that such processing units incur in random mechanical failures that require a highly skilled repair team. So, every breakdown represents an important cost.

The management decided to conduct a Monte Carlo Simulation to **numerically assess the risk associated with the new project.**

Historical data of the present processing unit shows that the time between machine failures has the following empirical distribution:

The time required to repair such a processing unit has the following empirical distribution:

The following Python code allows us to develop an MCS to numerical assess the risk involved in the new investment:

First, we imported several Python libraries:

*NumPy*: a library for the Python programming language that provides support for creating large multidimensional vectors and matrices, along with a large collection of high-level mathematical functions to operate on them.

*SciPy*: a free and open source library for Python. Based on NumPy, contains modules for statistics, linear algebra, optimization, interpolation, signal processing, and other usual methods for science and engineering.

*Matplotlib* and *Seaborn* are traditional Python libraries for charting.

*PrettyTable* is a Python library for easily displaying tabular data in a visually appealing ASCII table format.

@author: darwt

"""# Import Modules

import numpy as npfrom scipy import stats

from scipy.stats import semimport matplotlib.pyplot as plt

import seaborn as sns

from prettytable import PrettyTableyour_path = 'your path'

Now we indicate the operating hours per year, the hourly cost incurred by the repair team, the empirical distribution for hours of operation between failures, and the empirical distribution needed for repairing.

# initialization moduleOperating_hours_year = 335 * 24

Cost_LostTime_hour = 55.50

# .........................................

# historical data of hours of operation between failures (HbF)

Hours_between_Failures = [200, 400, 600, 800, 1000, 1200, 1400]# discrete probabilities for hours of operation between failures

probabilities_HBF = [0.1, 0.2, 0.1, 0.2, 0.3, 0.05, 0.05]# historical data of hours needed to repair (TTR)

Time_to_Repair = [24, 48, 72, 96, 120]probabilities_TTR =[0.10, 0.20, 0.25, 0.25, 0.20]

We set a huge number of replications, so our sample mean will be very close to the expected value. We also defined the level of confidence for our Confidence Interval (CI).

Number_of_Replications = 2000confidence = 0.95 ## selected by the analyst# Lists to be filled during calculations

list_of_HBF, list_of_TTR = [], []

list_Lost_Time, list_cost_TTR = [], []

list_Hours_Prod = []

The logic behind the MCS is described in the following lines of code. We have two loops: the outer loop (*for run in range(Number_of_Replication)* related to the number of runs or replications; the inner loop (*while acum_time <= Operating_hours_year *) related to the yearly production and breakdowns. We used twice *np*.*random.choice *to generate a random sample of size one from both empirical distributions according to their corresponding probabilities.

`for run in range(Number_of_Replications):`

acum_time, acum_hbf, acum_ttr = 0, 0, 0

while acum_time <= Operating_hours_year:

HBF = np.random.choice(Hours_between_Failures,

1, p=list(probabilities_HBF))

TTR = np.random.choice(Time_to_Repair,

1, p=list(probabilities_TTR))

acum_time += (HBF + TTR)

acum_hbf += HBF

acum_ttr +=TTRTotal_HBF = round(float(acum_hbf),4)

Total_TTR = round(float(acum_ttr),4)

Perc_lost_Time = round(float(Total_TTR/Total_HBF * 100),2)

Hours_Prod = Operating_hours_year - Total_TTR

Cost_TTR = Total_TTR * Cost_LostTime_hour

list_of_HBF.append(Total_HBF)

list_of_TTR.append(Total_TTR)

list_Lost_Time.append(Perc_lost_Time)

list_cost_TTR.append(Cost_TTR)

list_Hours_Prod.append(Hours_Prod)

We used NumPy and SciPy to calculate key descriptive statistical measures, particularly a *Percentile Table *for our principal outcome *Cost_TTR = Total_TTR * Cost_LostTime_hour*. This output variable represents the cost associated with machine failures and the corresponding repair times. media represents the expected value of such cost.

media = round(np.mean(list_cost_TTR),2)

median = round(np.median(list_cost_TTR),2)

var = round(np.var(list_cost_TTR), 2)

stand = round(np.std(list_cost_TTR),2)std_error = round(sem(list_cost_TTR),2)skew = round(stats.skew(list_cost_TTR),2)

kurt = round(stats.kurtosis(list_cost_TTR),2)minimum = round(np.min(list_cost_TTR),2)

maximum = round(np.max(list_cost_TTR),2)dof = Number_of_Replications - 1

t_crit = np.abs(stats.t.ppf((1-confidence)/2,dof))half_width = round(stand *t_crit/np.sqrt(Number_of_Replications),2)

inf = media - half_width

sup = media + half_widthinf = round(float(inf),2)

sup = round(float(sup),2)

We used *PrettyTable *for the statistic report and the percentile report:

t = PrettyTable(['Statistic', 'Value'])

t.add_row(['Trials', Number_of_Replications])

t.add_row(['Mean', media])

t.add_row(['Median', median])

t.add_row(['Variance', var])

t.add_row(['Stand Dev', stand])

t.add_row(['Skewness', skew])

t.add_row(['Kurtosis', kurt])

t.add_row(['Half Width', half_width])

t.add_row(['CI inf', inf])

t.add_row(['CI sup', sup])

t.add_row(['Minimum', minimum])

t.add_row(['Maximum', maximum])

print(t)percents = np.percentile(list_cost_TTR, [10, 20, 30, 40, 50, 60, 70, 80, 90, 100])p = PrettyTable(['Percentile (%)', 'Lost Time Cost'])

p.add_row([' 10 ', percents[0]])

p.add_row([' 20 ', percents[1]])

p.add_row([' 30 ', percents[2]])

p.add_row([' 40 ', percents[3]])

p.add_row([' 50 ', percents[4]])

p.add_row([' 60 ', percents[5]])

p.add_row([' 70 ', percents[6]])

p.add_row([' 80 ', percents[7]])

p.add_row([' 90 ', percents[8]])

p.add_row([' 100 ', percents[9]])

print(p)

Finally, we used Matplotlib and Seaborn for charting: a histogram that shows the underlying frequency distribution of the expected lost time cost per year and a density plot that shows the probability density function of the percent lost time due to breakdowns and repairs.

# Lost Time Cost Histogramn_bins = 20fig, ax = plt.subplots(figsize=(8, 6))

ax.hist(list_cost_TTR, histtype ='bar', bins=20, color = 'c',

edgecolor='k', alpha=0.65, density = False) # density=False show countsax.axvline(media, color='g', linestyle='dashed', linewidth=3)

ax.axvline(median, color='r', linestyle='dashed', linewidth=3)ax.set_title('Frequency Chart')

plt.savefig(your_path +'Expected Lost Time Cost per Year',

ax.set_ylabel('Counts')

ax.set_xlabel('U$S')

ax.grid(axis = 'y')

bbox_inches='tight', dpi=150)

plt.show()# Percent Lost Time Cost Density Plotsns.distplot(list_Lost_Time, hist=False, kde=True,

bins=int(180/5), color = 'lightblue',

axlabel = 'Percent Lost Time (%)',

hist_kws={'edgecolor':'black'},

kde_kws = {'shade': True, 'linewidth': 3})plt.savefig(your_path + 'Percent Lost Time' ,

bbox_inches='tight', dpi=150)

plt.show()

We made 2000 replications appending several lists. Then, we calculated key descriptive statistical measures for the cost associated with machine failures and the corresponding repair times. The following table shows the *Statistics Report:*

Table 1 shows little difference between the mean and the median and the half-width interval is almost negligible. The outcome distribution is positively skewed with a negligible degree of kurtosis.

Table 2 shows the *Percentile Report*. We can see that the chance that the lost time cost will be greater than 50 M U$S is more than 20%.

Figure 1 shows the distribution of our outcome (*Lost Time Cost*) as a histogram with 20 bins. The mean and the median are indicated with green and red vertical lines respectively. As usual, the histogram provides a visual representation of the distribution of our quantity of interest: its location, spread, skewness, and kurtosis. We claim that the expected value (95% confidence level) is around USS 44.000.

Figure 2 displays a density plot of the output variable *Percent Lost Time.* Remember that the key idea in density plots is to eliminate the jaggedness that characterizes histograms, allowing better visualization of the shape of the distribution.

Clearly, the distribution is unimodal. We claim that, on average, 10% of the operating hours will be lost due to machine breakdowns.

A Monte Carlo Simulation is a computational technique used by decision-makers and project managers to quantitatively assess the impact of risk. An essential step of any simulation study is an appropriate statistical analysis of the output data. Statistical reports and distribution charts (histograms and density plots) are mandatory for assessing the level of exposure to risk.

[ad_2]

Source link