Stratified random sampling is a valuable statistical method in which you divide your data into groups called strata and sample each group. It works well for data with homogenous groups with different levels of risk. After forming the strata, you can sample each group differently. This guide will show you two techniques you can use to apply stratified sampling using Google Sheets.
Contents
Ensure Your Data is a Good Fit
For example, you can sample accounts payable invoices entered into an accounting system. You plan to request backup documentation for random invoices to check for fraud. Fraud is more likely to occur outside of regular business hours. Column C
in the image below contains a checkbox yes/no field we can use for stratification. These checkboxes are an excellent choice for the strata, as they will let us separate the population into two groups. The transactions posted during business hours will be low risk, and those that occur outside regular business hours will be high risk.
After you have chosen your stratification criteria, you can create the sample.
Manual Method with Formulas
Step 1: Generate Random Numbers
Create a new column titled ‘Random’ and input the formula =RAND()
in the first cell to generate a random number between 0
and 1
. Copy the formula down to apply it to all cells in the column.
Step 2: Stabilize Random Numbers
To prevent the random numbers from changing every time you perform an action, Copy the entire Random column, then right-click and select Paste special followed by Values only.
Step 3: Sort the Data
Highlight your data, including the header row. Go to the Data menu and select Sort range. Then, choose Advanced range sorting options.
Google Sheets shows the column names from the first row of your data. Remember to check the box indicating that your data has a header row. Now, you can choose the columns of your choice. For the first sort, choose Business Hours?, then Random for the second.
This advanced sort arranges your data by your chosen strata and in random order inside each strata.
Step 4: Select Your Sample
After sorting, your final sample will consist of as many from each stratum as you desire. For example, you could sample the first three low-risk and the first fifteen high-risk items. Remember, since you sorted the list based on random numbers, selecting them in order creates a random selection.
You’ve completed stratified random sampling in Google Sheets…the hard way. Now, let’s look at an easier alternative.
Using the Random Sampler Add-On
The Random Sampler add-on creates stratified samples without formulas or sorting. Let’s examine the steps involved.
Step 1: Install the Add-On
Go to Extensions > Add-ons > Get Add-ons. Search for “Random Sampler” and install the one by Sheets Help. Expand the side panel in your browser and click on the Random Sampler icon. See this video for more installation instructions.
Step 2: Configure Sample
In the Random Sampler sidebar, select the data range of your population and the stratified sampling method.
This add-on asks for the strata column. Enter C
as the location of the Business Hours checkboxes.
Step 3: Choose the Sample Size
After choosing the strata, Random Sampler lets you select the percentage to sample for each strata. Since transactions posted outside of normal business hours are the most risky, we’ll choose to sample those at 30% while only sampling the business hours transactions at 10%.
You can also choose how to show the sample. Use the Highlight in place option to keep data intact and indicate the samples on the same sheet.
Step 4: Create the Sample
Click the Sample button, and the add-on will highlight the random samples.
Choosing a Method
Now you have your sample highlighted and ready to go.
- Use the function method (Method 1) if you’re comfortable with formulas and want more control over the process.
- Use the Random Sampler add-on (Method 2) for a quicker and user-friendly experience, especially for larger datasets.
Choose the method that best suits your needs and data characteristics.
Other Sampling Techniques
Simple – Take simple random samples from a population
Systematic – Select every nth item from a population