Sampling Methods

In this post we will discuss various sampling methods used in research. Imagine you want to know the average height of all students in Multan. You obviously cannot measure every single student: there are thousands of them! So what do you do? You pick a smaller group, measure them, and use those results to make a conclusion about everyone. That smaller group is called a sample, and the process of selecting it is called sampling.

In statistics, sampling is one of the most fundamental concepts. The method you choose directly affects the accuracy, reliability, and validity of research. Choosing the wrong sampling method may lead to completely misleading results; no matter how sophisticated your analysis is.

Families of Sampling Methods

There are two broad families of sampling methods (probability and non-probability sampling):

  • Probability Sampling: every member of the population has a known, non-zero chance of being selected.
  • Non-Probability Sampling: selection is based on judgment, convenience, or other non-random criteria.
Sampling methods

The following are some popular methods of sampling (sampling methods) that are explained with examples:

Random Sampling

The sample is chosen as a result of chance occurrences. Telephone polling random telephone numbers and drawing names out of a hat are examples of random sampling. This can be done with or without replacement:

  • With replacement: A selected unit can be picked again.
  • Without replacement: Once selected, a unit is removed from the pool.

Systematic Sampling

The popultion is placed on a list, a random starting point is chosen and then every kth member/element is selected. Choosing a sample of registered voters by choosing every 25th voter from the country registration roll. Similarly, testing every 300th product from the assembly line are examples of systematic sampling.

Stratified Sampling

The population is divided into groups (stratas) usually with meaningful differences, and a sample is chosen from each group. For example, choosing 200 men and 200 woemn for a sample is an example of stratified sampling. Similarly, stratify the population by income level and then choose a sample of low, middle, and high income indivduals is another example of stratified sampling. There are two types:

  • Proportional Stratified Sampling: Sample size from each stratum is proportional to its size in the population.
  • Disproportional Stratified Sampling: Equal sample sizes from each stratum regardless of stratum size.

Cluster Sampling

The population is divided into groups in a more or less ranodm way, and then a sample is chosen by randomly selecting entire groups. Randomly choose 10 polling stations in a city and exit poll al lvoters at those stations is an example of cluster sampling.

Multistage Sampling

Multistage sampling combines several sampling methods in successive stages. At each stage, a smaller sampling unit is selected from the previous stage. It’s the most practical approach for large-scale national or international surveys.

Convenience Sampling

Choose individuals for a sample because they are eacy to include. The examples are internet polls, and mail-in customer survey.  It is the go-to choice when time and resources are limited, though it’s highly prone to bias.

Purposes Sampling (Judgemental Sampling)

The researcher uses their own expert judgment to select participants who best represent the population or who are most relevant to the research question. This is common in qualitative research where specific knowledge or characteristics matter.

Quota Sampling

It is the cousin of stratified sampling. The population is divided into subgroups, and the researcher fills a specific quota from each group: but without random selection. The researcher chooses whoever is convenient within each quota.

Snowball Sampling

It is used when the target population is hard to reach or hidden. You start with a small group of known individuals, and ask them to refer others who meet the criteria. The sample “snowballs”: growing as each participant recruits more participants.

The table below describe the comparison among various sampling methods:

MethodTypeRandom?Best Used WhenGeneralizability
Simple RandomProbability✅ YesComplete list available, small/medium populationHigh
SystematicProbability✅ PartialLarge ordered lists, production/quality controlHigh
StratifiedProbability✅ YesPopulation has distinct subgroupsVery High
ClusterProbability✅ YesPopulation is geographically dispersedModerate
MultistageProbability✅ YesVery large national/international surveysModerate–High
ConvenienceNon-Probability❌ NoExploratory/pilot studies, limited resourcesLow
PurposiveNon-Probability❌ NoQualitative research, expert knowledge requiredLow–Moderate
QuotaNon-Probability❌ NoMarket research, subgroup representation neededModerate
SnowballNon-Probability❌ NoHidden or hard-to-reach populationsLow

Sampling in R Language

Essential Biostatistics MCQs Regression Epidemiology

Prepare for your biostatistics exams or medical boards with this targeted set of Multiple Choice Questions (Essential Biostatistics MCQs). Covering essential topics like Pearson and Spearman correlations, linear models, and logistic regression, this quiz is designed to test your ability to interpret clinical data, epidemiological odds ratios, and confounding variables. Let us start with the Online Essential Biostatistics MCQs about Correlation, Regression, and Epidemiology now.

Online Essential Biostatistics MCQs Correlation, regression, and Epidemiology

Online multiple choice questions about Regression Analysis in BioStatistics with Answers

1. Which statistical test is used to analyze the association between two continuous variables?

 
 
 
 

2. Residuals in regression should be

 
 
 
 

3. Spearman correlation is used for

 
 
 
 

4. Which measure quantifies relationship strength and direction?

 
 
 
 

5. The value of $r=0.9$ indicates

 
 
 
 

6. Which test analyzes a binary outcome with covariates?

 
 
 
 

7. Which statistical test is used to compare means between multiple groups and control for confounding variables?

 
 
 
 

8. Which test fits logistic regression?

 
 
 
 

9. What is the null hypothesis in regression?

 
 
 
 

10. In $Y = a + bX$, where $b$ is

 
 
 
 

11. What does the term ‘odds ratio’ represent in epidemiological studies?

 
 
 
 

12. $R^2$ represents

 
 
 
 

13. The odds ratio in logistic regression is

 
 
 
 

14. A researcher is investigating the relationship between age and blood pressure. Which type of correlation is most appropriate?

 
 
 
 

15. What is a scatter plot used for?

 
 
 
 

16. The logistic regression outcome is

 
 
 
 

17. Multiple regression includes

 
 
 
 

18. Pearson correlation ($r$) ranges from

 
 
 
 

19. In a simple linear regression model, there are

 
 
 
 

20. What is the purpose of correlation analysis?

 
 
 
 

Question 1 of 20

Online Essential Biostatistics MCQs Correlation, Regression, and Epidemiology

  • Pearson correlation ($r$) ranges from
  • The value of $r=0.9$ indicates
  • Spearman’s correlation is used for
  • In a simple linear regression model, there are
  • In $Y = a + bX$, where $b$ is
  • $R^2$ represents
  • Residuals in regression should be
  • Multiple regression includes
  • The logistic regression outcome is
  • The odds ratio in logistic regression is
  • Which test analyzes a binary outcome with covariates?
  • A researcher is investigating the relationship between age and blood pressure. Which type of correlation is most appropriate?
  • Which statistical test is used to analyze the association between two continuous variables?
  • What does the term ‘odds ratio’ represent in epidemiological studies?
  • Which statistical test is used to compare means between multiple groups and control for confounding variables?
  • What is the purpose of correlation analysis?
  • Which measure quantifies relationship strength and direction?
  • What is a scatter plot used for?
  • What is the null hypothesis in regression?
  • Which test fits logistic regression?

R language and Data Analysis

SAS Macros

Discover essential SAS macros concepts in this Q&A guide. Learn how to create and identify macro variables, distinguish %LOCAL vs. %GLOBAL scope, reuse code with %INCLUDE and macros, leverage DATA _NULL_, perform arithmetic on macro variables, call macros inside a data step, understand macro variable length limits, and explore SAS validation tools. Perfect for SAS programmers looking to sharpen their macro skills.

SAS Macros: 10 Key Q&As for Programmers

Describe the way in which one can create a macro variable in SAS?

There are 5 ways to create macro variables in SAS:

  • %LET statement – defines a macro variable explicitly:
    %let var = value;
  • CALL SYMPUT / CALL SYMPUTX – creates a macro variable from a data step:
    call symput(‘var’, value);
  • INTO clause in PROC SQL – stores query results into macro variables:
    select count(*) into :var from dataset;
  • Macro parameters – defined in a macro definition:
    %macro mymacro(var=);
  • Iterative %Do Statement
sas macros

How would you identify a macro variable in SAS?

A macro variable is identified in SAS code by an ampersand (&) preceding its name, e.g., &var. To view existing macro variables and their values, one can use:

  • %PUT _USER_; – lists all user-defined macro variables.
  • %PUT &var; – displays the value of a specific macro variable.

What is the difference between %LOCAL and %GLOBAL in SAS?

%LOCAL creates a macro variable with scope limited to the current macro; it is not accessible outside that macro.
%GLOBAL creates a macro variable accessible anywhere (global scope), even after the macro finishes.

How would you define the end of a macro in SAS?

The end of a macro in SAS is defined using the %MEND statement. Optionally, you can include the macro name after %MEND for clarity (e.g., %MEND mymacro;).

How would you include common code or reuse code to be processed along with your statements?

One can reuse code in SAS by:

  • Using %INCLUDE to insert an external file containing SAS statements.
  • Defining macros (%MACRO / %MEND) to encapsulate reusable logic, then calling them as needed.

Explain DATA_NULL_

DATA _NULL_; It is a SAS data step that does not create an output dataset. It is used for executing logic without producing data, such as writing to the log, creating macro variables (via CALL SYMPUT), generating custom reports with PUT statements, or performing calculations that only need to be processed in memory.

How do you add a number to a macro variable in SAS?

To add a number to a macro variable, use %EVAL (for integers) or %SYSEVALF (for floating-point) within a %LET statement:

%let var = 5;
%let var = %eval(&var + 3);

For decimal values:

%let var = %sysevalf(&var + 0.5);

How can we call macros within a data step?

One can call a macro within a data step by:

  • Directly invoking the macro (e.g., %my_macro;): it executes during data step compilation, not for each observation.
  • Using CALL EXECUTE: to execute the macro conditionally or for each observation:sascall execute(‘%my_macro’);
  • Using %SYSFUNC: to call a SAS function within a macro without requiring a full macro invocation.

Note: Macro calls placed directly in the data step are resolved before the data step runs, while CALL EXECUTE resolves during data step execution.

What is the maximum length of the macro variable?

The maximum length of a macro variable value in SAS is 65,534 characters. The macro variable name itself can be up to 32 characters long. Note: In very old versions (SAS 6), the maximum length was 32,767 characters. The 65,534 limit applies to SAS System 8 and later.

What validation tools are used in SAS?

The common validation tools in SAS include:

  • PROC COMPARE: compares datasets for differences.
  • PROC FREQ / PROC MEANS: validates data distributions and summary statistics.
  • Data step debugging: using PUT statements and _ERROR_ variable.
  • Macro debugging options: MPRINT, SYMBOLGEN, MLOGIC to trace macro resolution.
  • SAS Code Analyzer (in SAS Enterprise Guide): checks code for errors and best practices.
  • DS2 and FedSQL: offer validation features in SAS Viya.
  • Validation framework: in SAS Viya for automated model validation.

Simulation in R Language