Statistical Data: Introduction and Real Life Examples (2020)

By statistical Data we mean, the piece of information collected for descriptive or inferential statistical analysis of the data. Data is everywhere. Therefore, everything that has past and/ or features is called statistical data.

One can find the Statistical data

  • Any financial/ economics data
  • Transactional data (from stores, or banks)
  • The survey, or census (of unemployment, houses, population, roads, etc)
  • Medical history
  • Price of product
  • Production, and yields of a crop
  • My history, your history is also statistical data

Data

Data is the plural of datum — it is a piece of information. The value of the variable (understudy) associated with one element of a population or sample is called a datum (or data in a singular sense or data point). For example, Mr. Asif entered college at the age of 18 years, his hair is black, has a height of 5 feet 7 inches, and he weighs about 140 pounds. The set of values collected for the variable from each of the elements belonging to the sample is called data (or data in a plural sense). For example, a set of 25 weights was collected from the 25 students.

Types of Data

The data can be classified into two general categories: quantitative data and qualitative data. The quantitative data can further be classified as numerical data that can be either discrete or continuous. The qualitative data can be further subdivided into nominal, ordinal, and binary data.

Qualitative data represent information that can be classified by some quality, characteristics, or criterion—for example, the color of a car, religion, blood type, and marital status.

When the characteristic being studied is non-numeric it is called a qualitative variable or an attribute. A qualitative variable is also known as a categorical variable. A categorical variable is not comparable to taking numerical measurements. Observations falling in each category (group, class) can only be counted for examples, gender (either male or female), general knowledge (poor, moderate, or good), religious affiliation, type of automobile owned, city of birth, eye color (red, green, blue, etc), etc. Qualitative variables are often summarized in charts graphs etc. Other examples are what percent of the total number of cars sold last month were Suzuki, what percent of the population has blue eyes?

Quantitative data result from a process that quantifies, such as how much or how many. These quantities are measured on a numerical scale. For example, weight, height, length, and volume.

When the variables studied can be reported numerically, the variable is called a quantitative variable. e.g. the age of the company president, the life of an automobile battery, the number of children in a family, etc. Quantitative variables are either discrete or continuous.

Statistical Data

Note that some data can be classified as either qualitative or quantitative, depending on how it is used. If a numerical is used as a label for identification, then it is qualitative; otherwise, it is quantitative. For example, if a serial number on a car is used to identify the number of cars manufactured up to that point then it is a quantitative measure. However, if this number is used only for identification purposes then it is qualitative data.

Binary Data

The binary data has only two possible values/states; such as, defected or non-defective, yes or no, and true or false, etc. If both of the values are equally important then it is binary symmetric data (for example, gender). However, if both of the values are not equally important then it can be called binary asymmetric data (for example, result: pass or fail, cancer detected: yes or no).

For quantitative data, a count will always give discrete data, for example, the number of leaves on a tree. On the other hand, a measure of a quantity will usually be continuous, for example, weigh 160 pounds, to the nearest pound. This weight could be any value in the interval say 159.5 to 160.5.

The following are some examples of Qualitative Data. Note that the outcomes of all examples of Qualitative Variables are non-numeric.

  • The type of payment (cheque, cash, or credit) used by customers in a store
  • The color of your new cell phone
  • Your eyes color
  • The make of the types on your car
  • The obtained exam grade

The following are some examples of Quantitative Data. Note that the outcomes of all examples of Quantitative Variables are numeric.

  • The age of the customer in a stock
  • The length of telephone calls recorded at a switchboard
  • The cost of your new refrigerator
  • The weight of your watch
  • The air pressure in a tire
  • the weight of a shipment of tomatoes
  • The duration of a flight from place A to B
  • The grade point average

Learn about the Measures of Central Tendency

Visit Online MCQs Quiz Website

Data Transformation (Variable Transformation)

The data transformation is a rescaling of the data using a function or some mathematical operation on each observation. When data are very strongly skewed (negative or positive), we sometimes transform the data so that they are easier to model. In another way, if the variable(s) do not fit a normal distribution, then one should try a Data Transformation to fit the assumption of using a parametric statistical test.

The most common data transformation is log (or natural log) transformation, which is often applied when most of the data values cluster around zero relative to the larger values in the data set, and all of the observations are positive.

Data Transformation Techniques

Variable transformation can also be applied to one or more variables in scatter plots, correlation, and regression analysis to make the relationship between the variables more linear; hence, it is easier to model with a simple method. Other transformations than log are square root, reciprocal, etc.

Reciprocal Transformation

The reciprocal transformation $x$ to $\frac{1}{x}$ or $(-\frac{1}{x})$ is a very strong transformation with a drastic effect on the shape of the distribution. Note that this transformation cannot be applied to zero values, but can be applied to negative values. Reciprocal transformation is not useful unless all of the values are positive and reverses the order among values of the same sign, i.e., largest becomes smallest, etc.

Logarithmic Transformation

The logarithm $x$ to log (base 10) (or natural log, or log base 2) is another strong transformation that affects the shape of the distribution. Logarithmic transformation is commonly used for reducing right skewness, but cannot be applied to negative or zero values.

Square Root Transformation

The square root x to $x^{\frac{1}{2}}=\sqrt(x)$ transformation has a moderate effect on the distribution shape and is weaker than the logarithm. The square root transformation can be applied to zero values but not negative values.

Data Transformation

The purpose of transformation is:

  • Convert data from one format or structure to another (like changing a messy spreadsheet into a table).
  • Clean and prepare data for analysis (fixing errors, inconsistencies, and missing values).
  • Standardize data for easier integration and comparison (making sure all your data uses the same units and formats).

Goals of transformation

The goals of transformation may be

  • one might want to see the data structure differently,
  • one might want to reduce the skew that assists in modeling
  • one might want to straighten a nonlinear (curvilinear) relationship in a scatter plot. In other words, a transformation may be used to have approximately equal dispersion, making data easier to handle and interpret
Data Transformation (Variable Transformation)

There are many techniques used in data transformation, these techniques are:

  • Cleaning and Filtering: Identifying and removing errors, missing values, and duplicates.
  • Data Normalization: Ensuring data consistency across different fields.
  • Aggregation: Summarizing data by combining similar values.

Benefits of Data Transformation

The Benefits of transformation and data cleaning are:

  • Improved data quality: Fewer errors and inconsistencies lead to more reliable results.
  • Easier analysis: Structured data is easier to work with for data analysts and scientists.
  • Better decision-making: Accurate insights from clean data lead to better choices.
https://itfeature.com

Data transformation is a crucial step in the data pipeline, especially in tasks like data warehousing, data integration, and data wrangling.

FAQS about Data Transformation

  • What is data transformation?
  • When is data transformation done?
  • What is the most common data transformation?
  • What is the reciprocal Data Transformation?
  • When is reciprocal transformation not useful?
  • What is a logarithmic transformation?
  • When logarithmic transformation not applied to the data?
  • What is the square root transformation?
  • When square root transformation not be applied?
  • What is the main purpose of data transformation?
  • What are the goals of transformation?
  • What is data normalization?
  • What is data aggregation?
  • What is the cleaning and filtering?

Online MCQs Test Website

Introduction to R Language

Primary and Secondary Data

Before learning about primary and Secondary Data, let us first understand the term Data in Statistics.

The facts and figures that can be numerically measured are studied in statistics. Numerical measures of the same characteristics are known as observations, and the collection of observations is termed as data. Data are collected by individual research workers or by organizations through sample surveys or experiments, keeping in view the objectives of the study. The data collected may be (i) Primary Data and (ii) Secondary Data.

Primary and Secondary Data in Statistics

The difference between primary and secondary data in Statistics is that Primary data is collected firsthand by a researcher (organization, person, authority, agency or party, etc.) through experiments, surveys, questionnaires, focus groups, conducting interviews, and taking (required) measurements, while the secondary data is readily available (collected by someone else) and is available to the public through publications, journals, and newspapers.

Primary and Secondary Data

Primary Data

Primary data means the raw data (data without fabrication or not tailored data) that has just been collected from the source and has not gone through any kind of statistical treatment, like sorting and tabulation. The term primary data may sometimes be used to refer to first-hand information.

Sources of Primary Data

The sources of primary data are primary units such as basic experimental units, individuals, and households. The following methods are used to collect data from primary units, usually, and these methods depend on the nature of the primary unit. Published data and the data collected in the past are called secondary data.

  • Personal Investigation
    The researcher experiments or surveys himself/herself and collects data from it. The collected data is generally accurate and reliable. This method of collecting primary data is feasible only in the case of small-scale laboratories, field experiments, or pilot surveys and is not practicable for large-scale experiments and surveys because it takes too much time.
  • Through Investigators
    The trained (experienced) investigators are employed to collect the required data. In the case of surveys, they contact the individuals and fill in the questionnaires after asking for the required information, whereas a questionnaire is an inquiry form having many questions designed to obtain information from the respondents. This method of collecting data is usually employed by most organizations, and it gives reasonably accurate information, but it is very costly and may be time-consuming too.
  • Through Questionnaire
    The required information (data) is obtained by sending a questionnaire (printed or soft form) to the selected individuals (respondents) (by mail) who fill in the questionnaire and return it to the investigator. This method is relatively cheap as compared to the “through investigator” method, but the non-response rate is very high, as most of the respondents don’t bother to fill in the questionnaire and send it back to the investigator.
  • Through Local Sources
    The local representatives or agents are asked to send the requisite information and provide it based on their own experience. This method is quick, but it gives rough estimates only.
  • Through Telephone
    The information may be obtained by contacting the individuals by telephone. It is Quick and provides the accurate required information.
  • Through Internet
    With the introduction of information technology, people may be contacted through the Internet, and individuals may be asked to provide pertinent information. Google Survey is widely used as an online method for data collection nowadays. There are many paid online survey services, too.

It is important to go through the primary data and locate any inconsistent observations before it is given a statistical treatment.

Secondary Data

Data that has already been collected by someone may be sorted, tabulated, and undergo a statistical treatment. It is fabricated or tailored data.

Sources of Secondary Data

The secondary data may be available from the following sources:

  • Government Organizations
    Federal and Provincial Bureau of Statistics, Crop Reporting Service-Agriculture Department, Census and Registration Organization, etc.
  • Semi-Government Organization
    Municipal committees, District Councils, Commercial and Financial Institutions like banks, etc.
  • Teaching and Research Organizations
  • Research Journals and Newspapers
  • Internet

Data Structure in R Language