{"id":21392,"date":"2025-02-27T06:10:36","date_gmt":"2025-02-27T06:10:36","guid":{"rendered":"https:\/\/codegnan.com\/?p=21392"},"modified":"2026-06-25T08:52:29","modified_gmt":"2026-06-25T08:52:29","slug":"data-science-interview-questions","status":"publish","type":"post","link":"https:\/\/codegnan.com\/data-science-interview-questions\/","title":{"rendered":"29 Data Science Interview Questions"},"content":{"rendered":"\r\n<p class=\"wp-block-paragraph\">Getting ready for a data science interview?\u00a0<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">You\u2019ll face data science interview questions about coding, statistics, machine learning, and real-world problem-solving.\u00a0<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">In this guide, we\u2019ll break down common data science interview questions and how to answer them.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Whether you&#8217;re a beginner or an expert, these tips will help you ace your next interview!<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\ud83d\udca1 Want to become a high-paying data science job ready?<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Explore our courses:<\/strong><\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li><a href=\"https:\/\/codegnan.com\/data-science-course-training-in-hyderabad\/\">Data science course in Hyderabad<\/a> (classroom training)<\/li>\r\n\r\n\r\n\r\n<li><a href=\"https:\/\/codegnan.com\/data-science-course-training-in-vijayawada\/\">Data science course training in Vijayawada<\/a> (classroom training)<\/li>\r\n\r\n\r\n\r\n<li><a href=\"https:\/\/codegnan.com\/academy\/online-data-analysis-course\/\">Online data analysis course<\/a> (online course)<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<h2 id=\"h-beginner-data-science-interview-questions\" class=\"wp-block-heading\"><strong>Beginner data science interview questions<\/strong><\/h2>\r\n\r\n\r\n\r\n<h3 id=\"h-1-what-is-data-science\" class=\"wp-block-heading\"><strong>1. What is Data Science?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Data Science is the study of data to find useful insights, patterns, and trends. It combines statistics, programming, and domain knowledge to make better decisions. Businesses use data science for multiple tasks like predicting sales, detecting fraud, and improving customer experience.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Example<\/strong>: A streaming platform like Netflix uses data science to suggest movies based on your past watching habits.<\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-image\"><img fetchpriority=\"high\" decoding=\"async\" width=\"866\" height=\"782\" class=\"wp-image-21398\" src=\"https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/AD_4nXcpcp2iYTj3jsdj_0cPEEP4Prm65pWXK2DG5dUTY4i43xTKsN-EEcbrY9njXOv-hoVw8fmgi1dWHKkBn9DGDIqhQ7_Or3SY0hKsYRuscVkJbgd0dEZ4g_Bn-L9lAu3t3-sUrwnz.png\" alt=\"\" srcset=\"https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/AD_4nXcpcp2iYTj3jsdj_0cPEEP4Prm65pWXK2DG5dUTY4i43xTKsN-EEcbrY9njXOv-hoVw8fmgi1dWHKkBn9DGDIqhQ7_Or3SY0hKsYRuscVkJbgd0dEZ4g_Bn-L9lAu3t3-sUrwnz.png 866w, https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/AD_4nXcpcp2iYTj3jsdj_0cPEEP4Prm65pWXK2DG5dUTY4i43xTKsN-EEcbrY9njXOv-hoVw8fmgi1dWHKkBn9DGDIqhQ7_Or3SY0hKsYRuscVkJbgd0dEZ4g_Bn-L9lAu3t3-sUrwnz-300x271.png 300w, https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/AD_4nXcpcp2iYTj3jsdj_0cPEEP4Prm65pWXK2DG5dUTY4i43xTKsN-EEcbrY9njXOv-hoVw8fmgi1dWHKkBn9DGDIqhQ7_Or3SY0hKsYRuscVkJbgd0dEZ4g_Bn-L9lAu3t3-sUrwnz-768x694.png 768w, https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/AD_4nXcpcp2iYTj3jsdj_0cPEEP4Prm65pWXK2DG5dUTY4i43xTKsN-EEcbrY9njXOv-hoVw8fmgi1dWHKkBn9DGDIqhQ7_Or3SY0hKsYRuscVkJbgd0dEZ4g_Bn-L9lAu3t3-sUrwnz-595xh.png 595w\" sizes=\"(max-width: 866px) 100vw, 866px\" \/><\/figure>\r\n\r\n\r\n\r\n<h3 id=\"h-2-explain-the-difference-between-supervised-and-unsupervised-learning-nbsp\" class=\"wp-block-heading\"><strong>2. Explain the difference between Supervised and Unsupervised Learning\u00a0<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">The difference between Supervised and unsupervised learning is:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li><strong>Supervised Learning<\/strong>: The computer learns from labeled data (data with correct answers). It makes predictions based on past examples.<\/li>\r\n\r\n\r\n\r\n<li>Example: A spam filter learns from past emails labeled as &#8220;spam&#8221; or &#8220;not spam.&#8221;<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li><strong>Unsupervised Learning<\/strong>: The computer finds patterns in data without labeled answers. It groups similar things together.<\/li>\r\n\r\n\r\n\r\n<li>Example: Netflix groups similar users based on what they watch and suggests shows they might like.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<h3 id=\"h-3-what-is-linear-regression\" class=\"wp-block-heading\"><strong>3. What is Linear Regression?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Linear Regression is a method to predict a value using a straight line. It shows the relationship between two things. If one increases, the other might increase or decrease.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Example:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Imagine you sell ice cream. If the temperature rises, you sell more. A straight line can predict future sales based on temperature by plotting a graph with &#8220;Temperature&#8221; on the X-axis and &#8220;Ice Cream Sales&#8221; on the Y-axis.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Mathematically, the formula is:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Sales = (Slope \u00d7 Temperature) + Intercept<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">If the slope is 10, it means with every 1\u00b0C increase in temperature, you will sell 10 more ice creams.<\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-image\"><img decoding=\"async\" width=\"428\" height=\"434\" class=\"wp-image-21397\" src=\"https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/AD_4nXcFtW38jZ3JsutWWcZ9AnC-lFblE10a6dULcPyZorsJb4E0v8tI-tA8Rycab729NdE0r_M-3sFG-mYDQMuU4fw1cuKgFld3VbWsBY5BT1c_G5Gk7zvTmmFtddskdUdcZj353vJGdQ.png\" alt=\"\" srcset=\"https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/AD_4nXcFtW38jZ3JsutWWcZ9AnC-lFblE10a6dULcPyZorsJb4E0v8tI-tA8Rycab729NdE0r_M-3sFG-mYDQMuU4fw1cuKgFld3VbWsBY5BT1c_G5Gk7zvTmmFtddskdUdcZj353vJGdQ.png 428w, https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/AD_4nXcFtW38jZ3JsutWWcZ9AnC-lFblE10a6dULcPyZorsJb4E0v8tI-tA8Rycab729NdE0r_M-3sFG-mYDQMuU4fw1cuKgFld3VbWsBY5BT1c_G5Gk7zvTmmFtddskdUdcZj353vJGdQ-296x300.png 296w\" sizes=\"(max-width: 428px) 100vw, 428px\" \/><\/figure>\r\n\r\n\r\n\r\n<h3 id=\"h-4-describe-confusion-matrix-with-an-example-nbsp\" class=\"wp-block-heading\"><strong>4. Describe confusion matrix with an example.\u00a0<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">A confusion matrix is a table used to measure how well a classification model performs. It compares predicted vs. actual results.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">The table has four main parts:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>True Positive (TP): Correctly predicted as positive<\/li>\r\n\r\n\r\n\r\n<li>False Positive (FP): Incorrectly predicted as positive<\/li>\r\n\r\n\r\n\r\n<li>True Negative (TN): Correctly predicted as negative<\/li>\r\n\r\n\r\n\r\n<li>False Negative (FN): Incorrectly predicted as negative<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Example:<\/strong><\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">If a doctor\u2019s AI model predicts if a person has a disease:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>TP: Sick people correctly identified as sick<\/li>\r\n\r\n\r\n\r\n<li>FP: Healthy people wrongly identified as sick<\/li>\r\n\r\n\r\n\r\n<li>TN: Healthy people correctly identified as healthy<\/li>\r\n\r\n\r\n\r\n<li>FN: Sick people wrongly identified as healthy<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<figure class=\"wp-block-image\"><img decoding=\"async\" width=\"507\" height=\"159\" class=\"wp-image-21396\" src=\"https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/AD_4nXc5jNyob2W_j1o70RsCQMdUMH80CwIl6kEGEoPrmEyVxmzPJdlNNqRcQIFf0l-rbBiioOUMdSJp7UivOitZJ6p-SsIu5WVo_hhjVH0HbXn8sv_qQor3D3NeSiBJX4gbATkUsGj-Mw.png\" alt=\"\" srcset=\"https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/AD_4nXc5jNyob2W_j1o70RsCQMdUMH80CwIl6kEGEoPrmEyVxmzPJdlNNqRcQIFf0l-rbBiioOUMdSJp7UivOitZJ6p-SsIu5WVo_hhjVH0HbXn8sv_qQor3D3NeSiBJX4gbATkUsGj-Mw.png 507w, https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/AD_4nXc5jNyob2W_j1o70RsCQMdUMH80CwIl6kEGEoPrmEyVxmzPJdlNNqRcQIFf0l-rbBiioOUMdSJp7UivOitZJ6p-SsIu5WVo_hhjVH0HbXn8sv_qQor3D3NeSiBJX4gbATkUsGj-Mw-300x94.png 300w\" sizes=\"(max-width: 507px) 100vw, 507px\" \/><\/figure>\r\n\r\n\r\n\r\n<h3 id=\"h-5-what-are-sampling-techniques-nbsp\" class=\"wp-block-heading\"><strong>5. What are sampling techniques?\u00a0<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Sampling techniques are the process of selecting a data subset from a larger dataset. It means you can pick a small group of data from a big population to analyze and uncover patterns. They save time and effort while still providing accurate insights.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Types of Sampling:<\/strong><\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li><strong>Random Sampling<\/strong> \u2013 In Random sampling, every item has an equal chance of selection. (E.g., lottery draw)<\/li>\r\n\r\n\r\n\r\n<li><strong>Stratified Sampling<\/strong> \u2013 In Stratified sampling, the entire population is divided into groups, then samples are taken from each group. (E.g., selecting students from different grades)<\/li>\r\n\r\n\r\n\r\n<li><strong>Systematic Sampling<\/strong> \u2013 In Systematic sampling, every nth item is chosen. (E.g., checking every 10th product in a factory)<\/li>\r\n\r\n\r\n\r\n<li><strong>Cluster Sampling<\/strong> \u2013 In Cluster sampling, the entire population is divided into groups, and one or more groups are selected randomly. (E.g., picking random schools in a city for a survey)<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<h3 id=\"h-6-define-pruning-in-a-decision-tree-algorithm\" class=\"wp-block-heading\"><strong>6. Define Pruning in a Decision Tree Algorithm<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Pruning is a technique used in decision trees to remove unnecessary branches that make the model too complex. It helps improve accuracy by preventing overfitting, which happens when a model learns too much from training data and performs poorly on new data.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Example:<\/strong><\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Imagine a tree that predicts if a student will pass an exam. If the tree splits into too many branches (e.g., &#8220;Did the student eat breakfast?&#8221; or &#8220;What color is their notebook?&#8221;), it becomes too complex. Pruning removes these unnecessary branches and keeps only the important ones, like &#8220;Did the student study?&#8221;<\/p>\r\n\r\n\r\n\r\n<h3 id=\"h-7-what-is-the-difference-between-long-format-data-and-wide-format-data\" class=\"wp-block-heading\"><strong>7. What is the difference between long-format data and wide-format data?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">The difference between long-format data and wide-format data lies in the way data is organized in a table format. In long format, each row represents one observation of a variable. Whereas, in wide format, multiple observations of the same variable appear in different columns.\u00a0<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">You can use long format representation for charts and analysis because it stores data in a structured way. The wide format makes reading data easy when comparing values side by side.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">For example,\u00a0<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Wide format data representation\u00a0<\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-table\">\r\n<table class=\"has-fixed-layout\">\r\n<tbody>\r\n<tr>\r\n<td>Student<\/td>\r\n<td>January score<\/td>\r\n<td>February score<\/td>\r\n<td>March score<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Alice<\/td>\r\n<td>85<\/td>\r\n<td>88<\/td>\r\n<td>90<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Bob<\/td>\r\n<td>78<\/td>\r\n<td>80<\/td>\r\n<td>92<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/figure>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Long-format data representation\u00a0<\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-table\">\r\n<table class=\"has-fixed-layout\">\r\n<tbody>\r\n<tr>\r\n<td><strong>Student<\/strong><\/td>\r\n<td><strong>Month<\/strong><\/td>\r\n<td><strong>Score<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Alice<\/td>\r\n<td>January\u00a0<\/td>\r\n<td>85<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Alice<\/td>\r\n<td>February<\/td>\r\n<td>88<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Alice<\/td>\r\n<td>March<\/td>\r\n<td>90<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Bob<\/td>\r\n<td>January<\/td>\r\n<td>78<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Bob<\/td>\r\n<td>February<\/td>\r\n<td>80<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Bob<\/td>\r\n<td>March<\/td>\r\n<td>82<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/figure>\r\n\r\n\r\n\r\n<h3 id=\"h-8-what-is-the-purpose-of-cross-validation\" class=\"wp-block-heading\"><strong>8. What is the purpose of Cross-Validation?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">The purpose of Cross-validation is to check if a model performs well on new, unseen data. Instead of testing on the same data used for training, cross-validation splits data into parts: one part is used for training and another for testing.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">For example, imagine you are training a model to predict if it will rain based on past weather data.\u00a0<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">If you test it using the same training data, it will look perfect but might fail on new data.\u00a0<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Cross-validation ensures the model generalizes well by testing on different data splits, improving reliability.<\/p>\r\n\r\n\r\n\r\n<h3 id=\"h-9-when-does-bias-occur\" class=\"wp-block-heading\"><strong>9. When does Bias occur?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Bias occurs when a model makes incorrect assumptions about data, leading to poor predictions. It happens when the model is too simple and ignores important patterns. This is also called underfitting.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Example:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">A model predicts house prices using only the size of a house but ignores location, number of rooms, and condition. If it assumes &#8220;bigger is always more expensive,&#8221; it will make incorrect predictions because other factors also affect housing prices. This is called bias.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">To fix bias, we need a better model that considers more relevant features. To develop such models, you can consider strategies like feature selection and engineering, regularisation techniques, cross-validation, etc.<\/p>\r\n\r\n\r\n\r\n<h3 id=\"h-10-what-is-the-difference-between-precision-and-recall\" class=\"wp-block-heading\"><strong>10. What is the difference between precision and recall?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">The difference between Precision and recall lies in the way how well a model predicts positive cases.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Precision: How many predicted positives are actually correct?<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Recall: How many actual positives were correctly predicted?<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Example:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">A model predicts if emails are spam.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">If precision is high, most emails labeled as spam are actually spam.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">If recall is high, the model catches most spam emails but may misclassify some normal emails as spam.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">You can understand this better with a real-time example,\u00a0<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">If a doctor tests for cancer:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>During high precision \u2192 Fewer false positives (wrongly saying someone has cancer).<\/li>\r\n\r\n\r\n\r\n<li>During high recall \u2192 Fewer false negatives (missing actual cancer cases).<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>\u2b50 Related data science resources<\/strong><\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li><a href=\"https:\/\/codegnan.com\/data-science-course-syllabus\/\">Data science course syllabus and subjects<\/a><\/li>\r\n\r\n\r\n\r\n<li><a href=\"https:\/\/codegnan.com\/data-science-career-paths\/\">Data science career paths that are in-demand<\/a><\/li>\r\n\r\n\r\n\r\n<li><a href=\"https:\/\/codegnan.com\/data-science-course-fees-and-duration\/\">Data science course fees and duration in India<\/a><\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<h2 id=\"h-intermediate-data-science-nbsp\" class=\"wp-block-heading\"><strong>Intermediate Data Science\u00a0<\/strong><\/h2>\r\n\r\n\r\n\r\n<h3 id=\"h-11-what-is-the-curse-of-dimensionality\" class=\"wp-block-heading\"><strong>11. What is the curse of dimensionality?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">The curse of dimensionality happens when a dataset has too many features (dimensions), making it harder for machine learning models to learn properly. As dimensions increase, data points spread out, making distance-based calculations (like in KNN or clustering) less meaningful. This leads to poor model performance.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Example:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Imagine you are searching for a friend in a small park (2D space). It\u2019s easy to find them. But if you search in a giant forest (100D space), it becomes much harder. The same happens with high-dimensional data\u2014models struggle to find meaningful patterns.<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>What are the feature selection methods used to select the right variables?<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">The feature selection method used to select the right variables includes:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Filter Methods \u2013 In Filter methods, you can use statistical tests like correlation or chi-square to remove irrelevant features.<\/li>\r\n\r\n\r\n\r\n<li>Wrapper Methods \u2013 With Wrapper methods, you can train models with different feature sets and select the best (e.g., Recursive Feature Elimination &#8211; RFE).<\/li>\r\n\r\n\r\n\r\n<li>Embedded Methods \u2013 In Embedded methods, you can use built-in model techniques like Lasso Regression to eliminate unimportant features.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Feature selection reduces the number of input variables, improving model accuracy and speed.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">For example, Suppose we predict house prices with 50 features (like area, number of rooms, the color of the walls, etc.). Using the Embedded method in feature selection, we might find that the &#8220;color of the walls&#8221; doesn&#8217;t impact the price and remove it, making the model simpler and better.<\/p>\r\n\r\n\r\n\r\n<h3 id=\"h-12-what-is-overfitting-in-machine-learning\" class=\"wp-block-heading\"><strong>12. What is Overfitting in Machine Learning?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Overfitting in machine learning occurs when a model learns too much from training data, capturing noise instead of patterns. It performs well on training data but fails on new data. This happens when the model is too complex, or the training data is too small.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">You can fix it by using more training data, applying the regularization (L1\/L2) method, or using simpler models like pruning decision trees.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Example:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Imagine a student memorizing answers instead of understanding concepts. They score 100% in practice tests but fail real exams with new questions. In machine learning, a deep decision tree with too many branches overfits, memorizing data instead of generalizing.<\/p>\r\n\r\n\r\n\r\n<h3 id=\"h-13-what-is-principal-component-analysis-pca\" class=\"wp-block-heading\"><strong>13. What is Principal Component Analysis (PCA)?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Principal Component Analysis PCA is a technique to reduce the number of features while keeping the most important information. It transforms data into a smaller set of new variables (principal components) that capture most of the variation in the original dataset. PCA is used in image compression, face recognition, and finance.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">You can think of PCA as cleaning a messy room. If you have too many things lying around, it\u2019s hard to find what you need. But if you organize them neatly, you can still keep the most important ones while saving space.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">For example:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Suppose we analyze student performance based on 10 subjects. Instead of looking at all 10 scores, PCA can create two principal components:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>PC1: Overall academic strength<\/li>\r\n\r\n\r\n\r\n<li>PC2: Strength in science vs. arts<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">This reduces complexity while preserving insights.<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>What is the Bias-Variance Tradeoff?<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">The bias-variance tradeoff is the balance between two types of mistakes a machine learning model can make.<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Bias happens when a model is too simple and makes too many assumptions. It doesn\u2019t learn enough from the data and makes errors. This is called <strong>underfitting<\/strong>.<\/li>\r\n\r\n\r\n\r\n<li>Variance happens when a model is too complex and tries to learn every small detail from the data, even the noise. It performs well on training data but poorly on new data. This is called <strong>overfitting<\/strong>.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">If a model has high bias, it won\u2019t perform well because it ignores important patterns.\u00a0<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">If it has high variance, it will struggle with new data.\u00a0<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">The goal of the Bias-Variance Tradeoff is to find a balance where the model learns well without being too simple or too complex.<\/p>\r\n\r\n\r\n\r\n<h3 id=\"h-14-what-is-the-concept-of-ensemble-learning\" class=\"wp-block-heading\"><strong>14. What is the concept of Ensemble Learning?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Ensemble Learning is a technique where multiple machine learning models are combined to improve accuracy. Instead of relying on a single model, we use multiple models (weak learners) and combine their predictions to make a stronger final prediction. This helps reduce errors and improves stability.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Example:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Imagine a group of doctors diagnosing a patient. If only one doctor gives an opinion, there is a higher chance of error. But if 10 doctors analyze the case and vote on the best diagnosis, the final decision is more accurate.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Common ensemble methods include:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li><strong>Bagging<\/strong> (Bootstrap Aggregating): Training multiple models on different samples of data (e.g., Random Forest).<\/li>\r\n\r\n\r\n\r\n<li><strong>Boosting<\/strong>: Training models sequentially, where each model corrects the mistakes of the previous one (e.g., AdaBoost, XGBoost).<\/li>\r\n\r\n\r\n\r\n<li><strong>Stacking<\/strong>: Combining multiple models using another model as a final decision-maker.<\/li>\r\n\r\n\r\n\r\n<li>How do you handle missing data in a dataset?<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Missing data can lead to biased results and inaccurate predictions. To handle missing values, we follow these steps:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li><strong>Identify Missing Data<\/strong>: Check which columns have missing values using df.isnull().sum() in Python (Pandas).<\/li>\r\n\r\n\r\n\r\n<li><strong>Remove Rows\/Columns<\/strong>: If a column has too many missing values (e.g., 80% missing), we may drop it.<\/li>\r\n\r\n\r\n\r\n<li><strong>Imputation<\/strong> (Filling Missing Values):\r\n<ol class=\"wp-block-list\">\r\n<li>Mean\/Median: For numerical data (e.g., filling missing ages with the average age).<\/li>\r\n\r\n\r\n\r\n<li>Mode: For categorical data (e.g., filling missing city names with the most common city).<\/li>\r\n\r\n\r\n\r\n<li>Forward\/Backward Fill: Filling missing values using previous or next values in time-series data.<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<h3 id=\"h-15-what-is-the-difference-between-batch-gradient-descent-and-stochastic-gradient-descent\" class=\"wp-block-heading\"><strong>15. What is the difference between batch gradient descent and stochastic gradient descent?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Gradient Descent is an optimization algorithm used to minimize errors in machine learning models. The difference between Batch Gradient Descent (BGD) and Stochastic Gradient Descent (SGD) lies in how they update weights.<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Batch Gradient Descent: Computes the gradient using the entire dataset in each step. It is more stable but slower for large datasets.<\/li>\r\n\r\n\r\n\r\n<li>Stochastic Gradient Descent: Updates weights after processing each individual data point. It is faster but noisier.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<h3 id=\"h-16-what-is-the-purpose-of-outlier-detection-in-data-science\" class=\"wp-block-heading\"><strong>16. What is the purpose of outlier detection in data science?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Outliers are extreme values that do not follow the normal pattern of the data. Detecting and handling outliers is crucial because they can distort statistical models and affect machine learning accuracy.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Why Detect Outliers?<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Improve Model Accuracy: Outliers can skew averages and predictions.<\/li>\r\n\r\n\r\n\r\n<li>Detect Data Errors: Outliers may indicate incorrect or corrupted data.<\/li>\r\n\r\n\r\n\r\n<li>Identify Rare Events: Fraud detection systems use outlier detection to catch unusual transactions.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<h3 id=\"h-17-how-to-implement-a-real-time-data-processing-pipeline-using-different-tools\" class=\"wp-block-heading\"><strong>17. How to implement a real-time data processing pipeline using different tools?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Real-time data processing means analyzing data as it arrives rather than storing it first and then processing it later. Apache Kafka is a popular tool for handling real-time data streams.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Steps to build a real-time pipeline:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Data Source (Producers): Collects data from sensors, websites, or applications.<\/li>\r\n\r\n\r\n\r\n<li>Kafka Broker: Kafka stores and transmits the data in real-time.<\/li>\r\n\r\n\r\n\r\n<li>Consumers (Processing Layer): Reads data and processes it using Apache Spark or Flink.<\/li>\r\n\r\n\r\n\r\n<li>Storage &amp; Visualization: Processed data is stored in a database (e.g., Elasticsearch) and displayed on dashboards (e.g., Grafana).<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<h2 id=\"h-advanced-data-science-interview-questions\" class=\"wp-block-heading\"><strong>Advanced data science interview questions<\/strong><\/h2>\r\n\r\n\r\n\r\n<h3 id=\"h-18-mention-the-steps-involved-in-an-analytics-project\" class=\"wp-block-heading\"><strong>18. Mention the steps involved in an analytics project<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">An analytics project follows a structured approach:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Understand the Problem \u2013 You first need to define the business objective.<\/li>\r\n\r\n\r\n\r\n<li>Collect Data \u2013 Then, gather relevant data from different sources.<\/li>\r\n\r\n\r\n\r\n<li>Clean and Prepare Data \u2013 Data scientists then handle the missing values, remove duplicates, and standardize formats.<\/li>\r\n\r\n\r\n\r\n<li>Explore Data (EDA) \u2013 You must use statistics and visualizations to find patterns.<\/li>\r\n\r\n\r\n\r\n<li>Feature Engineering \u2013 To create meaningful features to improve model accuracy, you will use different feature engineering techniques.\u00a0<\/li>\r\n\r\n\r\n\r\n<li>Select and Train Model \u2013 Then, you can choose algorithms to train ML models.<\/li>\r\n\r\n\r\n\r\n<li>Evaluate Model \u2013 The next step is to measure the accuracy of the machine by using metrics like RMSE and F1-score.<\/li>\r\n\r\n\r\n\r\n<li>Deploy Model \u2013 Finally, you can integrate the model into a real-world system.<\/li>\r\n\r\n\r\n\r\n<li>Monitor &amp; Improve \u2013 You can keep track of its performance and retrain machines if needed.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Example<\/strong>:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">An e-commerce company wants to predict product demand. They collect sales data, clean it, analyze seasonal trends, build a forecasting model, test its accuracy, and deploy it for inventory management.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\ud83d\udc49 <a href=\"https:\/\/codegnan.com\/data-science-projects-for-beginners\/\">Data science project ideas for beginners with source code<\/a><\/p>\r\n\r\n\r\n\r\n<h3 id=\"h-19-what-are-eigenvectors-and-eigenvalues\" class=\"wp-block-heading\"><strong>19. What are Eigenvectors and Eigenvalues?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Eigenvectors and eigenvalues help simplify complex datasets by reducing dimensions while preserving essential information.<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Eigenvalues measure the magnitude of the transformation.<\/li>\r\n\r\n\r\n\r\n<li>Eigenvectors show the direction of transformation.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">In Principal Component Analysis (PCA), we use eigenvectors and eigenvalues to identify the most important patterns in data.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Example:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Imagine a dataset with 100 variables (features). PCA helps reduce them to a few principal components by identifying eigenvectors that capture maximum variance. This is useful in facial recognition, where eigenfaces represent the most significant facial features.<\/p>\r\n\r\n\r\n\r\n<h3 id=\"h-20-explain-the-principles-of-predictive-modeling-nbsp\" class=\"wp-block-heading\"><strong>20. Explain the principles of predictive modeling.\u00a0<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Predictive modeling involves building models that use historical data to predict future outcomes. The key principles are:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Understand Business Context \u2013 Define the goal (e.g., predicting customer churn).<\/li>\r\n\r\n\r\n\r\n<li>Choose the Right Algorithm \u2013 Use regression, decision trees, or neural networks.<\/li>\r\n\r\n\r\n\r\n<li>Train on Historical Data \u2013 Learn patterns from past data.<\/li>\r\n\r\n\r\n\r\n<li>Validate with New Data \u2013 Check if the model generalizes well.<\/li>\r\n\r\n\r\n\r\n<li>Measure Performance \u2013 Use RMSE, accuracy, or AUC-ROC for evaluation.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Example:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Netflix uses predictive modeling to recommend movies. It analyzes your past viewing history and suggests shows using collaborative filtering.<\/p>\r\n\r\n\r\n\r\n<h3 id=\"h-21-describe-different-regularisation-techniques-nbsp\" class=\"wp-block-heading\"><strong>21. Describe different regularisation techniques.\u00a0<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Regularization prevents overfitting by penalizing complex models. The main techniques are:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>L1 Regularization (Lasso) \u2013 Shrinks some feature coefficients to zero, performing feature selection.<\/li>\r\n\r\n\r\n\r\n<li>L2 Regularization (Ridge) \u2013 Distributes penalty across coefficients, reducing their impact without making them zero.<\/li>\r\n\r\n\r\n\r\n<li>Elastic Net \u2013 Combines L1 and L2 for balanced regularization.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<h3 id=\"h-22-when-is-resampling-done\" class=\"wp-block-heading\"><strong>22. When is resampling done?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Resampling improves model accuracy by modifying the dataset. It is done when:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Handling Imbalanced Data \u2013 Use SMOTE (Synthetic Minority Over-sampling Technique) when one class is underrepresented.<\/li>\r\n\r\n\r\n\r\n<li>Evaluating Models \u2013 Apply Cross-Validation to test model performance on different data subsets.<\/li>\r\n\r\n\r\n\r\n<li>Bootstrapping \u2013 Generate multiple samples from limited data to improve estimates.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Example:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">A bank has 100,000 loan applications but only 5,000 fraud cases. Instead of training on unbalanced data, we oversample fraud cases to prevent bias.<\/p>\r\n\r\n\r\n\r\n<h3 id=\"h-23-how-can-you-calculate-euclidean-distance-in-python\" class=\"wp-block-heading\"><strong>23. How can you calculate Euclidean Distance in Python?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Euclidean Distance is the straight-line distance between two points in an n-dimensional space. In Python, you can calculate it using NumPy or the math.dist() function.<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Using NumPy:<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">import numpy as np<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">point1 = np.array([3, 4])<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">point2 = np.array([6, 8])<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">distance = np.linalg.norm(point1 &#8211; point2)<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">print(distance)\u00a0 # Output: 5.0<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Using math.dist():<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">import math<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">point1 = (3, 4)<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">point2 = (6, 8)<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">distance = math.dist(point1, point2)<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">print(distance)\u00a0 # Output: 5.0<\/p>\r\n\r\n\r\n\r\n<h3 id=\"h-24-how-would-you-detect-bogus-instagram-accounts-used-for-scamming-consumers\" class=\"wp-block-heading\"><strong>24. How would you detect bogus Instagram accounts used for scamming consumers?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Detecting scam Instagram accounts used for scamming consumers involves analyzing behavior patterns, follower ratios, and content irregularities. Here are a few things you can consider to identify them<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Check Profile Activity \u2013 Scammers often have low posts and very high follow counts.<\/li>\r\n\r\n\r\n\r\n<li>Engagement Metrics \u2013 Real users have natural likes\/comments; scammers use automated bots.<\/li>\r\n\r\n\r\n\r\n<li>Profile Picture &amp; Bio Analysis \u2013 Scammers often use stock images or generic bios.<\/li>\r\n\r\n\r\n\r\n<li>Text &amp; Sentiment Analysis \u2013 NLP can detect fake DMs or phishing messages.<\/li>\r\n\r\n\r\n\r\n<li>Graph Analysis \u2013 Analyzing friend connections helps uncover fake networks.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Example:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">A scam account may follow 10,000 users but only receive 50 likes per post. Running an anomaly detection model (e.g., Isolation Forest) can flag such accounts.<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Given a list A of objects and another list B which is identical to A except that one element is removed, find that removed element.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">The most efficient way to solve this problem is using the set difference or the sum difference approach.<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Using set difference:<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">A = [1, 2, 3, 4, 5]<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">B = [1, 2, 4, 5]<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">missing_element = set(A) &#8211; set(B)<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">print(missing_element)\u00a0 # Output: {3}<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Using sum difference:<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">missing_element = sum(A) &#8211; sum(B)<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">print(missing_element)\u00a0 # Output: 3<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">For example:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">If you have a list of items you packed for a trip (A = [shoes, jeans, shirt, hat]) and after unpacking, you check (B = [shoes, jeans, shirt]), the missing item (hat) is found using this approach.<\/p>\r\n\r\n\r\n\r\n<h3 id=\"h-25-what-is-the-purpose-of-the-mapreduce-framework-in-big-data-processing-nbsp\" class=\"wp-block-heading\"><strong>25. What is the purpose of the MapReduce framework in big data processing?\u00a0<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">MapReduce is a framework for processing large datasets in parallel across multiple nodes. It has two key functions:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Map Step: Divides data into smaller chunks and processes them in parallel.<\/li>\r\n\r\n\r\n\r\n<li>Reduce Step: Aggregates the processed data into meaningful results.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Example:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Suppose you have a huge log file containing web traffic data.<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Map Phase: Each server processes a portion of the logs and counts visits per IP.<\/li>\r\n\r\n\r\n\r\n<li>Reduce Phase: The results from all servers are combined to find the total visits per IP.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Code example:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">from collections import Counter<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">logs = [&#8220;192.168.1.1&#8221;, &#8220;192.168.1.2&#8221;, &#8220;192.168.1.1&#8221;]<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">mapped = [(ip, 1) for ip in logs]<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">reduced = Counter(dict(mapped))<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">print(reduced)\u00a0 # Output: {&#8216;192.168.1.1&#8217;: 2, &#8216;192.168.1.2&#8217;: 1}<\/p>\r\n\r\n\r\n\r\n<h3 id=\"h-26-what-are-generative-adversarial-network-gans-and-their-applications-in-data-science\" class=\"wp-block-heading\"><strong>26. What are generative adversarial network (GANs) and their applications in data science?<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">GANs are a type of neural network where two models\u2014the Generator and Discriminator\u2014compete to improve data generation.<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Generator: Creates fake data trying to mimic real data.<\/li>\r\n\r\n\r\n\r\n<li>Discriminator: Tries to differentiate real from fake data.<\/li>\r\n\r\n\r\n\r\n<li>As training progresses, the Generator gets better at creating realistic outputs.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Its applications include:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Image Generation \u2013 You can use GANs for creating AI-generated human faces (e.g., ThisPersonDoesNotExist.com).<\/li>\r\n\r\n\r\n\r\n<li>Super-Resolution \u2013 It can enhance low-quality images.<\/li>\r\n\r\n\r\n\r\n<li>Data Augmentation \u2013 It can create synthetic medical images for better AI training.<\/li>\r\n\r\n\r\n\r\n<li>Fraud Detection \u2013 It helps in detecting deepfakes by training models to distinguish real from fake.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<h2 id=\"h-reasoning-questions-for-data-science-interview-questions\" class=\"wp-block-heading\"><strong>Reasoning questions for data science interview questions<\/strong><\/h2>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Here\u2019s a curated list of technical data science interview questions designed to test both technical knowledge and reasoning ability, spanning statistics, machine learning, programming, and problem-solving:<\/p>\r\n\r\n\r\n\r\n<h3 id=\"h-27-write-a-function-to-calculate-the-euclidean-distance-between-two-points-in-n-dimensional-space-then-optimize-it-for-large-scale-data\" class=\"wp-block-heading\"><strong>27. Write a function to calculate the Euclidean distance between two points in n-dimensional space. Then, optimize it for large-scale data.<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">The Euclidean distance between two points AAA and BBB in n-dimensional space is calculated using the formula:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">d(A,B)=i=1n(Ai-Bi)2<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">For large-scale data, we optimize it using vectorized operations with NumPy instead of looping over dimensions.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Example:<\/strong><strong><br \/><\/strong>Here\u2019s a function to compute Euclidean distance efficiently:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">import numpy as np<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">def euclidean_distance(a, b):<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\u00a0\u00a0\u00a0\u00a0return np.linalg.norm(np.array(a) &#8211; np.array(b))<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"># Example<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">point1 = [3, 4, 5]<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">point2 = [1, 1, 1]<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">print(euclidean_distance(point1, point2))\u00a0 # Output: 5.385<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Optimization for Large-Scale Data:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">For datasets with millions of points, we can use NumPy broadcasting or Scipy\u2019s cdist function:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">from scipy.spatial.distance import cdist<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">def batch_euclidean(A, B):<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\u00a0\u00a0\u00a0\u00a0return cdist(A, B, metric=&#8217;euclidean&#8217;)<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"># Example: Distance between multiple points<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">A = np.array([[1, 2], [3, 4], [5, 6]])<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">B = np.array([[0, 0], [1, 1]])<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">print(batch_euclidean(A, B))\u00a0 # Returns distance matrix<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>How would you implement a binary search algorithm? What are its time and space complexities, and in what scenarios is it preferable to linear search?<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Binary search efficiently finds an element in a sorted list by repeatedly dividing the search space in half.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Implementation:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">def binary_search(arr, target):<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\u00a0\u00a0\u00a0\u00a0left, right = 0, len(arr) &#8211; 1<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\u00a0\u00a0\u00a0\u00a0while left &lt;= right:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0mid = left + (right &#8211; left) \/\/ 2\u00a0 # Prevents overflow in large lists<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0if arr[mid] == target:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0return mid\u00a0 # Target found, return index<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0elif arr[mid] &lt; target:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0left = mid + 1\u00a0 # Search in the right half<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0else:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0right = mid &#8211; 1\u00a0 # Search in the left half<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\u00a0\u00a0\u00a0\u00a0return -1\u00a0 # Target not found<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Time &amp; Space Complexity<\/strong><\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Time Complexity:\r\n<ul class=\"wp-block-list\">\r\n<li><strong>Best case:<\/strong> O(1) (When the target is found at the middle index)<\/li>\r\n\r\n\r\n\r\n<li><strong>Worst\/Average case:<\/strong> O(log\u2061 n) (Since the list is halved at each step)<\/li>\r\n<\/ul>\r\n<\/li>\r\n\r\n\r\n\r\n<li>Space Complexity:\r\n<ul class=\"wp-block-list\">\r\n<li><strong>Iterative version:<\/strong> O(1) (Uses only a few extra variables)<\/li>\r\n\r\n\r\n\r\n<li><strong>Recursive version:<\/strong> O(log \u2061n) (Due to recursive function call stack)<\/li>\r\n<\/ul>\r\n<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>When is it preferable to linear search?<\/strong><\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">You can use Linear Search when:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>The list is unsorted (sorting takes O(n log \u2061n), which is costly).<\/li>\r\n\r\n\r\n\r\n<li>The list is small, where O(n) is not a big issue.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<figure class=\"wp-block-table\">\r\n<table class=\"has-fixed-layout\">\r\n<tbody>\r\n<tr>\r\n<td>Scenario<\/td>\r\n<td>Binary Search<\/td>\r\n<td>Linear Search<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Sorted Data<\/td>\r\n<td>Yes<\/td>\r\n<td>No<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Small Data (few elements)<\/td>\r\n<td>No<\/td>\r\n<td>Yes<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Large Data (millions of elements)<\/td>\r\n<td>Yes<\/td>\r\n<td>No<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Dynamic list<\/td>\r\n<td>No<\/td>\r\n<td>Yes<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/figure>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Example Comparison:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Searching for a name in a sorted phonebook? Use Binary Search<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Finding a rare letter in an unsorted paragraph? Use Linear Search<\/p>\r\n\r\n\r\n\r\n<h3 id=\"h-28-given-two-tables-orders-order-id-customer-id-amount-and-customers-customer-id-signup-date-write-a-query-to-find-the-average-order-amount-for-customers-who-signed-up-in-2023\" class=\"wp-block-heading\"><strong>28. Given two tables, orders (order_id, customer_id, amount) and customers (customer_id, signup_date), write a query to find the average order amount for customers who signed up in 2023.<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">We need to join the orders table with the customers table based on customer_id and filter customers who signed up in 2023.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">SQL Query<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">SELECT AVG(o.amount) AS avg_order_amount<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">FROM orders o<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">JOIN customers c ON o.customer_id = c.customer_id<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">WHERE c.signup_date BETWEEN &#8216;2023-01-01&#8217; AND &#8216;2023-12-31&#8217;;<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Explanation<\/strong><\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>JOIN customers ON o.customer_id = c.customer_id \u2192 Links both tables<\/li>\r\n\r\n\r\n\r\n<li>WHERE c.signup_date BETWEEN &#8216;2023-01-01&#8217; AND &#8216;2023-12-31&#8217; \u2192 Filters customers from 2023<\/li>\r\n\r\n\r\n\r\n<li>AVG(o.amount) \u2192 Computes the average order value<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Example Dataset<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Orders Table<\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-table\">\r\n<table class=\"has-fixed-layout\">\r\n<tbody>\r\n<tr>\r\n<td><strong>order_id<\/strong><\/td>\r\n<td><strong>customer_id<\/strong><\/td>\r\n<td><strong>amount<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>101<\/td>\r\n<td>1<\/td>\r\n<td>200<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>102<\/td>\r\n<td>2<\/td>\r\n<td>150<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>103<\/td>\r\n<td>3<\/td>\r\n<td>300<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/figure>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Customers Table<\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-table\">\r\n<table class=\"has-fixed-layout\">\r\n<tbody>\r\n<tr>\r\n<td>customer_id<\/td>\r\n<td>signup_date<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>1<\/td>\r\n<td>2023-02-10<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>2<\/td>\r\n<td>2022-11-15<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>3<\/td>\r\n<td>2023-06-01<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/figure>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">For this dataset, the average order amount for 2023 customers would be (200 + 300) \/ 2 = 250.<\/p>\r\n\r\n\r\n\r\n<h3 id=\"h-29-how-would-you-handle-missing-data-in-a-dataset-discuss-pros-cons-of-methods-like-mean-imputation-k-nn-imputation-or-deletion\" class=\"wp-block-heading\"><strong>29. How would you handle missing data in a dataset? Discuss pros\/cons of methods like mean imputation, k-NN imputation, or deletion.<\/strong><\/h3>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">To handle missing data, you can use mean imputation, k-NN imputation, or deletion.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Mean imputation replaces missing values with the column\u2019s average, which is simple but may reduce accuracy. k-NN imputation finds similar data points to estimate missing values, making it more precise but slower.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Deletion removes incomplete rows or columns, which is easy but may lose important data. The best method depends on the dataset\u2019s size and completeness.<\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-table\">\r\n<table class=\"has-fixed-layout\">\r\n<tbody>\r\n<tr>\r\n<td>Method<\/td>\r\n<td>Pros<\/td>\r\n<td>Cons<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Deletion (Drop Rows\/Columns)<\/td>\r\n<td>Works well for small missing data<\/td>\r\n<td>Losses valuable info if many rows are dropped<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Mean\/Median Imputation<\/td>\r\n<td>Simple and quick, keeps all data<\/td>\r\n<td>It can distort data distributionDoesn\u2019t work well when data has outliers<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>K-NN Imputation<\/td>\r\n<td>More accurate than Mean Imputation, it preserves relationships<\/td>\r\n<td>Slow for large datasets<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/figure>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Example:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Given this dataset:<\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-table\">\r\n<table class=\"has-fixed-layout\">\r\n<tbody>\r\n<tr>\r\n<td>Age<\/td>\r\n<td>Salary<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>25<\/td>\r\n<td>50000<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>30<\/td>\r\n<td>NaN<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>35<\/td>\r\n<td>80000<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/figure>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Mean Imputation:<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Replace missing salary with the mean of available salaries.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Given dataset:<\/strong><\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-table\">\r\n<table class=\"has-fixed-layout\">\r\n<tbody>\r\n<tr>\r\n<td><strong>Age<\/strong><\/td>\r\n<td><strong>Salary<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>25<\/td>\r\n<td>50000<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>30<\/td>\r\n<td>NaN<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>35<\/td>\r\n<td>80000<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/figure>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Implementation<\/strong>:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">import pandas as pd<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">df = pd.DataFrame({&#8216;Age&#8217;: [25, 30, 35], &#8216;Salary&#8217;: [50000, None, 80000]})<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">df[&#8216;Salary&#8217;].fillna(df[&#8216;Salary&#8217;].mean(), inplace=True)<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">print(df)<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Output<\/strong>:<\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-table\">\r\n<table class=\"has-fixed-layout\">\r\n<tbody>\r\n<tr>\r\n<td>ID<\/td>\r\n<td>Age<\/td>\r\n<td>Salary<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>1<\/td>\r\n<td>25<\/td>\r\n<td>50000<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>2<\/td>\r\n<td>30<\/td>\r\n<td>65000<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>3<\/td>\r\n<td>35<\/td>\r\n<td>80000<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/figure>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>K-NN Imputation:<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Fills missing values based on similar data points.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Given Dataset<\/strong><\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-table\">\r\n<table class=\"has-fixed-layout\">\r\n<tbody>\r\n<tr>\r\n<td>Age<\/td>\r\n<td>Salary<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>25<\/td>\r\n<td>50000<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>30<\/td>\r\n<td>NaN<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>35<\/td>\r\n<td>80000<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/figure>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Implementation<\/strong><\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">from sklearn.impute import KNNImputer<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">import numpy as np<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">data = np.array([[25, 50000], [30, np.nan], [35, 80000]])<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">imputer = KNNImputer(n_neighbors=2)<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">data_imputed = imputer.fit_transform(data)<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">print(data_imputed)<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Output<\/strong>:<\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-table\">\r\n<table class=\"has-fixed-layout\">\r\n<tbody>\r\n<tr>\r\n<td>ID<\/td>\r\n<td>Age<\/td>\r\n<td>Salary<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>1<\/td>\r\n<td>25<\/td>\r\n<td>50000<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>2<\/td>\r\n<td>30<\/td>\r\n<td>65000<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>3<\/td>\r\n<td>35<\/td>\r\n<td>80000<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/figure>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Deletion (Dropping rows or columns):<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Given dataset:<\/strong><\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-table\">\r\n<table class=\"has-fixed-layout\">\r\n<tbody>\r\n<tr>\r\n<td>ID<\/td>\r\n<td>Age<\/td>\r\n<td>Salary<\/td>\r\n<td>Department<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>1<\/td>\r\n<td>25<\/td>\r\n<td>50000<\/td>\r\n<td>HR<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>2<\/td>\r\n<td>30<\/td>\r\n<td>NaN<\/td>\r\n<td>IT<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>3<\/td>\r\n<td>35<\/td>\r\n<td>80000<\/td>\r\n<td>NaN<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/figure>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Implementation:<\/strong><\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">import pandas as pd<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"># Create DataFrame<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">df = pd.DataFrame({<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\u00a0\u00a0\u00a0\u00a0&#8216;ID&#8217;: [1, 2, 3],<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\u00a0\u00a0\u00a0\u00a0&#8216;Age&#8217;: [25, 30, 35],<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\u00a0\u00a0\u00a0\u00a0&#8216;Salary&#8217;: [50000, None, 80000],<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">\u00a0\u00a0\u00a0\u00a0&#8216;Department&#8217;: [&#8216;HR&#8217;, &#8216;IT&#8217;, None]<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">})<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"># Drop rows with missing values<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">df_cleaned = df.dropna()<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">print(df_cleaned)<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>Output:<\/strong><\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-table\">\r\n<table class=\"has-fixed-layout\">\r\n<tbody>\r\n<tr>\r\n<td>ID<\/td>\r\n<td>Age<\/td>\r\n<td>Salary<\/td>\r\n<td>Department<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>1<\/td>\r\n<td>25<\/td>\r\n<td>50,000<\/td>\r\n<td>HR<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/figure>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Since rows 2 and 3 have NaN values, they are removed.\u00a0<\/p>\r\n\r\n\r\n\r\n<h2 id=\"h-why-enroll-in-our-data-science-course\" class=\"wp-block-heading\"><strong>Why Enroll in our data science course?<\/strong><\/h2>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Want a high-paying job in tech? Our Data Science Course in Hyderabad gives you the skills to work with data, build AI models, and solve real-world problems.\u00a0<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>6 months of expert training\u00a0<\/li>\r\n\r\n\r\n\r\n<li>Hands-on projects (like rain prediction &amp; chatbots)\u00a0<\/li>\r\n\r\n\r\n\r\n<li>300 hours of live classes\u00a0<\/li>\r\n\r\n\r\n\r\n<li>Job placement support\u00a0<\/li>\r\n\r\n\r\n\r\n<li>No prior coding needed Join 2,700+ students who landed great jobs!\u00a0<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Here you can check our recent <a href=\"https:\/\/www.placements.codegnan.com\/\"><strong>student placements at Codegnan<\/strong><\/a>.<\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-49526 size-full\" src=\"https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/placements-page.png\" alt=\"\" width=\"1912\" height=\"912\" srcset=\"https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/placements-page.png 1912w, https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/placements-page-300x143.png 300w, https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/placements-page-1024x488.png 1024w, https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/placements-page-768x366.png 768w, https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/placements-page-1536x733.png 1536w, https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/placements-page-595xh.png 595w\" sizes=\"(max-width: 1912px) 100vw, 1912px\" \/><\/figure>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Whether you&#8217;re a beginner or an IT professional looking to upskill, our flexible learning options and 100% placement assistance make us the perfect choice. Secure a high-paying career in data science\u2014<a href=\"https:\/\/codegnan.com\/contact-us\/\"><strong>contact today<\/strong><\/a>.<\/p>\r\n","protected":false},"excerpt":{"rendered":"<p>Getting ready for a data science interview?\u00a0 You\u2019ll face data science interview questions about coding, statistics, machine learning, and real-world problem-solving.\u00a0 In this guide, we\u2019ll break down common data science interview questions and how to answer them. Whether you&#8217;re a beginner or an expert, these tips will help you ace your next interview! \ud83d\udca1 Want [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":21395,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14],"tags":[],"class_list":["post-21392","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v23.3 (Yoast SEO v27.9) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>29 Data Science Interview Questions - Codegnan<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/codegnan.com\/data-science-interview-questions\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"29 Data Science Interview Questions\" \/>\n<meta property=\"og:description\" content=\"Getting ready for a data science interview?\u00a0 You\u2019ll face data science interview questions about coding, statistics, machine learning, and real-world problem-solving.\u00a0 In this guide, we\u2019ll break down common data science interview questions and how to answer them. Whether you&#8217;re a beginner or an expert, these tips will help you ace your next interview! \ud83d\udca1 Want [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/codegnan.com\/data-science-interview-questions\/\" \/>\n<meta property=\"og:site_name\" content=\"Codegnan\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/codegnan\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-02-27T06:10:36+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-06-25T08:52:29+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/Data-Science-Interview-Questions.png\" \/>\n\t<meta property=\"og:image:width\" content=\"900\" \/>\n\t<meta property=\"og:image:height\" content=\"400\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Sairam Uppugundla\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@codegnandotcom\" \/>\n<meta name=\"twitter:site\" content=\"@codegnandotcom\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Sairam Uppugundla\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"21 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/codegnan.com\\\/data-science-interview-questions\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/codegnan.com\\\/data-science-interview-questions\\\/\"},\"author\":{\"name\":\"Sairam Uppugundla\",\"@id\":\"https:\\\/\\\/codegnan.com\\\/#\\\/schema\\\/person\\\/510a2ce6cfa80a9688733994fe67da52\"},\"headline\":\"29 Data Science Interview Questions\",\"datePublished\":\"2025-02-27T06:10:36+00:00\",\"dateModified\":\"2026-06-25T08:52:29+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/codegnan.com\\\/data-science-interview-questions\\\/\"},\"wordCount\":4310,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/codegnan.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/codegnan.com\\\/data-science-interview-questions\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/codegnan.com\\\/wp-content\\\/uploads\\\/2025\\\/02\\\/Data-Science-Interview-Questions.png\",\"articleSection\":[\"Data Science\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/codegnan.com\\\/data-science-interview-questions\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/codegnan.com\\\/data-science-interview-questions\\\/\",\"url\":\"https:\\\/\\\/codegnan.com\\\/data-science-interview-questions\\\/\",\"name\":\"29 Data Science Interview Questions - Codegnan\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/codegnan.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/codegnan.com\\\/data-science-interview-questions\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/codegnan.com\\\/data-science-interview-questions\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/codegnan.com\\\/wp-content\\\/uploads\\\/2025\\\/02\\\/Data-Science-Interview-Questions.png\",\"datePublished\":\"2025-02-27T06:10:36+00:00\",\"dateModified\":\"2026-06-25T08:52:29+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/codegnan.com\\\/data-science-interview-questions\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/codegnan.com\\\/data-science-interview-questions\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/codegnan.com\\\/data-science-interview-questions\\\/#primaryimage\",\"url\":\"https:\\\/\\\/codegnan.com\\\/wp-content\\\/uploads\\\/2025\\\/02\\\/Data-Science-Interview-Questions.png\",\"contentUrl\":\"https:\\\/\\\/codegnan.com\\\/wp-content\\\/uploads\\\/2025\\\/02\\\/Data-Science-Interview-Questions.png\",\"width\":900,\"height\":400,\"caption\":\"Data Science Interview Questions\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/codegnan.com\\\/data-science-interview-questions\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/codegnan.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data Science\",\"item\":\"https:\\\/\\\/codegnan.com\\\/category\\\/data-science\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"29 Data Science Interview Questions\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/codegnan.com\\\/#website\",\"url\":\"https:\\\/\\\/codegnan.com\\\/\",\"name\":\"Codegnan\",\"description\":\"Where Talent Meets Opportunity\",\"publisher\":{\"@id\":\"https:\\\/\\\/codegnan.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/codegnan.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/codegnan.com\\\/#organization\",\"name\":\"Codegnan\",\"url\":\"https:\\\/\\\/codegnan.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/codegnan.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/codegnan.com\\\/wp-content\\\/uploads\\\/2023\\\/05\\\/Codegnan-New-Logo.png\",\"contentUrl\":\"https:\\\/\\\/codegnan.com\\\/wp-content\\\/uploads\\\/2023\\\/05\\\/Codegnan-New-Logo.png\",\"width\":420,\"height\":102,\"caption\":\"Codegnan\"},\"image\":{\"@id\":\"https:\\\/\\\/codegnan.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/codegnan\\\/\",\"https:\\\/\\\/x.com\\\/codegnandotcom\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/codegnan\",\"https:\\\/\\\/www.instagram.com\\\/codegnan\\\/\",\"https:\\\/\\\/t.me\\\/codegnan\",\"https:\\\/\\\/www.youtube.com\\\/@Codegnan\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/codegnan.com\\\/#\\\/schema\\\/person\\\/510a2ce6cfa80a9688733994fe67da52\",\"name\":\"Sairam Uppugundla\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/fb72f4f7eb256ddd452b9939d321540cd244487ff7bb982f98e750e2df959cb6?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/fb72f4f7eb256ddd452b9939d321540cd244487ff7bb982f98e750e2df959cb6?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/fb72f4f7eb256ddd452b9939d321540cd244487ff7bb982f98e750e2df959cb6?s=96&d=mm&r=g\",\"caption\":\"Sairam Uppugundla\"},\"description\":\"Sairam Uppugunda is the Founder of Codegnan and a technology educator with expertise in Software Development, Python Programming, Data Science, Artificial Intelligence, Full Stack Development, and emerging IT technologies. He has mentored thousands of engineering students and fresh graduates by focusing on practical learning, real-time projects, coding skills, and industry-ready training. Through Codegnan, he actively helps students build successful careers in technology by sharing insights on career guidance, software industry trends, placement preparation, and in-demand skills required for modern tech jobs in Hyderabad, Vijayawada, and other growing IT hubs.\",\"sameAs\":[\"https:\\\/\\\/codegnan.com\"],\"url\":\"https:\\\/\\\/codegnan.com\\\/author\\\/sairam\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"29 Data Science Interview Questions - Codegnan","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/codegnan.com\/data-science-interview-questions\/","og_locale":"en_US","og_type":"article","og_title":"29 Data Science Interview Questions","og_description":"Getting ready for a data science interview?\u00a0 You\u2019ll face data science interview questions about coding, statistics, machine learning, and real-world problem-solving.\u00a0 In this guide, we\u2019ll break down common data science interview questions and how to answer them. Whether you&#8217;re a beginner or an expert, these tips will help you ace your next interview! \ud83d\udca1 Want [&hellip;]","og_url":"https:\/\/codegnan.com\/data-science-interview-questions\/","og_site_name":"Codegnan","article_publisher":"https:\/\/www.facebook.com\/codegnan\/","article_published_time":"2025-02-27T06:10:36+00:00","article_modified_time":"2026-06-25T08:52:29+00:00","og_image":[{"width":900,"height":400,"url":"https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/Data-Science-Interview-Questions.png","type":"image\/png"}],"author":"Sairam Uppugundla","twitter_card":"summary_large_image","twitter_creator":"@codegnandotcom","twitter_site":"@codegnandotcom","twitter_misc":{"Written by":"Sairam Uppugundla","Est. reading time":"21 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/codegnan.com\/data-science-interview-questions\/#article","isPartOf":{"@id":"https:\/\/codegnan.com\/data-science-interview-questions\/"},"author":{"name":"Sairam Uppugundla","@id":"https:\/\/codegnan.com\/#\/schema\/person\/510a2ce6cfa80a9688733994fe67da52"},"headline":"29 Data Science Interview Questions","datePublished":"2025-02-27T06:10:36+00:00","dateModified":"2026-06-25T08:52:29+00:00","mainEntityOfPage":{"@id":"https:\/\/codegnan.com\/data-science-interview-questions\/"},"wordCount":4310,"commentCount":0,"publisher":{"@id":"https:\/\/codegnan.com\/#organization"},"image":{"@id":"https:\/\/codegnan.com\/data-science-interview-questions\/#primaryimage"},"thumbnailUrl":"https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/Data-Science-Interview-Questions.png","articleSection":["Data Science"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/codegnan.com\/data-science-interview-questions\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/codegnan.com\/data-science-interview-questions\/","url":"https:\/\/codegnan.com\/data-science-interview-questions\/","name":"29 Data Science Interview Questions - Codegnan","isPartOf":{"@id":"https:\/\/codegnan.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/codegnan.com\/data-science-interview-questions\/#primaryimage"},"image":{"@id":"https:\/\/codegnan.com\/data-science-interview-questions\/#primaryimage"},"thumbnailUrl":"https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/Data-Science-Interview-Questions.png","datePublished":"2025-02-27T06:10:36+00:00","dateModified":"2026-06-25T08:52:29+00:00","breadcrumb":{"@id":"https:\/\/codegnan.com\/data-science-interview-questions\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/codegnan.com\/data-science-interview-questions\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/codegnan.com\/data-science-interview-questions\/#primaryimage","url":"https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/Data-Science-Interview-Questions.png","contentUrl":"https:\/\/codegnan.com\/wp-content\/uploads\/2025\/02\/Data-Science-Interview-Questions.png","width":900,"height":400,"caption":"Data Science Interview Questions"},{"@type":"BreadcrumbList","@id":"https:\/\/codegnan.com\/data-science-interview-questions\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/codegnan.com\/"},{"@type":"ListItem","position":2,"name":"Data Science","item":"https:\/\/codegnan.com\/category\/data-science\/"},{"@type":"ListItem","position":3,"name":"29 Data Science Interview Questions"}]},{"@type":"WebSite","@id":"https:\/\/codegnan.com\/#website","url":"https:\/\/codegnan.com\/","name":"Codegnan","description":"Where Talent Meets Opportunity","publisher":{"@id":"https:\/\/codegnan.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/codegnan.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/codegnan.com\/#organization","name":"Codegnan","url":"https:\/\/codegnan.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/codegnan.com\/#\/schema\/logo\/image\/","url":"https:\/\/codegnan.com\/wp-content\/uploads\/2023\/05\/Codegnan-New-Logo.png","contentUrl":"https:\/\/codegnan.com\/wp-content\/uploads\/2023\/05\/Codegnan-New-Logo.png","width":420,"height":102,"caption":"Codegnan"},"image":{"@id":"https:\/\/codegnan.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/codegnan\/","https:\/\/x.com\/codegnandotcom","https:\/\/www.linkedin.com\/company\/codegnan","https:\/\/www.instagram.com\/codegnan\/","https:\/\/t.me\/codegnan","https:\/\/www.youtube.com\/@Codegnan"]},{"@type":"Person","@id":"https:\/\/codegnan.com\/#\/schema\/person\/510a2ce6cfa80a9688733994fe67da52","name":"Sairam Uppugundla","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/fb72f4f7eb256ddd452b9939d321540cd244487ff7bb982f98e750e2df959cb6?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/fb72f4f7eb256ddd452b9939d321540cd244487ff7bb982f98e750e2df959cb6?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/fb72f4f7eb256ddd452b9939d321540cd244487ff7bb982f98e750e2df959cb6?s=96&d=mm&r=g","caption":"Sairam Uppugundla"},"description":"Sairam Uppugunda is the Founder of Codegnan and a technology educator with expertise in Software Development, Python Programming, Data Science, Artificial Intelligence, Full Stack Development, and emerging IT technologies. He has mentored thousands of engineering students and fresh graduates by focusing on practical learning, real-time projects, coding skills, and industry-ready training. Through Codegnan, he actively helps students build successful careers in technology by sharing insights on career guidance, software industry trends, placement preparation, and in-demand skills required for modern tech jobs in Hyderabad, Vijayawada, and other growing IT hubs.","sameAs":["https:\/\/codegnan.com"],"url":"https:\/\/codegnan.com\/author\/sairam\/"}]}},"_links":{"self":[{"href":"https:\/\/codegnan.com\/wp-json\/wp\/v2\/posts\/21392","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/codegnan.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/codegnan.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/codegnan.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/codegnan.com\/wp-json\/wp\/v2\/comments?post=21392"}],"version-history":[{"count":3,"href":"https:\/\/codegnan.com\/wp-json\/wp\/v2\/posts\/21392\/revisions"}],"predecessor-version":[{"id":49527,"href":"https:\/\/codegnan.com\/wp-json\/wp\/v2\/posts\/21392\/revisions\/49527"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/codegnan.com\/wp-json\/wp\/v2\/media\/21395"}],"wp:attachment":[{"href":"https:\/\/codegnan.com\/wp-json\/wp\/v2\/media?parent=21392"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/codegnan.com\/wp-json\/wp\/v2\/categories?post=21392"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/codegnan.com\/wp-json\/wp\/v2\/tags?post=21392"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}