Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Articles
Page 145 of 855
Drop duplicate rows in PySpark DataFrame
PySpark is a Python API for Apache Spark, designed to process large-scale data in real-time with distributed computing capabilities. Unlike regular DataFrames, PySpark DataFrames distribute data across clusters and follow a strict schema for optimized processing. In this article, we'll explore different methods to drop duplicate rows from PySpark DataFrames using distinct() and dropDuplicates() functions. Installation Install PySpark using pip ? pip install pyspark Creating a PySpark DataFrame First, let's create a sample DataFrame with duplicate rows to demonstrate the deduplication methods ? from pyspark.sql import SparkSession import pandas as ...
Read MoreDrop columns in DataFrame by label Names or by Index Positions
A pandas DataFrame is a 2D data structure for storing tabular data. When working with DataFrames, you often need to remove unwanted columns. This can be done by specifying column names or their index positions using the drop() method. In this tutorial, we'll explore different methods to drop columns from a pandas DataFrame including dropping by names, index positions, and ranges. Creating the Sample DataFrame Let's start by creating a sample DataFrame to work with ? import pandas as pd dataset = { "Employee ID": ["CIR45", "CIR12", "CIR18", "CIR50", "CIR28"], ...
Read MoreDrop a list of rows from a Pandas DataFrame
The pandas library in Python is widely popular for representing data in tabular structures called DataFrames. When working with data analysis, you often need to remove specific rows from your DataFrame. This article demonstrates three effective methods for dropping multiple rows from a Pandas DataFrame. Creating a Sample DataFrame Let's start by creating a DataFrame with student marks data ? import pandas as pd dataset = { "Aman": [98, 92, 88, 90, 91], "Raj": [78, 62, 90, 71, 45], "Saloni": [82, ...
Read MoreHow to Locate Elements using Selenium Python?
Selenium is a powerful web automation tool that can be used with Python to locate and extract elements from web pages. This is particularly useful for web scraping, testing, and automating browser interactions. In this tutorial, we'll explore different methods to locate HTML elements using Selenium with Python. Setting Up Selenium Before locating elements, you need to set up Selenium with a WebDriver. Here's a basic setup ? from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.chrome.service import Service import time # Setup Chrome driver driver = webdriver.Chrome() driver.get("https://example.com") time.sleep(2) # Always close ...
Read MoreHow to iterate through a nested List in Python?
A nested list in Python is a list that contains other lists as elements. Iterating through nested lists requires different approaches depending on the structure and your specific needs. What is a Nested List? Here are common examples of nested lists ? # List with mixed data types people = [["Alice", 25, ["New York", "NY"]], ["Bob", 30, ["Los Angeles", "CA"]], ["Carol", 28, ["Chicago", "IL"]]] # 3-dimensional nested list matrix = [ ...
Read MoreHow to invert the elements of a boolean array in Python?
Boolean array inversion is a common operation when working with data that contains True/False values. Python offers several approaches to invert boolean arrays using NumPy functions like np.invert(), the bitwise operator ~, or np.logical_not(). Using NumPy's invert() Function The np.invert() function performs bitwise NOT operation on boolean arrays ? import numpy as np # Create a boolean array covid_negative = np.array([True, False, True, False, True]) print("Original array:", covid_negative) # Invert using np.invert() covid_positive = np.invert(covid_negative) print("Inverted array:", covid_positive) Original array: [ True False True False True] Inverted array: [False ...
Read MoreHow to Make a Bell Curve in Python?
A bell curve (normal distribution) is a fundamental concept in statistics that appears when we plot many random observations. Python's Plotly library provides excellent tools for creating these visualizations. This article demonstrates three practical methods to create bell curves using different datasets. Understanding Bell Curves The normal distribution emerges naturally when averaging many observations. For example, rolling two dice and summing their values creates a bell-shaped pattern — the sum of 7 occurs most frequently, while extreme values (2 or 12) are rare. Example 1: Bell Curve from Dice Roll Simulation Let's simulate 2000 dice rolls ...
Read MoreWhat are the limitations of Python?
Python is a popular and widely used programming language known for its simplicity, flexibility, and productivity. It excels in web development, data science, automation, and machine learning. However, like any programming language, Python has certain limitations that developers should consider when choosing it for their projects. Performance and Speed Limitations Python is an interpreted language that executes code at runtime through a virtual machine or interpreter. This makes it significantly slower than compiled languages like C or C++. import time # Python's interpreted nature makes operations slower start = time.time() result = sum(range(1000000)) end = ...
Read MorePositive and negative indices in Python?
Python sequences like lists, tuples, and strings support two types of indexing: positive indexing (starting from 0) and negative indexing (starting from -1). This tutorial explains both approaches with practical examples. What Are Sequence Indexes? Indexing allows us to access individual elements in Python sequence data types. There are two types: Positive indexing − Starts from 0 and increases to n-1 (where n is the total number of elements) Negative indexing − Starts from -1 (last element) and moves backwards to -n List: [10, 20, 30, 40, 50] ...
Read MoreWhat are the different types of Python data analysis libraries used?
Python has established itself as the leading language for data science, consistently ranking first in industry surveys. Its success comes from combining an easy-to-learn, object-oriented syntax with specialized libraries for every data science task − from mathematical computations to data visualization. Core Data Science Libraries NumPy NumPy (Numerical Python) forms the foundation of Python's data science ecosystem. It provides efficient arrays and mathematical functions for numerical computing ? import numpy as np # Creating arrays and basic operations data = np.array([1, 2, 3, 4, 5]) print("Array:", data) print("Mean:", np.mean(data)) print("Standard deviation:", np.std(data)) ...
Read More