Python 3 adoption has grown exponentially over the last few years. An impressive 88% of Python developers now utilize Python 3 according to JetBrains‘ Python Developers Survey. Python 3 introduces key language features, security upgrades and performance improvements that enable building robust enterprise-grade applications.

However, several major distros still only package older Python 2 versions by default. CentOS 7 ships with Python 2.7, which reached its end-of-life in 2020 and is no longer receiving updates. Migrating to Python 3 on RHEL-based systems provides access to the latest language and package ecosystem innovations.

In this comprehensive 2600+ word guide, we will go through the entire process of installing the latest Python 3 releases on CentOS 7 for running high-performance Python applications.

Table of Contents

  • Prerequisites
  • Adding the IUS Community Repository
  • Installing Python 3.8
  • Setting up Virtual Environments
  • Python Package Management
  • Migrating from Python 2 to Python 3
  • Profiling and Optimizing Python Code
  • Multi-threading and Multi-Processing
  • Conclusion

Prerequisites

Let‘s first ensure your CentOS system is fully updated before installing Python. This upgrades all pre-installed tools and libraries to their latest secure patches:

$ sudo yum update -y

The -y flag automates confirming any prompts that may interrupt the update process.

Adding the IUS Community Repository

IUS or Inline Upstream Stable packages provide newer versions of Python, PHP, MySQL and other tools for CentOS/RHEL systems. The IUS repository maintains high compatibility with the platform conventions and stability characteristics expected of Enterprise Linux distributions.

Enable the IUS repository:

$ sudo yum install -y https://repo.ius.io/ius-release-el7.rpm

This imports the IUS GPG key and configurable .repo file onto your system located at /etc/yum.repos.d/ius.repo.

Now refresh your package index to load the IUS packages:

$ sudo yum makecache fast

The fast makes yum only refresh metadata cached locally. This reduces download bandwidth consumption if the cache is still relatively recent.

According to IUS package maintainers, their Python 3 builds link against OpenSSL 1.1.1 which fixes several security issues from older OpenSSL versions. This enables packaging Python libraries that rely on newer OpenSSL releases.

Installing Python 3.8

IUS currently provides Python 3.6, 3.7 and 3.8 packages for CentOS 7. We will install Python 3.8, which as of 2022 is the latest stable Python 3 release receiving ongoing support and updates.

Run the below command to install Python 3.8:

$ sudo yum install -y python38

The -y flag automatically confirms and installs the 112 MB python38 package with all its dependencies including pip, certifi, setuptools, wheel and other Python tools.

Let‘s verify that Python 3.8 was successfully installed:

$ python3.8 --version
Python 3.8.0

By default, the system python command still points to the legacy Python 2.7 runtime. We need to specifically call python3.8 binary to invoke our new Python 3.8 installation.

Modern Python applications rely heavily on third-party packages from PyPI and need pip for managing dependencies. Confirm pip is installed for our Python 3 environment:

$ pip3.8 --version
pip 20.3.4 from /usr/lib/python3.8/site-packages/pip (python 3.8)

With pip, we can now install PyPI packages built for Python 3.8.

Setting up Virtual Environments

It is highly recommended to isolate Python projects using virtual environments. Virtualenvs enable installing separate sets of packages and their own Python versions per project. This prevents conflicts between incompatible dependencies and runtimes amongst our projects.

Let‘s create and activate a new virtualenv named py38env for our Python 3.8 installation:

$ python3.8 -m venv py38env
$ source py38env/bin/activate

This spawns an isolated shell with the virtualenv enabled. We can verify we are now in the virtualenv:

(py38env) $ python -V 
Python 3.8.0

Any packages installed now with pip will be specific to this virtualenv alone in py38env/lib/python3.8/site-packages/.

When done working on our project, we can simply deactivate the virtualenv:

(py38env) $ deactivate  

This brings us back to the global shell environment.

Python Package Management

The Python Packaging Guide recommends leveraging virtual environments in combination with either pip and requirements.txt or an abstracted dependency management tool for managing complex package installations.

Pip and requirements.txt

pip freeze outputs all packages installed in the current virtualenv along with their pinned versions into a requirements.txt file.

For example, after installing the SciPy stack:

(py38env) $ pip install numpy scipy pandas matplotlib 
(py38env) $ pip freeze > requirements.txt

The generated requirements.txt locks down exact package versions reproducibility across different environments.

We can install from this known working set of dependencies on another system:

(py38env) $ pip install -r requirements.txt

However, manually tracking dependencies and versions across multiple projects can prove difficult. This is where more advanced dependency management tools come in handy.

Poetry

Poetry takes inspiration from modern JavaScript package managers like npm and Yarn. It aims to simplify and streamline dependency handling for Python projects.

Some of its core capabilities include:

  • Virtualenv management
  • Automated resolution of multi-dependency graphs
  • Building and publishing packages to PyPI
  • Versioning and locking of direct + transitive dependencies
  • Integrations with popular IDEs/editors

Install Poetry using pip:

$ pipx install poetry

pipx ensures Poetry gets isolated into its own virtualenv to avoid conflicting with system packages.

Now Poetry can instantiate fully self-contained Python projects complete with virtualenv initialization, dependency declarations and build scripts:

$ poetry new my-project
$ cd my-project

The project includes a configurable pyproject.toml file for managing dependencies:

[tool.poetry]
name = "my-project"
version = "0.1.0"
description = ""
authors = ["Your Name <you@example.com>"]

[tool.poetry.dependencies]
python = ">=3.8,<3.9"
pandas = "^1.4"

[tool.poetry.dev-dependencies]
pytest = "^7.0"

We can require specific packages like Pandas in the [dependencies] section. Poetry will resolve and fetch all versions compatible with this constraint. The [dev-dependencies] section enables declaring development/test tools that don‘t get packaged with the application.

Poetry vastly simplifies package management and releasing Python projects. The Poetry documentation covers many more advanced features like publishing, scripting and integration with CI/CD pipelines.

Pipenv

Pipenv is another popular tool that aims to combine the best of pip, virtualenv and requirements.txt into one workflow. It enables similar workflow orchestration capabilities right from your shell:

$ pipenv install requests
$ pipenv run python main.py
$ pipenv lock -r > requirements.txt
$ pipenv --venv #show virtualenv location  

Key highlights include:

  • Automatically creates and manages a virtualenv for projects
  • Generates Pipfile containing pinned package dependencies
  • ‘Locks‘ dependencies to create reproducible Pipfile.lock
  • Integrates seamlessly with pip and GitHub workflow

Explore the detailed Pipenv documentation to learn more on development workflows. The tools documentation covers edge cases around managing security vulnerabilities, CI/CD integration, Python 2 to 3 migration and more.

In summary, Poetry and Pipenv both provide complete workflow orchestration and dependency management for end-to-end Python project development while still integrating with familiar tools like pip.`

Migrating Python 2 Code to Python 3

Given Python 2 has reached end-of-life since 2020, it is recommended to migrate legacy Python 2 codebases to Python 3 for improved security and access to latest libraries. Some key considerations when porting code:

Updating import statements

Certain Python modules got renamed or consolidated in Python 3 – these import statements need updating:

# Python 2
import CGIHTTPServer
import ConfigParser

# Python 3 
import http.server
import configparser

Handling text vs binary data

Python 3 does a stricter check distinguishing between text and binary data. Code dealing with binary data like reading/writing files may need tweaks:

## Python 2
with open(file_path, ‘w‘) as f:
    f.write(b‘Hello world!‘)

## Python 3
with open(file_path, ‘wb‘) as f:
    f.write(b‘Hello world!‘) 

Note the added ‘b‘ flag in opening binary files.

Division operator changes

Division changed behaviour in Python 3 for consistency. Adding from __future__ import division enables Python 2 style float division while porting:

## Python 2
1 / 2 = 0.5

## Python 3
1 / 2 = 0.5  # previously integer division
from __future__ import division
1 / 2 = 0.5 # float division like py2 

Handling strings vs bytes

Python 3 has a distinct bytes type while Python 2 strings acted as both text and binary data. Some tweaks required:

## Python 2
my_str = ‘\x34\x55\x67‘ # works on text and binary 

## Python 3
my_bytes = b‘\x34\x55\x67‘ # for binary data
my_str = ‘\x34\x55\x67‘ # for textual data

There are a few other changes around tuples, exception handling, types of ranges and IO that may come up less frequently.

Overall the core Python language remains the same, with the migration changes being mostly additions around type safety and tooling improvements. Picking a few files or projects to run through Python 2 to 3 converters helps identify any parts of your codebase needing targeted migrations.

Profiling and Optimizing Python Code

While Python provides simplicity and faster development speeds, it comes at a cost of performance. Python generally runs an order of magnitude slower than compiled statically typed languages. However, we can utilize some techniques to profile and optimize compute or IO heavy Python applications:

Prerequisites

Ensure you install Python 3.8 with -O enabled:

$ sudo yum install -y python38-devel

The -O flag explicitly enables optimizations in the Python compile step improving runtime performance.

Benchmarking with timeit

Python includes the timeit module to precisely measure snippet execution times.

For example, comparing two methods:

import timeit

def concat_str(items):
    result = ""
    for item in items:
        result += item
    return result

def join_str(items):
    return "".join(items)


items = ["a","b","c"] * 100

t1 = timeit.Timer("concat_str(items)", "from __main__ import concat_str, items")  
print("concat :",t1.timeit(number=1000), " milliseconds")
# concat : 4.852711999999999  milliseconds

t2 = timeit.Timer("join_str(items)", "from __main__ import join_str, items")
print("join :",t2.timeit(number=1000), " milliseconds") 
# join : 0.13476  milliseconds

We can quickly compare speeds for identification of hotspots. join() operates in 0.13 ms vs 4.85 ms for the manual concat, nearly 40X faster!

For more complex projects, line_profiler and memory_profiler provide detailed function-level consumption analysis.

Optimizing performance bottlenecks

Some common techniques to boost performance of IO or CPU bound Python code:

  • Lazy loading data generators – Instead of loading entire datasets into memory, lazily pull in batches during processing. Makes better use of available RAM.

  • Vectorizing array expressions – Use NumPy vectorized operations instead of slow Python for-loops. Achieves near C speeds.

  • Database querying best practices – Structure queries to run on DB layer when possible instead of pulling entire tables locally. Adds IO overhead.

  • Caching reusable results – Cache previously computed results if recalculations are expensive. Saves redundant work.

  • Asynchronous IO requests – Use threads/async to parallelize external IO requests instead of blocking sequences. Overlaps waiting.

The Python wiki contains an excellent summary of additional tips categorized by common bottleneck types.

Multi-threading and Multi-Processing

Python enables leveraging multiple CPUs to speed up an application through parallel processing and asynchronous programming. The concurrent.futures module provides thread and process pool abstractions for common patterns:

import concurrent.futures
import math

PRIMES = [112272535095293] * 100

def is_prime(n):
    # cpu-heavy is_prime check
    for i in range(2, int(math.floor(math.sqrt(n))) + 1):
        if n % i == 0:
            return False 
    return True

with concurrent.futures.ProcessPoolExecutor() as executor:
    for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
        print(‘%d is prime: %s‘ % (number, prime))

We parallelize independent is_prime checks across multiple child processes. This vastly reduces total execution time through parallelization.

Key capabilities:

  • ThreadPoolExecutor – For CPU-bound jobs. Parallelizes with threads
  • ProcessPoolExecutor – Where GIL contention arises. Parallelizes across processes
  • Futures and context manager API – Simplifies async coordination

Correct use of the concurrency module prevents common multi-threading pitfalls around deadlocks and race conditions. The official guide covers patterns for efficiently scaling diverse workloads.

Conclusion

In this comprehensive 2600+ word guide, we went through the essential steps to get the latest production-ready Python 3 release installed on CentOS 7. We covered best practices around virtual environments, Python 3 migration, dependency management, profiling techniques and multiprocessing to build robust and optimized data applications.

The IUS community repository offers newer versions of Python and other software conveniently packaged and tested for Enterprise Linux compatibility. Tools like Poetry and Pipenv vastly improve dependency management over basic pip and virtualenvs. Concurrency primitives offer simple parallelism while NumPy provides C-speed vectorization where possible.

Migrating to Python 3 unlocks access to an actively maintained language version along with thousands of cutting-edge libraries powering modern machine learning and data analytics applications today. It enables building highly scalable and maintainable applications on cost-efficient and stable CentOS platforms.

Let me know if you have any other tips or tricks for maximizing Python 3 performance on CentOS!

Similar Posts