Lecture 4: Modules, Packages & Regular Expressions
1. Introduction to Modules
A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py.
Creating a Simple Module
my_math.py
"""A simple math module with basic operations."""
def add(a, b):
"""Return the sum of a and b."""
return a + b
def subtract(a, b):
"""Return the difference between a and b."""
return a - b
def multiply(a, b):
"""Return the product of a and b."""
return a * b
def divide(a, b):
"""Return the quotient of a divided by b."""
if b == 0:
raise ValueError("Cannot divide by zero!")
return a / b
# This code runs when the module is executed directly
if __name__ == "__main__":
print("Running my_math module directly")
print(f"2 + 3 = {add(2, 3)}")
print(f"5 - 2 = {subtract(5, 2)}")
Using the Module
# Import the entire module
import my_math
# Use functions with module name prefix
result = my_math.add(5, 3)
print(f"5 + 3 = {result}")
# Import specific functions
from my_math import multiply, divide
# Use directly without module name
product = multiply(4, 6)
print(f"4 * 6 = {product}")
# Import with alias
import my_math as mm
print(f"10 - 4 = {mm.subtract(10, 4)}")
# Import all names (not recommended)
from my_math import *
Module Search Path
When a module is imported, Python searches for it in the following order:
- The directory containing the input script (or the current directory)
- The list of directories contained in the
PYTHONPATHenvironment variable - An installation-dependent list of directories configured at Python installation time
You can view the search path with:
import sys
print(sys.path)
2. Creating and Using Packages
A package is a way of organizing related modules into a single directory hierarchy.
Package Structure
my_package/
__init__.py # Makes the directory a Python package
module1.py # Module 1
module2.py # Module 2
subpackage/ # Subpackage
__init__.py # Makes subdirectory a package
module3.py # Module in subpackage
Package Initialization
The __init__.py file can be empty or can contain initialization code for the package.
# my_package/__init__.py
"""
This is the my_package package.
It provides useful utilities for various tasks.
"""
# You can define what gets imported with 'from my_package import *'
__all__ = ['module1', 'module2']
# Package-level variables
VERSION = '1.0.0'
# You can also import functions to make them available at the package level
from .module1 import some_function
print("Initializing my_package...")
Using the Package
# Import from the package
import my_package.module1
from my_package import module2
from my_package.subpackage import module3
# Using the imported modules
my_package.module1.some_function()
module2.another_function()
# Import specific functions
from my_package.module1 import specific_function
# Using package-level imports (if defined in __init__.py)
from my_package import some_function
3. Standard Library Modules
Commonly Used Standard Modules
| Module | Description | Common Uses |
|---|---|---|
sys |
System-specific parameters and functions | Command-line arguments, interpreter settings |
os |
Operating system interfaces | File/directory operations, process management |
math |
Mathematical functions | Trigonometry, logarithms, constants |
datetime |
Date and time handling | Date arithmetic, formatting, timezones |
json |
JSON data handling | Reading/writing JSON files, parsing JSON data |
re |
Regular expressions | Pattern matching, string searching |
collections |
Specialized container datatypes | Counters, named tuples, default dictionaries |
itertools |
Functions creating iterators | Efficient looping, combinations, permutations |
Example: Using Standard Modules
import os
import sys
import math
from datetime import datetime
import json
# Using os module
current_dir = os.getcwd()
print(f"Current directory: {current_dir}")
# Using sys module
print(f"Python version: {sys.version}")
print(f"Command line arguments: {sys.argv}")
# Using math module
print(f"Square root of 16: {math.sqrt(16)}")
print(f"Value of pi: {math.pi}")
# Using datetime module
now = datetime.now()
print(f"Current date and time: {now}")
print(f"Formatted date: {now.strftime('%Y-%m-%d %H:%M:%S')}")
# Using json module
data = {
'name': 'John Doe',
'age': 30,
'courses': ['Math', 'Physics', 'Chemistry']
}
# Convert to JSON string
json_str = json.dumps(data, indent=2)
print("JSON data:")
print(json_str)
4. Introduction to Regular Expressions
Regular expressions (regex) are a powerful tool for pattern matching and manipulation of strings.
Basic Regex Patterns
| Pattern | Description | Example | Matches |
|---|---|---|---|
. |
Matches any character except newline | a.c |
abc, aac, a1c, etc. |
^ |
Start of string | ^Hello |
Hello world (but not Say Hello) |
$ |
End of string | world$ |
Hello world (but not world peace) |
* |
0 or more repetitions | ab*c |
ac, abc, abbc, abbbc, etc. |
+ |
1 or more repetitions | ab+c |
abc, abbc, abbbc, etc. (not ac) |
? |
0 or 1 repetition | ab?c |
ac, abc (not abbc) |
{m,n} |
m to n repetitions | a{2,4}b |
aab, aaab, aaaab |
\d |
Digit (0-9) | \d{3} |
123, 456, 000, etc. |
\w |
Word character (a-z, A-Z, 0-9, _) | \w+ |
hello, Python3, user_name |
\s |
Whitespace (space, tab, newline) | \s+ |
, , |
[abc] |
Any of a, b, or c | [aeiou] |
a, e, i, o, u |
[^abc] |
Not a, b, or c | [^0-9] |
a, b, !, @, etc. (not digits) |
| |
OR operator | cat|dog |
cat or dog |
() |
Grouping | (ab)+ |
ab, abab, ababab, etc. |
5. Using the re Module
Common re Functions
| Function | Description | Example |
|---|---|---|
re.search() |
Search for a pattern anywhere in the string | re.search(r'\d+', 'abc123') |
re.match() |
Match pattern at the beginning of the string | re.match(r'\d+', '123abc') |
re.findall() |
Find all non-overlapping matches | re.findall(r'\d+', 'a1b22c333') |
re.finditer() |
Return an iterator yielding match objects | for m in re.finditer(r'\d+', 'a1b22c333'): |
re.sub() |
Replace occurrences of a pattern | re.sub(r'\d', '#', 'a1b2c3') |
re.split() |
Split string by pattern | re.split(r'\s+', 'split on whitespace') |
re.compile() |
Compile a regex pattern for reuse | pattern = re.compile(r'\d{3}-\d{2}-\d{4}') |
Regex Examples
import re
# Example 1: Simple search
text = "The quick brown fox jumps over the lazy dog."
match = re.search(r'fox', text)
if match:
print(f"Found 'fox' at position {match.start()}-{match.end()}")
# Example 2: Find all email addresses
text = "Contact us at support@example.com or sales@company.org"
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
print(f"Found emails: {emails}")
# Example 3: Extract dates in format YYYY-MM-DD
text = "Events on 2023-10-15 and 2023-11-20"
dates = re.findall(r'\b\d{4}-\d{2}-\d{2}\b', text)
print(f"Found dates: {dates}")
# Example 4: Replace phone numbers
text = "Call 555-123-4567 or 800-555-0000"
masked = re.sub(r'\b\d{3}-\d{3}-\d{4}\b', 'XXX-XXX-XXXX', text)
print(f"Masked text: {masked}")
# Example 5: Using groups for extraction
text = "Name: John Doe, Age: 30, City: New York"
pattern = r'Name: (\w+ \w+), Age: (\d+), City: (\w+ \w+)'
match = re.search(pattern, text)
if match:
name, age, city = match.groups()
print(f"Name: {name}, Age: {age}, City: {city}")
# Example 6: Case-insensitive search
text = "Python is awesome! PYTHON is great! python is fun!"
matches = re.findall(r'python', text, re.IGNORECASE)
print(f"Found 'python' {len(matches)} times (case-insensitive)")
6. Practical Example: Log File Analysis
Let's create a script that analyzes a web server log file using regular expressions.
import re
from collections import defaultdict
from datetime import datetime
def analyze_log_file(log_file):
"""Analyze a web server log file and extract useful information."""
# Common log format: 127.0.0.1 - - [10/Oct/2023:13:55:36 +0000] "GET / HTTP/1.1" 200 612
log_pattern = r'''(?P\d+\.\d+\.\d+\.\d+)\s+ # IP address
[^\[\]]+\s+ # Remote user (ignored)
\[([^\]]+)\]\s+ # Timestamp
"(?:\w+\s+)?(?P[^\s"]+)\s+[^"]*"\s+ # Request URL
(?P\d{3})\s+ # Status code
(?P\d+|-)\s* # Response size
'''
# Compile the regex with verbose flag for comments
log_regex = re.compile(log_pattern, re.VERBOSE)
# Store statistics
stats = {
'total_requests': 0,
'status_codes': defaultdict(int),
'popular_pages': defaultdict(int),
'requests_by_hour': defaultdict(int),
'unique_ips': set(),
'total_bytes': 0
}
# Process each line in the log file
with open(log_file, 'r') as f:
for line in f:
match = log_regex.search(line)
if not match:
continue
stats['total_requests'] += 1
# Extract data from the match
ip = match.group('remote_addr')
timestamp_str = match.group(2) # Group 2 is the timestamp
url = match.group('url')
status = match.group('status')
size = match.group('size')
# Update statistics
stats['status_codes'][status] += 1
stats['popular_pages'][url] += 1
stats['unique_ips'].add(ip)
# Parse timestamp to get hour
try:
# Handle different timestamp formats if needed
timestamp = datetime.strptime(timestamp_str, '%d/%b/%Y:%H:%M:%S %z')
hour = timestamp.strftime('%Y-%m-%d %H:00')
stats['requests_by_hour'][hour] += 1
except ValueError:
pass # Ignore timestamp parsing errors
# Add response size to total
if size != '-':
stats['total_bytes'] += int(size)
# Generate report
print(f"Log Analysis Report")
print("=" * 80)
print(f"Total requests: {stats['total_requests']}")
print(f"Total data transferred: {stats['total_bytes'] / (1024*1024):.2f} MB")
print(f"Unique IP addresses: {len(stats['unique_ips'])}")
print("\nStatus Code Distribution:")
for code, count in sorted(stats['status_codes'].items()):
print(f" {code}: {count} requests")
print("\nTop 5 Most Popular Pages:")
for url, count in sorted(stats['popular_pages'].items(),
key=lambda x: x[1], reverse=True)[:5]:
print(f" {url}: {count} requests")
print("\nRequests by Hour:")
for hour, count in sorted(stats['requests_by_hour'].items()):
print(f" {hour}: {count} requests")
# Example usage
if __name__ == "__main__":
log_file = "access.log" # Replace with your log file path
try:
analyze_log_file(log_file)
except FileNotFoundError:
print(f"Error: Log file '{log_file}' not found.")
except Exception as e:
print(f"An error occurred: {e}")