Lecture 4: Modules, Packages & Regular Expressions

1. Introduction to Modules

A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py.

Creating a Simple Module

my_math.py

"""A simple math module with basic operations.""" def add(a, b): """Return the sum of a and b.""" return a + b def subtract(a, b): """Return the difference between a and b.""" return a - b def multiply(a, b): """Return the product of a and b.""" return a * b def divide(a, b): """Return the quotient of a divided by b.""" if b == 0: raise ValueError("Cannot divide by zero!") return a / b # This code runs when the module is executed directly if __name__ == "__main__": print("Running my_math module directly") print(f"2 + 3 = {add(2, 3)}") print(f"5 - 2 = {subtract(5, 2)}")

Using the Module

# Import the entire module import my_math # Use functions with module name prefix result = my_math.add(5, 3) print(f"5 + 3 = {result}") # Import specific functions from my_math import multiply, divide # Use directly without module name product = multiply(4, 6) print(f"4 * 6 = {product}") # Import with alias import my_math as mm print(f"10 - 4 = {mm.subtract(10, 4)}") # Import all names (not recommended) from my_math import *

Module Search Path

When a module is imported, Python searches for it in the following order:

The directory containing the input script (or the current directory)
The list of directories contained in the PYTHONPATH environment variable
An installation-dependent list of directories configured at Python installation time

You can view the search path with:

import sys print(sys.path)

2. Creating and Using Packages

A package is a way of organizing related modules into a single directory hierarchy.

Package Structure

my_package/

__init__.py # Makes the directory a Python package

module1.py # Module 1

module2.py # Module 2

subpackage/ # Subpackage

__init__.py # Makes subdirectory a package

module3.py # Module in subpackage

Package Initialization

The __init__.py file can be empty or can contain initialization code for the package.

# my_package/__init__.py """ This is the my_package package. It provides useful utilities for various tasks. """ # You can define what gets imported with 'from my_package import *' __all__ = ['module1', 'module2'] # Package-level variables VERSION = '1.0.0' # You can also import functions to make them available at the package level from .module1 import some_function print("Initializing my_package...")

Using the Package

# Import from the package import my_package.module1 from my_package import module2 from my_package.subpackage import module3 # Using the imported modules my_package.module1.some_function() module2.another_function() # Import specific functions from my_package.module1 import specific_function # Using package-level imports (if defined in __init__.py) from my_package import some_function

3. Standard Library Modules

Commonly Used Standard Modules

Module	Description	Common Uses
`sys`	System-specific parameters and functions	Command-line arguments, interpreter settings
`os`	Operating system interfaces	File/directory operations, process management
`math`	Mathematical functions	Trigonometry, logarithms, constants
`datetime`	Date and time handling	Date arithmetic, formatting, timezones
`json`	JSON data handling	Reading/writing JSON files, parsing JSON data
`re`	Regular expressions	Pattern matching, string searching
`collections`	Specialized container datatypes	Counters, named tuples, default dictionaries
`itertools`	Functions creating iterators	Efficient looping, combinations, permutations

Example: Using Standard Modules

import os import sys import math from datetime import datetime import json # Using os module current_dir = os.getcwd() print(f"Current directory: {current_dir}") # Using sys module print(f"Python version: {sys.version}") print(f"Command line arguments: {sys.argv}") # Using math module print(f"Square root of 16: {math.sqrt(16)}") print(f"Value of pi: {math.pi}") # Using datetime module now = datetime.now() print(f"Current date and time: {now}") print(f"Formatted date: {now.strftime('%Y-%m-%d %H:%M:%S')}") # Using json module data = { 'name': 'John Doe', 'age': 30, 'courses': ['Math', 'Physics', 'Chemistry'] } # Convert to JSON string json_str = json.dumps(data, indent=2) print("JSON data:") print(json_str)

4. Introduction to Regular Expressions

Regular expressions (regex) are a powerful tool for pattern matching and manipulation of strings.

Basic Regex Patterns

Pattern	Description	Example	Matches
`.`	Matches any character except newline	`a.c`	abc, aac, a1c, etc.
`^`	Start of string	`^Hello`	Hello world (but not Say Hello)
`$`	End of string	`world$`	Hello world (but not world peace)
`*`	0 or more repetitions	`ab*c`	ac, abc, abbc, abbbc, etc.
`+`	1 or more repetitions	`ab+c`	abc, abbc, abbbc, etc. (not ac)
`?`	0 or 1 repetition	`ab?c`	ac, abc (not abbc)
`{m,n}`	m to n repetitions	`a{2,4}b`	aab, aaab, aaaab
`\d`	Digit (0-9)	`\d{3}`	123, 456, 000, etc.
`\w`	Word character (a-z, A-Z, 0-9, _)	`\w+`	hello, Python3, user_name
`\s`	Whitespace (space, tab, newline)	`\s+`	, ,
`[abc]`	Any of a, b, or c	`[aeiou]`	a, e, i, o, u
`[^abc]`	Not a, b, or c	`[^0-9]`	a, b, !, @, etc. (not digits)
`\|`	OR operator	`cat\|dog`	cat or dog
`()`	Grouping	`(ab)+`	ab, abab, ababab, etc.

5. Using the `re` Module

Common `re` Functions

Function	Description	Example
`re.search()`	Search for a pattern anywhere in the string	`re.search(r'\d+', 'abc123')`
`re.match()`	Match pattern at the beginning of the string	`re.match(r'\d+', '123abc')`
`re.findall()`	Find all non-overlapping matches	`re.findall(r'\d+', 'a1b22c333')`
`re.finditer()`	Return an iterator yielding match objects	`for m in re.finditer(r'\d+', 'a1b22c333'):`
`re.sub()`	Replace occurrences of a pattern	`re.sub(r'\d', '#', 'a1b2c3')`
`re.split()`	Split string by pattern	`re.split(r'\s+', 'split on whitespace')`
`re.compile()`	Compile a regex pattern for reuse	`pattern = re.compile(r'\d{3}-\d{2}-\d{4}')`

Regex Examples

import re # Example 1: Simple search text = "The quick brown fox jumps over the lazy dog." match = re.search(r'fox', text) if match: print(f"Found 'fox' at position {match.start()}-{match.end()}") # Example 2: Find all email addresses text = "Contact us at support@example.com or sales@company.org" emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text) print(f"Found emails: {emails}") # Example 3: Extract dates in format YYYY-MM-DD text = "Events on 2023-10-15 and 2023-11-20" dates = re.findall(r'\b\d{4}-\d{2}-\d{2}\b', text) print(f"Found dates: {dates}") # Example 4: Replace phone numbers text = "Call 555-123-4567 or 800-555-0000" masked = re.sub(r'\b\d{3}-\d{3}-\d{4}\b', 'XXX-XXX-XXXX', text) print(f"Masked text: {masked}") # Example 5: Using groups for extraction text = "Name: John Doe, Age: 30, City: New York" pattern = r'Name: (\w+ \w+), Age: (\d+), City: (\w+ \w+)' match = re.search(pattern, text) if match: name, age, city = match.groups() print(f"Name: {name}, Age: {age}, City: {city}") # Example 6: Case-insensitive search text = "Python is awesome! PYTHON is great! python is fun!" matches = re.findall(r'python', text, re.IGNORECASE) print(f"Found 'python' {len(matches)} times (case-insensitive)")

6. Practical Example: Log File Analysis

Let's create a script that analyzes a web server log file using regular expressions.

import re from collections import defaultdict from datetime import datetime def analyze_log_file(log_file): """Analyze a web server log file and extract useful information.""" # Common log format: 127.0.0.1 - - [10/Oct/2023:13:55:36 +0000] "GET / HTTP/1.1" 200 612 log_pattern = r'''(?P\d+\.\d+\.\d+\.\d+)\s+ # IP address [^\[\]]+\s+ # Remote user (ignored) \[([^\]]+)\]\s+ # Timestamp "(?:\w+\s+)?(?P[^\s"]+)\s+[^"]*"\s+ # Request URL (?P\d{3})\s+ # Status code (?P\d+|-)\s* # Response size ''' # Compile the regex with verbose flag for comments log_regex = re.compile(log_pattern, re.VERBOSE) # Store statistics stats = { 'total_requests': 0, 'status_codes': defaultdict(int), 'popular_pages': defaultdict(int), 'requests_by_hour': defaultdict(int), 'unique_ips': set(), 'total_bytes': 0 } # Process each line in the log file with open(log_file, 'r') as f: for line in f: match = log_regex.search(line) if not match: continue stats['total_requests'] += 1 # Extract data from the match ip = match.group('remote_addr') timestamp_str = match.group(2) # Group 2 is the timestamp url = match.group('url') status = match.group('status') size = match.group('size') # Update statistics stats['status_codes'][status] += 1 stats['popular_pages'][url] += 1 stats['unique_ips'].add(ip) # Parse timestamp to get hour try: # Handle different timestamp formats if needed timestamp = datetime.strptime(timestamp_str, '%d/%b/%Y:%H:%M:%S %z') hour = timestamp.strftime('%Y-%m-%d %H:00') stats['requests_by_hour'][hour] += 1 except ValueError: pass # Ignore timestamp parsing errors # Add response size to total if size != '-': stats['total_bytes'] += int(size) # Generate report print(f"Log Analysis Report") print("=" * 80) print(f"Total requests: {stats['total_requests']}") print(f"Total data transferred: {stats['total_bytes'] / (1024*1024):.2f} MB") print(f"Unique IP addresses: {len(stats['unique_ips'])}") print("\nStatus Code Distribution:") for code, count in sorted(stats['status_codes'].items()): print(f" {code}: {count} requests") print("\nTop 5 Most Popular Pages:") for url, count in sorted(stats['popular_pages'].items(), key=lambda x: x[1], reverse=True)[:5]: print(f" {url}: {count} requests") print("\nRequests by Hour:") for hour, count in sorted(stats['requests_by_hour'].items()): print(f" {hour}: {count} requests") # Example usage if __name__ == "__main__": log_file = "access.log" # Replace with your log file path try: analyze_log_file(log_file) except FileNotFoundError: print(f"Error: Log file '{log_file}' not found.") except Exception as e: print(f"An error occurred: {e}")