This project downloads and analyzes SEC DEF 14A (proxy statement) filings to determine:
- If the annual shareholder meeting is scheduled to be held in-person.
- If the in-person meeting location is in a user-specified target city (and optionally state/region).
The analysis focuses on achieving high precision and recall through careful text processing and dynamic pattern matching using regular expressions.
- Automated downloading of DEF 14A filings from SEC EDGAR.
- Handles HTML and TXT filing formats.
- Extracts relevant text content using
BeautifulSoup. - Uses sophisticated Regular Expressions to identify meeting format (In-Person, Virtual, Hybrid).
- Accepts command-line arguments for the target city and state/region.
- Dynamically checks for the specified location within the meeting context.
- Provides confidence levels for the analysis (High, Medium, Low).
- Outputs results to a CSV file, including snippets for verification.
- Includes rate limiting for SEC EDGAR.
- Docker support for easy execution.
- Clone the repository:
git clone ... - Install Docker and Docker Compose: (If using Docker) Follow official instructions.
- Create
.envfile: In the project root, create.envwith your SEC email:# .env SEC_EMAIL=your_real_email@example.com - (Optional) Native Setup:
- Create virtual environment:
python -m venv venv,source venv/bin/activate - Install dependencies:
pip install -r requirements.txt
- Create virtual environment:
-
Build the image (only needed once or after code changes):
docker compose build
-
Run the analysis: Use
docker compose runto pass arguments to the script. The service name isanalyzer(as defined indocker-compose.yml). Arguments are passed after the service name.# Example: Analyze for meetings in Chicago, IL for default tickers/dates docker compose run --rm analyzer --city "Chicago" --state "IL" # Example: Analyze for meetings in Austin, TX for specific tickers & dates docker compose run --rm analyzer --city "Austin" --state "TX" --tickers GOOGL TSLA --start-date "2023-06-01" # Example: City only (less precise) docker compose run --rm analyzer --city "Boston"
--rm: Automatically removes the container after it finishes.analyzer: The name of the service indocker-compose.yml.--city "City Name": Required. The target city. Use quotes if the name has spaces.--state "State/Region": Optional but recommended. The target state or region (e.g., "CA", "New York", "Illinois").--tickers TICKER1 TICKER2: Optional. Override default tickers.--start-date YYYY-MM-DD,--end-date YYYY-MM-DD: Optional. Override date range.--output-file filename.csv: Optional. Specify the output CSV filename.
-
Accessing Results: The output CSV is saved in the
output_datavolume. Copy it out as described previously (e.g.,docker cp <container_id>:/app/output/your_file.csv ./). Usedocker ps -ato find the container ID if you didn't use--rm.
-
Ensure Setup: Complete the native setup steps (virtual env,
pip install). -
Set Environment Variable: Make sure the
SEC_EMAILis available (either via the.envfile andpython-dotenv, or by exporting it in your terminal:export SEC_EMAIL="your_email@example.com"). -
Run the script with arguments:
# Example: Analyze for meetings in Palo Alto, CA python main.py --city "Palo Alto" --state "CA" # Example: Analyze for Redmond, WA for specific tickers python main.py --city "Redmond" --state "WA" --tickers MSFT AMZN # Example: Specifying output file python main.py --city "New York" --state "NY" --output-file ny_meetings_q1_2024.csv --start-date 2024-01-01 --end-date 2024-03-31
Refer to the Docker examples above for the available arguments (
--city,--state,--tickers,--start-date,--end-date,--output-file). -
Results: The output CSV file will be in the
output/directory.
- Location Specificity: Providing the
--statesignificantly improves accuracy by disambiguating common city names (e.g., Springfield). - Address Formatting: The
physical_location_context_regextries to capture common address formats but might miss unconventional ones. Reviewing snippets is important. - Regex Tuning: Further tuning in
meeting_analyzer.pymight be needed for specific edge cases or less common city/state representations in filings.