Skip to content

AI-for-Education/lesson-plan-parse-mbsse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lesson Plan Parse MBSSE

The Ministry of Basic and Senior Secondary Education (MBSSE) of Sierra Leone publishes a set of lesson plans for Maths and Language Arts on their online knowledge platform. The lesson plans are in pdf format, and cover all grade levels across Primary, Junior Secondary (JSS), and Senior Secondary (SSS).

Here we share python code for parsing the raw text of the lesson plan pdfs into structured json and subsequently cleaning the text so that it is suitable for human consumption. We also share the following files, which are the input to and outputs from this process:

Step 1: Installation

First, clone the repository to some location on your computer (we'll call it INSTALL_DIR, but replace that with the full path to the location on your computer, e.g. C:\my_code_directory):

cd INSTALL_DIR
git clone https://github.com/AI-for-Education/lesson-plan-parse-mbsse.git

This will create the folder INSTALL_DIR/lesson-plan-parse-mbsse, which we will now refer to as REPO_DIR

The package can be installed into an existing python environment with:

cd REPO_DIR
pip install -r requirements.txt

For conda users, a conda environment.yml file is also included, which can be used to create a new python environment and install the package inside it in one step.

cd REPO_DIR
conda env create -f environment.yml

Step 2: Set environment variables

Although most of the steps involved in parsing the lesson plans are rule-based, all cleaning steps use LLMs from OpenAI's API. In order for the scripts to run, you will need to have an OpenAI API key, which you can set by creating a file called .env in the root directory of the repository. This file should contain the line:

OPENAI_API_KEY=your_api_key

where your_api_key is replaced with the value of your OpenAI API key.

Step 3: Parse lesson plans

The script for parsing the lesson plans is REPO_DIR/scripts/parse_mbsseKP_lessonplans.py. You can either run this from the command line with:

python REPO_DIR/scripts/parse_mbsseKP_lessonplans.py

or you can run it interactively in an IDE (like VSCode) to understand each of the steps.

The raw text of the lesson plans will first be downloaded to REPO_DIR/mbsseKP_files_lessonplans.json, and at the end of the process REPO_DIR/mbsseKP_files_lessonplans_parsed.json will be created.

Step 4: Clean parsed lesson plans

The parsed lesson plans produced in Step 3 are fine to use as inputs to LLMs, but for human readability they suffer from a range of formatting errors resulting from the extraction of the raw text from pdf. The process of correcting these kinds of errors, and generally improving formatting overall (including inserting markdown tables, LaTeX formulae), is perfectly suited to an LLM like gpt-4o.

The script for running this cleaning process is REPO_DIR/scripts/clean_mbsseKP_lessonplans.py. The ouput from this process is REPO_DIR/mbsseKP_files_lessonplans_parsed_cleaned.json. This file is created at launch, and updated as the process runs. The full process over all lesson plans can be quite expensive (although prices are descreasing), so the script can be cancelled at any time and this output file supports resuming from where it left off.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages