Wyoming Parser

Repository hosting the instructions on how to parse the Wyoming Business Registry DB export from the SOS website and how to import it into a PostgreSQL database for further use via a Hasura Endpoint.

Requirements

Docker
Docker Compose
PostgreSQL Database
awk (most unix-based systems have it pre-installed)

All scripts are inside scripts folder, and should be run from the base/main root directory of this source code.

Setup

Go to the Wyoming Secretary of State Business Center, and download the Business Entity Database inside the sample folder.

(A sample can be found inside the sample folder. It is expected your folder is called WY_Business_Data.zip. You can simply replace your latest zip inside that folder).

Extract the contents inside the data folder. This folder will be used inside a Docker container later to be imported using pgfutter. If you are on a unix-based OS, you can run scripts/unzip_wy_db.sh to unzip the contents in the right folder. Your data folder will now look like this:

data
├── FILING.csv
├── FILING_ANNUAL_REPORT.csv
├── PARTY.csv
└── schema.pdf

0 directories, 4 files

The exported CSV files need some cleaning before they can be properly used. Importing them into your database as is, will create a wrong schema and/or break pgfutter. In particularly, many entries have wrong numbers of columns and/or are parsed poorly. To clean them up, run scripts/clean_csv_files.sh, which will:

Save all lines that do not match the total amount of columns per table
Remove these lines from the CSV files into a new $TABLE_CLEANED.csv file.

After running having $TABLE_CLEANED.csv in your repository, it's time to spin our database and pgadmin4 to import the schema and database. Run docker-compose up db pgadmin4, and go to your browser on localhost:8889. Provide the default login credentials admin@zontle.tech and password. We need to now import our newly created server so we se what we are doing.

To view our server inside pgadmin4, you need to create a new server. To do so, right-click on the Servers option in the sidebar and select Create > Server. You'll need to provide Docker's host address, which is only known within the Docker network. Run the following command to get it, should be something like 172.20.0.2:

docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker ps -q --filter "ancestor=postgres:11")

If you click under the Database > postgres > schemas, you will only see a public empty one. Its time to import our CSV files as actual database entries. To do so, we'll use pgfutter.

We are ready to use pgfutter, which will create the schema and import our clean data from our csv files. We'll attach ourselves to the pgadmin4 instance to run it, as its already within our bin folder mounted as a volume. To do so, run the following commands:

docker-compose exec db bash
cd /home
./bin/pgfutter --user postgres --pw password --schema wyoming csv -d "|" ./data/FILING_CLEANED.csv
./bin/pgfutter --user postgres --pw password --schema wyoming csv -d "|" ./data/PARTY_CLEANED.csv
./bin/pgfutter --user postgres --pw password --schema wyoming csv -d "|" ./data/FILING_ANNUAL_REPORT_CLEANED.csv

(Each pgfutter call might take some time depending on your computer).

After successfully running these commands, you will be able to see all the entries into a new table called wyoming. You can then query and see data directly from pgadmin4. Its time to spin our API!

To spin our API, we only need to run docker-compose up graphql-engine, which will spin a Hasura instance that will connect to our database. You can now go to http://localhost:8080/console and play with the actual API.

To allow automatic import of data, go to Data and select wyoming under Current Postgres Schema and thereafter Import All tables to expose your database via Hasura's GraphQL Engine. You can now start querying the Wyoming Business Registry as an Open API.

Additional Notes

The current process was setup manual for the Wyoming 2020 Hackathon. The database is roughly updated every 2 weeks, so these steps need to be run on a bi-weekly basis. However, a GitHub Action can be created to automate this by running scripts/import.sh on a cron-job basis by using GitHub on.schedule API and provide the database dump in an external URL (e.g. AWS S3, Google Cloud Storage).

Please reach the secretary of state IT department (SOSAdminServices@wyo.gov) before creating an automated import script on their database, as mocking the download query might create a disruption of service, be confused as a DDOS attack, and/or block GitHub range of IPs.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
bin		bin
sample		sample
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wyoming Parser

Requirements

Setup

Additional Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wyoming Parser

Requirements

Setup

Additional Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages