Working with Research Datasets
Paramus provides direct access to curated chemical and materials science datasets. You can install, query, and cross-reference datasets without leaving your research environment.

Available Research Domains
| Domain | Datasets | What You Get |
|---|---|---|
| Polymer Science | RadonPy, PI1M, OpenMacromolecularGenome, VipEA, OMG-Property-Database, PolyIE | ~1M+ polymer structures with physical properties from MD simulations |
| Computational Chemistry | QM9, QM9S, MSR-ACC-TAE25 | 134k small molecules with DFT-level energies, HOMO/LUMO, dipole moments |
| Inorganic / Crystallography | COD, a-Si-24, Anionic-Solvation-Dataset | Crystal structures, amorphous silicon configurations, solvation data |
| Organic / Solubility | BigSolDB | 112,465 experimental solubility records across multiple solvents |

Installing a Dataset
Select a dataset tile and click Install. Paramus downloads the data files from their source (Zenodo, GitHub) and prepares them for querying. Original files are never modified — normalized copies and a search index are created alongside them.

Querying by Chemical Properties
Ask questions in natural language through the chat. Paramus translates your request into the right query automatically.
Find soluble compounds in ethanol at room temperature:
“Show me compounds with LogS above -2 in ethanol between 20 and 30 degrees Celsius from BigSolDB”
Screen polymers by glass transition temperature:
“Which polymers in RadonPy have a Tg above 400K and density below 1.2 g/cm3?”
Look up molecular properties by structure:
“Get the HOMO-LUMO gap and dipole moment for all molecules containing a carbonyl group in QM9”
SMILES columns are automatically canonicalized using RDKit, so c1ccccc1 and C1=CC=CC=C1 both find benzene.

Query Methods
| Method | Use Case |
|---|---|
dataset.query | Filter by structure, property ranges, solvents, conditions |
dataset.query_schema | Inspect available columns, types, and value ranges |
dataset.query_remote | Query a dataset without downloading it first |
dataset.list | See all installed datasets |
dataset.get | Get metadata and file listing for a dataset |
Supported File Formats
Paramus handles common research data formats out of the box:
| Format | Extensions |
|---|---|
| Tabular | .csv, .json, .jsonl, .xlsx, .xls, .parquet, .feather |
| Scientific | .h5, .hdf5, .mat, .npy, .npz |
| Serialized | .pkl, .pickle |
| Archives | .tar, .tar.gz, .tar.bz2, .zip (auto-extracted) |
Use dataset.unfold to convert between formats (e.g. Parquet to CSV).

Semantic Knowledge Graphs
Beyond tabular datasets, three RDF knowledge graphs capture domain-specific research context:
| Knowledge Graph | Focus |
|---|---|
| Polymer Chemistry R&D | Polymer synthesis, characterization, and property prediction |
| Medicinal Chemistry (Molidustat) | HIF-PHD inhibitor research, SAR relationships |
| Germanium Extraction R&D | Hydrometallurgical processing, extraction optimization |
These are managed separately via semantic.list, semantic.switch, and semantic.info.

Dataset Metadata
Each dataset card follows the Croissant 1.0 + Schema.org standard, capturing provenance, licensing, and citation:
{
"@type": "Dataset",
"name": "BigSolDB",
"dataOrigin": "experimental",
"measurementTechnique": "Various experimental methods",
"license": "CC-BY-4.0",
"citation": {
"name": "BigSolDB: Solubility Dataset of Compounds in Organic Solvents",
"identifier": "10.1038/s41597-023-02..."
}
}
This ensures every query result can be traced back to its original publication and data source.

Frequently Asked Questions
Paramus includes curated datasets across polymer science (RadonPy, PI1M, OpenMacromolecularGenome), computational chemistry (QM9, QM9S), inorganic crystallography (COD), and solubility (BigSolDB with 112k+ experimental records). New datasets are added regularly.
Select a dataset tile in the Paramus interface and click Install. Paramus downloads source files from Zenodo or GitHub and builds a local search index. You can then query by chemical properties using natural language or the dataset.query API method.
Paramus handles tabular formats (CSV, JSON, JSONL, XLSX, Parquet, Feather), scientific formats (HDF5, MAT, NPY, NPZ), serialized objects (Pickle), and archives (TAR, ZIP) with automatic extraction. Use dataset.unfold to convert between formats.
Yes. SMILES columns are automatically canonicalized using RDKit, so equivalent representations like c1ccccc1 and C1=CC=CC=C1 both match benzene. You can query by substructure, property ranges, solvents, and experimental conditions.
Three RDF knowledge graphs examples that capture domain-specific research context: Polymer Chemistry R&D (synthesis and property prediction), Medicinal Chemistry with example molecule Molidustat (HIF-PHD inhibitor SAR), and Germanium Extraction R&D (hydrometallurgical processing). Manage them with semantic.list, semantic.switch, and semantic.info.
