Production-grade recommendation systems framework.
57+ models · Unified API · Multi-stage pipelines · Research to deployment.
pip install corerec pip install cr_learn
Docs · PyPI · Issues · Modern Guide
CoreRec is a modern recommendation engine built for the deep learning era. It implements industry-standard architectures — Two-Tower retrieval, Transformers, Graph Neural Networks — following the multi-stage pipeline approach used at Netflix, YouTube, and major e-commerce platforms.
- Unified API: every model shares
fit,predict,recommend,save,load - 57+ algorithms: deep learning, collaborative filtering, graph-based, sequential, Bayesian
- Multi-stage pipeline: Retrieval → Ranking → Reranking in a single orchestrated system
- cr_learn: companion dataset library for fast prototyping on real-world data
Last updated: 2024-11-20
pip install --upgrade corerec
pip install cr_learn # dataset companion (optional but recommended)- Python ≥ 3.8
- PyTorch ≥ 1.9
- NumPy, Pandas, SciPy
from corerec.engines import DCN
from cr_learn import ml_1m
# 1. Load a real dataset (auto-downloads MovieLens 1M)
data = ml_1m.load()
ratings = data['ratings']
user_ids = ratings['user_id'].values
item_ids = ratings['movie_id'].values
r = ratings['rating'].values
# 2. Train
model = DCN(embedding_dim=64, epochs=10, verbose=True)
model.fit(user_ids=user_ids, item_ids=item_ids, ratings=r)
# 3. Recommend
recs = model.recommend(user_id=1, top_k=10)
print(recs)That's it. The same three lines — fit, recommend, predict — work for every model in CoreRec.
Every model in CoreRec inherits from BaseRecommender and exposes the same interface:
model.fit(user_ids, item_ids, ratings) # train
model.predict(user_id, item_id) # → float score
model.recommend(user_id, top_k=10) # → list of item IDs
model.batch_predict([(uid, iid), ...]) # → list of floats
model.save('model.pkl') # persist
model = ModelClass.load('model.pkl') # restoreBest for feature-rich data with complex interaction patterns.
| Model | Description | Import |
|---|---|---|
| DCN | Deep & Cross Network — explicit + implicit feature crossing | from corerec.engines import DCN |
| DeepFM | Factorization Machines + Deep Network | from corerec.engines import DeepFM |
| GNNRec | Graph Neural Network recommender | from corerec.engines import GNNRec |
| MIND | Multi-Interest sequential network | from corerec.engines import MIND |
| SASRec | Self-Attentive Sequential Recommendation | from corerec.engines import SASRec |
| NASRec | Neural Architecture Search for RecSys | from corerec.engines import NASRec |
| BERT4Rec | Bidirectional Transformer for sequences | from corerec.engines.content_based import BERT4Rec |
| TwoTower | Dual-encoder retrieval (YouTube-style) | from corerec.engines import TwoTower |
| AFM, AutoInt, DIN, DIEN, DLRM, PNN, NCF, NFM, FIBINet, xDeepFM, Wide&Deep, YouTubeDNN, ESMM, MMoE, PLE, FGCNN, Monolith … | see corerec.engines |
from corerec.engines import DCN
from cr_learn import ml_1m
data = ml_1m.load()
ratings = data['ratings']
model = DCN(
embedding_dim=64,
num_cross_layers=3,
deep_layers=[128, 64],
epochs=20,
learning_rate=0.001,
verbose=True,
)
model.fit(
user_ids=ratings['user_id'].values,
item_ids=ratings['movie_id'].values,
ratings=ratings['rating'].values,
)
score = model.predict(user_id=1, item_id=100)
recs = model.recommend(user_id=1, top_k=10)
print(f"Score: {score:.3f} | Top-10: {recs}")from corerec.engines import TwoTower
model = TwoTower(user_input_dim=64, item_input_dim=128, embedding_dim=256)
model.fit(user_ids, item_ids, interactions)
candidates = model.recommend(user_id=42, top_k=100)from corerec.engines.content_based import BERT4Rec
model = BERT4Rec(hidden_dim=256, num_layers=4)
model.fit(user_ids, item_ids, interactions)
next_items = model.recommend(user_id=1, top_k=10)Simple Algorithm for Recommendation (SAR) — fast, no GPU required.
from corerec.engines.collaborative import SAR
import pandas as pd
df = pd.DataFrame({
'userID': [0, 0, 1, 1, 2],
'itemID': [10, 20, 10, 30, 20],
'rating': [5.0, 4.0, 5.0, 3.0, 4.0],
})
model = SAR(similarity_type='jaccard') # also: 'cosine', 'lift', 'cooccurrence'
model.fit(df)
recs = model.recommend(user_id=0, top_k=5)
batch_recs = model.recommend_k_items(df[['userID']], top_k=10) # all users at oncefrom corerec.engines.content_based import TFIDFRecommender
items = [101, 102, 103]
docs = {101: "action adventure film", 102: "romantic comedy", 103: "thriller suspense"}
model = TFIDFRecommender()
model.fit(items=items, docs=docs)
recs = model.recommend_by_text(query_text="action thriller", top_n=5)from corerec.engines import GNNRec
model = GNNRec(embedding_dim=64, epochs=20)
model.fit(user_ids, item_ids, ratings)
recs = model.recommend(user_id=1, top_k=10)from corerec.multimodal.fusion_strategies import MultiModalFusion
fusion = MultiModalFusion(
modality_dims={'text': 768, 'image': 2048, 'meta': 32},
output_dim=256,
strategy='attention',
)
item_embedding = fusion({'text': text_emb, 'image': img_emb, 'meta': meta})Production systems use Retrieval → Ranking → Reranking. CoreRec ships this pattern out of the box:
from corerec.pipelines import RecommendationPipeline, PipelineConfig
pipeline = RecommendationPipeline(
config=PipelineConfig(retrieval_k=200, ranking_k=50, final_k=10)
)
pipeline.add_retriever(my_retriever, weight=1.0)
pipeline.set_ranker(my_ranker)
pipeline.add_reranker(diversity_reranker)
result = pipeline.recommend(user_id=123, top_k=10)cr_learn is CoreRec's companion package. It provides one-line access to real recommendation datasets, auto-downloading and caching them locally.
pip install cr_learn| Dataset | Module | Load |
|---|---|---|
| MovieLens 1M | cr_learn.ml_1m |
ml_1m.load() |
| IJCAI-16 (Tmall/O2O) | cr_learn.ijcai |
ijcai.load() |
| Tmall | cr_learn.tmall |
tmall.load() |
| Steam Games | cr_learn.steam_games |
steam_games.load() |
| BeiDou/BeiBei | cr_learn.beibei |
beibei.load() |
| LibraryThing | cr_learn.library_thing |
library_thing.load() |
| Rees46 | cr_learn.rees46 |
rees46.load() |
from cr_learn import ml_1m
data = ml_1m.load()
# Returns dict with keys: 'users', 'ratings', 'movies',
# 'user_interactions', 'item_features', 'trn_buy'
print(data['ratings'].head())
# user_id movie_id rating timestamp
# 0 1 1193 5.0 978300760
# ...
# Ready-to-use training data
ratings = data['ratings']
user_ids = ratings['user_id'].values
item_ids = ratings['movie_id'].values
r = ratings['rating'].valuesfrom cr_learn import ijcai
data = ijcai.load(limit_rows=50000)
# Returns dict with train/test DataFrames + user/item featuresAll example scripts try cr_learn first and fall back to the bundled sample_data/ CSVs — no manual setup needed.
CoreRec ships its own optimizer suite (compatible with torch.optim API):
from corerec.cr_boosters.adam import Adam
from corerec.cr_boosters.nadam import NAdam
optimizer = Adam(model.parameters(), lr=0.001)Available: Adam · NAdam · Adamax · Adadelta · Adagrad · ASGD · LBFGS · RMSprop · SGD · SparseAdam
python examples/engines_dcn_example.py # Deep & Cross Network
python examples/engines_deepfm_example.py # DeepFM
python examples/engines_gnnrec_example.py # GNN-based recommender
python examples/engines_mind_example.py # MIND (multi-interest)
python examples/engines_nasrec_example.py # NASRec
python examples/engines_sasrec_example.py # SASRec (self-attentive)python examples/unionized_sar_example.py # SAR (item-to-item similarity)
python examples/unionized_fast_example.py # FastAI-style embedding
python examples/unionized_rbm_example.py # Restricted Boltzmann Machine
python examples/unionized_rlrmc_example.py # Riemannian low-rank matrix completion
python examples/unionized_geomlc_example.py # Geometric matrix completionpython examples/content_filter_tfidf_example.py # TF-IDF content filterpython examples/imshow_connector_example.py # plug-and-play demo UI
# Then open http://127.0.0.1:8000python examples/run_all_algo_tests_example.py # discover + run all algorithm testsTip: All scripts add the project root to
sys.pathautomatically. Ifcr_learnis installed, they prefer it; otherwise they usesample_data/CSVs bundled in this repo.
| Area | Path |
|---|---|
| Core models | corerec/ ├── engines/ │ ├── dcn.py, deepfm.py, gnnrec.py, mind.py, │ │ sasrec.py, nasrec.py, bert4rec.py, two_tower.py │ ├── collaborative/ SAR, LightGCN, NCF, TwoTower │ └── content_based/ TFIDFRecommender, YoutubeDNN, DSSM ├── pipelines/ RecommendationPipeline, DataPipeline ├── retrieval/ Candidate retrieval, ensemble fusion ├── ranking/ Pointwise, pairwise, feature-cross rankers ├── reranking/ Diversity, fairness rerankers ├── multimodal/ MultiModalFusion, encoders ├── embeddings/ Pretrained embeddings, tables ├── evaluation/ Evaluator, metrics (RMSE, NDCG, MAP …) ├── explanation/ Feature-based & generative explainers ├── serving/ ModelServer, batch inference ├── api/ BaseRecommender, exceptions, mixins └── cr_boosters/ Adam, NAdam, SGD, … optimizers |
| Datasets | cr_learn_setup/cr_learn/ ├── ml_1m.py MovieLens 1M ├── ijcai.py IJCAI-16 O2O ├── tmall.py Tmall ├── beibei.py BeiBei ├── steam_games.py Steam Games ├── rees46.py Rees46 └── library_thing.py |
| Docs & Examples | docs/source/ ├── tutorials/ 57 model tutorials (DCN, DeepFM, SASRec …) ├── api/ Full API reference ├── user_guide/ Data prep, training, persistence, best practices └── examples/ Basic, advanced, production deployment |
CoreRec ships with VishGraphs, a companion library for graph visualization and analysis:
import vish_graphs as vg
# Generate a random graph and save to CSV
graph_file = vg.generate_random_graph(num_people=100, file_path="graph.csv")
# Load as adjacency matrix and visualize
adj_matrix = vg.bipartite_matrix_maker(graph_file)
nodes = list(range(len(adj_matrix)))
top_nodes = [0, 1, 2]
vg.draw_graph(adj_matrix, nodes, top_nodes) # 2D
vg.draw_graph_3d(adj_matrix, nodes, top_nodes) # 3D
vg.show_bipartite_relationship(adj_matrix) # bipartite viewAPI summary:
| Function | Description |
|---|---|
generate_random_graph(n, file_path, seed) |
Generate & save random adjacency matrix |
draw_graph(adj, top_nodes, recommended_nodes, ...) |
2D graph visualization |
draw_graph_3d(adj, top_nodes, ...) |
3D graph visualization |
show_bipartite_relationship(adj) |
Bipartite relationship view |
find_top_nodes(matrix, num_nodes) |
Most-connected nodes |
bipartite_matrix_maker(csv_path) |
Load adjacency matrix from CSV |
Full documentation is available at vishesh9131.github.io/CoreRec.
Build locally:
pip install sphinx sphinx-design myst-parser sphinx-book-theme
sphinx-build -b html docs/source docs/build/html
open docs/build/html/index.htmlKey sections:
ImportError / module not found
pip install --upgrade corerecNumPy 2.x conflict with PyTorch
pip install "numpy<2"CUDA / GPU issues
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118cr_learn dataset download fails
Examples fall back to sample_data/ CSVs bundled in this repo automatically. No action needed.
For anything else: open an issue or check the FAQ.
We welcome bug fixes, new features, docs improvements, and new models.
- Fork the repo
- Create a feature branch (
git checkout -b feature/my-thing) - Make your changes following the existing code style
- Open a pull request with a clear description
See CONTRIBUTING.md for the full guide.
| @vishesh9131 |
|---|
| Founder / Creator |
This library and its utilities are for research purposes only. Commercial use requires explicit consent from the author (@vishesh9131).
See LICENSE for details.
