This repository contains practical notebooks, a course project, and small datasets for introductory data science and machine learning exercises. The material focuses on data exploration, visualization, image processing, clustering, regression, classification, and workflow-oriented examples in Jupyter notebooks.
The material was developed as part of EL-BONGÓ physics and published at https://elbongo.redclara.net/ciencia-de-datos/. Within the EL-BONGÓ training program, Ciencia de Datos is presented as an international learning component that goes beyond isolated algorithm practice, linking data science with physics, engineering, and contemporary computational intelligence in a collaborative Latin American context.
practical1.ipynb: plotting, image histograms, clustering, image segmentation, and cluster evaluationpractical2.ipynb: regression, classification, decision trees, random forests, and incremental learningexamples/dataworkflow.ipynb: a compact end-to-end notebook showing a typical data workflowproject.md: 35-hour Human Activity Recognition project combining clustering and classification
The project dependencies are listed in requirements.txt:
numpypandasscikit-learnjupyterlabnbclientnbformatipykernelmatplotlib
- Clone the repository:
git clone https://gitmilab.redclara.net/jsamuel/ciencia-de-datos/
cd ciencia-de-datos- Create and activate a virtual environment:
python -m venv .venvOn Windows:
.venv\Scripts\activateOn macOS or Linux:
source .venv/bin/activate- Install the dependencies from
requirements.txt:
pip install -r requirements.txt- Start JupyterLab:
jupyter labYou can then open practical1.ipynb, practical2.ipynb, examples/dataworkflow.ipynb, or project.md.
ciencia-de-datos/
|-- data/
| |-- citypopulation.json
| |-- customer_campaign_data.csv
| |-- flower.jpg
| |-- pl.json
| |-- plparadigm.json
| `-- population.csv
|-- examples/
| `-- dataworkflow.ipynb
|-- project.md
|-- practical1.ipynb
|-- practical2.ipynb
|-- requirements.txt
|-- LICENSE
data/: input datasets and media files used by the practical notebooksexamples/: additional worked notebook examplesproject.md: course project brief on Human Activity Recognition using smartphone sensor datapractical1.ipynb: first practical session on visualization and clusteringpractical2.ipynb: second practical session on supervised learning methods
The exercises use a three-level difficulty scale:
*: Easy**: Medium***: Difficult
The practical notebooks mix easy, medium, and difficult exercises so learners can build from fundamentals toward more open-ended tasks.
John Samuel
Code in this repository is released under the GPLv3+ license. Documentation and other associated content are released under CC BY-SA 4.0.