Skip to content

johnsamuelwrites/CienciadeDatos

Repository files navigation

Ciencia de Datos

This repository contains practical notebooks, a course project, and small datasets for introductory data science and machine learning exercises. The material focuses on data exploration, visualization, image processing, clustering, regression, classification, and workflow-oriented examples in Jupyter notebooks.

The material was developed as part of EL-BONGÓ physics and published at https://elbongo.redclara.net/ciencia-de-datos/. Within the EL-BONGÓ training program, Ciencia de Datos is presented as an international learning component that goes beyond isolated algorithm practice, linking data science with physics, engineering, and contemporary computational intelligence in a collaborative Latin American context.

Contents

  • practical1.ipynb: plotting, image histograms, clustering, image segmentation, and cluster evaluation
  • practical2.ipynb: regression, classification, decision trees, random forests, and incremental learning
  • examples/dataworkflow.ipynb: a compact end-to-end notebook showing a typical data workflow
  • project.md: 35-hour Human Activity Recognition project combining clustering and classification

Installation

Requirements

The project dependencies are listed in requirements.txt:

  • numpy
  • pandas
  • scikit-learn
  • jupyterlab
  • nbclient
  • nbformat
  • ipykernel
  • matplotlib

Setup

  1. Clone the repository:
git clone https://gitmilab.redclara.net/jsamuel/ciencia-de-datos/
cd ciencia-de-datos
  1. Create and activate a virtual environment:
python -m venv .venv

On Windows:

.venv\Scripts\activate

On macOS or Linux:

source .venv/bin/activate
  1. Install the dependencies from requirements.txt:
pip install -r requirements.txt
  1. Start JupyterLab:
jupyter lab

You can then open practical1.ipynb, practical2.ipynb, examples/dataworkflow.ipynb, or project.md.

Repository Structure

ciencia-de-datos/
|-- data/
|   |-- citypopulation.json
|   |-- customer_campaign_data.csv
|   |-- flower.jpg
|   |-- pl.json
|   |-- plparadigm.json
|   `-- population.csv
|-- examples/
|   `-- dataworkflow.ipynb
|-- project.md
|-- practical1.ipynb
|-- practical2.ipynb
|-- requirements.txt
|-- LICENSE

Folder Guide

  • data/: input datasets and media files used by the practical notebooks
  • examples/: additional worked notebook examples
  • project.md: course project brief on Human Activity Recognition using smartphone sensor data
  • practical1.ipynb: first practical session on visualization and clustering
  • practical2.ipynb: second practical session on supervised learning methods

Exercises and Difficulty

The exercises use a three-level difficulty scale:

  1. *: Easy
  2. **: Medium
  3. ***: Difficult

The practical notebooks mix easy, medium, and difficult exercises so learners can build from fundamentals toward more open-ended tasks.

Author

John Samuel

License

Code in this repository is released under the GPLv3+ license. Documentation and other associated content are released under CC BY-SA 4.0.

About

Ciencia de Datos

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors