CacheVector

Advancing machine learning libraries, mathematical research, and developer tools through open-source innovation

Currently Working on:

HashPrep

UNDER DEV

Think "Pandas Profiling + ESLint + AutoML" specifically designed for ML datasets.

HashPrep is an intelligent dataset debugging and preparation platform that acts as a comprehensive pre-training quality assurance tool for machine learning projects. The platform catches critical dataset issues before they derail your ML pipeline, automatically suggests fixes, and generates production-ready cleaning code - saving hours of manual data debugging and preparation work.

Smart Detection

Automatically identifies data quality issues, anomalies, and potential ML pipeline bottlenecks

Auto-Fix Suggestions

Provides intelligent recommendations and generates production-ready cleaning code

Comprehensive Profiling

Deep statistical analysis and visualization of your dataset characteristics

Pipeline Integration

Seamlessly integrates into existing ML workflows and CI/CD pipelines

our mission

At cachevector, we're building the invisible parts that matter.

Work That Lasts

We focus on the parts most people skip. Math, algorithms, and libraries that aren’t glamorous, but make everything else possible.

Open by Default

Everything we do is open-source. We value clarity over polish, and we share our work so others can build on it. If it’s useful, it belongs in the open

Math, Models, Tools

We work on core mathematics, train and test ML and DL models, and create libraries that improve performance. The goal is always to make complex ideas usable in practice.

Quiet Progress

We don’t chase hype or buzzwords. We publish, package, and share work that people can depend on. Quiet progress matters more than loud promises.

Our projects are open, simple, and built to be useful. If something helps you, star it. If you can improve it, fork it. Even just exploring the repos means a lot — the work is meant to be shared.