GitHub for Smell

A version-controlled database for the world's olfactory data is becoming possible. Here's the business case for a 'GitHub for smells'.

GitHub for Smell
A version-controlled system where olfactory profiles are layered and forked, allowing new scent combinations to emerge from a collaborative foundation.

⚡ The Signal

For decades, digitizing scent has been more science fiction than reality. But two trends are changing the landscape. First, the technology to capture and analyze molecular scent profiles is escaping expensive, monolithic labs and becoming more modular. Second, a powerful cultural and scientific movement is underway to preserve our olfactory heritage, from capturing whiffs of inspiration from endangered plants to recreating the aroma of historical artifacts. The age of digital olfaction is arriving.

🚧 The Problem

Scent data today is a mess. It's locked away in proprietary, internal databases at giant fragrance houses like Givaudan. It’s listed in static, non-computational directories like The Good Scents Company. Or it’s fragmented across countless university labs in incompatible formats. There is no central, standardized platform for researchers, perfumers, or historians to upload, version, analyze, and collaborate on olfactory data. We have a 'lingua franca' for images (JPEG) and documents (PDF), but nothing for smell.

🚀 The Solution

Enter Scentgraph: a "GitHub for Smell." It’s a version-controlled, collaborative database for digital scent profiles. Scentgraph allows perfumers, scientists, and historians to upload raw data from lab equipment, convert it into a standardized "smell-fingerprint," and compare it against a global library of scents. Users can track formula changes, fork existing scent profiles to create new variations, and discover molecular similarities between seemingly unrelated aromas. It’s a foundational layer for the future of scent creation and preservation.

🎧 Audio Edition (Beta)

Listen to Ada and Charles discuss today's business idea.

If you're reading this in your email, you may need to open the post in a browser to see the audio player.

💰 The Business Case

Revenue Model

Scentgraph will operate on a tiered SaaS model.

  • Pro Tier: A monthly subscription ($25/mo) for individual researchers and perfumers, offering private repositories, advanced analytical tools, and version history for proprietary formulas.
  • API Tier: A usage-based, metered API for developers and businesses to programmatically query the database, find similar scent profiles, or build applications on top of the data.
  • Enterprise Tier: A high-touch service for large R&D labs providing dedicated instances, on-premise deployment, and advanced security for sensitive R&D.

Go-To-Market

The initial push is about building a community and providing immediate value.

  • Free Tool: A "Scent Visualizer" web app that converts standard GC-MS data files into a beautiful, shareable "smell-fingerprint" visualization, acting as a powerful lead magnet.
  • Open Source Library: Release a Python library (‘aroma-parser’) to standardize data from common e-noses and lab equipment, building credibility with the scientific community.
  • Programmatic SEO: Create a public, indexed page for every known chemical compound, capturing long-tail search traffic from researchers and students looking for olfactory data.

⚔️ The Moat

The biggest players (Givaudan, IFF) have massive internal databases, but they are closed ecosystems. Tools like OpenChrom are for analysis, not collaboration. Scentgraph’s unfair advantage is data accumulation and network effects. Each uploaded scent profile makes the platform's analytical models more accurate and its similarity search more powerful. This creates a proprietary data moat that becomes exponentially harder for competitors to replicate, attracting more users who contribute more data in a virtuous cycle.

⏳ Why Now

The timing is right because the enabling technologies and cultural demand are converging. The push to digitally preserve the natural world, as seen in efforts to catalog the scents of endangered flowers, provides a strong market pull. Simultaneously, the trend toward hyper-specific, miniaturized sensors, like those used to uncover wasps' social secrets, shows that the hardware for capturing complex data is becoming more accessible. This creates a new class of sophisticated prosumers and researchers who need a platform to manage the data they're now able to collect, sparking the exact kind of community discussion that precedes a market shift.

🛠️ Builder's Corner

This is just one way to build it, but here's a recommended MVP stack.

A Python backend using FastAPI can handle the ingestion of scent data files (like CSVs or JCAMP-DX). Use Pandas for initial parsing and cleaning. The core magic happens with a library like scikit-learn to perform dimensionality reduction (t-SNE or PCA) on the molecular data, creating a vector "smell-fingerprint."

Store these vectors and all associated metadata in a PostgreSQL database with the pgvector extension enabled. This makes performing efficient k-NN similarity searches ("find smells like this one") incredibly fast and scalable. A Next.js frontend deployed on Vercel can provide the user dashboard, data upload interface, and interactive visualizations of the scent fingerprints.


Legal Disclaimer: GammaVibe is provided for inspiration only. The ideas and names suggested have not been vetted for viability, legality, or intellectual property infringement (including patents and trademarks). This is not financial or legal advice. Always perform your own due diligence and clearance searches before executing on any concept.