How It Works

This page provides a more detailed look at the indexing process for users who want to understand what’s happening under the hood.

Overview

Indexing is the process by which Supercharger converts your post content into vector embeddings — numerical representations of meaning stored in your database. These embeddings are what power the semantic search behind the AI Recommendations module and any other feature that depends on content similarity.

The Indexing Pipeline

When a post is indexed, Supercharger performs the following steps:

1. Content Extraction

The post’s content is extracted and cleaned. HTML tags, shortcodes, and irrelevant markup are stripped. Only the meaningful text is used.

2. Chunking

Long posts are divided into smaller sections called chunks. Each chunk is approximately 6,000 tokens. This is necessary because embedding models have input length limits, and chunking allows long articles to be accurately represented.

3. Embedding Generation

Each chunk is sent to the OpenAI Embeddings API (text-embedding-3-large). The API returns a high-dimensional vector (a list of numbers) that encodes the semantic meaning of that chunk.

4. Storage

The vectors are stored in the wp_supercharger_vectors database table, associated with the post ID. A content hash is also stored so Supercharger can detect when the post has changed and needs re-indexing.

When Re-indexing Occurs

Supercharger automatically re-indexes a post when:

  • The post content is updated and saved
  • The post’s content hash no longer matches the stored hash

You can also manually trigger re-indexing at any time from:

  • The Supercharger sidebar in Gutenberg (single post)
  • Supercharger → Tools (individual post by ID or full re-index)

Cron and Background Processing

Indexing runs via WordPress cron. The plugin schedules a recurring cron job that processes the indexing queue in batches. If WordPress cron is not running reliably on your server (common on low-traffic sites), you may want to configure a real server-side cron job to trigger wp-cron.php on a schedule.

You can also manually trigger a cron run from Supercharger → Tools → Trigger Indexing Cron.

Database Storage

Embeddings are stored in your WordPress database in the wp_supercharger_vectors table. Each row contains:

  • post_id — The WordPress post ID
  • content_hash — A hash of the post content (used to detect changes)
  • embedding — The vector data
  • model — The embedding model used
  • language — The content language

The table can grow large for sites with thousands of posts and long articles. This is expected behavior. You can check the table size from Supercharger → Tools → Check Index Health.