Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.ondb.ai/llms.txt

Use this file to discover all available pages before exploring further.

Sharding partitions large collections into smaller segments (shards) based on field values. Instead of scanning an entire collection, queries target only the relevant shards — reducing read latency and improving throughput for high-volume datasets. For example, an orderbook collection with 50 markets and hourly snapshots can be sharded so that a query for market=SUI, hour=12 scans a single shard (~600KB) instead of the full dataset (~30GB).

Shard Key Types

OnDB supports three sharding strategies that can be combined in a hierarchy:
TypeDescriptionUse Case
discreteExact value partitioning. Each unique value gets its own shard.Markets, regions, tenant IDs
time_rangeTime-based bucketing with configurable granularity (minute, hour, day, week, month).Logs, events, time-series data
hash_distributedEven distribution across a fixed number of buckets via hashing.High-cardinality fields, write-heavy workloads

Setting Up Sharding

syncCollection creates or updates a collection’s indexes and sharding configuration in a single call. This is the recommended approach for new collections.
import { createClient } from '@ondb/sdk';

const client = createClient({
  endpoint: 'https://api.ondb.io',
  appId: 'my-app',
  appKey: 'your-app-key',
});

await client.syncCollection({
  name: 'orderbook_snapshots',
  fields: {
    market: { type: 'string', index: true },
    timestamp: { type: 'number', index: true },
    mid_price: { type: 'number' },
    spread: { type: 'number' },
  },
  sharding: {
    keys: [
      { field: 'market', type: 'discrete' },
      { field: 'timestamp', type: 'time_range', granularity: 'hour' },
    ],
    enforce_in_queries: true,
    max_shard_size: 50_000_000, // 50MB per shard
  },
});
Method signature:
async syncCollection(
  schema: SimpleCollectionSchemaWithSharding
): Promise<SyncCollectionResult>
The result includes a sharding_configured field that confirms whether sharding was applied.

With setupSharding (Existing Collection)

Use setupSharding to add sharding to a collection that already exists and has data.
await client.setupSharding('orderbook_snapshots', {
  keys: [
    { field: 'market', type: 'discrete' },
    { field: 'timestamp', type: 'time_range', granularity: 'hour' },
  ],
  enforce_in_queries: true,
});
Method signature:
async setupSharding(
  collection: string,
  sharding: ShardingStrategy
): Promise<void>

Shard Key Configuration

ShardingStrategy

interface ShardingStrategy {
  /** Ordered list of shard keys -- each adds a level to the hierarchy */
  keys: ShardKey[];

  /** If true, queries must include all shard key fields */
  enforce_in_queries: boolean;

  /** Max bytes per shard (optional) */
  max_shard_size?: number;
}

ShardKey

interface ShardKey {
  /** Field name to shard by (must be an indexed field) */
  field: string;

  /** Sharding type */
  type: 'discrete' | 'time_range' | 'hash_distributed';

  /** Required for time_range: 'minute' | 'hour' | 'day' | 'week' | 'month' */
  granularity?: 'minute' | 'hour' | 'day' | 'week' | 'month';

  /** Required for hash_distributed: number of hash buckets */
  num_buckets?: number;
}

Examples

Time-Series Data

Partition event logs by day for efficient date-range queries:
await client.syncCollection({
  name: 'system_events',
  fields: {
    event_type: { type: 'string', index: true },
    timestamp: { type: 'date', index: true },
    severity: { type: 'string', index: true },
    message: { type: 'string' },
  },
  sharding: {
    keys: [
      { field: 'timestamp', type: 'time_range', granularity: 'hour' },
    ],
    enforce_in_queries: true,
  },
});

Multi-Tenant Data

Isolate tenant data into discrete shards:
await client.syncCollection({
  name: 'user_activity',
  fields: {
    tenant_id: { type: 'string', index: true },
    user_id: { type: 'string', index: true },
    action: { type: 'string', index: true },
    timestamp: { type: 'date', index: true },
  },
  sharding: {
    keys: [
      { field: 'tenant_id', type: 'discrete' },
    ],
    enforce_in_queries: true,
  },
});

High-Volume Writes

Distribute writes evenly across hash buckets to avoid hotspots:
await client.syncCollection({
  name: 'telemetry',
  fields: {
    device_id: { type: 'string', index: true },
    reading: { type: 'number' },
    timestamp: { type: 'date', index: true },
  },
  sharding: {
    keys: [
      { field: 'device_id', type: 'hash_distributed', num_buckets: 64 },
    ],
    enforce_in_queries: false,
  },
});

Query Considerations

When enforce_in_queries is set to true, every query against the collection must include all shard key fields. This prevents full-collection scans and ensures queries hit only the relevant shards.
// With enforce_in_queries: true and keys [market, timestamp]
// This query targets a single shard:
const results = await client.queryBuilder()
  .collection('orderbook_snapshots')
  .whereField('market').equals('SUI')
  .whereField('timestamp').greaterThanOrEqual(1705305600)
  .whereField('timestamp').lessThan(1705309200)
  .execute();

// Omitting a shard key field will return an error
// when enforce_in_queries is enabled.
If you need to run occasional cross-shard queries (e.g., analytics), set enforce_in_queries: false. Be aware that queries without shard key filters will scan all shards.

Next Steps

Collections & Indexes

Schema design and index configuration

Query Builder

Fluent query API for filtering and sorting