Skip to main content
Sharding partitions large collections into smaller segments (shards) based on field values. Instead of scanning an entire collection, queries target only the relevant shards — reducing read latency and improving throughput for high-volume datasets. For example, an orderbook collection with 50 markets and hourly snapshots can be sharded so that a query for market=SUI, hour=12 scans a single shard (~600KB) instead of the full dataset (~30GB).

Shard Key Types

OnDB supports three sharding strategies that can be combined in a hierarchy:
TypeDescriptionUse Case
discreteExact value partitioning. Each unique value gets its own shard.Markets, regions, tenant IDs
time_rangeTime-based bucketing with configurable granularity (minute, hour, day, week, month).Logs, events, time-series data
hash_distributedEven distribution across a fixed number of buckets via hashing.High-cardinality fields, write-heavy workloads

Setting Up Sharding

syncCollection creates or updates a collection’s indexes and sharding configuration in a single call. This is the recommended approach for new collections.
import { createClient } from '@ondb/sdk';

const client = createClient({
  endpoint: 'https://api.ondb.io',
  appId: 'my-app',
  appKey: 'your-app-key',
});

await client.syncCollection({
  name: 'orderbook_snapshots',
  fields: {
    market: { type: 'string', index: true },
    timestamp: { type: 'number', index: true },
    mid_price: { type: 'number' },
    spread: { type: 'number' },
  },
  sharding: {
    keys: [
      { field: 'market', type: 'discrete' },
      { field: 'timestamp', type: 'time_range', granularity: 'hour' },
    ],
    enforce_in_queries: true,
    max_shard_size: 50_000_000, // 50MB per shard
  },
});
Method signature:
async syncCollection(
  schema: SimpleCollectionSchemaWithSharding
): Promise<SyncCollectionResult>
The result includes a sharding_configured field that confirms whether sharding was applied.

With setupSharding (Existing Collection)

Use setupSharding to add sharding to a collection that already exists and has data.
await client.setupSharding('orderbook_snapshots', {
  keys: [
    { field: 'market', type: 'discrete' },
    { field: 'timestamp', type: 'time_range', granularity: 'hour' },
  ],
  enforce_in_queries: true,
});
Method signature:
async setupSharding(
  collection: string,
  sharding: ShardingStrategy
): Promise<void>

Shard Key Configuration

ShardingStrategy

interface ShardingStrategy {
  /** Ordered list of shard keys -- each adds a level to the hierarchy */
  keys: ShardKey[];

  /** If true, queries must include all shard key fields */
  enforce_in_queries: boolean;

  /** Max bytes per shard (optional) */
  max_shard_size?: number;
}

ShardKey

interface ShardKey {
  /** Field name to shard by (must be an indexed field) */
  field: string;

  /** Sharding type */
  type: 'discrete' | 'time_range' | 'hash_distributed';

  /** Required for time_range: 'minute' | 'hour' | 'day' | 'week' | 'month' */
  granularity?: 'minute' | 'hour' | 'day' | 'week' | 'month';

  /** Required for hash_distributed: number of hash buckets */
  num_buckets?: number;
}

Examples

Time-Series Data

Partition event logs by day for efficient date-range queries:
await client.syncCollection({
  name: 'system_events',
  fields: {
    event_type: { type: 'string', index: true },
    timestamp: { type: 'date', index: true },
    severity: { type: 'string', index: true },
    message: { type: 'string' },
  },
  sharding: {
    keys: [
      { field: 'timestamp', type: 'time_range', granularity: 'hour' },
    ],
    enforce_in_queries: true,
  },
});

Multi-Tenant Data

Isolate tenant data into discrete shards:
await client.syncCollection({
  name: 'user_activity',
  fields: {
    tenant_id: { type: 'string', index: true },
    user_id: { type: 'string', index: true },
    action: { type: 'string', index: true },
    timestamp: { type: 'date', index: true },
  },
  sharding: {
    keys: [
      { field: 'tenant_id', type: 'discrete' },
    ],
    enforce_in_queries: true,
  },
});

High-Volume Writes

Distribute writes evenly across hash buckets to avoid hotspots:
await client.syncCollection({
  name: 'telemetry',
  fields: {
    device_id: { type: 'string', index: true },
    reading: { type: 'number' },
    timestamp: { type: 'date', index: true },
  },
  sharding: {
    keys: [
      { field: 'device_id', type: 'hash_distributed', num_buckets: 64 },
    ],
    enforce_in_queries: false,
  },
});

Query Considerations

When enforce_in_queries is set to true, every query against the collection must include all shard key fields. This prevents full-collection scans and ensures queries hit only the relevant shards.
// With enforce_in_queries: true and keys [market, timestamp]
// This query targets a single shard:
const results = await client.queryBuilder()
  .collection('orderbook_snapshots')
  .whereField('market').equals('SUI')
  .whereField('timestamp').greaterThanOrEqual(1705305600)
  .whereField('timestamp').lessThan(1705309200)
  .execute();

// Omitting a shard key field will return an error
// when enforce_in_queries is enabled.
If you need to run occasional cross-shard queries (e.g., analytics), set enforce_in_queries: false. Be aware that queries without shard key filters will scan all shards.

Next Steps