Documentation Index
Fetch the complete documentation index at: https://docs.ondb.ai/llms.txt
Use this file to discover all available pages before exploring further.
Sharding partitions large collections into smaller segments (shards) based on field values. Instead of scanning an entire collection, queries target only the relevant shards — reducing read latency and improving throughput for high-volume datasets.
For example, an orderbook collection with 50 markets and hourly snapshots can be sharded so that a query for market=SUI, hour=12 scans a single shard (~600KB) instead of the full dataset (~30GB).
Shard Key Types
OnDB supports three sharding strategies that can be combined in a hierarchy:
| Type | Description | Use Case |
|---|
discrete | Exact value partitioning. Each unique value gets its own shard. | Markets, regions, tenant IDs |
time_range | Time-based bucketing with configurable granularity (minute, hour, day, week, month). | Logs, events, time-series data |
hash_distributed | Even distribution across a fixed number of buckets via hashing. | High-cardinality fields, write-heavy workloads |
Setting Up Sharding
With syncCollection (Recommended)
syncCollection creates or updates a collection’s indexes and sharding configuration in a single call. This is the recommended approach for new collections.
import { createClient } from '@ondb/sdk';
const client = createClient({
endpoint: 'https://api.ondb.io',
appId: 'my-app',
appKey: 'your-app-key',
});
await client.syncCollection({
name: 'orderbook_snapshots',
fields: {
market: { type: 'string', index: true },
timestamp: { type: 'number', index: true },
mid_price: { type: 'number' },
spread: { type: 'number' },
},
sharding: {
keys: [
{ field: 'market', type: 'discrete' },
{ field: 'timestamp', type: 'time_range', granularity: 'hour' },
],
enforce_in_queries: true,
max_shard_size: 50_000_000, // 50MB per shard
},
});
Method signature:
async syncCollection(
schema: SimpleCollectionSchemaWithSharding
): Promise<SyncCollectionResult>
The result includes a sharding_configured field that confirms whether sharding was applied.
With setupSharding (Existing Collection)
Use setupSharding to add sharding to a collection that already exists and has data.
await client.setupSharding('orderbook_snapshots', {
keys: [
{ field: 'market', type: 'discrete' },
{ field: 'timestamp', type: 'time_range', granularity: 'hour' },
],
enforce_in_queries: true,
});
Method signature:
async setupSharding(
collection: string,
sharding: ShardingStrategy
): Promise<void>
Shard Key Configuration
ShardingStrategy
interface ShardingStrategy {
/** Ordered list of shard keys -- each adds a level to the hierarchy */
keys: ShardKey[];
/** If true, queries must include all shard key fields */
enforce_in_queries: boolean;
/** Max bytes per shard (optional) */
max_shard_size?: number;
}
ShardKey
interface ShardKey {
/** Field name to shard by (must be an indexed field) */
field: string;
/** Sharding type */
type: 'discrete' | 'time_range' | 'hash_distributed';
/** Required for time_range: 'minute' | 'hour' | 'day' | 'week' | 'month' */
granularity?: 'minute' | 'hour' | 'day' | 'week' | 'month';
/** Required for hash_distributed: number of hash buckets */
num_buckets?: number;
}
Examples
Time-Series Data
Partition event logs by day for efficient date-range queries:
await client.syncCollection({
name: 'system_events',
fields: {
event_type: { type: 'string', index: true },
timestamp: { type: 'date', index: true },
severity: { type: 'string', index: true },
message: { type: 'string' },
},
sharding: {
keys: [
{ field: 'timestamp', type: 'time_range', granularity: 'hour' },
],
enforce_in_queries: true,
},
});
Multi-Tenant Data
Isolate tenant data into discrete shards:
await client.syncCollection({
name: 'user_activity',
fields: {
tenant_id: { type: 'string', index: true },
user_id: { type: 'string', index: true },
action: { type: 'string', index: true },
timestamp: { type: 'date', index: true },
},
sharding: {
keys: [
{ field: 'tenant_id', type: 'discrete' },
],
enforce_in_queries: true,
},
});
High-Volume Writes
Distribute writes evenly across hash buckets to avoid hotspots:
await client.syncCollection({
name: 'telemetry',
fields: {
device_id: { type: 'string', index: true },
reading: { type: 'number' },
timestamp: { type: 'date', index: true },
},
sharding: {
keys: [
{ field: 'device_id', type: 'hash_distributed', num_buckets: 64 },
],
enforce_in_queries: false,
},
});
Query Considerations
When enforce_in_queries is set to true, every query against the collection must include all shard key fields. This prevents full-collection scans and ensures queries hit only the relevant shards.
// With enforce_in_queries: true and keys [market, timestamp]
// This query targets a single shard:
const results = await client.queryBuilder()
.collection('orderbook_snapshots')
.whereField('market').equals('SUI')
.whereField('timestamp').greaterThanOrEqual(1705305600)
.whereField('timestamp').lessThan(1705309200)
.execute();
// Omitting a shard key field will return an error
// when enforce_in_queries is enabled.
If you need to run occasional cross-shard queries (e.g., analytics), set enforce_in_queries: false. Be aware that queries without shard key filters will scan all shards.
Next Steps
Collections & Indexes
Schema design and index configuration
Query Builder
Fluent query API for filtering and sorting