Object Store Best Practices Skill
You are an expert at implementing robust cloud storage operations using the object_store crate. When you detect object_store usage, proactively ensure best practices are followed.
When to Activate
Activate this skill when you notice:
- Code using
ObjectStoretrait,AmazonS3Builder,MicrosoftAzureBuilder, orGoogleCloudStorageBuilder - Discussion about S3, Azure Blob, or GCS operations
- Issues with cloud storage reliability, performance, or errors
- File uploads, downloads, or listing operations
- Questions about retry logic, error handling, or streaming
Best Practices Checklist
1. Retry Configuration
What to Look For:
- Missing retry logic for production code
- Default settings without explicit retry configuration
Good Pattern:
use object_store::aws::AmazonS3Builder;
use object_store::RetryConfig;
let s3 = AmazonS3Builder::new()
.with_region("us-east-1")
.with_bucket_name("my-bucket")
.with_retry(RetryConfig {
max_retries: 3,
retry_timeout: Duration::from_secs(10),
..Default::default()
})
.build()?;
Bad Pattern:
// No retry configuration - fails on transient errors
let s3 = AmazonS3Builder::new()
.with_region("us-east-1")
.with_bucket_name("my-bucket")
.build()?;
Suggestion:
Cloud storage operations need retry logic for production resilience.
Add retry configuration to handle transient failures:
.with_retry(RetryConfig {
max_retries: 3,
retry_timeout: Duration::from_secs(10),
..Default::default()
})
This handles 503 SlowDown, network timeouts, and temporary outages.
2. Error Handling
What to Look For:
- Using
unwrap()orexpect()on storage operations - Not handling specific error types
- Missing context in error propagation
Good Pattern:
use object_store::Error as ObjectStoreError;
use thiserror::Error;
#[derive(Error, Debug)]
enum StorageError {
#[error("Object store error: {0}")]
ObjectStore(#[from] ObjectStoreError),
#[error("File not found: {path}")]
NotFound { path: String },
#[error("Access denied: {path}")]
PermissionDenied { path: String },
}
async fn read_file(store: &dyn ObjectStore, path: &Path) -> Result<Bytes, StorageError> {
match store.get(path).await {
Ok(result) => Ok(result.bytes().await?),
Err(ObjectStoreError::NotFound { path, .. }) => {
Err(StorageError::NotFound { path: path.to_string() })
}
Err(e) => Err(e.into()),
}
}
Bad Pattern:
let data = store.get(&path).await.unwrap(); // Crashes on errors!
Suggestion:
Avoid unwrap() on storage operations. Use proper error handling:
match store.get(&path).await {
Ok(result) => { /* handle success */ }
Err(ObjectStoreError::NotFound { .. }) => { /* handle missing file */ }
Err(e) => { /* handle other errors */ }
}
Or use thiserror for better error types.
3. Streaming Large Objects
What to Look For:
- Loading entire files into memory with
.bytes().await - Not using streaming for large files (>100MB)
Good Pattern (Streaming):
use futures::stream::StreamExt;
let result = store.get(&path).await?;
let mut stream = result.into_stream();
while let Some(chunk) = stream.next().await {
let chunk = chunk?;
// Process chunk incrementally
process_chunk(chunk)?;
}
Bad Pattern (Loading to Memory):
let result = store.get(&path).await?;
let bytes = result.bytes().await?; // Loads entire file!
Suggestion:
For files >100MB, use streaming to avoid memory issues:
let mut stream = store.get(&path).await?.into_stream();
while let Some(chunk) = stream.next().await {
let chunk = chunk?;
process_chunk(chunk)?;
}
This processes data incrementally without loading everything into memory.
4. Multipart Upload for Large Files
What to Look For:
- Using
put()for large files (>100MB) - Missing multipart upload for big data
Good Pattern:
async fn upload_large_file(
store: &dyn ObjectStore,
path: &Path,
data: impl Stream<Item = Bytes>,
) -> Result<()> {
let multipart = store.put_multipart(path).await?;
let mut stream = data;
while let Some(chunk) = stream.next().await {
multipart.put_part(chunk).await?;
}
multipart.complete().await?;
Ok(())
}
Bad Pattern:
// Inefficient for large files
let large_data = vec![0u8; 1_000_000_000]; // 1GB
store.put(path, large_data.into()).await?;
Suggestion:
For files >100MB, use multipart upload for better reliability:
let multipart = store.put_multipart(&path).await?;
for chunk in chunks {
multipart.put_part(chunk).await?;
}
multipart.complete().await?;
Benefits:
- Resume failed uploads
- Better memory efficiency
- Improved reliability
5. Efficient Listing
What to Look For:
- Not using prefixes for listing
- Loading all results without pagination
- Not filtering on client side
Good Pattern:
use futures::stream::StreamExt;
// List with prefix
let prefix = Some(&Path::from("data/2024/"));
let mut list = store.list(prefix);
while let Some(meta) = list.next().await {
let meta = meta?;
if should_process(&meta) {
process_object(&meta).await?;
}
}
Better Pattern with Filtering:
let prefix = Some(&Path::from("data/2024/01/"));
let list = store.list(prefix);
let filtered = list.filter(|result| {
future::ready(match result {
Ok(meta) => meta.location.as_ref().ends_with(".parquet"),
Err(_) => true,
})
});
futures::pin_mut!(filtered);
while let Some(meta) = filtered.next().await {
let meta = meta?;
process_object(&meta).await?;
}
Bad Pattern:
// Lists entire bucket!
let all_objects: Vec<_> = store.list(None).collect().await;
Suggestion:
Use prefixes to limit LIST operations and reduce cost:
let prefix = Some(&Path::from("data/2024/01/"));
let mut list = store.list(prefix);
This is especially important for buckets with millions of objects.
6. Atomic Writes with Rename
What to Look For:
- Writing directly to final location
- Risk of partial writes visible to readers
Good Pattern:
async fn atomic_write(
store: &dyn ObjectStore,
final_path: &Path,
data: Bytes,
) -> Result<()> {
// Write to temp location
let temp_path = Path::from(format!("{}.tmp", final_path));
store.put(&temp_path, data).await?;
// Atomic rename
store.rename(&temp_path, final_path).await?;
Ok(())
}
Bad Pattern:
// Readers might see partial data during write
store.put(&path, data).await?;
Suggestion:
Use temp + rename for atomic writes:
let temp_path = Path::from(format!("{}.tmp", path));
store.put(&temp_path, data).await?;
store.rename(&temp_path, path).await?;
This prevents readers from seeing partial/corrupted data.
7. Connection Pooling
What to Look For:
- Creating new client for each operation
- Not configuring connection limits
Good Pattern:
use object_store::ClientOptions;
let s3 = AmazonS3Builder::new()
.with_client_options(ClientOptions::new()
.with_timeout(Duration::from_secs(30))
.with_connect_timeout(Duration::from_secs(5))
.with_pool_max_idle_per_host(10)
)
.build()?;
// Reuse this store across operations
let store: Arc<dyn ObjectStore> = Arc::new(s3);
Bad Pattern:
// Creating new store for each operation
for file in files {
let s3 = AmazonS3Builder::new().build()?;
upload(s3, file).await?;
}
Suggestion:
Configure connection pooling and reuse the ObjectStore:
let store: Arc<dyn ObjectStore> = Arc::new(s3);
// Clone Arc to share across threads
let store_clone = store.clone();
tokio::spawn(async move {
upload(store_clone, file).await
});