Embeddings Visualization in FiftyOne
Overview
Visualize your dataset in 2D using deep learning embeddings and dimensionality reduction (UMAP/t-SNE). Explore clusters, find outliers, and color samples by any field.
Use this skill when:
- Visualizing dataset structure in 2D
- Finding natural clusters in images
- Identifying outliers or anomalies
- Exploring data distribution by class or metadata
- Understanding embedding space relationships
Prerequisites
- FiftyOne MCP server installed and running
@voxel51/brainplugin installed and enabled- Dataset with image samples loaded in FiftyOne
Key Directives
ALWAYS follow these rules:
1. Set context first
set_context(dataset_name="my-dataset")
2. Launch FiftyOne App
Brain operators are delegated and require the app:
launch_app()
Wait 5-10 seconds for initialization.
3. Discover operators dynamically
# List all brain operators
list_operators(builtin_only=False)
# Get schema for specific operator
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
4. Compute embeddings before visualization
Embeddings are required for dimensionality reduction:
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "img_sim",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings",
"backend": "sklearn",
"metric": "cosine"
}
)
5. Close app when done
close_app()
Complete Workflow
Step 1: Setup
# Set context
set_context(dataset_name="my-dataset")
# Launch app (required for brain operators)
launch_app()
Step 2: Verify Brain Plugin
# Check if brain plugin is available
list_plugins(enabled=True)
# If not installed:
download_plugin(
url_or_repo="voxel51/fiftyone-plugins",
plugin_names=["@voxel51/brain"]
)
enable_plugin(plugin_name="@voxel51/brain")
Step 3: Discover Brain Operators
# List all available operators
list_operators(builtin_only=False)
# Get schema for compute_visualization
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
Step 4: Check for Existing Embeddings or Compute New Ones
First, check if the dataset already has embeddings by looking at the operator schema:
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
# Look for existing embeddings fields in the "embeddings" choices
# (e.g., "clip_embeddings", "dinov2_embeddings")
If embeddings exist: Skip to Step 5 and use the existing embeddings field.
If no embeddings exist: Compute them:
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "img_viz",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings", # Field name to store embeddings
"backend": "sklearn",
"metric": "cosine"
}
)
Required parameters for compute_similarity:
brain_key- Unique identifier for this brain runmodel- Model from FiftyOne Model Zoo to generate embeddingsembeddings- Field name where embeddings will be storedbackend- Similarity backend (use"sklearn")metric- Distance metric (use"cosine"or"euclidean")
Recommended embedding models:
clip-vit-base32-torch- Best for general visual + semantic similaritydinov2-vits14-torch- Best for visual similarity onlyresnet50-imagenet-torch- Classic CNN featuresmobilenet-v2-imagenet-torch- Fast, lightweight option
Step 5: Compute 2D Visualization
Use existing embeddings field OR the brain_key from Step 4:
# Option A: Use existing embeddings field (e.g., clip_embeddings)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "img_viz",
"embeddings": "clip_embeddings", # Use existing field
"method": "umap",
"num_dims": 2
}
)
# Option B: Use brain_key from compute_similarity
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "img_viz", # Same key used in compute_similarity
"method": "umap",
"num_dims": 2
}
)
Dimensionality reduction methods:
umap- (Recommended) Preserves local and global structure, faster. Requiresumap-learnpackage.tsne- Better local structure, slower on large datasets. No extra dependencies.pca- Linear reduction, fastest but less informative
Step 6: Direct User to Embeddings Panel
After computing visualization, direct the user to open the FiftyOne App at http://localhost:5151/ and:
- Click the Embeddings panel icon (scatter plot icon, looks like a grid of dots) in the top toolbar
- Select the brain key (e.g.,
img_viz) from the dropdown - Points represent samples in 2D embedding space
- Use the "Color by" dropdown to color points by a field (e.g.,
ground_truth,predictions) - Click points to select samples, use lasso tool to select groups
IMPORTANT: Do NOT use set_view(exists=["brain_key"]) - this filters samples and is not needed for visualization. The Embeddings panel automatically shows all samples with computed coordinates.
Step 7: Explore and Filter (Optional)
To filter samples while viewing in the Embeddings panel:
# Filter to specific class
set_view(filters={"ground_truth.label": "dog"})
# Filter by tag
set_view(tags=["validated"])
# Clear filter to show all
clear_view()
These filters will update the Embeddings panel to show only matching samples.
Step 8: Find Outliers
Outliers appear as isolated points far from clusters:
# Compute uniqueness scores (higher = more unique/outlier)
execute_operator(
operator_uri="@voxel51/brain/compute_uniqueness",
params={
"brain_key": "img_viz"
}
)
# View most unique samples (potential outliers)
set_view(sort_by="uniqueness", reverse=True, limit=50)
Step 9: Find Clusters
Use the App's Embeddings panel to visually identify clusters, then:
Option A: Lasso selection in App
- Use lasso tool to select a cluster
- Selected samples are highlighted
- Tag or export selected samples
Option B: Use similarity to find cluster members
# Sort by similarity to a representative sample
execute_operator(
operator_uri="@voxel51/brain/sort_by_similarity",
params={
"brain_key": "img_viz",
"query_id": "sample_id_from_cluster",
"k": 100
}
)
Step 10: Clean Up
close_app()
Available Tools
Session View Tools
| Tool | Description |
|---|---|
set_view(filters={...}) | Filter samples by field values |
set_view(tags=[...]) | Filter samples by tags |
set_view(sort_by="...", reverse=True) | Sort samples by field |
set_view(limit=N) | Limit to N samples |
clear_view() | Clear filters, show all samples |
Brain Operators for Visualization
Use list_operators() to discover and get_operator_schema() to see parameters:
| Operator | Description |
|---|---|
@voxel51/brain/compute_similarity | Compute embeddings and similarity index |
@voxel51/brain/compute_visualization | Reduce embeddings to 2D/3D for visualization |
@voxel51/brain/compute_uniqueness | Score samples by uniqueness (outlier detection) |
@voxel51/brain/sort_by_similarity | Sort by similarity to a query sample |
Common Use Cases
Use Case 1: Basic Dataset Exploration
Visualize dataset structure and explore clusters:
set_context(dataset_name="my-dataset")
launch_app()
# Check for existing embeddings in schema
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
# If embeddings exist (e.g., clip_embeddings), use them directly:
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={