OpenVLA-OFT
Fine-tuning and evaluation workflows for OpenVLA-OFT and OpenVLA-OFT+ from the official openvla-oft codebase. Covers blank-machine setup plus LoRA-based adaptation of OpenVLA for robot action generation with continuous action prediction heads.
Quick start
Clone the public repo, follow the official setup, then evaluate a pretrained LIBERO checkpoint:
git clone https://github.com/moojink/openvla-oft.git
cd openvla-oft
python experiments/robot/libero/run_libero_eval.py \
--pretrained_checkpoint moojink/openvla-7b-oft-finetuned-libero-spatial \
--task_suite_name libero_spatial \
--center_crop True \
--num_trials_per_task 50 \
--seed 7
Core concepts
What OpenVLA-OFT changes: Standard OpenVLA tokenizes continuous actions into discrete bins, losing precision. OFT replaces this with dedicated continuous action heads (L1 regression or diffusion) while keeping the VLA backbone frozen and adapting via LoRA.
OFT vs OFT+ variants:
| Variant | FiLM | Images | Typical use |
|---|---|---|---|
| OFT | Off | 2 (front + wrist) | LIBERO simulation |
| OFT+ | On | 3 (high + left + right wrist) | ALOHA real-world |
Key architecture choices:
- LoRA adaptation: Rank-32 LoRA on VLA backbone (no full fine-tuning needed)
- Continuous actions: L1 regression head (default) or diffusion head
- FiLM conditioning: Feature-wise Linear Modulation for stronger language grounding in OFT+
- Multi-image input: Configurable 2 or 3 camera streams via
num_images_in_input
Compute requirements
| Task | GPU | VRAM | Notes |
|---|---|---|---|
| LIBERO evaluation | 1x A100/A40 | ~16 GB | Single GPU |
| ALOHA evaluation | 1x A100/A40 | ~18 GB | Single GPU |
| LIBERO fine-tuning | 8x A100 | ~27 GB/GPU | Paper default |
| ALOHA fine-tuning (OFT+) | 8x A100 | ~35 GB/GPU | FiLM + 3 images |
| LoRA merge | 1x any GPU | ~16 GB | One-time step |
Expected performance benchmarks
Official results (paper setup, seed=7, 50 trials per task):
| Task Suite | Task-Specific | Combined Policy | Notes |
|---|---|---|---|
| LIBERO-Spatial | 97.2% | 96.8% | Easiest suite |
| LIBERO-Object | 97.4% | 97.0% | Object manipulation |
| LIBERO-Goal | 95.8% | 95.4% | May peak at 50k-100k steps |
| LIBERO-10 | 98.0% | 98.0% | Long-horizon tasks |
| Average | 97.1% | 96.8% | Near-equivalent |
Reproduction notes: results are tied to Python 3.10.14, PyTorch 2.2.0, NVIDIA A100, and custom Transformers fork.
When to use vs alternatives
Use OpenVLA-OFT when:
- The target task is robot action generation with visual and language conditioning
- LoRA-based adaptation of
openvla/openvla-7bis preferred - You need official LIBERO or ALOHA workflows from the OpenVLA-OFT paper
- You want continuous action heads (L1 regression or diffusion) instead of tokenized actions
Use alternatives when:
- You need a different VLA architecture (use
fine-tuning-serving-openpifor pi0/pi0.5 models) - You need the NVIDIA Cosmos Policy stack (use
evaluating-cosmos-policy) - You need general LLM fine-tuning without robot action heads
Workflow 1: Set up environment
Copy this checklist and track progress:
Setup Progress:
- [ ] Step 1: Create conda env and install PyTorch
- [ ] Step 2: Install openvla-oft package in editable mode
- [ ] Step 3: Install FlashAttention2
- [ ] Step 4: Verify critical versions
Step 1: Create conda env and clone repo
conda create -n openvla-oft python=3.10 -y
conda activate openvla-oft
git clone https://github.com/moojink/openvla-oft.git
cd openvla-oft
pip3 install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0
pip3 install robosuite==1.4.0
Step 2: Install package
pip install -e .
Step 3: Install FlashAttention2
pip install packaging ninja
pip install "flash-attn==2.5.5" --no-build-isolation
Step 4: Verify versions
import torch, transformers, peft
print(f"PyTorch: {torch.__version__}") # Expected: 2.2.0
print(f"Transformers: {transformers.__version__}")
print(f"PEFT: {peft.__version__}") # Expected: 0.11.1
Workflow 2: Evaluate pretrained checkpoints on LIBERO
LIBERO Eval Progress:
- [ ] Step 1: Install LIBERO dependencies
- [ ] Step 2: Choose checkpoint and task suite
- [ ] Step 3: Run evaluation
- [ ] Step 4: Parse and validate results
Step 1: Install LIBERO
git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
pip install -e LIBERO
pip install -r experiments/robot/libero/libero_requirements.txt
Step 2: Choose checkpoint
| Checkpoint | Task suite |
|---|---|
moojink/openvla-7b-oft-finetuned-libero-spatial | libero_spatial |
moojink/openvla-7b-oft-finetuned-libero-object | libero_object |
moojink/openvla-7b-oft-finetuned-libero-goal | libero_goal |
moojink/openvla-7b-oft-finetuned-libero-10 | libero_10 |
moojink/openvla-7b-oft-finetuned-libero-spatial-object-goal-10 | Combined |
Step 3: Run evaluation
python experiments/robot/libero/run_libero_eval.py \
--pretrained_checkpoint moojink/openvla-7b-oft-finetuned-libero-spatial \
--task_suite_name libero_spatial \
--center_crop True \
--num_trials_per_task 50 \
--seed 7
Step 4: Parse results
import re
def parse_libero_log(log_path):
"""Extract per-task success rates from LIBERO eval log."""
with open(log_path) as f:
content = f.read()
matches = re.findall(r"Task (.+?): (\d+)/(\d+) successes", content)
for task, successes, trials in matches:
rate = int(successes) / int(trials)
print(f" {task}: {rate:.0%} ({successes}/{trials})")
parse_libero_log("experiments/logs/latest.log")
Workflow 3: Fine-tune on LIBERO
Detailed reference: See references/libero-workflow.md for the full LIBERO setup, checkpoint selection strategy, and LoRA merge instructions.
LIBERO Fine-Tune Progress:
- [ ] Step 1: Prepare RLDS dataset
- [ ] Step 2: Launch torchrun with OFT defaults
- [ ] Step 3: Evaluate intermediate and final checkpoints
- [ ] Step 4: Merge LoRA for deployment if needed
Step 1: Dataset
Use RLDS datasets: libero_spatial_no_noops, libero_object_no_noops, libero_goal_no_noops, libero_10_no_noops.
Step 2: Launch training
torchrun --standalone --nnodes 1 --nproc-per-node 8 vla-scripts/finetune.py \
--vla_path openvla/openvla-7b \
--data_root_dir /PATH/TO/RLDS/DATASETS/ \
--dataset_name libero_spatial_no_noops \
--run_root_dir /YOUR/CHECKPOINTS/ \
--use_l1_regression True \
--use_diffusion False \
--use_film False \
--num_images_in_input 2 \
--use_proprio True \
--batch_size 8 \
--learning_rate 5e-4 \
--num_steps_before_decay 100000 \
--max_steps 150005 \
--save_freq 10000 \
--save_latest_checkpoint_only False \
--image_aug True \
--lora_rank 32 \
--wandb_entity YOUR_WANDB_ENTITY \
--wandb_project YOUR_WANDB_PROJECT
Step 3: Evaluate checkpoints
Evaluate 50k, 100k, and 150k checkpoints — LIBERO-Goal may peak earlier than other suites. Keep best checkpoint per suite by actual task success, not only training loss.
Step 4: Merge LoRA
python vla-scripts/merge_lora_weights_and_save.py \
--base_checkpoint openvla/openvla-7b \
--lora_finetuned_checkpoint_dir /PATH/TO/CHECKPOINT_DIR
Workflow 4: Train and evaluate OpenVLA-OFT+ on ALOHA
Detailed reference: See references/aloha-workflow.md for the full ALOHA server-client setup, data preprocessing, dataset registration, and troubleshooting.
ALOHA Progress:
- [ ] Step 1: Preprocess raw ALOHA demonstrations
- [ ] Step 2: Convert to RLDS and register dataset configs
- [ ] Step 3: Fine-tune OFT+ with FiLM and 3 images
- [ ] Step 4: Start VLA server on G