AgentDB Learning Plugins
What This Skill Does
Provides access to 9 reinforcement learning algorithms via AgentDB's plugin system. Create, train, and deploy learning plugins for autonomous agents that improve through experience. Includes offline RL (Decision Transformer), value-based learning (Q-Learning), policy gradients (Actor-Critic), and advanced techniques.
Performance: Train models 10-100x faster with WASM-accelerated neural inference.
Prerequisites
- Node.js 18+
- AgentDB v1.0.7+ (via agentic-flow)
- Basic understanding of reinforcement learning (recommended)
Quick Start with CLI
Create Learning Plugin
# Interactive wizard
npx agentdb@latest create-plugin
# Use specific template
npx agentdb@latest create-plugin -t decision-transformer -n my-agent
# Preview without creating
npx agentdb@latest create-plugin -t q-learning --dry-run
# Custom output directory
npx agentdb@latest create-plugin -t actor-critic -o ./plugins
List Available Templates
# Show all plugin templates
npx agentdb@latest list-templates
# Available templates:
# - decision-transformer (sequence modeling RL - recommended)
# - q-learning (value-based learning)
# - sarsa (on-policy TD learning)
# - actor-critic (policy gradient with baseline)
# - curiosity-driven (exploration-based)
Manage Plugins
# List installed plugins
npx agentdb@latest list-plugins
# Get plugin information
npx agentdb@latest plugin-info my-agent
# Shows: algorithm, configuration, training status
Quick Start with API
import { createAgentDBAdapter } from 'agentic-flow/reasoningbank';
// Initialize with learning enabled
const adapter = await createAgentDBAdapter({
dbPath: '.agentdb/learning.db',
enableLearning: true, // Enable learning plugins
enableReasoning: true,
cacheSize: 1000,
});
// Store training experience
await adapter.insertPattern({
id: '',
type: 'experience',
domain: 'game-playing',
pattern_data: JSON.stringify({
embedding: await computeEmbedding('state-action-reward'),
pattern: {
state: [0.1, 0.2, 0.3],
action: 2,
reward: 1.0,
next_state: [0.15, 0.25, 0.35],
done: false
}
}),
confidence: 0.9,
usage_count: 1,
success_count: 1,
created_at: Date.now(),
last_used: Date.now(),
});
// Train learning model
const metrics = await adapter.train({
epochs: 50,
batchSize: 32,
});
console.log('Training Loss:', metrics.loss);
console.log('Duration:', metrics.duration, 'ms');
Available Learning Algorithms (9 Total)
1. Decision Transformer (Recommended)
Type: Offline Reinforcement Learning Best For: Learning from logged experiences, imitation learning Strengths: No online interaction needed, stable training
npx agentdb@latest create-plugin -t decision-transformer -n dt-agent
Use Cases:
- Learn from historical data
- Imitation learning from expert demonstrations
- Safe learning without environment interaction
- Sequence modeling tasks
Configuration:
{
"algorithm": "decision-transformer",
"model_size": "base",
"context_length": 20,
"embed_dim": 128,
"n_heads": 8,
"n_layers": 6
}
2. Q-Learning
Type: Value-Based RL (Off-Policy) Best For: Discrete action spaces, sample efficiency Strengths: Proven, simple, works well for small/medium problems
npx agentdb@latest create-plugin -t q-learning -n q-agent
Use Cases:
- Grid worlds, board games
- Navigation tasks
- Resource allocation
- Discrete decision-making
Configuration:
{
"algorithm": "q-learning",
"learning_rate": 0.001,
"gamma": 0.99,
"epsilon": 0.1,
"epsilon_decay": 0.995
}
3. SARSA
Type: Value-Based RL (On-Policy) Best For: Safe exploration, risk-sensitive tasks Strengths: More conservative than Q-Learning, better for safety
npx agentdb@latest create-plugin -t sarsa -n sarsa-agent
Use Cases:
- Safety-critical applications
- Risk-sensitive decision-making
- Online learning with exploration
Configuration:
{
"algorithm": "sarsa",
"learning_rate": 0.001,
"gamma": 0.99,
"epsilon": 0.1
}
4. Actor-Critic
Type: Policy Gradient with Value Baseline Best For: Continuous actions, variance reduction Strengths: Stable, works for continuous/discrete actions
npx agentdb@latest create-plugin -t actor-critic -n ac-agent
Use Cases:
- Continuous control (robotics, simulations)
- Complex action spaces
- Multi-agent coordination
Configuration:
{
"algorithm": "actor-critic",
"actor_lr": 0.001,
"critic_lr": 0.002,
"gamma": 0.99,
"entropy_coef": 0.01
}
5. Active Learning
Type: Query-Based Learning Best For: Label-efficient learning, human-in-the-loop Strengths: Minimizes labeling cost, focuses on uncertain samples
Use Cases:
- Human feedback incorporation
- Label-efficient training
- Uncertainty sampling
- Annotation cost reduction
6. Adversarial Training
Type: Robustness Enhancement Best For: Safety, robustness to perturbations Strengths: Improves model robustness, adversarial defense
Use Cases:
- Security applications
- Robust decision-making
- Adversarial defense
- Safety testing
7. Curriculum Learning
Type: Progressive Difficulty Training Best For: Complex tasks, faster convergence Strengths: Stable learning, faster convergence on hard tasks
Use Cases:
- Complex multi-stage tasks
- Hard exploration problems
- Skill composition
- Transfer learning
8. Federated Learning
Type: Distributed Learning Best For: Privacy, distributed data Strengths: Privacy-preserving, scalable
Use Cases:
- Multi-agent systems
- Privacy-sensitive data
- Distributed training
- Collaborative learning
9. Multi-Task Learning
Type: Transfer Learning Best For: Related tasks, knowledge sharing Strengths: Faster learning on new tasks, better generalization
Use Cases:
- Task families
- Transfer learning
- Domain adaptation
- Meta-learning
Training Workflow
1. Collect Experiences
// Store experiences during agent execution
for (let i = 0; i < numEpisodes; i++) {
const episode = runEpisode();
for (const step of episode.steps) {
await adapter.insertPattern({
id: '',
type: 'experience',
domain: 'task-domain',
pattern_data: JSON.stringify({
embedding: await computeEmbedding(JSON.stringify(step)),
pattern: {
state: step.state,
action: step.action,
reward: step.reward,
next_state: step.next_state,
done: step.done
}
}),
confidence: step.reward > 0 ? 0.9 : 0.5,
usage_count: 1,
success_count: step.reward > 0 ? 1 : 0,
created_at: Date.now(),
last_used: Date.now(),
});
}
}
2. Train Model
// Train on collected experiences
const trainingMetrics = await adapter.train({
epochs: 100,
batchSize: 64,
learningRate: 0.001,
validationSplit: 0.2,
});
console.log('Training Metrics:', trainingMetrics);
// {
// loss: 0.023,
// valLoss: 0.028,
// duration: 1523,
// epochs: 100
// }
3. Evaluate Performance
// Retrieve similar successful experiences
const testQuery = await computeEmbedding(JSON.stringify(testState));
const result = await adapter.retrieveWithReasoning(testQuery, {
domain: 'task-domain',
k: 10,
synthesizeContext: true,
});
// Evaluate action quality
const suggestedAction = result.memories[0].pattern.action;
const confidence = result.memories[0].similarity;
console.log('Suggested Action:', suggestedAction);
console.log('Confidence:', confidence);
Advanced Training Techniques
Experience Replay
// Store experiences in buffer
const replayBuffer = [];
// Sample random batch for training
const batch =