Flutter Gemma Scaffold
Scaffold guide for integrating Gemma 4 on-device AI into Flutter projects.
Capabilities
- Create new Flutter project with Gemma integration from scratch
- Add Gemma to existing Flutter project
- Configure platforms (iOS, Android, Web, Desktop)
Workflow
Step 1: Confirm Requirements
Ask the user:
- New project or existing project?
- Target platform(s)? (iOS/Android/Web/All)
- Need function calling support?
- Have HuggingFace token? (for model downloads)
Step 2: Add Dependencies
In pubspec.yaml:
dependencies:
flutter: sdk: flutter
flutter_gemma: ^0.13.2
flutter_riverpod: ^2.5.0
riverpod_annotation: ^2.3.0
go_router: ^14.0.0
dev_dependencies:
build_runner: ^2.4.0
riverpod_generator: ^2.4.0
flutter_lints: ^4.0.0
flutter:
uses-material-design: true
Step 2b: Initialize Riverpod
Wrap your app with ProviderScope in main.dart:
import 'package:flutter_riverpod/flutter_riverpod.dart';
void main() {
runApp(
const ProviderScope(
child: MyApp(),
),
);
}
Step 3: Platform Configuration
iOS (ios/Podfile)
platform :ios, '16.0'
use_frameworks! :linkage => :static
iOS 16.0+ required. Static linking required.
Android (android/app/build.gradle.kts)
android {
defaultConfig {
minSdk = 24 // Android 7.0+
}
}
Optional GPU acceleration (requires OpenCL):
<uses-native-library android:name="libOpenCL.so" android:required="false"/>
<uses-native-library android:name="libOpenCL-car.so" android:required="false"/>
<uses-native-library android:name="libOpenCL-pixel.so" android:required="false"/>
Web (web/index.html)
In <head>:
<script type="module">
import { FilesetResolver, LlmInference } from 'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai@0.10.27';
window.FilesetResolver = FilesetResolver;
window.LlmInference = LlmInference;
</script>
Step 4: Model Installation
Recommended Models
| Model | Size | Memory | Best For |
|---|---|---|---|
| Gemma 4 E2B Q4_K_M | ~3.5GB | <1.5GB | General purpose, balance of quality and performance |
| Gemma 4 E4B Q4_K_M | ~6.5GB | <2.5GB | Stronger reasoning |
| FunctionGemma 270M | ~284MB | <300MB | Function calling only, lightweight |
Installation Code
import 'package:flutter_gemma/flutter_gemma.dart';
void main() async {
WidgetsFlutterBinding.ensureInitialized();
// Initialize (call once at app startup)
await FlutterGemma.initialize(
huggingFaceToken: 'hf_xxxx', // Optional, required for gated models
);
runApp(MyApp());
}
Download Model
// From HuggingFace
await FlutterGemma.installModel(
modelType: ModelType.gemma4E2BIt,
)
.fromNetwork(
'https://huggingface.co/bartowski/gemma-4-2b-it-GGUF/resolve/main/gemma-4-2b-it-Q4_K_M.gguf',
token: 'hf_xxxx', // If model requires auth
)
.withProgress((progress) {
print('Download: ${progress.percentage.toStringAsFixed(1)}%');
})
.install();
Step 5: Model Loading & Inference
// Get active model
final model = await FlutterGemma.getActiveModel(
maxTokens: 2048,
preferredBackend: PreferredBackend.gpu,
);
// Create session
final session = await model.createSession();
// Send message
await session.addQueryChunk(Message.text(
text: 'Hello, explain quantum computing',
isUser: true,
));
// Get response
final response = await session.getResponse();
print(response.text);
// Close
await session.close();
Step 6: Streaming Response
final session = await model.createSession();
await session.addQueryChunk(Message.text(text: prompt, isUser: true));
final stream = session.getResponseStream();
await for (final event in stream) {
if (event is TextResponse) {
print('token: ${event.text}');
}
}
Common Configurations
Multimodal (Image Input)
final model = await FlutterGemma.getActiveModel(
supportImage: true,
maxNumImages: 1,
);
final session = await model.createSession();
// Add image
final imageBytes = await rootBundle.load('assets/image.png');
await session.addQueryChunk(Message.multimodal(
text: 'Describe this image',
images: [imageBytes.buffer.asUint8List()],
isUser: true,
));
Function Calling
final tool = Tool.fromJsonSchema({
'name': 'get_weather',
'description': 'Get weather for a location',
'parameters': {
'type': 'object',
'properties': {
'location': {'type': 'string'},
},
'required': ['location'],
},
});
final session = await model.createSession(tools: [tool]);
await session.addQueryChunk(Message.text(
text: 'What\'s the weather in Tokyo?',
isUser: true,
));
final stream = session.getResponseStream();
await for (final event in stream) {
if (event is FunctionCallResponse) {
print('Function: ${event.functionCall.functionName}');
print('Args: ${event.functionCall.argumentsText}');
}
}
Thinking Mode (Gemma 4)
final session = await model.createSession(
enableThinking: true,
);
Platform Differences
| Capability | iOS | Android | Web | Desktop |
|---|---|---|---|---|
| GPU acceleration | Metal | Vulkan/OpenCL | WebGPU | GPU |
| Multimodal | ✅ | ✅ | ✅ | ✅ |
| Audio | ✅ | ✅ | ❌ | ✅ |
| Thinking | ✅ | ✅ | ❌ | ✅ |
| Embeddings | ✅ | ✅ | ✅ | ✅ |
| Function Calling | ✅ | ✅ | ✅ | ✅ |
Web Limitations
- Must use WebGPU backend
- Thinking mode not supported
- Audio not supported
Desktop Limitations
- Requires JRE (auto-downloaded Azul Zulu)
- Communicates with LiteRT-LM via gRPC
- Model format:
.litertlmonly
Model Download Sources
Official MediaPipe Format (Recommended for Mobile)
https://huggingface.co/google/gemma-4-2b-it/
GGUF Format (llama.cpp ecosystem)
https://huggingface.co/bartowski/gemma-4-2b-it-GGUF
https://huggingface.co/unsloth/gemma-4-2b-it-GGUF
LiteRT-LM Format (Desktop)
https://huggingface.co/litert-community/gemma-4-e2b-it-litertlm
Troubleshooting
iOS Build Fails
- Verify
platform :ios, '16.0' - Verify
use_frameworks! :linkage => :static - Check entitlements configuration
Android Build Fails
- Verify
minSdk = 24 - For NPU, verify device support
Web Not Working
- Verify MediaPipe script loads correctly in
index.html - Verify using
PreferredBackend.gpu - Safari doesn't support WebGPU (need Chrome/Edge)
Slow Model Download
- Use HuggingFace token for faster downloads
- Consider GGUF Q4_K_M quantization (smaller, faster)
Complete Example
This example follows Flutter best practices:
- Riverpod for state management
- Feature-based project structure
AsyncNotifierProviderfor async stateAsyncValue.guard()for error handlingConsumerWidgetfor efficient rebuilds
Project Structure
lib/
├── main.dart
├── app.dart
├── core/
│ └── theme/
│ └── app_theme.dart
├── features/
│ └── gemma_chat/
│ ├── data/
│ │ └── gemma_repository.dart
│ ├── domain/
│ │ └── gemma_service.dart
│ └── presentation/
│ ├── screens/
│ │ └── gemma_chat_screen.dart
│ ├── widgets/
│ │ └── chat_bubble.dart
│ └── providers/
│ └── gemma_providers.dart
└── routes/
└── app_router.dart
main.dart
import 'package:flutter/material.dart';
import 'package:flutter_riverpod/flutter_riverpod.dart';
import 'package:flutter_gemma/flutter_gemma.dart';
import 'app.dart';
void main() async {
WidgetsFlutterBinding.ensureInitialized();
// Initialize Gemma (call once at startup)
await FlutterGemma.initialize();
runApp(
const ProviderScope(
child: GemmaApp(),
),
);
}
app.dart
import 'package:flutter/material.dart';
import 'package:flutter_riverpod/flutter_riverpod.dart';
import 'package:go_router/go_router.dart';
impor