Flutter Gemma Scaffold

Scaffold guide for integrating Gemma 4 on-device AI into Flutter projects.

Capabilities

Create new Flutter project with Gemma integration from scratch
Add Gemma to existing Flutter project
Configure platforms (iOS, Android, Web, Desktop)

Workflow

Step 1: Confirm Requirements

Ask the user:

New project or existing project?
Target platform(s)? (iOS/Android/Web/All)
Need function calling support?
Have HuggingFace token? (for model downloads)

Step 2: Add Dependencies

In pubspec.yaml:

dependencies:
  flutter: sdk: flutter
  flutter_gemma: ^0.13.2
  flutter_riverpod: ^2.5.0
  riverpod_annotation: ^2.3.0
  go_router: ^14.0.0

dev_dependencies:
  build_runner: ^2.4.0
  riverpod_generator: ^2.4.0
  flutter_lints: ^4.0.0

flutter:
  uses-material-design: true

Step 2b: Initialize Riverpod

Wrap your app with ProviderScope in main.dart:

import 'package:flutter_riverpod/flutter_riverpod.dart';

void main() {
  runApp(
    const ProviderScope(
      child: MyApp(),
    ),
  );
}

Step 3: Platform Configuration

iOS (`ios/Podfile`)

platform :ios, '16.0'
use_frameworks! :linkage => :static

iOS 16.0+ required. Static linking required.

Android (`android/app/build.gradle.kts`)

android {
    defaultConfig {
        minSdk = 24  // Android 7.0+
    }
}

Optional GPU acceleration (requires OpenCL):

<uses-native-library android:name="libOpenCL.so" android:required="false"/>
<uses-native-library android:name="libOpenCL-car.so" android:required="false"/>
<uses-native-library android:name="libOpenCL-pixel.so" android:required="false"/>

Web (`web/index.html`)

In <head>:

<script type="module">
  import { FilesetResolver, LlmInference } from 'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai@0.10.27';
  window.FilesetResolver = FilesetResolver;
  window.LlmInference = LlmInference;
</script>

Step 4: Model Installation

Recommended Models

Model	Size	Memory	Best For
Gemma 4 E2B Q4_K_M	~3.5GB	<1.5GB	General purpose, balance of quality and performance
Gemma 4 E4B Q4_K_M	~6.5GB	<2.5GB	Stronger reasoning
FunctionGemma 270M	~284MB	<300MB	Function calling only, lightweight

Installation Code

import 'package:flutter_gemma/flutter_gemma.dart';

void main() async {
  WidgetsFlutterBinding.ensureInitialized();

  // Initialize (call once at app startup)
  await FlutterGemma.initialize(
    huggingFaceToken: 'hf_xxxx',  // Optional, required for gated models
  );

  runApp(MyApp());
}

Download Model

// From HuggingFace
await FlutterGemma.installModel(
  modelType: ModelType.gemma4E2BIt,
)
.fromNetwork(
  'https://huggingface.co/bartowski/gemma-4-2b-it-GGUF/resolve/main/gemma-4-2b-it-Q4_K_M.gguf',
  token: 'hf_xxxx',  // If model requires auth
)
.withProgress((progress) {
  print('Download: ${progress.percentage.toStringAsFixed(1)}%');
})
.install();

Step 5: Model Loading & Inference

// Get active model
final model = await FlutterGemma.getActiveModel(
  maxTokens: 2048,
  preferredBackend: PreferredBackend.gpu,
);

// Create session
final session = await model.createSession();

// Send message
await session.addQueryChunk(Message.text(
  text: 'Hello, explain quantum computing',
  isUser: true,
));

// Get response
final response = await session.getResponse();
print(response.text);

// Close
await session.close();

Step 6: Streaming Response

final session = await model.createSession();
await session.addQueryChunk(Message.text(text: prompt, isUser: true));

final stream = session.getResponseStream();
await for (final event in stream) {
  if (event is TextResponse) {
    print('token: ${event.text}');
  }
}

Common Configurations

Multimodal (Image Input)

final model = await FlutterGemma.getActiveModel(
  supportImage: true,
  maxNumImages: 1,
);

final session = await model.createSession();

// Add image
final imageBytes = await rootBundle.load('assets/image.png');
await session.addQueryChunk(Message.multimodal(
  text: 'Describe this image',
  images: [imageBytes.buffer.asUint8List()],
  isUser: true,
));

Function Calling

final tool = Tool.fromJsonSchema({
  'name': 'get_weather',
  'description': 'Get weather for a location',
  'parameters': {
    'type': 'object',
    'properties': {
      'location': {'type': 'string'},
    },
    'required': ['location'],
  },
});

final session = await model.createSession(tools: [tool]);
await session.addQueryChunk(Message.text(
  text: 'What\'s the weather in Tokyo?',
  isUser: true,
));

final stream = session.getResponseStream();
await for (final event in stream) {
  if (event is FunctionCallResponse) {
    print('Function: ${event.functionCall.functionName}');
    print('Args: ${event.functionCall.argumentsText}');
  }
}

Thinking Mode (Gemma 4)

final session = await model.createSession(
  enableThinking: true,
);

Platform Differences

Capability	iOS	Android	Web	Desktop
GPU acceleration	Metal	Vulkan/OpenCL	WebGPU	GPU
Multimodal	✅	✅	✅	✅
Audio	✅	✅	❌	✅
Thinking	✅	✅	❌	✅
Embeddings	✅	✅	✅	✅
Function Calling	✅	✅	✅	✅

Web Limitations

Must use WebGPU backend
Thinking mode not supported
Audio not supported

Desktop Limitations

Requires JRE (auto-downloaded Azul Zulu)
Communicates with LiteRT-LM via gRPC
Model format: .litertlm only

Model Download Sources

Official MediaPipe Format (Recommended for Mobile)

https://huggingface.co/google/gemma-4-2b-it/

GGUF Format (llama.cpp ecosystem)

https://huggingface.co/bartowski/gemma-4-2b-it-GGUF
https://huggingface.co/unsloth/gemma-4-2b-it-GGUF

LiteRT-LM Format (Desktop)

https://huggingface.co/litert-community/gemma-4-e2b-it-litertlm

Troubleshooting

iOS Build Fails

Verify platform :ios, '16.0'
Verify use_frameworks! :linkage => :static
Check entitlements configuration

Android Build Fails

Verify minSdk = 24
For NPU, verify device support

Web Not Working

Verify MediaPipe script loads correctly in index.html
Verify using PreferredBackend.gpu
Safari doesn't support WebGPU (need Chrome/Edge)

Slow Model Download

Use HuggingFace token for faster downloads
Consider GGUF Q4_K_M quantization (smaller, faster)

Complete Example

This example follows Flutter best practices:

Riverpod for state management
Feature-based project structure
AsyncNotifierProvider for async state
AsyncValue.guard() for error handling
ConsumerWidget for efficient rebuilds

Project Structure

lib/
├── main.dart
├── app.dart
├── core/
│   └── theme/
│       └── app_theme.dart
├── features/
│   └── gemma_chat/
│       ├── data/
│       │   └── gemma_repository.dart
│       ├── domain/
│       │   └── gemma_service.dart
│       └── presentation/
│           ├── screens/
│           │   └── gemma_chat_screen.dart
│           ├── widgets/
│           │   └── chat_bubble.dart
│           └── providers/
│               └── gemma_providers.dart
└── routes/
    └── app_router.dart

main.dart

import 'package:flutter/material.dart';
import 'package:flutter_riverpod/flutter_riverpod.dart';
import 'package:flutter_gemma/flutter_gemma.dart';
import 'app.dart';

void main() async {
  WidgetsFlutterBinding.ensureInitialized();

  // Initialize Gemma (call once at startup)
  await FlutterGemma.initialize();

  runApp(
    const ProviderScope(
      child: GemmaApp(),
    ),
  );
}

app.dart

import 'package:flutter/material.dart';
import 'package:flutter_riverpod/flutter_riverpod.dart';
import 'package:go_router/go_router.dart';
impor

How to add

Drop this on your repo README

Related skills

claude-api

skill-creator

oh-my-issues

claude-mem

Get new Desenvolvimento skills every Monday