Building AIKokoron — A Personalized AI Desktop Companion
Building AIKokoron — A Personalized AI Desktop Companion
I've been working on a project called AIKokoron — a personalized AI desktop companion that combines voice interaction, a Live2D animated character, and extensible tool capabilities. Think of it as your own AI assistant that lives on your desktop, listens to you, and talks back with a character you can see and interact with.
Why Build This?
After experimenting with LLM APIs and RAG systems, I wanted to push further. Chatbots in a browser are useful, but what if the AI could be more present — always on your desktop, responding to voice, and capable of actually doing things on your computer?
I was inspired by Open-LLM-VTuber, an open-source project that combines LLMs with Live2D avatars for voice conversation. I forked it as my starting point, then gradually made significant changes to tailor it to my vision.
What Makes AIKokoron Different
While Open-LLM-VTuber focuses on being a flexible, multi-character VTuber platform, AIKokoron is designed as a personal AI assistant with a focus on practical utility:
- MCP Tool Integration — AIKokoron uses the Model Context Protocol (MCP) to give the AI real capabilities: execute shell commands, search the web, check the time, and even recognize faces via webcam. This goes beyond conversation into actual task execution
- Extension Architecture — I built a pluggable extension system for planned features like file browsing, browser monitoring, and game detection. The architecture is designed to grow
- Face Recognition — Built-in user identification via DeepFace. The AI can recognize who it's talking to and personalize responses
- Focused Character Design — Rather than supporting arbitrary character switching, AIKokoron is designed around a single, deeply customized companion persona
- Streamlined Deployment — One-click startup scripts for Windows, with clear configuration via YAML files
Tech Stack
The project has a dual architecture:
Backend (Python + FastAPI):
- Multi-LLM support (Gemini, Claude, OpenAI, Ollama, etc.)
- Speech recognition (Sherpa-ONNX, Faster-Whisper, Azure)
- Text-to-speech with voice cloning (GPT-SoVITS)
- MCP client for tool execution
- Face recognition via DeepFace
Frontend (Electron + React):
- Live2D character rendering with audio-driven lip sync
- Real-time voice activity detection
- Desktop pet mode with transparent background
- Multi-language UI (EN/JA/ZH)
Current Status
The project is functional and I use it daily, but it's still in active development. I haven't uploaded it to GitHub yet — I want to clean up the codebase and documentation before making it public. That's coming soon.
What I've Learned
Building AIKokoron has taught me a lot about system integration:
- Real-time audio pipelines are complex — getting VAD, ASR, LLM, and TTS to work together smoothly requires careful engineering
- MCP is a game changer — having a standardized way for LLMs to use tools makes the system incredibly extensible
- Desktop apps in 2026 are still hard — Electron has its quirks, especially with audio and transparent windows
Stay tuned for the open-source release. Feel free to reach out via email or GitHub if you're interested.