Building AIKokoron — A Personalized AI Desktop Companion

I've been working on a project called AIKokoron — a personalized AI desktop companion that combines voice interaction, a Live2D animated character, and extensible tool capabilities. Think of it as your own AI assistant that lives on your desktop, listens to you, and talks back with a character you can see and interact with.

Why Build This?#

After experimenting with LLM APIs and RAG systems, I wanted to push further. Chatbots in a browser are useful, but what if the AI could be more present — always on your desktop, responding to voice, and capable of actually doing things on your computer?

I was inspired by Open-LLM-VTuber, an open-source project that combines LLMs with Live2D avatars for voice conversation. I forked it as my starting point, then gradually made significant changes to tailor it to my vision.

What Makes AIKokoron Different#

While Open-LLM-VTuber focuses on being a flexible, multi-character VTuber platform, AIKokoron is designed as a personal AI assistant with a focus on practical utility:

MCP Tool Integration — AIKokoron uses the Model Context Protocol (MCP) to give the AI real capabilities: execute shell commands, search the web, check the time, and even recognize faces via webcam. This goes beyond conversation into actual task execution
Extension Architecture — I built a pluggable extension system for planned features like file browsing, browser monitoring, and game detection. The architecture is designed to grow
Face Recognition — Built-in user identification via DeepFace. The AI can recognize who it's talking to and personalize responses
Focused Character Design — Rather than supporting arbitrary character switching, AIKokoron is designed around a single, deeply customized companion persona
Streamlined Deployment — One-click startup scripts for Windows, with clear configuration via YAML files

Tech Stack#

The project has a dual architecture:

Backend (Python + FastAPI):

Multi-LLM support (Gemini, Claude, OpenAI, Ollama, etc.)
Speech recognition (Sherpa-ONNX, Faster-Whisper, Azure)
Text-to-speech with voice cloning (GPT-SoVITS)
MCP client for tool execution
Face recognition via DeepFace

Frontend (Electron + React):

Live2D character rendering with audio-driven lip sync
Real-time voice activity detection
Desktop pet mode with transparent background
Multi-language UI (EN/JA/ZH)

Current Status#

The project is functional and I use it daily, but it's still in active development. I haven't uploaded it to GitHub yet — I want to clean up the codebase and documentation before making it public. That's coming soon.

What I've Learned#

Building AIKokoron has taught me a lot about system integration:

Real-time audio pipelines are complex — getting VAD, ASR, LLM, and TTS to work together smoothly requires careful engineering
MCP is a game changer — having a standardized way for LLMs to use tools makes the system incredibly extensible
Desktop apps in 2026 are still hard — Electron has its quirks, especially with audio and transparent windows

Stay tuned for the open-source release. Feel free to reach out via email or GitHub if you're interested.

All Posts

Building AIKokoron — A Personalized AI Desktop Companion

Why Build This?#

What Makes AIKokoron Different#

Tech Stack#

Current Status#

What I've Learned#

Related Posts

Designing AI That Remembers and Thinks on Its Own — Koclaw's Memory and Autonomy System in Three Phases

Fine-Tuning Qwen3.5 to Give AI a Soul — The LLMPERSONA Project

Five Layers of Building RAG for Enterprise