How AI Manga Colorization Works
Why we chose Google Gemini, how virtual image splitting was born out of necessity, and what separates publisher-grade AI from hobby-grade tools.
Published by Watashi Games · March 2026
Why We Bet on Google Gemini for Colorization
When we started building Watashi Colorizer, we evaluated every available AI model for image colorization. Traditional neural-network colorizers — the kind trained specifically on manga — produced flat, uniform results. They could tint areas but couldn’t understand context. A night scene and a day scene got the same blue sky. A character’s clothing color was random every time.
Large multimodal models changed the equation. Google’s Gemini models can interpret the content of an image — identify characters, understand scene context, read text — and colorize based on that understanding. When you tell Gemini “this character has red hair and a blue jacket,” the model applies those colors because it understands the instruction semantically, not because it’s matching a pixel pattern.
Gemini also handles text natively. It can read dialogue, preserve it during colorization, and even translate it into other languages in the same pass. For a publisher, this meant one API call could colorize a page and translate it simultaneously — a workflow that previously required separate tools for each step.
The Birth of Virtual Image Splitting
Virtual image splitting wasn’t planned. It was born from a production failure. We were colorizing a webtoon chapter where a dramatic scene spanned across two pages — a character leaping from one panel at the bottom of page 15 to a landing panel at the top of page 16. The two pages landed in different AI batches. The model colored the character’s outfit blue in one batch and purple in the other. The color break fell right in the middle of the action.
The initial fix was simple: overlap batches so the last image of batch N appeared again in batch N+1 as a color reference. This failed spectacularly. The AI re-interpreted colors each time, producing two different colorizations of the same content. We tried blending the overlapping regions, but the AI shifts element positions slightly during colorization, making any blend produce artifacts.
The real solution required rethinking the entire pipeline. Instead of sending whole pages, we split pages at their natural scene boundaries — the black panel dividers — and regrouped the resulting art bands by visual continuity. The bottom of page 15 and the top of page 16 now land in the same batch because the system recognizes there’s no scene break between them.
Publisher-Grade vs. Hobby-Grade AI Colorization
The difference between hobby-grade and publisher-grade AI colorization comes down to consistency at scale. A hobby tool that colorizes one image beautifully is useless for a 60-page chapter if it produces different colors on every page. Publisher-grade means the output of page 1 and page 60 look like they came from the same colorist.
Hobby tools also typically ignore output dimensions. They resize images to the model’s preferred resolution and return whatever the AI generates. For publishing, output must match input dimensions exactly — pixel for pixel. Our pipeline processes at the AI’s resolution but maps the result back to the original canvas, preserving every dimension.
Character control is another dividing line. Hobby tools let the AI choose colors freely. Publisher tools enforce specific palettes defined by the production team. When you’re publishing a series with 200 chapters, you can’t have the AI improvising character colors. They must match the style guide every time.
How the AI Sees Your Manga Pages
The AI model receives images at a maximum resolution of 2048 pixels on the longest side. A typical webtoon page at 1280×4000 gets scaled down to roughly 655×2048 for processing. At that resolution, large text is readable but small text — stat tables, game boards, tiny labels — becomes blurry. The model tries to recreate blurry text and often generates garbled characters.
This is why text preservation exists as an opt-in feature. Before sending to the AI, the system detects small, dense text regions using local contrast analysis, masks them with blurred background, sends the text-free image to the AI, and then pastes the original text back onto the colorized result. The AI never sees the text, so it can’t garble it.
Understanding what the AI sees also explains why virtual image splitting matters for quality. A 1280×8000 pixel webtoon page gets scaled to 328×2048 — barely wider than a smartphone screenshot. Splitting that page into two 1280×4000 bands gives the AI twice the horizontal resolution to work with, producing noticeably better detail in the colorization.
The Limits of AI and How We Work Around Them
AI colorization has real limits. The model occasionally assigns wrong colors to characters it hasn’t seen before. It can interpret dark scenes as lighter than intended. It sometimes bleeds color from one panel into an adjacent panel’s background. These aren’t bugs we can fix with better code — they’re inherent to how large language models process visual information.
Our approach to these limits is layered. Character palettes handle the color assignment problem by telling the model exactly what to use. Context learning handles the environment consistency problem by remembering scene-specific colors. The edit mode handles everything else by letting the human operator give targeted corrections. The AI does 95% of the work; the human refines the remaining 5%.
This human-in-the-loop approach is key to production quality. The AI is fast and consistent enough to be the primary colorist. The human is precise enough to catch and fix the cases where the AI falls short. Together, they produce chapters that are indistinguishable from manual colorization at a fraction of the time and cost.
For a deeper technical dive into AI colorization technology, read our detailed explainer on watashicolorizer.com.
Read the Full Guide →