FIELD REPORT · VOL 07
A–Z Coding Models · est. read 14 min
Dispatch from the cheap-model frontier

Kimi K2.6: the model
nobody saw
coming. A complete A–Z guide to the open-source Chinese coding agent that runs 13-hour autonomous sessions for one-seventh the cost.

Not a "Kimi vs Claude" hot take. A field manual — copy-paste prompts, hidden commands, and a troubleshooting guide for when your agent inevitably drifts.

Before we talk code, let's talk money
Claude Opus 4.7
Input
$5.00 / 1M tok
Output
$25.00 / 1M tok
Kimi K2.6
Input
$0.80
Output
$3.60
That's cheaper — for a model that benchmarks on par with Opus 4.7.
§ 01

What is Kimi Code?

Kimi Code is Kimi's coding agent — similar to Claude Code, but powered by K2.6 and accessible at kimi.com/code. It runs in your terminal and IDE. It takes tasks, not just questions.

The difference between a coding assistant and a coding agent:

  • Assistant — you ask, it answers, you implement.
  • Agent — you describe the outcome, it executes, iterates, fixes errors, and delivers.

Kimi Code does the second one. The benchmarks back it up: on SWE-Bench and Terminal-Bench it lines up with Opus 4.7, and on long-horizon agentic tasks the article claims it exceeds Opus 4.7 over sustained multi-hour workflows.

§ 02

5 hidden commands
that save hours

01

@ Map the Battlefield

Before Kimi writes a single line, make it map the codebase.

@src/auth/middleware.ts @src/utils/token.ts
Explain the token refresh flow and identify
where we might leak memory on rapid retries.

WhatPulls live definitions from your indexed codebase. Kimi reads the actual files, traces imports, and builds context on the fly.

WhyEliminates copy-paste hell. On a 50-file refactor, this saves 30–40 minutes of manual context assembly and prevents hallucinated imports.

ProChain multiple symbols: @AuthService.refresh @TokenStore.cleanup @APIClient.interceptors — Kimi connects the dots automatically.

02

/explain Onboard to Legacy

Dropped into a 5-year-old monolith? Don't read — interrogate.

/explain @src/matching-engine/order-book.ts
Focus on: thread-safety model, memory allocation
patterns, and where the hot path starts.

WhatGenerates an architectural digest — dependency tracing, complexity hotspots, data flow.

WhySenior engineers spend 2–3 days mapping legacy before touching code. /explain collapses that to 10 minutes. You get the tribal knowledge without finding the tribe.

03

.kimi/rules Program the Agent

Tired of saying "use strict mode" every session? Bake it into project DNA.

# .kimi/rules
- Always use TypeScript strict mode; no implicit any
- For HTTP calls, use the retry-wrapper from
  @utils/api-client, never raw fetch
- /legacy/ directory is read-only unless explicitly
  overridden
- Prefer functional React components; class
  components require justification

WhatPersistent project-level instructions. Kimi loads these automatically at session start.

WhyStandardizes output across team members. Eliminates the "oops, wrong pattern" rework loop.

ProVersion-control .kimi/rules alongside your codebase. Living documentation that enforces itself.

04

Checkpoint Prompting

K2.6's killer feature is endurance. Endurance without breadcrumbs is a crash waiting to happen.

After each optimization iteration, output:
- [ITERATION N]   What changed
- [PERFORMANCE]   Current throughput vs baseline
- [BLOCKERS]      What's blocking the next step
- [STATE]         Files modified, tests, risks

WhyIf your terminal crashes at hour 5, you lose the mental model — not just the output. Checkpoints let you reconstruct from any point.

WhenAny session expected to exceed 30 minutes or involve more than 10 tool calls.

05

/test Generate Coverage

Writing the function is half the battle. Proving it works is the other half.

/test @src/matching-engine/order-matcher.ts
Focus on: race conditions between order
cancellation and matching, overflow on
quantity * price.

WhatAnalyzes implementation, identifies missed edge cases, mocks dependencies, generates test scaffolding.

WhyDevelopers spend 30–50% of time writing tests. /test delivers ~80% coverage in 2 minutes — including the nasty edge cases humans forget.

UpgradeAfter generation, run /review with "Focus on test gaps" — forces a second pass on the suite itself.

There is no /godmode. The hidden power isn't secret commands — it's composability.

§ 03

What sets it apart:
two case studies

Case 01 · Inference

Zig Inference Optimization on Mac

Download and deploy Qwen3.5-0.8B locally on a Mac. Implement inference in Zig — a highly niche systems language. Optimize for throughput.

4,000+
tool calls
12 hrs
continuous run
14×
optimization passes
0
human interventions
15 tok/s 193 tok/s
20% faster than LM Studio
Case 02 · Engine

Financial Matching Engine Overhaul

Take exchange-core — an 8-year-old open-source matching engine — and optimize it to its theoretical limit.

13 hrs
continuous
12
strategies
1,000+
tool calls
4,000+
lines modified
Median: 0.43 1.24 MT/s (+185%)
Peak: 1.23 2.86 MT/s (+133%)

The engine was already operating near its performance limits. K2.6 found headroom that human maintainers missed for years.
— This is not autocomplete. This is engineering.

§ 04

Why K2.6 beats Claude on coding, in practice

Fewer steps to the same outcome.

K2.6 reaches better results with ~35% fewer steps than K2.5. Fewer steps means fewer tokens. Fewer tokens means lower cost. And faster execution.

Better instruction following.

Most coding agents fail because they drift — they start solving one problem and gradually solve a different one. K2.6 stays within constraints, preserves project structure, and recovers without losing intent.

"Surgical precision in large codebases."
— Augment Code's CTO

Better with real-world APIs.

K2.6 has improved understanding of third-party frameworks, real APIs, and tool interactions. In production, this is the difference between an agent that works and one that requires constant correction.

§ 05

How to set up Kimi Code

Requirements: a computer (Mac, Windows, or Linux), terminal access, and a Kimi account.

Install

curl -LsSf https://code.kimi.com/install.sh | bash
Invoke-RestMethod https://code.kimi.com/install.ps1 | Invoke-Expression
kimi --version

Due to macOS Gatekeeper, the first run may take longer. Add your terminal in System Settings → Privacy & Security → Developer Tools to speed up subsequent launches. If you have uv, you can also run uv tool install --python 3.13 kimi-cli. Python 3.12–3.14 supported; 3.13 recommended.

Authenticate

kimi login

Opens a browser. Log in with your Kimi account.

Navigate to your project

cd your-project
kimi

On first launch, enter /login to configure the API source.

Give it a task — not a question

Don't say: "How do I optimize this function?"

Say: "Analyze the performance bottleneck in the payment processing module and refactor it to reduce average response time by at least 30%. Run the existing test suite after each change."

§ 06

3 battle-tested
copy-paste prompts

PROMPT 01
Refactor with Constraints
Best for: legacy code, API-preserving refactors
Analyze [module name] for performance bottlenecks.
Refactor to reduce response time by 30%.
Do NOT change the public API or function signatures.
Run the full test suite after each change.
Report: before metrics, after metrics, and what was changed.
If you hit an error, stop and ask before proceeding.
PROMPT 02
Multi-File Architecture Change
Best for: feature additions across layers
Implement [feature description] across [file A], [file B], [file C].
Maintain backward compatibility with existing callers.
Add unit tests for all new code paths.
Update README.md with the new capability.
If you discover the current architecture can't support
this cleanly, propose 2 alternatives before choosing one.
PROMPT 03
Deep Debug Session
Best for: race conditions, memory issues
[Paste full error trace here]

This error occurs when [describe context].
Find the root cause — not the symptom.
Fix it at the source.
Verify with tests.
Do not apply band-aid fixes or suppress the error.
Explain the root cause in 2 sentences after fixing.
§ 07

The iteration loop:
don't accept the first output

The best engineers don't ship v1. Neither should your agent.

Step 1 — Generate    Kimi writes the first version
Step 2 — Evaluate    You run tests / check metrics
Step 3 — Diagnose    Feed results back: "Test X failed because Y"
Step 4 — Improve     Kimi fixes
Step 5 — Repeat      Until all thresholds pass

Threshold rule. Never say "make it better." Say "tests must pass, coverage must not drop, response time must be under 200ms."

Adversarial pressure. After passing, add one more round:

Now critique your own solution. Find 3 weaknesses
a senior engineer would flag. Fix them.

This is how 15 tok/s becomes 193 tok/s. Not in one shot. In 14 loops.

§ 08

When it goes wrong:
troubleshooting

F-01 The Drift
Symptom Kimi starts solving a different problem than the one you gave it. Fix Start every prompt with scope lock: "Scope: [specific module/file/behavior]. Do not change anything outside this scope." If it still drifts, use /compact and restate the task.
F-02 Context Collapse
Symptom After 2+ hours, Kimi forgets the original architecture constraints. Fix Create a CONSTRAINTS.md in your project root — Kimi reads it automatically. Use /compact Focus on [original goal] mid-session. For 6+ hour tasks, break into sub-sessions with --resume.
F-03 Silent Regression
Symptom Tests pass, but something else broke. Fix Add to your prompt: "Run the full test suite, not just affected tests. Verify no unrelated tests failed."
F-04 Over-Engineering
Symptom Kimi rewrites the entire module when you asked for a 3-line fix. Fix Be explicit: "Make the minimal change necessary. Do not refactor unrelated code."
F-05 Tool Call Failure
Symptom Kimi tries to run a command, fails silently, and moves on. Fix Add: "After every shell command, verify the output. If a command fails, stop and report the error."
§ 09

What Kimi Code is best at

  • Long-horizon refactoring — multi-file, multi-hour tasks where the model needs to maintain architectural consistency across thousands of lines.
  • Performance optimization — profiling, bottleneck identification, iterative improvement. The exchange-core and Zig cases above are real examples.
  • Multi-language projects — strong across Python, Rust, Go, TypeScript, and less common languages (Zig, Lua).
  • API integration tasks — connecting your codebase to external services, handling edge cases, debugging API behaviors.
  • DevOps & infrastructure — Vercel saw 50%+ improvement on their Next.js benchmark. Fireworks AI noted stable, autonomous agent pipelines.
§ 10

Vibe coding
with K2.6

You don't need to be a developer to use it effectively. You need to know what you want to build. K2.6 can turn a description into a working full-stack application — frontend, database, authentication — in a single session.

Beyond web apps, the coding agent handles real engineering work that normally takes senior developers days. A single founder can run an entire engineering workflow using Kimi Code + Kimi Claw's group chat feature — routing tasks to specialized agents, each loaded with its own skill set.

A one-person company with the output of a team.

Full-stack app, one session

VIBE-CODE PROMPT
Task management app · 20–45 min
Working app, end-to-end
Build a task management app with:

FRONTEND:
  - Next.js 14 with App Router
  - Tailwind CSS + shadcn/ui components
  - Dark mode support
  - Responsive layout (mobile + desktop)

BACKEND:
  - SQLite database via Drizzle ORM
  - tRPC for type-safe API routes
  - Zod validation on all inputs

AUTH:
  - OAuth 2.0 with GitHub login
  - Protected routes middleware

FEATURES:
  - Create / edit / delete tasks
  - Task priority (low/medium/high)
  - Due dates with calendar picker
  - Filter by status and priority
  - Search by title

DEPLOY:
  - Configure for Vercel deployment
  - Include vercel.json and env example

PROCESS:
  1. Initialize the project (Next.js + all deps)
  2. Set up database schema and migrations
  3. Implement auth flow
  4. Build all CRUD operations
  5. Build the UI with loading states
  6. Write and run tests for critical paths
  7. If any step fails, debug and retry

Do not ask me questions. Make reasonable decisions.
Report the local dev URL when ready.
§ 11

The cost argument:
matters more than benchmarks

If you're running an AI coding agent at scale — across a team, multiple projects, thousands of API calls per day — the cost difference is not marginal. At 1 million output tokens per day:

Claude Opus 4.7
$25
PER DAY
$750 / month
Kimi K2.6
$3.60
PER DAY
$108 / month

Same task. Same output quality tier. 7× difference in monthly cost. For a team running multiple agents simultaneously, this compounds fast.

▌ The Open Source Advantage

K2.6 is fully
open source.

You can self-host. Run it on your own infrastructure. No API dependency. No usage caps. Full control over your data.

You can fine-tune. The base model is available for customization on domain-specific tasks — legal, medical, proprietary codebases.

Community velocity. Open source models improve faster because the entire developer ecosystem contributes.

Ollama OpenCode OpenClaw vLLM llama.cpp
Final verdict

The question isn't
whether K2.6 is good enough.

The narrative around AI coding has been simple: Claude is the best, pay whatever it costs.

K2.6 breaks that narrative. Open source. 7× cheaper. Benchmarks on par with Opus 4.7. Proven in production by Vercel, Fireworks, Augment Code, and a dozen others.

The question is why you're still paying 7× more.