Case File

About WinQA

A place to poke, prod, and stress-test AI models until they crack.

1What is WinQA

You ask two AI models the same question, and one of them confidently gives you the wrong answer. Now what? WinQA lets you run that kind of experiment on purpose — compare models head to head, battle them against each other, execute their code live, and log every failure you find. It's a QA lab for AI.

2What You Can Do

Chat Lab

Ask the same question to multiple models and see the answers next to each other. You can swap providers mid-conversation to see how a different model picks up the thread.

AI Battle Arena

9 challenges split between Mind Games and Spectacular. Escalation, Interrogation, Code Duel, Blindfold, Battle Royale — each one designed to expose a different weakness.

Code Testing Lab

Paste AI-generated code and run it right in the browser. JavaScript, Python, TypeScript. See the output, see the errors, get AI help debugging.

Bug Log

When an AI hallucinates or gives you broken logic, log it here. Tag the type, note the severity, link it back to the prompt that triggered it.

Prompt Library

Keep your best prompts in one place. Chain of Thought, Few-Shot, whatever works for you. Everything is ready to copy and reuse.

Test Cases

Write a test once, then run every model through it. Same input, different models, compare the output.

Insights

Gemini nails code but fumbles jokes? Write it down. Over time you build a map of what each model is actually good at.

3Under the Hood

4 LLM Providers Connected

CohereCommand R and Command R+ models

Google GeminiGemini Pro and Gemini Flash

GroqFast inference — Llama and Mixtral

OpenRouter100+ models, one API

•Real-time code execution in the browser for JavaScript, Python, and TypeScript
•A/B voting system inspired by LM Arena (formerly Chatbot Arena) for blind model comparison
•API keys encrypted with AES-256-GCM — never stored in plaintext

4The Story

WinQA was built by Ran. He started in QA, moved into development, and kept the QA habit — that itch to poke at things until they break. When LLMs showed up, he pointed that itch at AI.

The name comes from his dog, Win. Every investigation needs a sidekick.

5The Mission

Most AI testing happens in private Slack threads and scattered notebooks. WinQA puts it all in one place — the tests, the results, the failures, the stuff you figured out along the way. It's free. No paywall, no credit card, no catch.

6Open Source

WinQA is open source. Browse the code, report issues, or contribute on GitHub.

7Contact & Feedback

Something broken? Got an idea? Want to complain? Open an issue on the GitHub issue tracker.

winqa.ai

Privacy|Terms