A place to poke, prod, and stress-test AI models until they crack.
You ask two AI models the same question, and one of them confidently gives you the wrong answer. Now what? WinQA lets you run that kind of experiment on purpose — compare models head to head, battle them against each other, execute their code live, and log every failure you find. It's a QA lab for AI.
Ask the same question to multiple models and see the answers next to each other. You can swap providers mid-conversation to see how a different model picks up the thread.
9 challenges split between Mind Games and Spectacular. Escalation, Interrogation, Code Duel, Blindfold, Battle Royale — each one designed to expose a different weakness.
Paste AI-generated code and run it right in the browser. JavaScript, Python, TypeScript. See the output, see the errors, get AI help debugging.
When an AI hallucinates or gives you broken logic, log it here. Tag the type, note the severity, link it back to the prompt that triggered it.
Keep your best prompts in one place. Chain of Thought, Few-Shot, whatever works for you. Everything is ready to copy and reuse.
Write a test once, then run every model through it. Same input, different models, compare the output.
Gemini nails code but fumbles jokes? Write it down. Over time you build a map of what each model is actually good at.
4 LLM Providers Connected
WinQA was built by Ran. He started in QA, moved into development, and kept the QA habit — that itch to poke at things until they break. When LLMs showed up, he pointed that itch at AI.
The name comes from his dog, Win. Every investigation needs a sidekick.
Most AI testing happens in private Slack threads and scattered notebooks. WinQA puts it all in one place — the tests, the results, the failures, the stuff you figured out along the way. It's free. No paywall, no credit card, no catch.
WinQA is open source. Browse the code, report issues, or contribute on GitHub.
Something broken? Got an idea? Want to complain? Open an issue on the GitHub issue tracker.