Join to apply for the AI QA Trainer – LLM Evaluation role at Invisible Expert Marketplace
Get AI-powered advice on this job and more exclusive features.
Role Overview
Are you an AI QA expert eager to shape the future of AI?
Large-scale language models are evolving from clever chatbots into enterprise-grade platforms.
With rigorous evaluation data, tomorrow’s AI can democratize world-class education, keep pace with cutting-edge research, and streamline workflows for teams everywhere.
That quality begins with you—we need your expertise to harden model reasoning and reliability.
Responsibilities
- Converse with models on real-world scenarios and evaluation prompts
- Verify factual accuracy and logical soundness of responses
- Design and run test plans and regression suites to identify failure modes
- Build clear rubrics and pass/fail criteria for evaluation tasks
- Capture reproducible error traces with root‑cause hypotheses
- Suggest improvements to prompt engineering, guardrails, and evaluation metrics (precision/recall, faithfulness, toxicity, latency SLOs)
- Partner on adversarial red‑teaming, automation (Python/SQL), and dashboarding to track quality deltas over time
- Document every failure mode to raise the bar for model performance
- Challenge advanced language models on tasks such as hallucination detection, factual consistency, prompt‑injection and jailbreak resistance, bias/fairness audits, chain‑of‑reasoning reliability, tool‑use correctness, retrieval‑augmentation fidelity, and end‑to‑end workflow validation
Qualifications
- Bachelor’s, master’s, or PhD in computer science, data science, computational linguistics, statistics, or a related field
- QA experience for ML/AI systems, safety/red‑team experience
- Test automation frameworks (e.g., PyTest)
- Hands‑on work with LLM evaluation tooling such as OpenAI Evals, RAG evaluators, W&B
- Strong skills in evaluation rubric design, adversarial testing/red‑teaming, regression testing at scale, bias/fairness auditing, grounding verification, prompt and system‑prompt engineering
- Test automation experience with Python/SQL and high‑signal bug reporting
- Clear, metacognitive communication—showing your work—is essential
Compensation & Logistics
We offer a pay range of $6 to $65 per hour, with the exact rate determined after evaluating your experience, expertise, and geographic location.
Final offer amounts may vary from the pay range listed above.
As a contractor you’ll supply a secure computer and high‑speed internet; company‑sponsored benefits such as health insurance and PTO do not apply.
Employment type: Contract
Workplace type: Remote
Seniority level: Mid‑Senior Level
Call to Action
Ready to turn your QA expertise into the quality backbone for tomorrow’s AI?
Apply today and start teaching the model that will teach the world.
#J-18808-Ljbffr