agentic-ai

Evaluating different AI's on African livestck knowledge

LessWrong · May 2, 2026, 9:04 PM

I have been running evaluations on a niche that has almost zero attention in the AI safety world. Meta open source mode the llama 3.1 8b scored a 43% accuracy score on a 420 question benchmark I built covering ethnoveterinary practices, indigenous breed characteristics, disease recognition, and production systems specific to Nigeria.This evaluation is important because most other evals are ran on properly documented western specific data problems,. This project tests a domain where almost none of the relevant knowledge exists in this form.METHODOLOGY420 questions, 6 categories, 0/1/2 scoring rubric. Questions drawn from Nigerian veterinary curriculum, published ethnoveterinary literature, and field practice knowledge.Baseline model: Meta Llama 3.1 8B via Groq.Next phase: Claude Sonnet, GPT 4o, Gemini 1.5 Pro for a comparative study. Paper to follow.CONCLUSIONIf you accept that AI advisory tools will be deployed at scale in African agricultural contexts and they already are being used then the absence of evals benchmarks for this domain is a real safety gap. Models can pass standard tests and still fail on knowledge domains that matter to specific populations. This is a real problem especially if these models are actively used in low resource regions or communities.This is one data point. The paper will have four.Manifund project if interested: [Manifund]Discuss

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Evaluating different AI's on African livestck knowledge

More in agentic-ai