Label Results Analysis Dashboard

Generated on: 2025-08-07 19:27:47

gemma-3-4b-it Analysis

Overall Metrics

Precision
0.419
Recall
0.654
F1 Score
0.511
Accuracy
0.032

Dataset Statistics

Total Samples
5000
Total Categories
80
Avg Predictions/Image
4.7
Avg Ground Truth/Image
2.9

Hallucination Statistics

Hallucinated Categories
1210
Samples with Hallucinations
3826 (76.5%)
Avg Hallucinations/Sample
1.68

Visualizations

Category Performance
Metric Distributions
Error Analysis
Hallucination Analysis

Top Performing Categories

CategoryF1 Score
giraffe0.971
tennis racket0.960
surfboard0.942
elephant0.940
train0.928
horse0.924
cat0.915
kite0.909
skateboard0.908
person0.906

Bottom Performing Categories

CategoryF1 Score
backpack0.373
cup0.356
sports ball0.352
book0.345
bear0.314
hair drier0.308
bench0.271
parking meter0.207
handbag0.162
dining table0.042

Hallucinated Categories (Not in Ground Truth)

CategoryFalse Positive Count
grass363
table250
tree218
plate191
water162
building161
road120
shoes119
street115
beach107

Model Comparison

Overall Performance

Overall Performance Comparison

Processing Speed

Processing Speed Comparison

Category Performance Differences

Category Differences

Hallucination Comparison

Hallucination Comparison

Detailed Metrics Comparison

Model Precision Recall F1 Score Accuracy
gemma-3-4b-it 0.419 0.654 0.511 0.032
gemma-3-12b-it 0.797 0.642 0.711 0.365