The multitude of issues with LLM benchmarks, and the problem of documenting the complexities to accurately test LLMs, and using evals as alternatives
The discrepancy between the potential impact of AI, its current limited use in the educational sphere, and the reasons behind it
The problems with AI alignment, and how instrumental misalignment threatens to worsen it.