Emergent Spatio-Semantic Structure in Large Language Model Embedding Spaces


Creative Commons License

Shingleton J., Bıçakçı Y. S., Wang Y., Basiri A.

eartharxiv, ss.1-5, 2026 (Hakemsiz Dergi)

Özet

Large Language Models (LLMs) are increasingly used in geospatial applications typically as generators of geographic textor as natural language interfaces to spatial data. Here, we explore whether LLM embedding spaces can instead function asgeospatial representations that can be exploited directly. Using embeddings extracted from Airbnb property descriptions inLondon, we show that off-the-shelf LLM embeddings exhibit emergent spatial structure.  We further demonstrate that alightweight residual geo-adapter substantially sharpens this spatial signal, enabling approximate localisation even whenexplicit geographic references are removed, while preserving semantic relationships learned during LLM pre-training. Theseresults suggest a path toward spatially explicit foundation models which operate over the spatio-semantic embedding space,rather than generated text.