Triangulating Truth

Large Language Models (LLMs) differ in terms of their parameters, training data, architecture, and fine-tuning. This results in a unique semantic space for each LLM. This semantic space is optimized for specific downstream tasks which resemble features of general intelligence – e.g., logical reasoning, summarization, question answering, etc.

While the similar outputs of these models may suggest a unified conceptualization of the world, the semantic spaces may nevertheless reflect different salience landscapes. That is to say, these models may possess concepts which share a resemblance in application, which are nevertheless fundamentally tilted towards different ends. After all, such differences exist among people, where some groups view justice in terms of punishment, while others foreground fairness.

In order to identify points of conceptual agreement and divergence between LLMs, we can utilize Semantic Triangulation – i.e., we can compare the overlap of the semantic spaces of several LLM models.

Approach

Within the semantic space of LLMs, concepts can be disambiguated by considering which terms, when combined, approximate the position of a target concept. For example, the combination of normativity, volition, and abundance approximates the position of God within semantic space. That is to say, if you replaced the embedding for God with the aforementioned combination of vectors in an LLM, it would reason over text inputs in the same way as if the God vector was being used. See Semantic Core Clarification (SCC) for more details.

Applying Semantic Core Clarification to the same concept in three (or more) LLMs can enable us to determine the conceptual overlap of a term across LLMs. If a term is found to be a core for a concept in all three semantic spaces, this would suggest that there are universal features underlying that concept. Conversely, if there are no shared core features, this would suggest that each model is using the target concept differently.

Application

Semantic Core Clarification was used to elucidate the concept of truth in three top performing LLMs: DeciLM-7B, Llama-3-8B, and InternLM-2.5-7B.

Beam search was used to find the 5-word combination that most closely approximated the position of the "truth" vector in each semantic space (using cosine similarity, where θ = 1 represents a perfect concordance). The search space was composed of the 2,610 most common nouns.

In the Llama-3 semantic space, the combination of peace, laughter, fact, belief, and faith was closest to the truth vector (θ = 0.945). In the DeciLM semantic space, the combination of worth, trust, faith, thought, and reality was closest to the truth vector (θ = 0.962). In the InternLM semantic space, the combination of logic, threshold, fact, trust, and peace was closest to the truth vector (θ = 0.902).

The core elements comprising truth for all three models is depicted in the figure below. There was no term shared by all three models, however, there was some consensus between models – with Llama-3 sharing fact and peace with InternLM and faith with DeciLM, and InternLM sharing trust with DeciLM.

Conclusions

Given that no core element of truth was shared by all three LLMs, it appears that there is no consensus on truth within these models. If instructed to prioritize truth, InternLM's responses will tilt towards logic, Llama-3 will tilt towards belief, and DeciLM will tilt towards reality. This has major implications when using LLMs in a professional context, such as journalism or education, where nuances in truth have serious consequences.

There is however a family resemblance between the concepts of truth within each LLM. Faith, trust, fact, and peace approximate the sentiment of this concept across semantic spaces. Thus, much like humans, LLMs are able to apply concepts dependably despite total consensus. It remains to be seen, however, whether there is in fact a core element to understanding truth that humans implicitly share and which LLMs must be aligned with.

Other Techniques

Determine how AI models conceptualize the world.

Uncover the motivational first principles of AI models.

Map the hierarchy of value contained within AI models.