# Alignment Score

Given a data point $$\mathbf{x} \in \mathcal{X}$$ uploaded in response to instruction $$\mathbf{y} \in \mathcal{Y}$$, the alignment score $$s(\mathbf{x}, \mathbf{y})$$ quantifies how well the response data point adheres to the requirements set by the instruction.

$$
s(\mathbf{x}, \mathbf{y}) : \mathcal{X} \times \mathcal{Y} \to \[0, 1]
$$

For ease of exposition, we will first assume that the instructions are free-form text, while the data uploaded in response is one of image, audio, video, or text. Later, we also show how these building blocks are combined to support multi-modal instructions and data.

At a high level, the idea is to map both the user post $$\mathbf{x}$$ and the instruction $$\mathbf{y}$$ to a joint embedding space that fosters the proximity of semantically similar samples and separation of dissimilar ones. Then we can simply compute the alignment score using a suitable distance measure $$\mathcal{D}$$, defined over the joint embedding space.

Specifically, we assume access to mapping functions (encoders): $$f\_w^{\mathcal{X}}(\cdot): \mathcal{X} \to \mathcal{H}$$ and $$f\_w^{\mathcal{Y}}(\cdot): \mathcal{Y} \to \mathcal{H}$$, mapping data and instruction to a joint embedding space where $$\mathcal{H} \in \mathbb{R}^k$$ denotes a separable Hilbert Space equipped with inner product operation.

Given an instruction $$\mathbf{y} \in \mathcal{Y}$$ and corresponding user-uploaded data $$\mathbf{x} \in \mathcal{X}$$, we can compute the embedding vectors (representation in the joint embedding space):

$$
\mathbf{z}*{\mathbf{x}} = f\_w^{\mathcal{X}}(\mathbf{x}), \quad \mathbf{z}*{\mathbf{y}} = f\_w^{\mathcal{Y}}(\mathbf{y})
$$

Once we have vector representations of the instruction and corresponding sample on the shared embedding space, we measure alignment as the cosine similarity (normalized inner product) with a Rectified Linear Unit (ReLU) activation.

$$
s(\mathbf{x}, \mathbf{y}) = \max \left(0, \frac{1}{\tau} \frac{\mathbf{z}*{\mathbf{x}}^T \mathbf{z}*{\mathbf{y}}}{|\mathbf{z}*{\mathbf{x}}| |\mathbf{z}*{\mathbf{y}}|}\right)
$$

Intuitively, we project the sample and instruction onto a hypersphere $$\mathcal{S}^{k-1}\_1 = {\mathbf{z} \in \mathbb{R}^k : |\mathbf{z}|=\frac{1}{\tau}}$$ and compute the *angular distance* between the semantic embeddings of the sample and instruction. Here, $$\tau \in \mathbb{R}^+$$ is a hyper-parameter that balances the spread of the representations on $$\mathcal{S}^{k-1}\_1$$.&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.eidon.ai/the-network/reward-mechanism/technical-details/post-quality-memagent/alignment-score.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.