# Evaluating VLM MemAgents

Evaluation on the Network is designed to be community-driven, leveraging collective insights to strengthen model alignment and robustness. To this end, we structured the data collection and labeling process as an interactive set of tasks within the network, where community members participate in labeling data points and assessing model outputs.

We set up specific labeling tasks to capture a wide range of model behaviors, creating both clean and adversarial samples for a thorough analysis. These labeling tasks were designed to be user-friendly and accessible, ensuring broad participation across the community. To guide contributors, we provided detailed labeling guidelines, clearly defining the criteria for classifying outputs as "aligned" or "misaligned." This ensured that labels were applied consistently and accurately, even when handling complex or ambiguous outputs.

<figure><img src="/files/DYzwNpN9OBgleVO9lyzy" alt=""><figcaption><p>Table 1. OOD Evaluation of VLM Reward MemAgents.</p></figcaption></figure>

For quality control, we implemented a consensus-based approach, where multiple contributors reviewed each data point. In cases of disagreement, the final label was determined by taking the median of all contributions, fostering a reliable, community-driven labeling process. This approach also includes mechanisms to reward high-quality, consistent labeling, ensuring that contributors are incentivized to maintain high standards.

Through these community-driven labeling tasks, we are able to create a robust real-world dataset that reflects a diverse range of perspectives, strengthening our ability to evaluate and improve model alignment across different contexts.                                      &#x20;

## Alignment Score Distributions&#x20;

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXe4ATw-DCiTccOnW4CPlaogfVbogJIq0MqHQwGyTuPBRNTcXbRR3p6Z8atGhFBztrPVFHs7QZQFf1apvFVY7Lka3rcu_i7iKhVtX4i13poLKAjsHCdVJOArxDjZDhaAnmdd7m_x3resW6qD94O2TTMrJcjs?key=h1Ssaqvmteua0nGkW6FqNA" alt="" width="563"><figcaption><p>Box Plot (Fitting Normals)</p></figcaption></figure>

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXc_X6qyOCnurPO5Yz3fd7BxKKa6K0RatqTvFj-RxsXwL0qfZLM8JG47nAEjdNkF4OWaRXk-zqxuDShRFjEXfq553PFU7u_LZlUyU8yqXQ-kQ3QEJoJ0ro8znekHtBfcWJ1EqblwbOem1ggB6ChxMpNvD3yr?key=h1Ssaqvmteua0nGkW6FqNA" alt="" width="563"><figcaption><p>Violin Plots (Fitting Normal)</p></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.eidon.ai/the-network/reward-mechanism/technical-details/evaluating-vlm-memagents.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
