v0.1 — Early Access

PREF.NET

A scalable network for preferences and environments.

The universal foundation for intelligence, governance, and alignment

// Initializing System...
░▒▓███████▓▒░░▒▓███████▓▒░░▒▓████████▓▒░▒▓████████▓▒░ 
░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░      ░▒▓█▓▒░        
░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░      ░▒▓█▓▒░        
░▒▓███████▓▒░░▒▓███████▓▒░░▒▓██████▓▒░ ░▒▓██████▓▒░   
░▒▓█▓▒░      ░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░      ░▒▓█▓▒░        
░▒▓█▓▒░      ░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░      ░▒▓█▓▒░        
 ░▒▓█▓▒░      ░▒▓█▓▒░░▒▓█▓▒░▒▓████████▓▒░▒▓█▓▒░
v0.1

Preference is a network for scaling preference data and environments, both human and programmed. It provides a universal foundation for intelligence, governance, and alignment, enabling systems to learn what entities value at scale.

What are Preferences?

The universe itself is governed by preferences, arrows that point toward truth and value.

Physical laws impose a directional bias on reality: under the law of gravity, an apple “prefers” to fall to the ground rather than other directions. This is not preference in the human sense, but it is a procedural ordering of possible trajectories, a natural constraint that shapes how time unfolds.

In artificial environments, reinforcement learning defines yet another layer of procedural preference, expressed through transitions and rewards. For instance, a trajectory in which an apple falls to the ground may be assigned a positive reward, whereas a trajectory in which the apple ascends contrary to gravity may be penalized with a negative reward. This outlines the concept of Reinforcement Learning from Verifiable Rewards (RLVR), where the reward is verified and programmed against the underlying physics, logic, or collective human consensus that defines what is real or expected.

On top of these physical and procedural preferences, humans construct subjective preferences: Human value sweetness, which in turn actively influences how apples are cultivated and bred over time to align with our taste and preferences. The same goes for AI Alignment—through processes like Reinforcement Learning from Human Feedback (RLHF), we are cultivating our intelligent systems according to our own value signals.

// In our network these forms are not separate, they are the manifestation of the same principle:

  • PhysicalPhysical preferences anchor reality in lawful regularity.
  • ProceduralProcedural preferences encode structured trajectories within designed environments.
  • HumanHuman preferences project value onto futures, transforming neutral time into meaningful direction.

Environments as Preferences

In reinforcement learning, a model improves by interacting with its environment, the system that defines success and delivers reward. The reward signal tells the model which behaviors are useful and which are not, effectively setting the boundaries of what the system can learn.

But the reward function is not simply a technical parameter; it encodes choices about which outcomes truly matter. As AI systems expand into complex, open-ended environments, their effectiveness will hinge on how faithfully their rewards reflect the values humans collectively hold.

Establishing a consensus of those values at scale is therefore not optional, it is the essential substrate for designing environments that foster aligned artificial intelligence.

The Arrow towards Intelligence

To have preference is to orient time.

Once a system holds preference, it no longer experiences the future as undifferentiated; it gains the ability to rank outcomes, to anticipate which trajectories are better than others. Preference is thus the minimal seed of prediction. And prediction is the root of intelligence: the capacity to navigate uncertain futures by aligning actions with desired outcomes.

In this sense, preference is the substrate of intelligence itself. Just as physics provides the arrow of entropy, preference provides the arrow of value, a guide for intelligent agents.

v0.1 - Why are we collecting preferences?

// The collected preferences serve three goals:

  • 1Live Evaluation — providing real-time assessment of models and agents,
  • 2Training — enabling reinforcement through large-scale preference data, and
  • 3Foundational Data Layer

Forming the bedrock for agents capable of simulating human values and constructing new environments themselves, ultimately advancing toward continuous learning.

Start Contributing Now

Experience preference collection in action. Compare AI-generated images and help build a dataset for training better models.