Yonsei University’s Breakthrough AI Research — Powered by AWS Trainium on Theta EdgeCloud

Theta Labs5 min read·Just now

Two landmark papers on personalized AI reward modeling, trained on AWS Trainium instances via Theta EdgeCloud Hybrid, mark a new era for decentralized AI infrastructure in academic research.

Press enter or click to view image in full size

We are proud to announce that Yonsei University’s Data & Language Intelligence Lab — led by Professor Dongha Lee — has published two groundbreaking research papers in personalized AI reward modeling, with experiments conducted on AWS Trainium instances deployed through Theta EdgeCloud Hybrid.

These results represent a significant milestone: world-class academic AI research running on decentralized cloud infrastructure, at scale, with reproducibility that traditional compute solutions struggle to match.

Press enter or click to view image in full size

The two papers — PIGReward and P-Check — tackle one of the hardest open problems in modern AI: how do you build a model that doesn’t just satisfy an average user, but adapts to the unique preferences of each individual?

Paper 1: PIGReward

Personalized Reward Modeling for Text-to-Image Generation

The Problem It Solves

AI image generators like Stable Diffusion or DALL·E can produce stunning visuals — but whether a generated image is actually good depends entirely on who is looking at it. One person values realism; another wants vibrant color; another prioritizes minimalist composition. Standard reward models evaluate images against a single universal rubric, missing the rich diversity of individual taste.

What PIGReward Does

PIGReward introduces a personalized reward model for text-to-image generation with two core innovations:

Dynamic, user-specific evaluation criteria — instead of a fixed scoring rubric, the model generates evaluation dimensions tailored to each individual user’s aesthetic preferences.
Chain-of-Thought (CoT) reasoning — images are assessed through step-by-step reasoning, making the evaluation transparent, explainable, and more accurate.

Critically, PIGReward addresses the cold-start problem of personalization — what do you do when a user has very little history? Its self-bootstrapping strategy constructs a rich user context from just a small number of reference images, enabling personalization without retraining the model for each user.

Beyond scoring, PIGReward also generates personalized feedback that can drive prompt optimization — directly improving what the model generates next for that specific user.

🔑 Key Insight

PIGReward reframes image quality evaluation as a personalized, reasoning-driven process — not a one-size-fits-all metric. This is the difference between a rating system and a genuinely intelligent creative collaborator.

Paper 2: P-Check

Advancing Personalized Reward Model via Learning to Generate Dynamic Checklist

The Problem It Solves

Large Language Models (LLMs) are increasingly deployed as personal AI assistants. But the reward models used to align their behavior — trained on global, averaged preference data — don’t reflect how different users actually judge quality. The same response can be great for one person and completely miss the mark for another.

Existing personalized reward approaches treat each user’s context as a static persona: a fixed description inferred from their history. This misses two key dynamics: what concretely drives a user’s judgment in a specific context, and how those drivers shift from task to task.

Press enter or click to view image in full size

What P-Check Does

P-Check introduces a plug-and-play checklist generator that dynamically creates query-specific evaluation criteria drawn from each user’s interaction history. Rather than a static persona, the judge receives a live checklist — explicit, actionable criteria tuned to both the user and the current task.

This mirrors how humans actually evaluate: we don’t apply the same rubric to every situation. Judging code quality, essay style, and recipe suggestions each requires a different lens.

The Training Innovation: Preference-Contrastive Criterion Weighting

Simply distilling checklists from annotated preference pairs produces generic criteria that mix objective quality with subjective taste. P-Check solves this with a two-step training strategy:

Inter-User Contrastive Sampling — each preference pair is augmented with responses generated for users with divergent preferences, creating sharper, more personalized contrast signals.
Personalized Saliency Scoring — each criterion is scored by measuring how much the model’s discriminative power drops when that criterion is removed, ensuring only the most diagnostically valuable criteria are weighted heavily.

The result: P-Check consistently outperforms existing personalized reward models across multiple benchmarks, including out-of-distribution settings. Its checklist outputs also serve as direct verbal feedback to the generator — enabling lightweight personalization without updating any model parameters.

🔑 Key Insight

P-Check shows that personalization isn’t just about knowing who the user is — it’s about dynamically understanding what they care about right now, for this specific task. That distinction is what separates genuinely personalized AI from a system that merely remembers your name.

Why This Matters for Theta Network

Both papers were trained and validated using AWS Trainium Trn instances deployed through Theta EdgeCloud Hybrid — making this a direct demonstration of what our infrastructure enables at the frontier of AI research.

Three Industry Firsts Behind This Work

First decentralized platform approved by AWS to integrate custom AI silicon — Trainium and Inferentia.
First blockchain network to deploy Amazon’s next-generation AI chipsets for real-world production workloads.
First institution-level customer to adopt Trainium-powered Theta EdgeCloud Hybrid for advanced AI research.

Personalized reward modeling requires training on large-scale preference datasets with millions of simulated user interactions. The compute demands are substantial — and reproducibility is everything in academic research. AWS Trainium on Theta EdgeCloud delivered both: high-performance training at cost efficiency that traditional cloud infrastructure cannot match, with the deterministic, reproducible results that peer-reviewed research demands.

“Theta EdgeCloud has been an integral part of our research infrastructure over the past year. With the addition of AWS Trainium, we can now scale our experiments faster, more efficiently, and with greater reproducibility. This enables us to push the boundaries of conversational AI and recommendation systems in ways that were previously not practical.”
- Professor Dongha Lee, Yonsei University

“Yonsei University’s adoption of AWS Trainium on Theta EdgeCloud Hybrid is a perfect example of how decentralized blockchain infrastructure and cutting-edge AI silicon can work hand-in-hand to accelerate world-class research.”
- Mitch Liu, Co-founder and CEO, Theta Labs

The Bigger Picture

PIGReward and P-Check are not just academic papers. They represent a vision for the future of AI: systems that learn individual human preferences with precision, adapt dynamically to each interaction, and provide transparent, reasoned evaluations rather than black-box scores.

That future requires infrastructure that is performant, cost-accessible, and reproducible at scale. Theta EdgeCloud — powered by AWS Trainium — is built to be exactly that for the global AI research community.

As AI moves from general-purpose to deeply personalized, the infrastructure it runs on matters more than ever. We’re proud to be the platform that helped bring these breakthroughs to life — and we’re just getting started.