ARKAI Research Projects | AI Safety, Robustness & Transparency

LLM Control

Understanding Generative AI

Model Unlearning, Reasoning and Steering

We delve into the high-dimensional latent space of LLMs to uncover how knowledge is encoded. Our research focuses on mechanistic interpretability to decode the 'inner circuits' of reasoning, enabling precise behavior steering and machine unlearning.

Reasoning and Steering

RobustnessPrivacy

AI Security & Privacy

Adversarial Attacks & Defensive Schemes

This research focuses on protecting AI models from manipulation. It explores ways to prevent attacks that exploit weaknesses, ensuring AI makes reliable and secure decisions.

Trustworthy AI

DetectionDeepfakes

AI Authenticity & Deepfakes

Verification and Generation Artifacts

We focus on improving AI’s ability to generate natural content while developing robust methods to detect AI-written text and deepfake speech, ensuring authenticity and preventing misuse.

Authenticity

XAI

Explainable AI (XAI)

Transparency & Trust

Explores making AI decisions more transparent while addressing vulnerabilities in explanations. It aims to ensure AI insights are trustworthy, resistant to manipulation, and easy for end users to understand.

Explainability

Social Good

NLP for Social Good

Real-world Impacts

We apply our developed methodology in various critical NLP applications, from healthcare to education and social media analysis.

Impact