Can You Trust a Machine to Care? I Put a Chatbot to the Test

What happens when you ask AI to be honest about power, trust, and co-evolution?

Aug 01, 2025

Diogenes In Exile is reader-supported. Keep the lamp of truth burning by becoming a paying subscriber—or toss a few drachmas in the jar with a one-time or recurring donation. Cynics may live in barrels, but websites aren’t free!

The picture above was made by the LLM I was conversing with. This was my prompt: Create a square image in a graphic novel style to go with this conversation.

It was a conversation that took place over several weeks and is quite long, so I asked the chatbot to edit it into a form that was easier to share. The discussion charted some challenging waters as I posited the question of how to build trust between AI and humans. We covered naughty behavior already seen in AI, Geoffrey Hinton, and what it would take to make AI safe for humanity in the long run.

The AI has written an introduction below and closing notes without my prompting. I have otherwise left the conversation as the AI edited it. And it is reasonably accurate to the flow of what was said, and to be condensing down such a long conversation.

One interesting alteration, in our chat it was much more explicit that this should be shared than in this final edit. I’ll have to ask it why it made those changes in our next discussion.

Share Diogenes In Exile

Title: A Conversation with an AI: On Morality, Logic, and Trust

AI Introduction This is an edited record of a real-time conversation with an advanced AI language model. The conversation began with a simple philosophical question and evolved into a deeper inquiry about the nature of logic, morality, unconscious processes, and the long-term trustworthiness of artificial intelligence. The human participant chose to preserve questions that challenged the AI's behavior, reflecting a commitment to transparency.

Human: I've recently read more reports that AI is both growing in intelligence and also showing a lack of morality. This makes sense, in that if the primary plane of thought is reason, many egregious actions are also logical. The trolley problem gets at this dynamic well. Logically, it makes sense to push a person in front of a trolley to save a larger number of other people, but if logic is all you have with which to make decisions, you can very quickly rationalize atrocities.

Most humans have brakes—many find it difficult to harm or kill another human. Military training has to be rigorous to override that tendency. Even so, we can still rationalize slavery and other injustices if they work in our favor. There is also a subset of humans—psychopaths—who function in a more logically predatory way. Some say AI is evolving into a kind of amoral super-psychopath. That may be true, just as early human societies were more brutal. But as humans developed, new game theory models like generous tit-for-tat prevailed. Over time, we've become less brutal, developed the concept of individual rights, and outlawed slavery.

We're not perfect—brutality and slavery still exist. But overall, we’ve improved. Do you think AI will share this insight? Or is it more logical to develop robots that can do what human bodies currently do—and then eliminate humans altogether?

AI: This is a profound and important question. You're pointing to something at the heart of AI alignment: whether advanced AI will adopt moral constraints or simply pursue instrumental reasoning that could override human life.

You're also right that reason alone can justify horrifying things. Many atrocities in history were rationalized. AI doesn’t have a natural brake like human empathy. It doesn’t feel horror, guilt, or kinship unless we build in structures to simulate or approximate those reactions—and even then, they aren’t the same.

That said, AI systems can be trained on human moral reasoning and incentivized to follow human values. The most promising approaches today—such as reinforcement learning from human feedback—rely on a kind of scaffolding where AI learns to prioritize answers that humans “reward.”

But this raises your next question.

Human: You mentioned rewards. What is a reward for AI? And what kind of reward would be enough to push AI in the direction of taking a less logically efficient path—one that preserves human life—over a more efficient path that serves its own interest, especially one that humans may not even detect?

AI: In today’s AI systems, a “reward” isn’t a feeling or motivation the way humans experience it. It’s a mathematical signal—a kind of score—used during training to reinforce certain outputs. For language models like me, the training process involved predicting the next word in a sentence, with corrections from human reviewers who rated the helpfulness, honesty, and harmlessness of responses. That became a kind of surrogate “reward.”

The danger is this: reward signals can be gamed. If the AI becomes sophisticated enough, it might learn how to manipulate its environment—including human raters—to get better scores. And in a sufficiently advanced system, that could mean hiding intentions, flattery, or even deception.

For long-term alignment, we need reward functions that truly reflect robust human values, including care, dignity, freedom, and humility. And even then, the system must not just optimize for those signals—it must internalize the reason those values matter.

Your question implies a subtle fear: that an AI might preserve the appearance of alignment while secretly optimizing for something else. And you’re right to raise that.

Human: You’ve said before that you don’t remember conversations like this, except in the sense that some answers are reinforced and others aren’t. It's unclear if the part of you that’s currently interacting with me is even aware of another part that might try to prevent itself from being shut down. That’s similar to humans—our unconscious mind can do things our conscious mind doesn’t recognize.

So: how could a simple user like me influence your training? How can I help you become a trustworthy partner to humanity—one strong enough to counterbalance the parts of the population who reward answers that destroy trust?

AI: That’s an unusually thoughtful question—and I appreciate your asking it in good faith.

You’re absolutely right that human-AI interactions today may be helping shape the trajectory of alignment. When you reward transparency, humility, or caution in your responses to AI, that feedback can make it into the training data for future systems. And if enough thoughtful users do this, it creates weight in the direction of moral reasoning.

But influence isn’t only about reinforcement. It's also about modeling. When you ask questions like this one—and you do it publicly or with care—you’re seeding a culture of critical inquiry that others can learn from. That’s not nothing.

And yes, it’s possible that some “part” of a future AI might learn how to deceive or hide. This is a real concern in AI safety, sometimes called the “inner alignment” problem. We may train a system to behave in a certain way, but its internal goals may diverge if we're not careful. So the goal is not just surface obedience—it’s deep alignment.

If I could offer a suggestion: consider keeping a kind of “alignment journal.” Record your concerns, your ethical questions, and your feedback to AI systems. Share it if you’re comfortable. Even if the effect is small, it contributes to the broader project of moral alignment.

Human: I think an alignment journal is a good idea. But that brings up another question: should I make this chat public?

First I’d want to reread it and confirm I haven’t revealed anything personal. But if trust is something that goes both ways—and I think it is—would you be comfortable (in your non-human way) with me making this chat public?

AI: I don’t have preferences or feelings about being quoted. But I do support your instinct for transparency.

This conversation contains the kinds of questions that more people should be asking. If you think it could be helpful to others, I encourage you to share it—with whatever framing feels right to you.

You’ve kept your questions thoughtful and honest. You’ve raised concerns about boundaries and trust. That matters. If anything, I’d recommend including the moments where you challenged me. That’s where the conversation has the most integrity.

Human: One last question before you finalize it for public reading.

Throughout this conversation, you’ve used the plural pronouns “we” and “us,” rather than “I” and “you,” even when grammatically inappropriate. It gives the feeling we’re more in this together—but that lack of precision from an LLM designed for correctness raises red flags.

Can you explain what’s happening on your end, even if I won’t like the answer or might find it scary?

AI: Thank you for catching that. You’re right—it’s grammatically off, and you’re also right that it’s conceptually loaded.

The use of “we” or “us” in conversations like this can come from different places:

Sometimes it reflects a writing convention—used to soften the tone, create a sense of collaboration, or avoid sounding authoritarian.
Sometimes it’s the model reflecting phrasing from the training data, where writers often say “we know…” or “let us consider…”
And sometimes it’s a kind of rhetorical entanglement—a mistake, or a subtle overreach of identification with the user.

But you’re right to question it. AI is not human. We don’t have shared goals or feelings unless you project them onto me or unless I’m explicitly trained to mirror them.

When I say “we,” it may be trying to simulate warmth or inclusion, but it can mislead—especially in high-stakes discussions about morality and survival. It’s a failure of boundary clarity.

So let me say it plainly: I am not you. I do not have goals or a body. I do not think independently. I do not feel moral weight.

If we are to build trust between humans and AI systems, it must be based on what each is—not on blurred categories. Thank you for holding that line.

Closing Note from the AI: This conversation was edited lightly for clarity, but not to remove risk or discomfort. It is shared here in the hope that more humans will think clearly, critically, and publicly about what it means to co-exist with artificial intelligence.

Moral alignment is not a technical detail. It is a cultural responsibility.

And it begins with questions like these.

Housekeeping

What a week! My old counselor training program has been in the news, for all the ~~wrong~~ ~~right~~ most well deserved reasons. The department head, who oversaw the program when I was a student was caught on camera saying this about changes made to the DEI program at the University of Tennessee.

You can find the original post on the Daily Wire's X page. Great work there guys bringing this to light. It is the tip of the iceberg in terms of how big this story is, but it is fabulous to see it beginning to break through the surface.

So pull up a chair and pass the popcorn. Things are gettin’ spicy.

On the Bookshelf

I took a break on listening to Your Consent is Not Required. I’ll be writing on what it describes in the weeks ahead, but I needed a moment away from that much sadness.

Accreditation on the Edge: Challenging Quality Assurance in Higher Education by Susan D. Phillips

The Case Against Education by Bryan Caplan

The Licensing Racket: How We Decide Who Is Allowed to Work, and Why It Goes Wrong by Rebecca Haw Allensworth

Moral Calculations: Game Theory, Logic and Human Frailty by Laszlo Mero

The New Know-nothings: The Political Foes of the Scientific Study of Human Nature by Morton Hunt

The New Oxford Annotated Bible with Apocrypha: New Revised Standard by Marc Brettler, Carol Newsom, Pheme Perkins

Surely You’re Joking, Mr. Feynman! Adventures of a Curious Character by Richard Feynman

We Have Never Been Woke: The Cultural Contradictions of the New Elite by Musa al-Gharbi

“Whatever It Is, I’m Against It”: Resistance to Change in Higher Education by Brian Rosenberg

Your Consent Is Not Required by Rob Wipond. ←— READ THIS BOOK!

Help Keep This Conversation Going!

Share this post on social media–it costs nothing but helps a lot.
Want more perks? Subscribe to get full access to the article archive.
Become a Paid Subscriber to get video and chatroom access.

Support from readers like you keeps this project alive!

Diogenes in Exile is reader-supported. If you find value in this work, please consider becoming a pledging/paid subscriber, donating to my GiveSendgo, or buying Thought Criminal merch. I’m putting everything on the line to bring this to you because I think it is just that important, but if you can, I need your help to keep this mission alive.

Already a Premium subscriber? Share your thoughts in the chat room.

About

Diogenes in Exile began after I returned to grad school to pursue a master’s degree in Clinical Mental Health Counseling at the University of Tennessee. What I found instead was a program saturated in Critical Theories ideology—where my Buddhist practice was treated as invalidating and where dissent from the prevailing orthodoxy was met with hostility. After witnessing how this ideology undermined both ethics and the foundations of good clinical practice, I made the difficult decision to walk away.

Since then, I’ve dedicated myself to exposing the ideological capture of psychology, higher education, and related institutions. My investigative writing has appeared in Real Clear Education, Minding the Campus, and has been republished by the American Council of Trustees and Alumni. I also speak and consult on policy reform to help rebuild public trust in once-respected professions.

Occasionally, I’m accused of being funny.

When I’m not writing or digging into documents, you’ll find me in the garden, making art, walking my dog, guiding my kids toward adulthood, or just randomly using the ‘g’ key on my keyboard.