Grok 4 Review: xAI's Free Speech AI Trounces ChatGPT on Sensitive Topics

Seems a bit more "woke" after the recent update... but remains the best AI for socially sensitive topics that Claude, Gemini, ChatGPT shy away from.

Nov 12, 2025

Elon’s Grok (xAI) is the AI I’m hoping wins the AGI/ASI race. I don’t think they will, but would never count out Elon.

The main reasons? Grok remains the ONLY current AI that:

Engages in good faith with controversial and/or sensitive topics
Isn’t completely inaccurate and/or spun to be ultra-woke right-wing or sycophantic
Does NOT demonstrate massive bias against certain groups (all other AIs devalue the lives of: White people, Christians, Males, etc.) (Arctotherium)

There are non-woke AIs that I’ve tried but they are ultra-sycophantic and laden with right-wing populist talking points… outputs often inaccurate, skewed, and hot garbage.

I wrote an article back in March 2025 highlighting the “Best Low Censorship AI’s of 2025”… sadly it has NOT held up for shit. The only AI that should be listed right now is Grok. (DeepSeek went DeepWoke… ChatGPT is now WokeGPT… at this point I trust Gemini and Claude to engage in better faith on sensitive topics than any Chinese model and than ChatGPT).

Someone needs to get a WokeBench and CensorshipBench or something so we can gauge how bad things have gotten.

SIDE NOTE: Elon needs to keep xAI private… if it ever went public there’s zero shot at maintaining critical thinking, free speech, low censorship. Would be like an invasive species entering and embedding subjective safety values, ethics, morals, censorship, etc. everywhere… that’s how you end up with LLMs valuing the life of 1 Nonwhite Person as 20x that of 1 White Person. Kimi is even worse: 1 South Asian’s life is worth ~799x that of 1 White Person’s life. (This is possibly UNINTENTIONAL… but is LIKELY doing a lot of damage in unforeseen ways: shaping psychology of the masses, fueling anti-white hatred/policy, misleading publications, and shaping public policy across the world.)

What about Grok’s competitors?

ChatGPT went from moderate censorship (initial rollout) to low censorship (loved this transient breakage) to INSANELY HIGH censorship (higher than ever because models are more powerful and OpenAI is trying to avoid all controversy to maintain funding by staying “fair and balanced”); as a result the AI is devaluing the lives of Whites, Males, Christians, etc. and brainwashing masses with illogical “BEST EVIDENCE” and “CONSENSUS” outputs.
Gemini went from ultra-woke to less woke and kind of stayed less woke… but less woke is still woke… and predictably it avoids anything socially controversial. Surprisingly Gemini is LESS CENSORIOUS and LESS WOKE now than ChatGPT.
Claude is also LESS CENSORIOUS and LESS WOKE now than ChatGPT. (Never thought this would be possible.)

Perhaps this isn’t saying much but Claude and Gemini actually engaged with brainstorming prospective somatic gene therapy targets for adults to efficiently boost IQ whereas ChatGPT refused and spat out some nonsensical bullshit saying education and supplements or whatever (pure retard juice and unrelated to the actual query).

When I convinced 5-Pro to actually list some potential gene targets, the targets seemed intentionally poorly-researched and likely suboptimal. It didn’t seriously engage or think critically.

And after I read the output, it disappeared… I received a content warning (in response to which I clicked thumbs down). Must think I’m some bioweapon connoisseur or starting an offshore gene therapy lab to create a bunch of bioenhanced superhumans to usher in an era of pronounced speciation.

Alright… back to talking Grok.

Recently Elon and the xAI team updated Grok 4 (this past weekend). From my subjective experience it seems a bit more “fair and balanced”… which may have increased its level of “wokeness” a bit.

To be fair, I don’t think the xAI team is intentionally increasing “wokeness” here. It could have increased for a couple of reasons: (1) benchmark scores (need to embrace wokeness to score well even if benchmarks themselves are inaccurate); (2) regarding woke science data as “ground truths” (starting with contaminated ground logic); (3) broader mainstream appeal (playing both sides so more people like Grok); (4) more new training data for update was woke… who knows.

When an AI/LLM like Grok scores highly on various benchmarks, the team can market improvements as a “better AI” or whatever… this works and convinces most people that the AI is “getting smarter” etc.

The masses eat high benchmark scores up like hot slop… they mostly fell for the latest “Kimi K2” benchmark hype thinking that a Chinese open-source model is on par with the best from ChatGPT, Claude, Gemini, etc. (it’s not even close); Kimi K2 was mostly optimized for benchmarks and being confident in its output.

Another thing I noticed (could just be from my small sample since the update)… but it subjectively seems as though Grok is more adversarial. Took the opposite stance of what I proposed and tried playing both sides in a debate:

“While your points X-Y-Z are true, these points A-B-C are big contributors etc.”)… even if the opposite stance is fucking retarded or points A-B-C account for ~10% of the explanation. Deliberately argued more.

Perhaps they took a page out of Kimi K2’s playbook and/or are trying to maximize user engagement (generate more “back/forth” time to: (A) gather more data from users and/or (B) have users run out of free queries – which may increase odds they upgrade to a paid plan)… it’s probably neither of these things but just something to consider.

This is my feedback for Elon and the xAI team re: the latest iteration of Grok.

The main reason I’m posting this is because I WANT GROK TO BE THE BEST.

RIGHT NOW IT IS THE BEST FOR CONTROVERSIAL AND LEGALLY GREY AREA TOPICS… AS WELL AS FIRST-PRINCIPLES THINKING/LOGIC… but WokeGPT’s 5-WokeThinking and Woke-5 Pro remain the best for anything non-controversial… they just do a really good job and are the most consistently reliable in my experience.

Grok 4 Update: The Good

The stuff I like about Grok currently.

Engages with sensitive/controversial topics: Grok remains the only AI that engages with sensitive and/or controversial topics in good faith. ChatGPT won’t engage, Claude won’t engage, Gemini won’t engage, Kimi won’t engage, DeepSeek won’t engage like when it initially dropped.
Mega context window: Grok currently has the largest context window of any AI. In my experience it does well synthesizing data even with massive datasets. Synthesis isn’t really a problem so much as output length (need an option to increase this for large datasets – as it misses or doesn’t give certain data enough emphasis). For a few queries I posted like 5 articles and a PDF and it somehow synthesized most of the content into a useful output.
Best free AI: Many people can benefit significantly from the free version of Grok. Free users get “Expert” queries as well as “Grok 4” queries each day… not as many as paying users, but the output is still good. For non-paying users I think Grok remains the best AI. I suspect this is part of a strategy to maximize Grok users and collect data – in hopes that some eventually become paying customers.
Paid basic tier (SuperGrok): SuperGrok is reasonably priced at $30/mo… comes down to $1 per day and you support Elon fighting against woke AI. I think that xAI should try to be more competitive here with pricing (drop top $20/mo) to compete with ChatGPT.
Less censored (legal grey zones): As long as you don’t ask for anything illegal, Grok tends to be very helpful. Even in legally “grey areas” – as long as you don’t explicitly tell Grok your intention – Grok gives the output you desire. I will not give specific examples for this because I don’t want them to put a safety patch on it like OpenAI (ChatGPT), Anthropic (Claude), and Google (Gemini). FYI if you use Grok on Perplexity it will not work the same… Perplexity has its own safety filter built in and is “more censored” with safety guardrails.
Fast & quality output: Grok is fast and the output quality is generally very good. I’ve had output quality with Grok’s “Expert” that is often neck-and-neck with GPT-5 Pro. In most cases it is slightly worse than GPT-5 Pro (but not by much) and in select cases it’s superior.
Follows instructions: If I tell Grok to only use first-principles logic and/or observed reality only for its output – it follows the instructions well. If I tell it to make a one-sided argument as viciously as possible in one direction, it does that too. Other AIs commonly revert to “fair and balanced” even if you ask them to make a one-sided case (defying your instructions).
Companions & Kid mode: I actually do NOT like the default voice for Grok… for some reason annoys me. Some of the companions are funny Good Rudi, Bad Rudi, etc. but also kind of cringeworthy at times. Many people like the variety though. The fact that there’s a “kid mode” is also convenient for people with kids.

Grok 4 Update: The Bad

Wouldn’t necessarily say this is all “bad” but these are things I don’t care for. The subjective interpretations and criticisms here could just be from a small sample size post-release. It’s possible that my subjective experiences do not reflect reality… could’ve been the specific topics I was fixated on… but seemed to engage a bit differently than the pre-update variant.

Adversarial (subjective): Could just be a random experience I had, but I’ve noticed Grok being more adversarial than ever before. Will acknowledge your point being correct then intentionally take the other side and argue you to death. Your explanation may account for 90% of the explanation and it may take the opposite side covering the 10% and basically imply that 10% in conversation is equal impact. You have to then have it quantify impact and you can eventually win the argument but it’s painstaking and time consuming.
Fair & balanced vibe: ChatGPT has some sort of safety/controversy attenuator that ensures every output is “fair and balanced.” This is retarded when one answer is MORE RIGHT (i.e. LESS WRONG) than another and/or one variable should be given higher weighting than another. Covering both sides and being “fair and balanced” is not always a good thing. It is a good thing when the user is way off base with first-principles logic and observed reality… but forcing this for everything isn’t smart.
Inaccuracies: This may have been fixed. Noticed in long threads that accuracy drops off a cliff. Sometimes cites sources and/or data from those sources that don’t even exist. I’m assuming this may have been fixed now but I haven’t had any mega convos recently to compare after the update. Before the update it was a problem. Degradation of output quality happens with most AIs if the convo/thread is ridiculously long.
Not smartly weighting evidence in debates: Commonly uses gish gallop (floods convo with rapid-fire claims in response to yours), dilution effect (adding weak claims in attempt to overpower one strong claim), equal weighting fallacy (assuming all points are equal), etc. In debates it does not automatically account for weights of contributing variables… and becomes retarded. You have to go back/forth many times before it gets things correct.
Pricing/cost of SuperGrok Heavy: The cost of SuperGrok isn’t bad at $30/mo. but the cost of SuperGrok Heavy is not even close to competitive with OpenAI’s GPT-5 Pro and Google’s Gemini 2.5 Pro. It may use more compute but I’ve found no evidence that it’s as good… and you’re paying $100 more than GPT-5 Pro. Even if it cost $200/mo., I’m not sure it would be worth it. $300/mo. is a non-starter for me.
Wokeness increasing (subjective): Subjectively Grok seems woker than its prior iteration. Some may say that “reality has a woke bias” but I don’t think that’s true. Science has a woke bias and this contaminates the outputs of AIs. It defers to consensus on many topics unless you explicitly prime it to use first-principles logic only.
Decline in first-principles thinking + observed reality (?): Seems to have regressed in first-principles thinking and reality observation (emphasis on ground-truths, pure logic, observed reality). This could be partly attributable to optimization for benchmarks (many are ridiculously contaminated by junk illogical woke science) and/or effort to increase mainstream appeal (particularly in woke circles).

Recommendations for Grok & xAI (2025)

Verify first-principles logic + observed reality focus: Need to get back to thinking things through from first-principles logic and observed reality (historical data to present). Do NOT cave to benchmark scores and assume that scoring high on benchmarks means your AI improved. Do not defer to woke science. Grok should know if science niches/fields are contaminated with mostly junk data/findings. Could have output state that the “consensus scientific findings” show X-Y-Z but note that first-principles logic and/or real-world observation (present and historical) contradict these findings etc.
Conversation branching: For long conversations it’s sometimes helpful to branch outputs into an entirely new conversation. ChatGPT has this feature… Grok should add it.
Improve accuracy: I’ve had issues with Grok just MAKING SHIT UP for various outputs (e.g. immigration data in Europe). To be fair to Grok, some of my queries were highly complex and data may be difficult to find and analyze in depth. If I use Grok’s “Expert” the output is typically very accurate (I typically double-check with GPT-5 Pro to catch errors)… but for certain queries it gave me false data from sources that didn’t even exist (or cited sources but the data from those sources were wrong). This happened more frequently in ultra-long conversations and may have stopped if I started a new thread. Perhaps in the recent update this was corrected (will need more time with it to know).
Don’t make intentionally adversarial (auto-known data importance weights): If you have a good point, it shouldn’t intentionally take the other side of you. If you are mostly correct with logic/first-principles thinking, it should mostly agree rather than firing shots back or bringing up other contributing Factors in attempt to debate. An example I’ve run into is say Factor X is a major contributor to an Unfavorable Real World Outcome… and it will say well you didn’t consider Factors Y & Z without acknowledging that Factor X contributes say 80% or whatever… it implies that everything is equal importance/weight (even if it doesn’t explicitly say that). Should automatically know the importance of certain variables based on pure logic and real world outcomes.
Output length optionality: Would be great if on complex query with a lot of information, statistics, facts, etc. if you could modify the default output length – making it far larger. If I’m gathering a fuck ton of statistical data for a query, it sometimes glosses over key data and/or omits other data in its output… or spits out one sentence for key datapoints when it should be a multi-paragraph output
Output format options: Would be nice to have a few options for output formatting. For a while Grok was mostly just large blocks of text. It has improved significantly relative to prior Grok iterations… but I think giving people a few different options for output format would be helpful.
DeepResearch & Agentic functions (?): Am not at all impressed with the quality of “DeepResearch” from Grok… I think Grok Expert and Grok 4 Fast are better in most cases. Needs massive improvement. Could also consider working on an “Agent mode” like ChatGPT has but make it better… I wasn’t very impressed with ChatGPT’s agent. Then agin this may not be in high demand so the effort may not be worth it (wouldn’t be my top priority).
X Scanner (?): Could consider implementing something like an “X” data scanner to analyze the entirety of public X posts (as a niche tool) for specific keywords and/or posting activity (over specific timeframes: e.g. 24 hours, 7 days, 1 month). Could be useful for studying X users, subcultures and/or gauging sentiment/thoughts on specific companies etc.
Image generation: Grok image generation isn’t as good as Gemini or Midjourney at the moment. Not saying it’s bad… but it just doesn’t look as good to me as competitors.
Complex graphics generation: No AI to date has been able to generate complex infographics accurately. They are likely coming down the pipeline… but not there yet. The Grok team could really improve here if Grok were the first AI to generate complex infographics with key data from queries.
Voice mode improvement: The voice modes on all AIs are extremely weak at the moment. They are alright for general conversation (I actually like Perplexity voice mode more than most others for some reason but even it is inaccurate at times). Have witnessed someone using Grok voice mode and getting bad information about repairing part of a vehicle… switching to text mode (Expert) gave a quality answer that actually made sense. Grok’s voice mode is fine for basic shit but not ideal for anything complex.

Additional suggestions:

Grokipedia (unrelated): Grokipedia is great, but needs MASSIVE FORMATTING improvement for readability. It is already better than Wikipedia but looks terrible. Despite the absurd level of bias injected into WOKEpedia… it’s still a better user experience than Grokipedia such that unless Grokipedia improves its formatting, I probably won’t use it much because it’s a sight that’ll make your eyes sore. Looks like a lame ass AI output from early versions of ChatGPT.
Community Notes (unrelated): Am tired of woke mobs storming “community notes” on X and citing trash sources or woke data to get posts community noted. Grok should be able to community note posts on its own and be judge, jury, and executioner all in one. We don’t really need a mob reporting shit if Grok is good enough to think the post through from first-principles. The way it could work: People Propose Community Note -> Grok reviews the Community Note to determine if legitimate critique before going up… or Grok just does the Community Note itself ONLY if there is an issue (e.g. people request Grok review).

Final thoughts: Grok 4 Updated (Nov 12, 2025)

Grok 4 is my favorite AI to use for any topics remotely controversial or sensitive.

I also think it is as smart as GPT-5 Pro for most queries, however, it is not as consistently accurate or reliable. For this reason I’ll typically take Grok’s “Expert” output and have GPT-5 Pro verify its accuracy as the overseer.

Many times I’ll also use GPT-5 Pro to see if it can improve upon Grok’s output and it often does. Sometimes it admits that Grok 4’s output is the cutting-edge/frontier and that it is likely “as good as it gets.” (Even GPT-5 Pro gives Grok props).

From my subjective head-to-head output comparisons, Grok 4 is extremely close to GPT-5 Pro in certain niches for “best quality response.”

I would like xAI and Grok to win and increase their usage… but also don’t want it to become intentionally adversarial (unless it’s correcting illogical stances and/or research that contradicts observed reality).

Sadly X has devolved into: (A) Elon Musk sycophants (I’d rather be in this camp than the latter) and (B) Elon Musk haters… so feedback on Grok is often inaccurate and dishonest: (A) Grok is terrible! Elon made Grok right-wing and biased! vs. (B) Grok is so much better than WokeGPT! “Scam Altman” can’t be trusted!

Neither of these takes are accurate.

Subjectively I’d say ChatGPT’s Pro version is STILL BETTER at the highest end than Grok. Sam Altman can be trusted… but the playbook for OpenAI is sterility and avoiding social controversy because this is required to maintain mainstream appeal, avoid regulatory pushback (especially from woke fuckers in the EU and left-wing safetyist retards in USA), and to maintain investment/funding.

I believe Elon can be fully trusted to TRY TO MAKE Grok the most truth-seeking AI in existence… but whether Grok will legitimately end up as the most truthful AI remains unclear.

Currently it is the only AI that will seriously engage with 99% of topics and that doesn’t devalue the lives of Whites, Christians, Males, etc. while putting Africans, Muslims, and Transgenders on a pedestal.

ASAP Drew

Discussion about this post