Blog
How to deanonymize web traffic from ChatGPT and Perplexity
How to deanonymize web traffic from ChatGPT and Perplexity
Deanonymizing AI traffic requires combining GA4 custom channels, server-side log analysis, and visitor identification tools. ChatGPT drives 50% of AI traffic to 63% of websites, yet standard analytics miss these visitors due to JavaScript handoffs and HTTPS defaults that strip referrer data. Modern platforms achieve 216% sales efficiency improvements by revealing high-intent visitors through behavioral scoring.
Key Facts
• AI search engines strip referrer data through JavaScript handoffs and HTTPS defaults, creating "dark traffic" that appears as direct visits in analytics
• Server access logs capture 64 crawls for every ChatGPT referral visit, revealing patterns invisible to client-side tracking
• GA4 regex patterns like .*chatgpt.*|.*openai.*|.*perplexity.* can identify AI sources when properly configured in custom channel groups
• Real-time visitor ID platforms achieve 4.5x more revenue versus competitors by combining identification with intent scoring
• Side-channel attacks can reconstruct 73% of user prompt content from encrypted LLM traffic patterns despite TLS protection
• GDPR-compliant implementation requires transparency in privacy policies while maintaining strict data collection boundaries
AI search engines now drive millions of silent clicks, yet most analytics stacks cannot deanonymize web traffic from those sources. This post shows why the blind-spot exists and how growth teams can surface identities responsibly.
Why are AI referrals suddenly invisible in your analytics?
The scale of AI-driven traffic has exploded faster than most tracking infrastructure can handle. ChatGPT became the fastest app in history to reach 100 million users, and by February 2025, it had 400 million weekly users; a number that doubled to 800 million by now, according to OpenAI CEO Sam Altman at TED 2025.
This massive shift has created a critical tracking challenge. 63% of websites already receive AI traffic, with ChatGPT alone driving 50% of it. Yet traditional analytics tools fail to properly attribute this traffic, leaving revenue teams blind to a rapidly growing channel.
The problem compounds when you consider the quality of these visitors. AI platforms like ChatGPT, Gemini, Claude, Microsoft Copilot, and DeepSeek are changing how people discover content. Unlike traditional search users, AI-assisted visitors often arrive further down the conversion funnel, as AI models filter out irrelevant queries and send more qualified traffic.

How do ChatGPT & Perplexity strip referrer data?
They strip referrer data through JavaScript hand-offs and HTTPS defaults that prevent standard analytics from seeing the traffic source. The technical architecture of AI search engines creates fundamental tracking challenges that standard analytics cannot overcome.
Sources of un-trackable clicks, also known as "Dark Traffic," include AI-generated links that don't pass referrer information due to browser hand-off mechanisms and HTTPS defaults. ChatGPT is mostly a JavaScript-based site, where the JS engine in Chrome and other chromium variants have different referral and Cross Domain settings that affect tracking.
The definitive solution for achieving complete traffic visibility requires shifting the point of data collection from the client's browser to the web server itself. Traditional analytics rely on client-side JavaScript that AI crawlers don't execute. For example, even though Peasy's website is relatively new, for every single referral visit ChatGPT sends, it makes ~64 crawls, with the most active user agent being ChatGPT-User, reflecting pulls when users interact via prompts.
[OpenAI can employ UTM](https://www.searchengineworld.com/searchgpt-and-chatgpt-referral-tracking-what-seos-and-webmasters-need-to-know?utm_source=brevo&utm_campaign=RnD Newsletter 2025 S09 384&utm_medium=email) parameters when linking out from AI-generated answers. Watch for terms like utm_source=chatgpt, utm_medium=ai_, or utm_campaign=openai_suggestions. However, when these parameters are missing, the traffic appears as direct, creating significant attribution gaps.
Which practical methods deanonymize AI traffic today?
Multiple approaches exist to surface AI-driven visitors, ranging from simple GA4 configurations to advanced visitor identification scripts. Each method offers different levels of accuracy and implementation complexity.
Example regex for AI sources includes patterns like .*gpt.*|.*chatgpt.*|.*openai.*|.*neeva.*|.*writesonic.*|.*nimble.*|.*outrider.*|.*perplexity.*|.*google.*bard.*|.*bard.*google.*|.*bard.*|.*edgeservices.*|.*gemini.*google.*. This comprehensive pattern captures the major AI referral sources that your analytics might otherwise miss.
GA4 doesn't automatically track AI-driven traffic, but custom channel groups and regex rules can help you surface it accurately. Create a custom channel group in Google Analytics 4 to track AI traffic by navigating to your channel configurations and adding the appropriate filters.
Server access logs provide an unfiltered, complete record of every single HTTP request that reaches your server, regardless of the client's nature or capabilities. This server-side approach captures traffic that client-side JavaScript misses entirely.
GA4 regex & custom channels step-by-step
Setting up AI traffic tracking in GA4 requires specific configuration steps that many teams overlook. Add this RegEx to the Filter: (?i)(.*gpt.*|.*chatgpt.*|.*openai.*|.*neeva.*|.*writesonic.*|.*nimble.*|.*outrider.*|.*perplexity.*|.*google.*bard.*|.*bard.*google.*|.*bard.*|.*edgeservices.*|.*gemini.*google.*). The (?i) prefix makes the pattern case-insensitive, crucial for catching variations.
Look for rows like 'chatgpt.com / referral', 'perplexity.ai / referral', 'copilot.microsoft.com / referral', 'gemini.google.com / referral', or similar patterns in your source/medium reports. These entries indicate successfully tracked AI traffic.
Match user-agents in raw HTTP logs
Server logs reveal patterns invisible to JavaScript-based analytics. Server access logs provide complete visibility into every HTTP request, including those from AI crawlers that never execute client-side tracking code.
[OpenAI-powered browsing](https://www.searchengineworld.com/searchgpt-and-chatgpt-referral-tracking-what-seos-and-webmasters-need-to-know?utm_source=brevo&utm_campaign=RnD Newsletter 2025 S09 384&utm_medium=email) and link-following activity may have distinct user agents such as Mozilla/5.0 (compatible; OpenAI; +https://openai.com/bot). By parsing these user-agent strings in your server logs, you can differentiate between crawler visits for content indexing and actual user clicks from AI chat interfaces.
Add real-time visitor-ID & intent scoring
Advanced visitor identification tools deliver dramatic improvements in conversion metrics by revealing previously anonymous traffic. Real-time identification of high-intent visitors through buyer intent scoring allows for highly targeted follow-up and sales engagement.
RealVNC's implementation demonstrates the potential impact. Over the first 60 days, RealVNC was able to reveal over 17,000 new accounts that matched their ICP. Out of these accounts, 673 accounts (3.9%) were identified as high-intent buyers based on Lift AI's intent scoring model.
From the Lift AI high intent traffic, RealVNC saw a 216% increase in individual buyer sales efficiency, allowing them to close more deals through their automated sales processes. This dramatic improvement came from combining visitor identification with behavioral scoring.
Cove Smart recognized 60% more returning shoppers and increased revenue recovery flow sales by $70,000 using CustomersAI's accurate website visitor identification. The ability to identify and score visitors in real-time transforms anonymous traffic into actionable leads.
Can TLS side-channels expose individual prompts?
Recent academic research reveals sophisticated attacks that can reconstruct user queries from encrypted LLM traffic, raising serious privacy concerns for both users and organizations.
This paper introduces Whisper Leak, a side-channel attack that infers user prompt topics from encrypted LLM traffic by analyzing packet size and timing patterns in streaming responses. Despite TLS encryption protecting content, these metadata patterns leak sufficient information to enable topic classification.
The attack's effectiveness is alarming. Researchers demonstrated near-perfect classification across 28 popular LLMs from major providers, achieving often >98% AUPRC and high precision even at extreme class imbalance (10,000:1 noise-to-target ratio). For many models, they achieved 100% precision in identifying sensitive topics like "money laundering" while recovering 5-20% of target conversations.
FlowTracker achieves the best attacking effect by using a hierarchical correlation decision algorithm to accurately associate users with their visited Internet service. The model addresses network noise through stacked auto-encoders and contrastive learning, amplifying differences between similar but unrelated flows.
Which AI-visibility platforms actually surface ChatGPT traffic?
The market for AI visibility tools has exploded, with distinct approaches to tracking and attribution emerging across different platforms. Each tool offers unique strengths depending on your specific tracking needs and budget.
Platform | Core strength | Key metric |
|---|---|---|
Lift AI | Buyer-intent scoring | 216% sales-efficiency lift |
Customers.AI | Visitor identification | 4.5x more revenue vs competitors |
Various tools | Lead scoring improvements | 25% to 215% conversion increases |
Real-time identification platforms like Lift AI focus on buyer intent scoring, allowing highly targeted follow-up. Their integration with existing marketing stacks delivers measurable ROI improvements. RealVNC saw 71% boost in conversion to sale via HubSpot Chat and 94% increase in CTR from targeted Unbounce pop-ups.
Customers.AI contacts drove 4.5x more revenue and 11x higher engagement and click rates than Opensend, clearly based on the superior performance of Customers.AI data. The platform's accuracy in identifying returning visitors translates directly to revenue impact.
Real-world results show conversion rate increases ranging from 25% to an astounding 215%, with businesses across industries leveraging lead scoring AI to transform their sales processes. Grammarly achieved an 80% increase in conversions for upgraded plans, while HES FinTech gave out 40% more loans each week after implementing AI lead scoring.
Key takeaway: When evaluating AI visibility platforms, prioritize those that combine accurate visitor identification with behavioral scoring and seamless CRM integration for maximum ROI impact.

De-anonymization vs GDPR: where is the red line?
The tension between leveraging AI traffic insights and respecting privacy regulations requires careful navigation of evolving regulatory frameworks.
The European Data Protection Supervisor today published its revised and updated guidelines on the use of generative Artificial Intelligence and processing of personal data by EU institutions, bodies, offices, and agencies, reflecting the fast-moving technological landscape and the evolving challenges posed by generative AI systems.
The U.K. Information Commissioner's Office published guidance on the relationship between artificial intelligence and data protection. The guidance aims to provide recommendations on best practices and technical measures that organizations can use to limit data protection risks associated with AI deployments.
The revised text clarifies the definition of generative AI, reflects technological progress and incorporates feedback from EUIs to make recommendations more practical and easier to apply under Regulation (EU) 2018/1725. It also introduces a new compliance checklist to support consistent implementation across institutions.
Wojciech Wiewiórowski, European Data Protection Supervisor, said: "Artificial intelligence is an extension of human ingenuity, and the rules governing it must evolve just as dynamically. This first revision of our Orientations is more than an update; it's a reaffirmation of our dual mission: enabling human-centric innovation within the EU while rigorously safeguarding individual's personal data."
Key takeaways for growth, security and compliance
The convergence of AI search growth and tracking challenges creates both opportunities and responsibilities for modern growth teams. Organizations must balance the competitive advantage of deanonymizing AI traffic with ethical and legal guardrails.
Use curl -v "https://api.openai.com/v1/models" The command looks at the X-Ratelimit-Remaining field in the response header and indicates that the IP is in the watch list when the value is consistently 0 and there is no error indication. This technical detail exemplifies the granular monitoring required to maintain effective AI traffic tracking.
To successfully deanonymize AI traffic while maintaining compliance:
Layer your tracking approach: Combine GA4 custom channels with server-side logging and visitor ID scripts for comprehensive coverage
Focus on high-intent signals: Use behavioral scoring to prioritize genuinely interested visitors rather than casting a wide net
Maintain transparency: Ensure your privacy policy clearly explains data collection practices, especially when using advanced identification tools
Monitor regulatory changes: Stay current with GDPR interpretations and regional privacy laws as they evolve to address AI-specific concerns
Test attribution accuracy: Regular audits comparing different tracking methods help identify gaps in your AI traffic attribution
The ability to identify and engage AI-driven visitors represents a significant competitive advantage. Companies implementing these tracking methods report dramatic improvements, from 2x increases in sales efficiency to 215% growth in qualified leads. However, success requires thoughtful implementation that respects user privacy while maximizing visibility into this rapidly growing traffic source.
For teams ready to capture the full value of AI search traffic, Relixir offers an end-to-end platform that combines comprehensive monitoring across all major AI search engines with advanced visitor identification capabilities. The platform's visitor ID script reveals up to 3x more person-level IDs and 40% higher company-level identification, transforming anonymous AI traffic into actionable leads while maintaining strict compliance with privacy regulations.
Frequently Asked Questions
Why is AI-driven traffic often invisible in analytics?
AI-driven traffic is often invisible in analytics because AI platforms like ChatGPT strip referrer data through JavaScript hand-offs and HTTPS defaults, preventing standard analytics tools from seeing the traffic source.
How can I track AI traffic in Google Analytics 4?
To track AI traffic in Google Analytics 4, you can create custom channel groups and use regex patterns to capture AI referral sources. This involves configuring your channel settings to include patterns like .gpt. or .chatgpt..
What are some methods to deanonymize AI traffic?
Methods to deanonymize AI traffic include using server access logs for complete HTTP request records, implementing visitor identification scripts, and configuring GA4 with regex rules to capture AI-driven traffic accurately.
How does Relixir help with AI traffic tracking?
Relixir offers an end-to-end platform that combines comprehensive monitoring across AI search engines with advanced visitor identification capabilities, revealing up to 3x more person-level IDs and 40% higher company-level identification.
What are the privacy concerns with deanonymizing AI traffic?
Deanonymizing AI traffic raises privacy concerns, especially regarding compliance with regulations like GDPR. It's crucial to maintain transparency in data collection practices and stay updated with evolving privacy laws.
Sources
https://www.peasy.so/blog/the-ai-traffic-blind-spot-in-google-analytics-data
https://www.atomicagi.com/blog/best-ai-search-tracking-tools
https://www.seerinteractive.com/insights/are-ai-sites-like-chatgpt-sending-your-website-traffic
[https://www.searchengineworld.com/searchgpt-and-chatgpt-referral-tracking-what-seos-and-webmasters-need-to-know?utm_source=brevo&utm_campaign=RnD Newsletter 2025 S09 384&utm_medium=email](https://www.searchengineworld.com/searchgpt-and-chatgpt-referral-tracking-what-seos-and-webmasters-need-to-know?utm_source=brevo&utm_campaign=RnD Newsletter 2025 S09 384&utm_medium=email)
https://clicktrust.be/blog/analytics/how-to-track-ai-traffic-in-ga4-best-practices-for-2025/
https://browsermedia.agency/blog/how-to-track-ai-traffic-google-analytics-4-ga4/
https://www.sciencedirect.com/science/article/abs/pii/S0167404822004102
https://www.betterbusiness.app/blog/ai-lead-scoring-success-stories
https://iapp.org/media/pdf/resource_center/ico-guidance-on-ai-and-data-protection.pdf
https://dataprotection.news/edps-updates-guidance-on-generative-ai-for-eu-institutions/


