>behavioral and account-level signals, including how long an account has existed, typical times of day when someone is active, usage patterns over time, and a user’s stated age.
Surely they're using "history of the user-inputted chat" as a signal and just choosing not to highlight that? Because that would make it so much easier to predict age.
Anyone remember the game Leisure Suit Larry? To get the full 18+ experience, you had to answer five trivia questions that only adults should know. But it turns out smart teens who like trivia knew most of them too (and you could just ask mom and dad, they had no clue why you were asking which President appeared on Laugh In).
Also, hilariously, a lot of those questions require a trip to Wikipedia (or a game guide) today. A lot of them reference bits of 1960s/1970s pop culture which are no longer common knowledge.
Last time I checked, most invasive analytics platforms do this by default as soon as you integrate their libraries. Product managers are very hype-driven, and usually the reason stuff like that gets integrated in the first place.
I think it's more common than not for the large platforms, to try to log everything that is happening + log stuff that isn't even happening.
I don't really know but I don't think most people know it.
I have had passwords accidentally be pasted into chatgpt if I were using my bitwarden password manager sometimes and then had them be removed and I thought I was okay
It is scary that I am pretty familiar with tech and I knew it was possible but I thought that for privacy they wouldn't. I feel like the general public might be even more oblivious.
Also a quick question but how long are the logs kept in OpenAI? And are the logs still taken even if you are in private mode?
> I don't really know but I don't think most people know it.
That's for sure, most people don't know how much they're being tracked, even if we consider only inside the platform. Nowadays, lots of platforms literally log your mouse movements inside the page, so they can see exactly where you first landed, how you moved around on the page, where you navigated, how long you paused for, and much much more. Basically, if it can be logged and re-constructed, it will be.
> Also a quick question but how long are the logs kept in OpenAI? And are the logs still taken even if you are in private mode?
As far as I know right now, OpenAI is under legal obligation to log all of their ChatGPT chats, regardless of their own policies, but this was a while ago (this summer sometime?), maybe it's different today.
What exactly you mean with "private mode"? If you mean "incognito/private window" in your browser, it has basically no impact on how much is logged by the platforms themselves, it's all about your local history.
For the "temporary mode" in ChatGPT, I also think it has no impact on how much they log, it's just about not making that particular chat visible in your chat history, and them not using that data for training their model. Besides that, all the tracking in your browser still works the same way, AFAIK.
I was referring to temporary mode when I was saying (but I also considered private window to be much safe as well but wow looks like they log literally everything)
So out of all providers, gemini,claude,openAI,grok and others? Do they all log everything permanently?
If they are logging everything, what prevents their logs from getting leaked or "accidentally" being used in training data?
> As far as I know right now, OpenAI is under legal obligation to log all of their ChatGPT chats, regardless of their own policies, but this was a while ago (this summer sometime?), maybe it's different today.
I also remember this post and from the current political environment, that's kind of crazy.
Also some of these services require a phone number one way or other and most likely there is a way the phone number can somehow be linked to logs, then since phone numbers are released by govt., usually chances are that if threat actors want data on large & OpenAI contributes to them, a very good profile of a person can be built if they use such services... Wild.
So if OpenAI"s under legal obligation, is there a limit for how long to keep the logs or are they gonna keep it permanently? I am gonna look for the old article from HN right now but if the answer is permanently, then its even more dystopian than I imagined.
The mouse sharing ability is wild too. I might use librewolf at this point to prevent some of such tracking
Also what are your thoughts on the new anonymous providers like confer.to (by signal creator), venice.ai etc.? (maybe some openrouter providers?)
You can safely assume (and probably better you do regardless) that everyone on the internet is logging and slurping up as much data as they can about their users. Their product teams usually is the one who is using the data, but depending on the amount of controls in the company, could be that most of it sits in a database both engineering, marketing and product team has access to.
> If they are logging everything, what prevents their logs from getting leaked or "accidentally" being used in training data?
The "tracking data" is different from "chat data", the tracking data is usually collected for the product team to make decisions with, and automatically collected in the frontend and backend based on various methods.
The "chat data" is something that they'd keep more secret and guarded typically, probably random engineers won't be able to just access this data, although seniors in the infrastructure team typically would be able to.
As for easy or not that data could slip into training data, I'm not sure, but I'd expect just the fear of big name's suing them could be enough for them to be really careful with it. I guess that's my hope at least.
I don't know any specific "how long they keep logs" or anything like that, but what I do know, is that typically you try to sit on your data for as long as you can, because you always end up finding new uses for it in the future. Maybe you wanna compare how users used the platform in 2022 vs 2033, and then you'd be glad, so unless the company has some explicit public policy about it, assume they sit on it "forever".
> Also what are your thoughts on the new anonymous providers like confer.to (by signal creator), venice.ai etc.? (maybe some openrouter providers?)
Haven't heard about any of them :/ This summer I took it one step further and got myself the beefiest GPU I could reasonably get (for unrelated purposes) and started using local models for everything I do with LLMs.
> I don't know any specific "how long they keep logs" or anything like that, but what I do know, is that typically you try to sit on your data for as long as you can, because you always end up finding new uses for it in the future. Maybe you wanna compare how users used the platform in 2022 vs 2033, and then you'd be glad, so unless the company has some explicit public policy about it, assume they sit on it "forever".
I am gonna assume in this case that the answer is forever.
I actually looked at kagi assistant for the purposes of this as someone mentioned and created a free kagi account but looks like that they are using AI models api themselves and the logs which come with that. Wouldn't consider it the most private (although like bedrock and aws says that they provide logs for 30 days but still :/ I feel like there is still a genuine issue )
I don't want to buy a gpu for my use case too though being honest :/
Either I am personally liking the proton lumo models or confer.to (I can't use confer.to on my mac for some reason so proton lumo it is)
I am probably gonna be right on proton lumo + kagi assistant/z.ai (with GLM 4.7 which is crazy good model)
I am really gpu poor (just got a simple mac air m1) but I ran some liquidFM model iirc and it was good for some extremely basic tasks but it fumbled at when I asked it the capital of bhutan just out of curiosity
Surely they're using "history of the user-inputted chat" as a signal and just choosing not to highlight that? Because that would make it so much easier to predict age.