GitHub Copilot Will Train on Your Code Unless You Opt Out
Starting April 24, GitHub Copilot will train AI models on your code interactions by default. Free, Pro, and Pro+ users can opt out. Business and Enterprise users are unaffected. Here's what's changing and what you should do about it.
TL;DR
- Starting April 24, GitHub will train AI models on interaction data from Copilot Free, Pro, and Pro+ users by default
- Business and Enterprise users are unaffected — their data stays private
- You can opt out in settings, and previous opt-out preferences are preserved
- This includes your accepted code, inputs, cursor context, file structure, and feedback ratings
The Big Picture
GitHub just flipped the default. Starting April 24, 2025, if you're using Copilot Free, Pro, or Pro+, your interaction data — inputs, outputs, code snippets, cursor context, the works — will be used to train GitHub's AI models unless you explicitly opt out. This isn't a new capability. It's a policy change that makes data collection the default instead of opt-in.
The justification is straightforward: real-world developer data makes models better. GitHub tested this internally with Microsoft employees and saw measurable improvements in acceptance rates across multiple languages. Now they want to expand that dataset to millions of individual developers. The trade-off is clear: contribute your coding patterns to improve the tool for everyone, or keep your workflow private and still get the same features you have today.
If you previously opted out of data collection for product improvements, your preference carries over. You're still opted out. But if you never touched that setting, you're now opted in by default. That's the shift.
How It Works
GitHub is collecting interaction data, not repository snapshots. There's a critical distinction here. Your private repositories "at rest" — the code sitting in your repos when you're not actively using Copilot — won't be scraped for training. But the moment you open a file and Copilot starts suggesting code, that interaction becomes fair game unless you opt out.
Here's what gets collected when you're opted in:
- Code outputs you accept or modify after Copilot suggests them
- Inputs you send to Copilot, including code snippets shown to the model
- Code context around your cursor position — the surrounding lines that inform suggestions
- Comments and documentation you write in your files
- File names, repository structure, and how you navigate your codebase
- How you interact with Copilot features: chat, inline suggestions, code review
- Feedback signals like thumbs up or thumbs down on suggestions
This data gets used to train GitHub's models. It may be shared with Microsoft and other GitHub affiliates, but GitHub explicitly states it won't go to third-party AI providers or independent service providers. That's a meaningful boundary, especially compared to some competitors who train on user data and then license those models to external partners.
The technical mechanism is straightforward. Copilot already processes this data to generate suggestions in real time. The change is that now, unless you opt out, that processed data gets retained and fed back into model training pipelines. GitHub's internal testing with Microsoft employees showed this approach works. Acceptance rates improved. The model got better at understanding real-world development patterns, catching bugs earlier, and suggesting more contextually relevant code.
If you're on Copilot Business or Enterprise, none of this applies to you. Those tiers have always had stricter data isolation guarantees, and that hasn't changed. Your interaction data stays private. This policy shift only affects individual developers on Free, Pro, and Pro+ plans.
What This Changes For Developers
The immediate impact is a decision you need to make: opt in or opt out. If you do nothing, you're opted in. Your coding patterns, the suggestions you accept, the context around your cursor — all of it becomes training data. For some developers, that's fine. You're already using a tool built on open-source code and public repositories. Contributing your interaction data to improve the model feels like a reasonable trade.
For others, this crosses a line. Maybe you work on proprietary algorithms. Maybe you're prototyping a product that isn't public yet. Maybe you just don't want your coding habits analyzed and fed into a model that other developers will benefit from. The opt-out exists for a reason, and GitHub has preserved previous opt-out preferences, which is the right move.
The practical workflow change is minimal if you opt out. You still get the same Copilot features. Suggestions still work. Chat still works. The difference is your data doesn't loop back into training. You're a consumer of the model, not a contributor to it.
If you opt in, the long-term benefit is theoretically better suggestions. GitHub's internal data shows this works. Models trained on real-world interaction data perform better than models trained only on public code and synthetic examples. But the improvement is incremental, not transformative. You're not going to see a night-and-day difference in suggestion quality because you opted in. You're contributing to a gradual improvement across the entire user base.
There's also a competitive angle here. GitHub's Applied Science team has been vocal about using agent-driven development internally, and this data collection policy accelerates that feedback loop. The more real-world interaction data they collect, the faster they can iterate on model improvements, which widens the gap between Copilot and competitors who don't have access to this scale of developer interaction data.
Try It Yourself
Check your current opt-out status and change it if needed:
- Go to github.com/settings/copilot
- Scroll to the "Privacy" section
- Look for the setting about allowing GitHub to use your data for model training
- Toggle it off if you want to opt out, or leave it on if you're comfortable contributing your interaction data
If you previously opted out, the setting should already be off. GitHub claims they've preserved those preferences, but it's worth verifying. If you've never touched this setting before, it will default to on starting April 24.
The Bottom Line
Opt out if you work on proprietary code, unreleased products, or anything you don't want feeding into a model that other developers will use. Opt out if you're uncomfortable with Microsoft affiliates having access to your interaction data, even if it's not shared with third parties. Opt out if you just don't like the idea of being a data source for a product you're already paying for.
Stay opted in if you're working on open-source projects, learning to code, or building side projects where data privacy isn't a concern. Stay opted in if you want to contribute to incremental model improvements and you trust GitHub's data handling policies. Stay opted in if you're already comfortable with the fact that Copilot processes your code in real time to generate suggestions.
The real risk here isn't that GitHub will leak your code or sell it to third parties. The policy explicitly prohibits that, and GitHub has invested heavily in security infrastructure to protect user data. The risk is that your coding patterns, your problem-solving approaches, and your workflow habits become part of a model that other developers benefit from without reciprocal contribution. If that bothers you, opt out. If it doesn't, the default is fine.
Source: GitHub Blog