As long as not ALL the data the agent hat access too is checked against the rights of the current user placing the request, there WILL be ways to leak data. This means Vector databases, Search Indexes or fancy "AI Search Databases" would be required on a per user basis or track the access rights along with the content, which is infeasible and does not scale.
And as access rights are complex and can change at any given moment, that would still be prone to race conditions.
I don't understand why you think tracking user access rights would be infeasible and would not scale. There is a query. You search for matching documents in your vector database / index. Once you have found the potentially relevant list of documents you check which ones can the current user access. You only pass the ones over to the LLM which the user can see.
This is very similar to how banks provide phone based services. The operator on the other side of the line can only see your account details once you have authenticated yourself. They can't accidentally tell you someone else's account balance, because they themselves don't have access to it unless they typed in all the information you provide them to authenticate yourself. You can't trick the operator to provide you with someone else's account balance because they can't see the account balance of anyone without authenticating first.
A basic implementation will return the top, let's say 1000, documents and then do the more expensive access check on each of them. Most of the time, you've now eliminated all of your search results.
Your search must be access aware to do a reasonable job of pre-filtering the content to documents the user has access to, at which point you then can apply post-filtering with the "100% sure" access check.
@pxt.query def search_documents(query_text: str, user_id: str): sim = chunks.text.similarity(query_text) return ( chunks.where( (chunks.user_id == user_id) # Metadata filtering & (sim > 0.5) # Filter by similarity threshold & (pxt_str.len(chunks.text) > 30) # Additional filter/transformation ) .order_by(sim, asc=False) .select( chunks.text, source_doc=chunks.document, # Ref to the original document sim=sim, title=chunks.title, heading=chunks.heading, page_number=chunks.page ) .limit(20) )
For instance in https://github.com/pixeltable/pixeltable
Vector Databases intended for this purpose filter this way by default for exactly this reason. It doesn't matter how many documents are in the master index, it could be 100000 or 100000000,doesn't matter. Once you filter down to the 10 that your user is allowed to see, it takes the same tenth of a second or whatever to whip up a new bespoke index just for them for this query.
Pre-search filtering is only a problem when your filter captures a large portion of the original corpus, which is rare. How often are you querying "all documents that Joe Schmoe isn't allowed to view"?
Index your ACLs, index your users, index your docs. Your database can handle it.
I've seen a list of what was supposed to be 20 items of something, it only showed 2, plus a comment "18 results were omitted to insufficient permissions".
(Servicenow has at least three different ways to do permissions, I don't know if this applies to all of them).
But yes, one could probably also construct a series of queries that reveal properties of hidden objects.
If the docs were indexed by groups/roles and you had some form of RBAC then this wouldn't happen.
Right, but this compare this to the original proposal:
> A basic implementation will return the top, let's say 1000, documents and then do the more expensive access check on each of them
Using an index is much better than that.
And it should be possible to update the index without a substantial cost, since most of the 100000 documents likely aren't changing their role access very often. You only have to reindex a document's metadata when that changes.
This is also far less costly than updating the actual content index (the vector embeddings) when the document content changes, which you have to do regardless of your permissions model.
If you use your index to get search results, then you will have a mix of roles that you then have to filter.
If you want to filter first, then you need to make a whole new search index from scratch with the documents that came out of the filter.
You can't use the same indexing information from the full corpus to search a subset, your classical search will have undefined IDF terms and your vector search will find empty clusters.
If you want quality search results and a filter, you have to commit to reindexing your data live at query time after the filter step and before the search step.
I don't think Elastic supports this (last time I used it it was being managed in a bizarre way, so I may be wrong). Azure AI Search does this by default. I don't know about others.
It's a separate index.
You store document access rules in the metadata. These metadata fields can be indexed and then use as a pre-filter before the vector search.
> I don't think Elastic supports this
https://www.elastic.co/docs/solutions/search/vector/knn#knn-...
Sometimes the potentially relevant list of documents itself is a leak all by itself.
Depending on the context it could be relevant.
If Joe's search is faster than Sally's because Sally has higher permissions, that's hardly a revelation.
But I guess they want something like training the chatbot as a LLM once with all the confidential data - and then indeed you could never separate it again.
Searching the whole index and then filtering is possible, but infeasible for large indexes where a specific user only has access to a few docs. And for diverse data sources (as we want to access), this would be really slow, many systems would need to be checked.
So, access rights should be part of the index. In that case, we are just storing a copy of the access rights, so this is prone to races. Besides that, we have multiple systems with different authorization systems, groups, roles, whatever. To homogenize this, we would need to store the info down to each individual user. Besides this, not all systems even support asking which users have access to resource Y, they only allow to ask „has X access to Y“.
Allow me to try to inject my understanding of how these agents work vs regular applications.
A regular SaaS will have an API endpoint that has permissions attached. Before the endpoint processes anything, the user making the request has their permissions checked against the endpoint itself. Once this request succeeds, anything that endpoint collects is considered "ok" ship to the user.
AI Agents, instead, directly access the database, completely bypassing this layer. That means you need to embed the access permissions into the individual rows, rather than at the URL/API layer. It's much more complex as a result.
For your bank analogy: they actually work in a similar way to how I described above. A temporary access is granted to the resources but, once it's granted, any data included in those screens is assumed to be ok. They won't see something like a blank box somewhere because there's info they're not supposed to see.
DISCLAIMER: I'm making an assumption on how these AI Agents work, I could be wrong.
If so, then as the wise man says: "well, there‘s your problem!"
I don't doubt there are implementations like that out there, but we should not judge the potential of a technology by the mistakes of the most boneheaded implementation.
Doing the same in the bank analogy would be like giving root SQL access to the phone operators and then asking them pretty please to be careful with it.
Of course, I wouldn't defend this! To be clear, it's not possible to know how every AI Agent works, I just go off what I've seen when a company promises to unlock analytics insights on your data: usually by plugging directly into the Prod DB and having your data analysts complain whenever the engineers change the schema.
> we should not judge the potential of a technology by the mistakes of the most boneheaded implementation.
I agree.
That's what the bank agent analogy was meant to tell you. The agent has a direct line to the prod DB through their computer terminal, but every session they open is automatically constrained to the account details if the person on the phone right now and nobody else.
It depends on how it's plugged-in. If you just hand it a connection and query access then what exactly stops it? In a lot of SaaS systems, there's only the "application" user, which is restricted via queries within the API.
You can create a user in the DB per user of your application but this isn't free. Now you have the operational problem of managing your permissions, not via application logic, but subject to the rules and restrictions of your DBMS.
You can also create your own API layer on top, however this also comes with constraints of your API and adding protections on your query language.
None of this is impossible but, given what I've seen happen in the data analytics space, I can tell you that I know which option business leaders opt for.
I don't understand the desire - borderline need - of folks on HN to just make stuff up. That is likely why you're being downvoted. I know we all love to do stuff "frOM fIRsT PRiNcIPlEs" around here but "let me just imagine how I think AI agents work then pass that off as truth" is taking it a bit far IMO.
This is the human equivalent of an AI hallucination. You are just making stuff up, passing it off as truth ("injecting your understanding"), then adding a one-line throwaway "this might be completely wrong lol" at the end.
Hacker News is addictive. This forum is designed to reward engagement with imaginary internet points and that operant conditioning works just as well here as everywhere else.
(To your point, >15 of them would have had different answers and the majority would have been materially wrong, but still.)
> AI Agents, instead, directly access the database
However, I don't think I'd be too far off the mark given many systems work like this (analytics tools typically hook into your DB, to the chagrin of many an SRE/DevOps) and it's usually marketed as the easy solution. Also, I've since read a few comments and it appears I'm pretty fucking close: the agents here read a search index, so pretty tightly hooked into a DB system.
Everything else, I know I'm right (I've built plenty of systems like this), and someone was making a point that permissions access does scale. I pointed out that it appears to scale because of the way they're designed.
I'd say most of my comment is substantively correct, with a disclaimer on an (important) point, where I'd be happy to be corrected.
> I'd say most of my comment is substantively correct, with a disclaimer on an (important) point, where I'd be happy to be corrected.
I read this and feel that you still want imaginary internet points for something that is, at best, directionally correct. To me it seems your desire for internet points urged you to post a statement and not a question. I imagine most of HN is just statements that are overconfident bluster by only directionally correct statements which create the cacophony of this site.
That doesn’t solve the problem of changing schemas causing issues for your data team at all. Something I see regularly. If you setup an AI Agent the same way you still give it full access, so you still haven’t fixed the problem at hand.
> I read this and feel that you still want imaginary internet points for something that is, at best, directionally correct.
And you’ve yet to substantiate your objection to what I posited (alongside everyone else), so instead you continue to talk about something unrelated in the hope of… what, exactly?
This is captured in the OWASP LLM Top 10 "LLM02:2025 Sensitive Information Disclosure" risk: https://genai.owasp.org/llmrisk/llm022025-sensitive-informat... although in some cases the "LLM06:2025 Excessive Agency" risk is also applicable.
I believe that some enterprise RAG solutions create a per user index to solve this problem when there are lots of complex ACLs involved. How vendors manage this problem is an important question to ask when analyzing RAG solutions.
At my current company at least we call this "権限混同" in Japanese - Literally "authorization confusion" which I think is a more fun name
Sometimes hard to avoid though, like our firehose analyzers :(
When you hit such a wall, you might not be failing to communicate, nor them failing to understand. In reality, said executives have probably chosen to ignore the issue, but also don't want to take accountability for the eventual leaks. So "not understanding" is the easiest way to blame the engineers later.
In their dream world the engineers would not know about it either.
Edit: Maybe we should call this style vibe management. :D
I wish I had a way of ensuring culpability remains with the human who published the text, regardless of who/what authored it.
tools are fine to use, personal responsability is still required. Companies already fuck up with this too much
Cc Legal/Compliance could do wonders to their capacity to understand the problem. Caveat, of course, that the execs might be pissed off that some peon is placing roadblocks in the way of their buzzword-happy plan.
Citation needed.
Most enterprise (homegrown or not) search engine products have to do this, and have been able to do it effectively at scale, for decades at this point.
This is a very well known and well-solved problem, and the solutions are very directly applicable to the products you list.
It is, as they say, a simple matter of implementation - if they don't offer it, it's because they haven't had the engineering time and/or customer need to do it.
Not because it doesn't scale.
It's absolutely a hard problem and it isn't well solved
But the reply i made was to " This means Vector databases, Search Indexes or fancy "AI Search Databases" would be required on a per user basis or track the access rights along with the content, which is infeasible and does not scale."
IE information retrieval.
Access control in information retrieval is a very well studied.
Making search engines, etc that effectively confirm user access to each possible record is feasible and common (They don't do it exactly this way but the result is the same), and scalable.
Hell, we even known how to do private information retrieval with access control in scalable ways.
PIR = the server does not know what the query was, or the result was, but still retrieves the result.
So we know how to make it so not only does the server does not know what was queried or retrieved by a user, but each querying user still only can access records they are allowed to.
Overhead of this, which is much harder than non-private information retrieval with access control, is only 2-3x in computation. See, e.g., https://dspace.mit.edu/handle/1721.1/151392 for one example of such a system. There are others.
So even if your 2ms retrieval latency was all CPU and 0 I/O, it would only become 4-6ms do to this.
If you remove the PIR part, as i said, it's much easier, and the overhead is much much less, since it doesn't involve tons and tons of computationally expensive encryption primitives (though some schemes still involve some).
In this way you exclude up-front the documents that the current user cannot see.
Of course, this requires you to update the vector metadata any time the permissions change at the document level (e.g. a given document originally visible only to HR is now also visibile to executives -> you need to add the principal executives to the metadata of the vector resulting from the document in your vector database)
I don't just mean this as lazy cynicism; executives don't really want to understand things. It doesn't suit their goals. They're not really in the business of strictly understanding things. They're in the business of "achieving success." And, in their world, a lot of success is really just the perception of success. Success and the perception of success are pretty interchangeable in their eyes, and they often feel that a lot of engineering concerns should really be dismissed unless those concerns are truly catastrophic.
Grizzled sysadmin here, and this is accurate. Classic case of "Hey boss, I need budget for server replacements, this hardware is going to fail." Declined. few months later, fails. Boss: "Why did you allow this to happen, what am I even paying you for?"
1. Why is tracking access rights "on a per user basis or [...] along with the content" is not feasible? A few mentions: Google Zanzibar (+Ory Keto as OSS impl) - makes authz for content othoronal to apps (i.e. possible to have it in one place, s.t. both Jira and a Jira MCP server can use the same API to check authz - possible to have a 100% faithful authz logic in the MCP server), Eclipse Biscuit (as far as I understand, this is a Dassault's attempt to make JWTs on steroids by adding Datalog and attenuation to the tokens, going in the Zanzibar direction but not requiring a network call for every single check), Apache Accumulo (DBMS with a cell-level security) and others. The way I see it, the tech is there but so far, not enough attention has been put on the problem of a high-fidelity authz throughout the enterprise on a granular level.
2. What is the scale needed? Enterprises with more than 10000 employees are quite rare, many individual internal IT systems even in large companies have less than 100 regular users. At these levels of scale, a lot more approaches are feasible that would not be considered possible at Google scale (i.e. more expensive algorithms w.r.t. big-O are viable).
There is no feasible way to track that during training (at least yet), so only current solution would be to learn AI agent only on data use can access and that is costly
The vector db definitely has to do some heavy lifting intersecting the say acl_id normal index with the nearest neighbors search but they do support it.
This is the way. This is also a solved problem. We solved it for desktop, web, mobile. Chatbots are just another untrusted frontend and should follow the same patterning to mitigate risks. I.E. do not trust inputs, use the same auth patterns you would for anything else (oauth, ect.).
It is solved and not new.
Knowledge should be properly grouped and have rights on database, documents, and chatbot managed by groups. For instance specific user can use the Engineering chatbot but not the Finance one. If you fail to define these groups, feels like you don't have a solid strategy. In the end, if that's what they want, let them experience open knowledge.
You should see our Engineering knowledge base before saying an AI would be useless.
That can’t be right, can it?
Really Microsoft should be auditing the search that copilot executes, its actually a bit misleading to be auditing the file as accessed when copilot has only read the indexed content of the file, I don't say I've visited a website when I've found a result of it in Google
Then someone discovered production passwords on a site that was supposed to be secured but wasn’t.
Found such things in several places.
The solution was to make searching work only if you opted-in your website.
After that internal search was effectively broken and useless.
All because a few actors did not think about or care about proper authentication and authorization controls.
If you have private documents, you can't let a public search engine index and show previews of those private documents. Even if you add an authentication wall for normal users if they try to open the document directly. They could still see part of the document in google's preview.
My explanation sounds silly because surely nobody is that dumb, but this is exactly what they have done. They gave access to ALL documents, both public and private, to an AI, and then got surprised when the AI leaked some private document details. They thought they were safe because users would be faced with an authentication wall if they tried to open the document directly. But that doesn't help if copilot simply tells you all the secret in it's own words.
Not my domain of expertise, but couldn't you at some point argue that the indexed content itself is an auditable file?
It's not literally a file necessarily, but if they contain enough information that they can be considered sensitive, then where is the significant difference?
I mean, it depends on how large the index window is, because if google returned the entire webpage content without leaving (amp moment), you did visit the website. fine line.
I'm a fan of FHIR (a healthcare api standard, but far from widely adopted), and they have a secondary set of definitions for Audit log patterns (BALP) that recommends this kind of behaviour. https://profiles.ihe.net/ITI/BALP/StructureDefinition-IHE.Ba...
"[Given a query for patients,] When multiple patient results are returned, one AuditEvent is created for every Patient identified in the resulting search set. Note this is true when the search set bundle includes any number of resources that collectively reference multiple Patients."
Or just a system prompt "log where all the info comes from"...
Don't train a model on sensitive info, if there will ever be a need for authZ more granular than implied by access to that model. IOW, given a user's ability to interact w/ a model, assume that everything it was trained on is visible to that user.
I don't believe it's integrated with (any bypass of) auditing but the same "ignore permissions" capability exists on Linux as CAP_DAC_READ_SEARCH and is primarily useful for the same sort of tasks.
... Or ... a very long-time ago, when SharePoint search would display results and synopsis's for search terms where a user couldn't open the document, but could see that it existed and could get a matching paragraph or two... Best example I would tell people of the problem was users searching for things like: "Fall 2025 layoffs"... if the document existed, then things were being planned...
Ah Microsoft, security-last is still the thing, eh?
I talked to some Microsoft folks around the Windows Server 2025 launch, where they claimed they would be breaking more compatibility in the name of their Secure Future Initiative.
But Server 2025 will load malicious ads on the Edge start screen[1] if you need to access a web interface of an internal thing from your domain controller, and they gleefully announced including winget, a wondeful malware delivery tool with zero vetting or accountability in Server 2025.
Their response to both points was I could disable those if I wanted to. Which I can, but was definitely not the point. You can make a secure environment based on Microsoft technologies, but it will fight you every step of the way.
[1] As a fun fact, this actually makes Internet Explorer a drastically safer browser than Edge on servers! By default, IE's ESC mode on servers basically refused to load any outside websites.
Also you probably have to go up 10 levels of management before you reach a common person.
100% agreed on the Edge-front page showing up on server machines being nasty though, server deployments should always have an empty page as the default for browsers (Always a heart-burn when you're trying to debug issues some newly installed webapp and that awful "news" frontpage pops up).
winget has none of that. winget is run by one Microsoft dude who when pressed about reviewing submissions gave some random GitHub users who have not been vetted moderator powers. There is no criteria for inclusion, if you can pack it and get it by the automated scanner, it ships. And anyone can submit changes to any winget package: They built a feature to let a developer restrict a package be only updated by a trusted user but never implemented it. (Doing so requires a "business process" but being a one-man sideshow that winget is, setting that up is beyond Microsoft's ability.)
winget is a complete joke that no professional could stand for if they understand how amateur hour it is, and the fact it is now baked into every Windows install is absolutely embarrassing. But I bet shipping it got that Microsoft engineer a promotion!
Also, in Edge the new tab page is loaded from MS servers, even if you disable all the optional stuff. It looks like something local (it doesn't have a visible url) but this is misleading. If you kill your internet connection you get a different, simpler new tab page.
The Edge UI doesn't let you pick a different new tab page but you can change it using group policy.
If you've elected to create a Frankenstein of a domain controller and a desktop/gaming PC and are using it to browse any websites, all consequences are entirely on you.
When installing Windows Server, there is a "core" experience and a "desktop" experience option. The former is now the default, but nearly all enterprise software not made by Microsoft (and some that is made by Microsoft) require the latter. Including many tools which expect to run on domain controllers! Some software says it requires the GUI but you can trick into running without if you're clever and adventurous.
No GUI is definitely the future and the way to go when you can, but even the most aggressive environments with avoiding the GUI end up with a mix of both.
Speaking of a gaming PC, Edge on Windows Server is so badly implemented, I have a server that is CPU pegged from a botched install of "Edge Game Mode" a feature for letting you use Edge in an overlay while gaming. I don't think it should have been auto installed on Windows Server, but I guess those engineers at Microsoft making triple my salary know better!
One of the major issues was we could never properly secure the main page, because of some fuckery. At the main page we'd redirect to the login if you weren't logged in, but that was basically after you'd already gone through the page access validation checks, so when I tried to secure that page you wouldn't be redirected. I can't remember how, or even if I solved this...
[0]: https://www.scottrlarson.com/publications/publication-transi...
Multiply that by years, by changing project managers and endless UX re-writes, huge push for DEI over merit, junior & outsourced-heavy hires and forced promotions, and you end up getting this mess that is "technically" working and correct but no one can quantify the potential loss and lack of real progress that could have been made if actual competent individuals were put in charge.
In the second case, the process has permission to do whatever it wants, it elects to restrain itself. Which is obviously subject to many more bugs then the first approach.
The dude found the bug, reported the bug, they fixed the bug.
This isn’t uncommon, there bugs like this frequently in complex software.
I wouldn't be surprised.
This is the organization that pushed code-signing as their security posture for a decade.
The latter is at least sort of usable for me, while the former is an active hindrance in the sense that it delays the appearance of much-more-useful Intellisense completions.
Having said that, even the agentic chat is not really a win for me at work. It lacks ... something that it needs in order to work on our large C++ codebase. Maybe it needs fine-tuning? Maybe it just needs tools that only pull in relevant bits of context (something like the Visual Studio "peek definition" so that it doesn't context-rot itself with 40 thousand lines of C++)? IDK.
For personal projects Claude Code is really good at the C++ type system, although inclined to bail before actually completing the task it's given.
So I feel like there's potential here.
But as you say, stock Copilot is Not It.
A title like this will get it fixed faster.
Vector embeddings are lossy encodings of documents roughly in the same way a SHA256 hash is a lossy encoding. It's virtually impossible to reverse the embedding vector to recover the original document.
Note: when vectors are combined with other components for search and retrieval, it's trivial to end up with a horribly insecure system, but just vector embeddings are useful by themselves and you said "all useful AI retrieval systems are insecure by design", so I felt it necessary to disagree with that part.
Incorrect. With a hash, I need to have the identical input to know whether it matches. If I'm one bit off, I get no information. Vector embeddings by design will react differently for similar inputs, so if you can reproduce the embedding algorithm then you can know how close you are to the input. It's like a combination lock that tells you how many numbers match so far (and for ones that don't, how close they are).
> It's virtually impossible to reverse the embedding vector to recover the original document.
If you can reproduce the embedding process, it is very possible (with a hot/cold type of search: "you're getting warmer!"). But also, you no longer even need to recover the exact original. You can recover something close enough (and spend more time to make it incrementally closer).
Is this a feature of CVE or of Microsoft's way of using CVE? It would seem this vulnerability would still benefit from having a common ID to be refrenced in various contexts (eg vulnerability research). Maybe there needs to be another numbering system that will enumerate these kinds of cases and doesn't depend on the vendor.
CVE track security incidents/vulnerabilities
just because you can emergency patch it out of band does not make it not an incident
but it falls under a trend of Microsoft acting increasingly negligent/non trusteable when it comes to security, especially when it comes to clear reporting about incidents.
Which when it comes to a provider of fundamental components like an OS or Claude is as important as getting security right.
What was their bug fix? Shadow prompts?
Nothing in this post suggests that they're relying on the LLM itself to append to the audit logs. That would be a preposterous design. It seems far more likely the audit logs are being written by the scaffolding, not by the LLM, but they instrumented the wrong places. (I.e. emitting on a link or maybe a link preview being output, rather than e.g. on the document being fed to the LLM as a result of RAG or a tool call.)
(Writing the audit logs in the scaffolding is probably also the wrong design, but at least it's just a bad design rather than a totally absurd one.)
Copilot is accessing the indexed contents of the file, not the file itself, when you tell it not to access the file.
The blog writer/marketer needs to look at the index access logs.
How can you say this if microsoft is issuing a fix?
I imagine the intended feature is learning about who read some information, and who modified it.
The implementation varies, but on a CRUD app it seems easy: an authenticated GET or PUT request against a file path - easy audit log.
If you are copying information to another place, and make it accessible there in a lossy way that is hard to audit... you broke your auditing system.
Maybe it's useful, maybe it's a trade-off, but is something that should be disclosed.
> The system being referred to in that explanation is Microsoft 365 (M365) / Office 365 audit logging, specifically the Unified Audit Log in the Microsoft Purview Compliance Portal.
It seems like this[1] documentation matches the stuff in TFA
There are several CVE numbering authorities and some of them (including the original MITRE, national CERTs etc), accept submissions from anyone, but there's evaluation and screening. Since Microsoft is their own CNA, most of them probably wouldn't issue a MS CVE without some kind of exceptional reason.
Please only use this for legitimate submissions.
Honestly, the worst thing about this story is that apparently the Copilot LLM is given the instructions to create audit log entries. That’s the worst design I could imagine! When they use an API to access a file or a url then the API should create the audit log. This is just engineering 101.
Including for end user applications, not libraries, another random example: https://msrc.microsoft.com/update-guide/vulnerability/CVE-20...
This is absolutely not true. I have no idea where you came up with this.
> Honestly, the worst thing about this story is that apparently the Copilot LLM is given the instructions to create audit log entries.
That's not at all what the article says.
> That’s the worst design I could imagine!
Ok, well, that's not how they designed it.
> This is just engineering 101.
Where is the class for reading 101?
>This is absolutely not true. I have no idea where you came up with this.
Perhaps they asked Copilot?
Technically, CVEs are meant to only affect one codebase, so a vulnerability in a shared library often means a separate CVE for each affected product. It’s only when there’s no way to use the library without being vulnerable that they’d generally make just one CVE covering all affected products. [1]
Even ignoring all that, people are incorporating Copilot into their development process, which makes it a common dependency.
"The Common Vulnerabilities and Exposures (CVE) Program’s primary purpose is to uniquely identify vulnerabilities and to associate specific versions of code bases (e.g., software and shared libraries) to those vulnerabilities. The use of CVEs ensures that two or more parties can confidently refer to a CVE identifier (ID) when discussing or sharing information about a unique vulnerability" (from https://nvd.nist.gov/vuln)
This Clippy 2.0 wave of apps will obviously be rejected by the market but it can't come soon enough.
The higher $msft gets, the more pressure they have to be invasive and shittify everything they do.
It is not a five alarm fire for HIPAA. HIPAA doesn’t require that all file access be logged at all. HIPAA also doesn’t require that a CVE be created for each defect in a product.
End of the day, it’s a hand-wavy, “look at me” security blog. Don’t get too crazy.
https://www.hhs.gov/sites/default/files/january-2017-cyber-n...
Biggest thing is to have plan and policy. I’d agree in general that more audit is better.
So my understanding is that this is that the database/index that copilot used already crawled this file so of course it would not need to access the file to be able to tell the information in it.
But then, how do you fix that? Do you then tie audit reports to accessing parts of the database directly? Or are we instructing the LLM to do something like...
"If you are accessing knowledge pinky promise you are going to report it so we can add an audit log"
This really needs some communication from Microsoft on exactly what happened here and how it is being addressed since as of right now this should raise alarm bells for any company using Copilot and people have access to sensitive data that needs to be strictly monitored.
Well, the article did not say whether the unaudited access was possible in the opposite order after boot. First ask without reference and get it without audit log. Then ask without any limitation and get an audit log entry.
Did Copilot just keep a buffer/copy/context of what it had before in the sequence described. I guess that would go without log entry for any program. So what did MS change or fix? Producing extra audit log entries from user space?
The correct thing to do would be to have the vector search engine do the auditing (it probably already does, it just isn't exposed via Copilot) because it sounds like Copilot is deciding if/when to audit things that it does...
Microsoft tools can't be trust anymore, something really broke since COVID...
I don’t personally see that company as reliable or trustworthy at all.
Satya Nadella is a cloud guy and a lot of the complaints people have of the changes in Microsoft products is that they are increasingly reliant on cloud infrastructure.
It will not successfully create a moat - turns out files are portable - but it will successfully peeve a huge number of users and institutions off, and inevitably cause years of litigation and regulatory attention.
Are there no adults left at Microsoft? Or is it now just Copilot all the way up?
From a brief glance at the O365 docs it seems like the 'AISystemPluginData` field indicates that the event in the screenshot showing the missing access is a copilot event (or maybe they all get collapsed into one event, I'm not super familiar with O365 audit logs), and I'm inferring from the footnote that there's not another sharepoint event somewhere in either the old or new version. But if there is one that could at least be a mitigation if you needed to do such a search on the activity before the fix.
To you, the reader of this comment: if you thought like this, the problem is also in you.
But how then did MS "fix" this bug? Did they stop pre-ingesting, indexing, and caching the content? I doubt that.
Pushing (defaulting) organizations to feed all their data to Copilot and then not providing an audit trail of data access on that replica data store -- feels like a fundamental gap that should be caught by a security 101 checklist.
https://www.cisa.gov/sites/default/files/2025-03/CSRBReviewO...
And remember when the Microsoft CEO responded that they will care about security above all else?
https://blogs.microsoft.com/blog/2024/05/03/prioritizing-sec...
Doesn’t seem they’re doing that does it?
The bubble bursting will be epic.
This has genuinely made me work on switching to neovim. I previously demurred because I don't trust supply chains that are random public git repos full of emojis and Discords, but we've reached the point now where they're no less trustworthy than Microsoft. (And realistically, if you use any extensions on VS Code you're already trusting random repos, so you might as well cut out the middle man with an AI + spyware addiction and difficulties understanding consent.)
I'd switch to VSCodium but I use the WSL and SSH extensions :(
There are employers where you don't have to use anything from Microsoft during work hours either.
Well put.
The fundamental flaw is in trying to employ nondeterministic content generation based on statistical relevance defined by an unknown training data set, which is what commercial LLM offerings are, in an effort to repeatably produce content satisfying a strict mathematical model (program source code).
I've literally been employing nondeterministic content generation based on statistical relevance defined by an unknown training data, to repeatably produce content satisfying a strict mathematical model for months now.
99.99% of the code in that B2B SaaS for finding the cheapest industrial shipping option isn't novel.
That's like saying 99.99% of the food people eat consists of protein, carbohydrates, fats, and/or vegetables and therefore isn't novel. The implication being a McDonald's Big Mac and fries is the same as a spinach salad.
The only way someone could believe all food is the same as a Big Mac and fries is if this is all they ate and knew nothing else.
Hyperbole never ends well and neither does assuming novelty requires rarity or uniqueness, as distinct combinations of programmatic operations which deliver value in a problem domain is the very definition "new in an interesting way."
Just like how Thai noodles have proteins, carbohydrates, fats, and/or vegetables, yet are nothing like a Big Mac and fries.
The equivalent of not using LLMs in your workflow as a software engineer today isn't eating whole foods. That might have been true a year ago, but today it's becoming more and more equivalent to a fruit only diet.
Of note too is that the same "systems made out of meat" have been producing content satisfying the strict mathematical model for decades and continue to do so beyond the capabilities of the aforementioned algorithms.
Yes, humans exceed the capability of machines, until they don't. Machines exceed humans in more and more domains.
The style of argument you made about the nature of the machinery used applies just as well (maybe better) to humans. To get a valid argument, we'll need to be more nuanced.
> It's usually not the same pile of meat defining the problem and solving the problem.
True, but this distinction is also irrelevant.
The point is that problems capable of being solved by software systems are identified, reified, and then determined to be solved by people. Regardless of the tooling used to do so and the number of people involved.
> Yes, humans exceed the capability of machines, until they don't. Machines exceed humans in more and more domains.
But machines do not, and cannot, exceed humans in the domain of "understanding what a human wants" because this type of understanding is intrinsic to people by definition. Machines can do a lot of things, things which can be amazing and are truly beneficial to mankind, but they cannot understand as people colloquially use this term since they are not people.
I believe a decent analogy for this situation is how people will never completely understand the communication whales use with each other the way whales do themselves. There may someday exist the ability to translate their communication into a semblance of human language, but that would be only what we think is correct and not the same as being a whale.
You seem to rule this out, but despite having similar biology and wants, humans misunderstand others' intents and miss cues a lot.
--
Is it impossible that humans could build a system to know what a whale wants, based on its vocalization, that does better than the typical whale? Do we know that whales do really great at this, even?
There is no such thing for AI. No ledger, no track record, no reproducibility.
I never claimed that there is one type of ultrageneric ledger that works for all areas of research. But somehow, the LLM world still thinks that is the case for whatever reason.
I'm asking because I read somewhere that "AI produced output cannot be copyrighted". But what if I modify that output myself? I am then a co-creator, right, and I think I should have a right to some copyright protection.
The answer that most aligns with current precedent to my knowledge is that the parts you modify are protected by your copyright, but the rest remains uncopyrightable. With the exception of any chunks generated that align with someone's existing copyrighted code, as long as those chunks are substantial and unique enough.
Take the case of Linda Yaccarino. Ordinarily, if a male employee publicly and sexually harassed his female CEO on Twitter, he would (and should) be fired immediately. When Grok did that though, it's the CEO who ended up quitting.
>> This is the question I keep asking leaders (I literally asked a VP this question once in an all hands). How do we approach the risk associated mistakes made by AI?
> What was the answer? Asking for a vp friend
This is a difficult issue to tackle, no doubt. What follows drifts into the philosophical realm by necessity.
Software exists to provide value to people. Malicious software qualifies as such due to the desires of the actors which produce same, but will no longer be considered here as this is not germane.
AI is an umbrella term for numerous algorithms having wide ranging problem domain applicability and often can approximate near-optimal solutions using significantly less resources than other approaches. But they are still algorithms, capable of only one thing - execute their defined logic.
Sometimes this logic can produce results similar to be what a person would in a similar situation. Sometimes the logic will produce wildly different results. Often there is significant value when the logic is used appropriately.
In all cases AI algorithms do not possess the concept of understanding. This includes derivatives of understanding such as:
- empathy
- integrity
- morals
- right
- wrong
Which brings us back to part of the first quoted post: To quote IBM, "A computer can never be held accountable."
Accountability requires justification of actions taken or lack thereof, which demands the ability to explain why said actions were undertaken relative to other options, and implies a potential consequence be imposed by an authority.Algorithms can partially "justify their output" via strategic logging, but that's about it.
Which is why "a computer can never be held accountable." Because it is a machine, executing the instructions ultimately initiated by one or more persons whom can be held accountable.
cf the Post Office scandal in the UK which was partly helped along by the 1999 change in law[1] which repealed the 1984 stance that "computer evidence is not permissible unless it is shown to be working correctly at the time"[0]; i.e. that a computer was now presumed to be working correctly and it was up to the defence to prove otherwise.
[0] https://www.legislation.gov.uk/ukpga/1984/60/section/69/1991...
[1] https://www.legislation.gov.uk/ukpga/1999/23/section/60/1999...
But any argument seeking to dunk on LLMs needs to not also apply equally to the alternative (humans).
Maybe you can argue we don't use statistical completion and prediction as a heavy underpinning to our reasoning, but that's hardly settled.
Nah-- you will have to try harder to make an argument that really focuses on how LLMs are different from the alternative.
If you already have your entire information infrastructure in Office 365 (including all email, Excel sheets with material non-public information etc) I think this point is moot. Why would MS abuse information only from Copilot and not the rest of its products when the legal agreements permit them to do neither?
I'm personally less concerned about Microsoft's impact on safety in terms of software development than I am with how all my data is handled by the public sector in Denmark. At this point they shouldn't be allowed to use Windows.
Unless you think humans code reviewing humans is pointless because errors sometimes still slip through?
They somehow don't understand how they are breaking their own business models. We can only assume its a quick spin up cash grab before they jack up prices to unbelievable corp only levels