[WorkplaceAI] SharePoint Online stack connector#248737
[WorkplaceAI] SharePoint Online stack connector#248737mattnowzari merged 33 commits intoelastic:mainfrom
Conversation
… not give a warning when attemping to access the output contents in a foreach step
🔍 Preview links for changed docs |
…n callGraphAPI that LLMs can use to forumlate their own Graph API calls
| /** | ||
| * Connector output schemas (from connector specs v2) | ||
| */ | ||
| export const ConnectorSpecsOutputSchemas = new Map<string, Record<string, z.ZodSchema>>( |
There was a problem hiding this comment.
Note to reviewers - This and the code in schema.ts was added to fix the Workflow YAML editor giving warnings when stringing together the ouputs of one SPO step with other steps.
Cards on the table, I kind of let Cursor take the reigns on these particular schema code diffs. They don't seem to have any adverse side effects but if others think they'd be better served as a separate PR I can remove them.
florent-leborgne
left a comment
There was a problem hiding this comment.
Leaving some suggestions to move the docs to a slightly different location of the Connectors section
docs/reference/connectors-kibana/sharepoint-online-action-type.md
Outdated
Show resolved
Hide resolved
docs/reference/connectors-kibana/_snippets/elastic-connectors-list.md
Outdated
Show resolved
Hide resolved
docs/reference/connectors-kibana/sharepoint-online-action-type.md
Outdated
Show resolved
Hide resolved
seanstory
left a comment
There was a problem hiding this comment.
Nice stuff! Most of my comments are from knowing too much about sharepoint specifically.
docs/reference/connectors-kibana/sharepoint-online-action-type.md
Outdated
Show resolved
Hide resolved
| }; | ||
| } | ||
|
|
||
| const contentUrl = `${baseUrl}/content`; |
There was a problem hiding this comment.
Should we try to use the /content URL first, before we return a link to a download URL?
This downloadUrl path feels misleading to me. I'd expect that to be the input to a downloadDriveItem tool, not the output.
There was a problem hiding this comment.
Asking some clarifying questions 😄
Should we try to use the /content URL first, before we return a link to a download URL?
Are you suggesting we should have a more automated behavior, where we attempt /content and simply return downloadUrl to the user if those fail (most likely bc its a PDF or .docx or something). Not entirely opposed to this if so, just want to make sure I understand. I have a lightly held opinion that its not amazing if actions return two different types of data given a single input set - either doc contents or a URL - but again, it's lightly held.
I'd expect that to be the input to a downloadDriveItem tool, not the output.
The way I envisioned this working was it'd accept a driveId and itemId as input, as they are outputs of other tools, then attempt get the content in a format the LLM could work with, and then give the user the downloadUrl if it can't...which means, In its current form, the LLM would call this tool again with a different input set. I'm not sure how ideal that behavior is, either.
A big simplification would be to just have getDriveItems return the downloadUrl as part of its payload and then have this action only accept downloadUrls as input, but I'd have to test how well this works.
Hopefully I've understood what your concerns are, if not please course-correct me 🫡
There was a problem hiding this comment.
Are you suggesting we should have a more automated behavior, where we attempt /content and simply return downloadUrl to the user if those fail
I haven't done any googling or investigation here. So take this with a grain of salt. But my concern is that we might have some responses where there's a content endpoint that can:
- give us the actual text contents
and there's a dowloadURL where we could:
- download a big file
- ship that big file somewhere else for processing
- get the actual text contents
If we can jump straight to the text, I want to make sure we do, to avoid unnecessary bandwidth, compute, and latency.
A big simplification would be to just have getDriveItems return the downloadUrl as part of its payload and then have this action only accept downloadUrls as input, but I'd have to test how well this works.
I think that could be a good piece of metadata to include. The flow I'd expect is:
- getMany (mostly metadata)
- LLM introspection (do any of these look promising?)
- fetchOneorSeveral (full contents, using metadata from 1 to limit the number of results)
If downloadUrl isn't included in 1, and another pass has to be done to get that metadata, this feels inefficient.
There was a problem hiding this comment.
Ok, I think I got it:
- We should make sure to return
downloadUrls as part of our getMany-style actions for drive items - I'm going to create a new action (
downloadItemFromURLor something) that can acquire the file via adownloadUrl, which we can hopefully use as an action for shipping more complicated formats elsewhere for processing - I'm going to remove the
downloadUrlcode path from the currentdownloadDriveItemaction. This existing action should just be our way of getting straight to text content. I'll also remove thebase64encoding format too.
Net result will be clearer definitions:
- An action (
downloadDriveItem) that can directly get text content - Another action (
downloadItemFromURL) can download a file via URL to potentially farm out for further processing
🤞 Hopefully I've understood the concerns and provided a good way forward!
There was a problem hiding this comment.
👍 this sounds like a reasonable path forward. Thanks!
...latform/packages/shared/kbn-connector-specs/src/specs/sharepoint_online/sharepoint_online.ts
Outdated
Show resolved
Hide resolved
...latform/packages/shared/kbn-connector-specs/src/specs/sharepoint_online/sharepoint_online.ts
Outdated
Show resolved
Hide resolved
...latform/packages/shared/kbn-connector-specs/src/specs/sharepoint_online/sharepoint_online.ts
Outdated
Show resolved
Hide resolved
...latform/packages/shared/kbn-connector-specs/src/specs/sharepoint_online/sharepoint_online.ts
Outdated
Show resolved
Hide resolved
...latform/packages/shared/kbn-connector-specs/src/specs/sharepoint_online/sharepoint_online.ts
Show resolved
Hide resolved
… downloadDriveItem + getDriveItems returns a downloadUrl as part of its params
...latform/packages/shared/kbn-connector-specs/src/specs/sharepoint_online/sharepoint_online.ts
Show resolved
Hide resolved
florent-leborgne
left a comment
There was a problem hiding this comment.
LGTM for docs, thanks!
|
Can we add a Team or Feature label so that it's categorized nicely from the start in the release notes? See https://stunning-adventure-qrvr1k2.pages.github.io/release-notes/kibana-pr-best-practices-for-dev/ (Elastic internal) Thanks! |
| .array(z.enum(['site', 'list', 'listItem', 'drive', 'driveItem'])) | ||
| .optional() | ||
| .describe('Entity types to search'), | ||
| region: z |
There was a problem hiding this comment.
Note to reviewers - apparently region is required for Search after all :(
It's the only place where it's required, so I've made it an optional input with NAM as the default, so an LLM can always pass it if necessary. This way, we don't have to ask for it at connector set-up time.
That being said, I don't feel terribly strongly about this, so if we want to revert to asking for a region at set-up time I can do so.
There was a problem hiding this comment.
We can have search region be an optional field, something like:
schema: z.object({
region: z
.enum(['NAM', 'EUR', 'APC', 'LAM', 'MEA'])
.optional()
.default('NAM')
.describe(
'Search region (NAM=North America, EUR=Europe, APC=Asia Pacific, LAM=Latin America, MEA=Middle East/Africa)'
)
.meta({
label: 'Search region',
}),
}),Feel free to leave this for a follow-up, and only if we're interested.
There was a problem hiding this comment.
Wait, sanity check: we're doing SPO connector that does not run on behalf of the user, is it correct?
There was a problem hiding this comment.
That's correct I believe, it does not run on the behalf of a user
There was a problem hiding this comment.
@artem-shelkovnikov , the "on behalf of the user" bit is blocked by #246655.
Once we merge that, we can simply change this line to
type: 'oauth_authorization_code_grant'
instead of
type: 'oauth_client_credentials'
and everything should "just work".
But that PR is probably going to be blocked for a bit. Hence the temporary alternative auth mechanism.
There was a problem hiding this comment.
I think it'll be a bit different in details - for example there's an endpoint in the PR that uses region param because it's needed by application permission requests.
We'd need to update some endpoints here and there + update the instructions for setting up the delegated permissions instead of application permissions
seanstory
left a comment
There was a problem hiding this comment.
LGTM. I know there's still plenty of follow-up work to do, but this PR is already quite large, so I'm supportive of getting it in.
...latform/packages/shared/kbn-connector-specs/src/specs/sharepoint_online/sharepoint_online.ts
Outdated
Show resolved
Hide resolved
|
Hey there @elastic/workflows-eng 👋 - would appreciate a review of this PR whenever y'all have a moment to do so 🙏 |
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Module Count
Async chunks
Page load bundle
Unknown metric groupsasync chunk count
History
|
…iew_cps * commit '5f7fec57cb01883038810bd735a0666683b49904': (116 commits) [Security Solution][Attacks/Alerts][Setup and miscellaneous] Advanced setting to control feature visibility (elastic#250157) (elastic#250830) Fix synthtrace `fetch` usage (elastic#250950) [APM] Add Nodes and Edges components and selection logic (elastic#250937) [Docs] Update alerting-settings.md and add serverless value for one parameter (elastic#250842) [Agent Builder] filestore: initial implementation (elastic#250043) [CPS] Support CPS in Vega ESQL (elastic#250693) Adjustments to cascade document esql helpers (elastic#250560) [Security Solutions] Trial Companion - adds ai chat and elastic agent detectors (elastic#250908) [Obs Presentation] Code Scanning Alert Fixes (elastic#250858) [performance] add return and refresh render scenarios to dashboard journeys (elastic#250939) skip failing test suite (elastic#245458) Add Cloud Forwarder onboarding tile to O11y Solution (elastic#250325) [Traces] Remove APM unified trace waterall embeddable registration (elastic#250808) [Discover] [Metrics] Fix: metrics grid titles do not update on order change (elastic#250963) [a11y] Fix Eui modal title annoucment (elastic#250459) [Cloud Security] [Fleet] Add cloud connector access scope for input or package level credential definitions (elastic#250280) [WorkplaceAI] SharePoint Online stack connector (elastic#248737) [Response Ops][Task Manager] Update functions do not handle API key invalidation (elastic#249109) [Osquery] Remove @kbn/timelines-plugin dependency from osquery plugin (elastic#250055) [One Discover][Logs UX] Update OpenTelemetry Semantic Conventions (elastic#250346) ...
## Summary This PR adds a SharePoint Online `v2` stack connector. The `actions` this connector provides are: - search (accepts a query and entity type to search) - getAllSites (retrieve all sites, potentially large return) - getSitePages (retrieve all pages of a given site) - getSitePageContents (get the content of a site page) - getSite (retrieve a single given site) - getSiteDrives (retrieve the drives of a single given site) - getDriveItems (retrieve all items of a single given drive) - downloadDriveItems (download the contents of a single drive item via a driveId and itemId) - downloadItemFromURL (download the contents of a single drive item via its downloadUrl) - getSiteLists (retrieve the lists of a single given site) - getSiteListItems (retrieve the items of single given list of a single given site) - callGraphAPI (transparent action that allows an LLM with knowledge of the Microsoft Graph API to formulate and call endpoints at will. Intended to be a power user feature) ## What this PR does not include - Data Source registration - Workflow YAML definitions These will be provided in a fast-follow PR, as this one is quite sizeable already. ## Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) ## Release Notes - A new connector for SharePoint Online has been added to the available list of connectors --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Florent Le Borgne <florent.leborgne@elastic.co>
Summary
This PR adds a SharePoint Online
v2stack connector.The
actionsthis connector provides are:What this PR does not include
These will be provided in a fast-follow PR, as this one is quite sizeable already.
Checklist
Check the PR satisfies following conditions.
Reviewers should verify this PR satisfies this list as well.
release_note:*label is applied per the guidelinesRelease Notes