Skip to content

Fix/job registry#260

Merged
l1ttps merged 9 commits intomainfrom
fix/job-registry
Jan 27, 2026
Merged

Fix/job registry#260
l1ttps merged 9 commits intomainfrom
fix/job-registry

Conversation

@mizhm
Copy link
Copy Markdown
Contributor

@mizhm mizhm commented Jan 27, 2026

No description provided.

Copilot AI review requested due to automatic review settings January 27, 2026 11:37
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @mizhm, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing data integrity and streamlining data processing within the core API. It introduces crucial deduplication mechanisms for assets, ports, and vulnerabilities, preventing redundant entries and improving database efficiency. Additionally, it refines the handling of job payload results by setting clearer default values and optimizing data transformation, contributing to more predictable and reliable data flows. A minor refactoring in the tools service also contributes to overall code cleanliness.

Highlights

  • Data Deduplication: Implemented robust deduplication logic across various data types (assets, ports, and vulnerabilities) within the DataAdapterService to ensure data uniqueness before persistence.
  • Job Payload Transformation Enhancement: Improved the DataPayloadResult DTO in jobs-registry to provide default values for error (to false) and raw (to null), and refined the payload transformation to prioritize existing values.
  • Tools Service Refactoring: Simplified the getTools method in ToolsService by directly returning query results, making the code more concise and removing unnecessary intermediate array manipulations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mizhm mizhm requested review from l1ttps and removed request for Copilot January 27, 2026 11:38
@mizhm
Copy link
Copy Markdown
Contributor Author

mizhm commented Jan 27, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several valuable improvements. The data deduplication logic added to data-adapter.service.ts is a solid enhancement for data integrity, preventing issues from duplicate entries. The changes in jobs-registry.dto.ts make the data transfer object more robust and flexible. Additionally, the refactoring in tools.service.ts simplifies the code and resolves a bug related to fetching tools. Overall, these are high-quality changes. I have one suggestion to further improve code style.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces data deduplication in data-adapter.service.ts before database insertion for subdomains, ports, and vulnerabilities. This is a good practice to prevent errors and ensure data consistency. The PR also refactors the getToolByNames method in tools.service.ts to fix a bug. While the refactoring is an improvement, I've found a logic issue in the new implementation when isInstalled is true, where it fails to include built-in tools. I've provided a suggestion to correct this behavior.

Copilot AI review requested due to automatic review settings January 27, 2026 15:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issues in the job registry by refactoring tool lookup logic and adding deduplication to prevent duplicate data insertion.

Changes:

  • Refactored getToolByNames method in tools.service.ts to change how installed tools are queried
  • Added deduplication logic in data-adapter.service.ts for assets, ports, and vulnerabilities to prevent duplicate entries

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
core-api/src/modules/tools/tools.service.ts Rewrote the tool lookup logic when isInstalled=true, removing workspace repository queries and built-in tools concatenation in favor of a direct query with type filtering
core-api/src/modules/data-adapter/data-adapter.service.ts Added deduplication using Map for assets (by value), ports (by Set), and vulnerabilities (by fingerprint) before database insertion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +245 to +248
// Deduplicate based on fingerprint
const uniqueValues = Array.from(
new Map(values.map((v) => [v.fingerprint, v])).values(),
);
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deduplication logic added here is not covered by tests. While the existing tests in data-adapter.service.spec.ts verify the basic vulnerability insertion flow, they don't test the scenario where duplicate vulnerabilities (with the same fingerprint) are provided in the input data.

Consider adding a test case that verifies when the input data contains multiple vulnerabilities with the same fingerprint (derived from the same name, asset ID, and tool ID), only one is inserted.

Copilot uses AI. Check for mistakes.
Comment on lines +59 to +62
// Deduplicate data based on value
const uniqueData = Array.from(
new Map(data.map((asset) => [asset.value, asset])).values(),
);
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this deduplication approach using Map is correct, there's a subtle behavior to note: when duplicate values exist in the input array, the Map will keep the last occurrence (since subsequent set operations with the same key overwrite previous values). This means if there are multiple assets with the same value but different properties (like dnsRecords), only the properties from the last occurrence will be retained.

Consider whether this is the intended behavior. If it is, this should be documented. If not, you may want to implement merge logic or keep the first occurrence instead by checking if the key already exists before adding it to the Map.

Suggested change
// Deduplicate data based on value
const uniqueData = Array.from(
new Map(data.map((asset) => [asset.value, asset])).values(),
);
// Deduplicate data based on value, keeping the first occurrence for each value
const assetByValue = new Map<string, Asset>();
for (const asset of data) {
if (!assetByValue.has(asset.value)) {
assetByValue.set(asset.value, asset);
}
}
const uniqueData = Array.from(assetByValue.values());

Copilot uses AI. Check for mistakes.
Comment on lines +59 to +62
// Deduplicate data based on value
const uniqueData = Array.from(
new Map(data.map((asset) => [asset.value, asset])).values(),
);
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the input data array is empty, the deduplication will produce an empty uniqueData array. This could lead to issues downstream:

  1. On line 64, uniqueData.find() will return undefined when the array is empty
  2. This causes primaryAssets?.dnsRecords on line 73 to be undefined, which is handled safely by the optional chaining
  3. On lines 89-93, mapping over an empty array will produce an empty values array for the insert

While the code handles the empty array case without crashing, the database insert with an empty values array might behave unexpectedly depending on the TypeORM version. Consider adding an early return or explicit check if data.length === 0 or uniqueData.length === 0 to make the intent clearer and avoid unnecessary database operations.

Copilot uses AI. Check for mistakes.
Comment on lines +442 to 446
workspaceTools: {
tool: {
name: In(names),
},
},
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query structure is incorrect. The workspaceTools relation on the Tool entity is a collection of WorkspaceTool records, where each WorkspaceTool has a tool property pointing back to the Tool. The nested workspaceTools.tool.name filter doesn't make sense in this context.

Additionally, this query is missing a critical parameter: it needs a workspaceId to determine which workspace to check for installed tools. Without the workspace context, this query will return any tool that has ANY workspace association, not just the specific workspace the caller is interested in.

The original implementation queried the workspaceToolRepository with workspace context and then mapped to get the tools, which was the correct approach. To fix this properly:

  1. Add a workspaceId parameter to the method signature
  2. Query the workspaceToolRepository to find tools installed in that specific workspace
  3. Combine those results with built-in tools (which should be available in all workspaces)

Copilot uses AI. Check for mistakes.
Comment on lines +435 to 449
return await this.toolsRepository.find({
where: [
{
name: In(names),
type: WorkerType.BUILT_IN,
},
{
workspaceTools: {
tool: {
name: In(names),
},
},
relations: ['tool'],
})
.then((res) => res.map((r) => r.tool));
}

tools = await this.toolsRepository.find({
where: {
name: In(names),
},
},
],
});
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of the built-in tools concatenation logic is a breaking change. The original code explicitly added tools from the builtInTools array that matched the requested names. This ensured that built-in tools were always available regardless of database state.

While the new code attempts to query for BUILT_IN type tools from the database, there's a fundamental difference: the original builtInTools array may contain tools with additional runtime properties (like parser functions) that aren't persisted to the database. Removing this concatenation could break functionality that depends on these runtime-only properties.

If the intent is to rely solely on database records, ensure that all necessary tool properties are properly persisted during the onModuleInit phase and that no runtime-only properties are required by consumers of this method.

Copilot uses AI. Check for mistakes.
Comment on lines +435 to 449
return await this.toolsRepository.find({
where: [
{
name: In(names),
type: WorkerType.BUILT_IN,
},
{
workspaceTools: {
tool: {
name: In(names),
},
},
relations: ['tool'],
})
.then((res) => res.map((r) => r.tool));
}

tools = await this.toolsRepository.find({
where: {
name: In(names),
},
},
],
});
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query is missing the relations configuration needed to load the workspaceTools relation. In TypeORM, when using nested relation paths in the where clause (like workspaceTools.tool.name), you must also explicitly specify those relations in the query configuration. Without this, TypeORM won't join the related tables and the query will fail or return no results.

Even if the query structure is corrected, you'll need to add something like:

relations: ['workspaceTools']

to the find options.

Copilot uses AI. Check for mistakes.
Comment on lines +59 to +62
// Deduplicate data based on value
const uniqueData = Array.from(
new Map(data.map((asset) => [asset.value, asset])).values(),
);
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deduplication logic added here is not covered by tests. While the existing test in data-adapter.service.spec.ts verifies the basic subdomain insertion flow, it doesn't test the scenario where duplicate assets (with the same value) are provided in the input data.

Consider adding a test case that verifies:

  1. When the input data contains multiple assets with the same value, only one is inserted
  2. The deduplicated asset is the correct one (e.g., the last one in case of conflicts)

Copilot uses AI. Check for mistakes.
Comment on lines +167 to +168
// Deduplicate ports
const uniquePorts = [...new Set(data.filter((port) => !isNaN(port)))];
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deduplication logic added here is not covered by tests. While the existing tests in data-adapter.service.spec.ts verify the basic port scanner flow, they don't test the scenario where duplicate ports are provided in the input data.

Consider adding a test case that verifies when the input data contains duplicate port numbers, only unique ports are inserted and used to create asset services.

Copilot uses AI. Check for mistakes.
@l1ttps l1ttps merged commit b67d4e5 into main Jan 27, 2026
15 checks passed
@l1ttps l1ttps deleted the fix/job-registry branch February 1, 2026 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants