Fix/job registry by mizhm · Pull Request #260 · oasm-platform/open-asm

mizhm · 2026-01-27T11:37:11Z

No description provided.

gemini-code-assist · 2026-01-27T11:37:30Z

Summary of Changes

Hello @mizhm, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing data integrity and streamlining data processing within the core API. It introduces crucial deduplication mechanisms for assets, ports, and vulnerabilities, preventing redundant entries and improving database efficiency. Additionally, it refines the handling of job payload results by setting clearer default values and optimizing data transformation, contributing to more predictable and reliable data flows. A minor refactoring in the tools service also contributes to overall code cleanliness.

Highlights

Data Deduplication: Implemented robust deduplication logic across various data types (assets, ports, and vulnerabilities) within the DataAdapterService to ensure data uniqueness before persistence.
Job Payload Transformation Enhancement: Improved the DataPayloadResult DTO in jobs-registry to provide default values for error (to false) and raw (to null), and refined the payload transformation to prioritize existing values.
Tools Service Refactoring: Simplified the getTools method in ToolsService by directly returning query results, making the code more concise and removing unnecessary intermediate array manipulations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

mizhm · 2026-01-27T11:40:39Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces several valuable improvements. The data deduplication logic added to data-adapter.service.ts is a solid enhancement for data integrity, preventing issues from duplicate entries. The changes in jobs-registry.dto.ts make the data transfer object more robust and flexible. Additionally, the refactoring in tools.service.ts simplifies the code and resolves a bug related to fetching tools. Overall, these are high-quality changes. I have one suggestion to further improve code style.

gemini-code-assist

Code Review

This pull request introduces data deduplication in data-adapter.service.ts before database insertion for subdomains, ports, and vulnerabilities. This is a good practice to prevent errors and ensure data consistency. The PR also refactors the getToolByNames method in tools.service.ts to fix a bug. While the refactoring is an improvement, I've found a logic issue in the new implementation when isInstalled is true, where it fails to include built-in tools. I've provided a suggestion to correct this behavior.

Copilot

Pull request overview

This PR addresses issues in the job registry by refactoring tool lookup logic and adding deduplication to prevent duplicate data insertion.

Changes:

Refactored getToolByNames method in tools.service.ts to change how installed tools are queried
Added deduplication logic in data-adapter.service.ts for assets, ports, and vulnerabilities to prevent duplicate entries

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File	Description
core-api/src/modules/tools/tools.service.ts	Rewrote the tool lookup logic when `isInstalled=true`, removing workspace repository queries and built-in tools concatenation in favor of a direct query with type filtering
core-api/src/modules/data-adapter/data-adapter.service.ts	Added deduplication using Map for assets (by value), ports (by Set), and vulnerabilities (by fingerprint) before database insertion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-27T15:40:43Z

core-api/src/modules/data-adapter/data-adapter.service.ts

+      // Deduplicate based on fingerprint
+      const uniqueValues = Array.from(
+        new Map(values.map((v) => [v.fingerprint, v])).values(),
+      );


The deduplication logic added here is not covered by tests. While the existing tests in data-adapter.service.spec.ts verify the basic vulnerability insertion flow, they don't test the scenario where duplicate vulnerabilities (with the same fingerprint) are provided in the input data.

Consider adding a test case that verifies when the input data contains multiple vulnerabilities with the same fingerprint (derived from the same name, asset ID, and tool ID), only one is inserted.

Copilot · 2026-01-27T15:40:43Z

core-api/src/modules/data-adapter/data-adapter.service.ts

+      // Deduplicate data based on value
+      const uniqueData = Array.from(
+        new Map(data.map((asset) => [asset.value, asset])).values(),
+      );


While this deduplication approach using Map is correct, there's a subtle behavior to note: when duplicate values exist in the input array, the Map will keep the last occurrence (since subsequent set operations with the same key overwrite previous values). This means if there are multiple assets with the same value but different properties (like dnsRecords), only the properties from the last occurrence will be retained.

Consider whether this is the intended behavior. If it is, this should be documented. If not, you may want to implement merge logic or keep the first occurrence instead by checking if the key already exists before adding it to the Map.

Suggested change

// Deduplicate data based on value

const uniqueData = Array.from(

new Map(data.map((asset) => [asset.value, asset])).values(),

);

// Deduplicate data based on value, keeping the first occurrence for each value

const assetByValue = new Map<string, Asset>();

for (const asset of data) {

if (!assetByValue.has(asset.value)) {

assetByValue.set(asset.value, asset);

}

}

const uniqueData = Array.from(assetByValue.values());

Copilot · 2026-01-27T15:40:44Z

core-api/src/modules/data-adapter/data-adapter.service.ts

+      // Deduplicate data based on value
+      const uniqueData = Array.from(
+        new Map(data.map((asset) => [asset.value, asset])).values(),
+      );


When the input data array is empty, the deduplication will produce an empty uniqueData array. This could lead to issues downstream:

On line 64, uniqueData.find() will return undefined when the array is empty

This causes primaryAssets?.dnsRecords on line 73 to be undefined, which is handled safely by the optional chaining

On lines 89-93, mapping over an empty array will produce an empty values array for the insert

While the code handles the empty array case without crashing, the database insert with an empty values array might behave unexpectedly depending on the TypeORM version. Consider adding an early return or explicit check if data.length === 0 or uniqueData.length === 0 to make the intent clearer and avoid unnecessary database operations.

Copilot · 2026-01-27T15:40:44Z

core-api/src/modules/tools/tools.service.ts

+          workspaceTools: {
            tool: {
              name: In(names),
            },
          },


This query structure is incorrect. The workspaceTools relation on the Tool entity is a collection of WorkspaceTool records, where each WorkspaceTool has a tool property pointing back to the Tool. The nested workspaceTools.tool.name filter doesn't make sense in this context.

Additionally, this query is missing a critical parameter: it needs a workspaceId to determine which workspace to check for installed tools. Without the workspace context, this query will return any tool that has ANY workspace association, not just the specific workspace the caller is interested in.

The original implementation queried the workspaceToolRepository with workspace context and then mapped to get the tools, which was the correct approach. To fix this properly:

Add a workspaceId parameter to the method signature

Query the workspaceToolRepository to find tools installed in that specific workspace

Combine those results with built-in tools (which should be available in all workspaces)

Copilot · 2026-01-27T15:40:44Z

core-api/src/modules/tools/tools.service.ts

+    return await this.toolsRepository.find({
+      where: [
+        {
+          name: In(names),
+          type: WorkerType.BUILT_IN,
+        },
+        {
+          workspaceTools: {
            tool: {
              name: In(names),
            },
          },
-          relations: ['tool'],
-        })
-        .then((res) => res.map((r) => r.tool));
-    }
-
-    tools = await this.toolsRepository.find({
-      where: {
-        name: In(names),
-      },
+        },
+      ],
    });


The removal of the built-in tools concatenation logic is a breaking change. The original code explicitly added tools from the builtInTools array that matched the requested names. This ensured that built-in tools were always available regardless of database state.

While the new code attempts to query for BUILT_IN type tools from the database, there's a fundamental difference: the original builtInTools array may contain tools with additional runtime properties (like parser functions) that aren't persisted to the database. Removing this concatenation could break functionality that depends on these runtime-only properties.

If the intent is to rely solely on database records, ensure that all necessary tool properties are properly persisted during the onModuleInit phase and that no runtime-only properties are required by consumers of this method.

Copilot · 2026-01-27T15:40:44Z

core-api/src/modules/tools/tools.service.ts

+    return await this.toolsRepository.find({
+      where: [
+        {
+          name: In(names),
+          type: WorkerType.BUILT_IN,
+        },
+        {
+          workspaceTools: {
            tool: {
              name: In(names),
            },
          },
-          relations: ['tool'],
-        })
-        .then((res) => res.map((r) => r.tool));
-    }
-
-    tools = await this.toolsRepository.find({
-      where: {
-        name: In(names),
-      },
+        },
+      ],
    });


The query is missing the relations configuration needed to load the workspaceTools relation. In TypeORM, when using nested relation paths in the where clause (like workspaceTools.tool.name), you must also explicitly specify those relations in the query configuration. Without this, TypeORM won't join the related tables and the query will fail or return no results.

Even if the query structure is corrected, you'll need to add something like:

relations: ['workspaceTools']

to the find options.

Copilot · 2026-01-27T15:40:45Z

core-api/src/modules/data-adapter/data-adapter.service.ts

+      // Deduplicate data based on value
+      const uniqueData = Array.from(
+        new Map(data.map((asset) => [asset.value, asset])).values(),
+      );


The deduplication logic added here is not covered by tests. While the existing test in data-adapter.service.spec.ts verifies the basic subdomain insertion flow, it doesn't test the scenario where duplicate assets (with the same value) are provided in the input data.

Consider adding a test case that verifies:

When the input data contains multiple assets with the same value, only one is inserted

The deduplicated asset is the correct one (e.g., the last one in case of conflicts)

Copilot · 2026-01-27T15:40:45Z

core-api/src/modules/data-adapter/data-adapter.service.ts

+    // Deduplicate ports
+    const uniquePorts = [...new Set(data.filter((port) => !isNaN(port)))];


The deduplication logic added here is not covered by tests. While the existing tests in data-adapter.service.spec.ts verify the basic port scanner flow, they don't test the scenario where duplicate ports are provided in the input data.

Consider adding a test case that verifies when the input data contains duplicate port numbers, only unique ports are inserted and used to create asset services.

mizhm and others added 7 commits January 22, 2026 21:09

feat(core, worker): update worker signature

6e11bfe

feat(core): update worker proto

fdd22e1

fix(core, worker): fix undefined signature

bd001fa

fix(core, job): fix class-validator

7230b7b

Merge branch 'main' into fix/job-registry

59abeee

fix(core): fix wrong type

3b0a77f

fix(core, job): fix duplicate job and confict on upsert

ac4c1e3

Copilot AI review requested due to automatic review settings January 27, 2026 11:37

Copilot started reviewing on behalf of mizhm January 27, 2026 11:37 View session

Merge branch 'main' into fix/job-registry

eb6c3db

mizhm requested review from l1ttps and removed request for Copilot January 27, 2026 11:38

gemini-code-assist bot reviewed Jan 27, 2026

View reviewed changes

fix(core, tool): fix getToolByNames query

e795517

Copilot AI review requested due to automatic review settings January 27, 2026 15:35

Copilot started reviewing on behalf of mizhm January 27, 2026 15:35 View session

Copilot AI reviewed Jan 27, 2026

View reviewed changes

l1ttps merged commit b67d4e5 into main Jan 27, 2026
15 checks passed

ntthanh2603 pushed a commit that referenced this pull request Jan 28, 2026

fix(job-registry): create job without tool id (#260)

e897573

l1ttps deleted the fix/job-registry branch February 1, 2026 02:08

-      // Deduplicate data based on value
-      const uniqueData = Array.from(
-        new Map(data.map((asset) => [asset.value, asset])).values(),
-      );
+      // Deduplicate data based on value, keeping the first occurrence for each value
+      const assetByValue = new Map<string, Asset>();
+      for (const asset of data) {
+        if (!assetByValue.has(asset.value)) {
+          assetByValue.set(asset.value, asset);
+        }
+      }
+      const uniqueData = Array.from(assetByValue.values());

		// Deduplicate ports
		const uniquePorts = [...new Set(data.filter((port) => !isNaN(port)))];

Conversation

mizhm commented Jan 27, 2026

Uh oh!

gemini-code-assist bot commented Jan 27, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

mizhm commented Jan 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants