Skip to content

Conversation

@TrafalgarZZZ
Copy link
Member

Ⅰ. Describe what this PR does

Three parts of code changes have been made in this pull request:

  • Inject Fuse sidecar in a determined lexicographic order based on datasets' names. That is, we can make sure the mapping between sidecar container's name and the runtimes will be fixed when there are multiple FUSE sidecars to inject.
  • Add spec.fuse.metrics to JindoRuntime's CRD. See below for some examples.
  • Some inner logic used to support spec.fuse.metrics

Feature API

apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
...
spec:
  fuse:
    metrics:
      enabled: true
      scrapeTarget: All

Two fields can be specified under spec.fuse.metrics in JindoRuntime CRD.

  • With enabled: true, Fluid exposes metrics for all FUSE components, including FUSE mount pods and FUSE sidecar containers.
  • scrapeTarget has four valid values:
    • "": The default value, indicating Fluid will not automatically scrape FUSE metrics. (Under the hood, will not add special annotation that makes Prometheus discover FUSE components.)
    • MountPodOnly: Only FUSE mount pods will be discovered by Prometheus and metrics will be scraped.
    • SidecarOnly: Only FUSE sidecar containers will be discovered by Prometheus and metrics will be scraped.
    • All: Both FUSE mount pods and FUSE sidecar containers will be discovered by Prometheus and metrics will be scraped.

Discovered FUSE mount pod example with enabled metrics

apiVersion: v1
kind: Pod
metadata:
  annotations:
    prometheus.fuse.fluid.io/scrape: "true"
...
spec:
  containers:
  - name: jindofs-fuse
    ...
    ports:
    - name: jindo-metrics
      containerPort: 10001 # (default in ContainerNetwork) or <allocated random port> (in HostNetwork)
    args:
    - -okernel_cache
    - -oro
    - -oattr_timeout=3600
    - -oentry_timeout=3600
    - -ometrics_port=10001 # or <allocated randome port>

Discovered FUSE sidecar container example with enabled metrics

apiVersion: v1
kind: Pod
metadata:
  annotations:
    prometheus.fuse.fluid.io/scrape: "true"
  labels:
    container-dataset-mapping.sidecar.fluid.io/fluid-fuse-0: default_demo-dataset # <namespace>_<dataset_name>, used to identify the relationship between dataset and sidecar container.
spec:
  containers:
  - name: fluid-fuse-0
    args:
    - -okernel_cache
    - -oro
    - -oattr_timeout=3600
    - -oentry_timeout=3600
    - -ometrics_port=10001 # or <allocated randome port>
    ports:
    - name: jindo-metrics
      containerPort: 10001 # (default in ContainerNetwork) or <allocated random port> (in HostNetwork)

Ⅱ. Does this pull request fix one issue?

NONE

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

@TrafalgarZZZ
Copy link
Member Author

/hold

@codecov
Copy link

codecov bot commented Sep 29, 2024

Codecov Report

Attention: Patch coverage is 21.21212% with 78 lines in your changes missing coverage. Please review.

Project coverage is 58.24%. Comparing base (22e808c) to head (408d2d4).
Report is 31 commits behind head on master.

Files with missing lines Patch % Lines
...application/inject/fuse/mutator/mutator_default.go 0.00% 27 Missing ⚠️
pkg/ddc/jindocache/transform.go 29.41% 16 Missing and 8 partials ⚠️
pkg/ddc/jindocache/sync_runtime.go 15.38% 8 Missing and 3 partials ⚠️
...cation/inject/fuse/mutator/mutator_unprivileged.go 0.00% 8 Missing ⚠️
pkg/utils/map.go 0.00% 6 Missing ⚠️
pkg/ddc/base/runtime.go 77.77% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4331      +/-   ##
==========================================
- Coverage   62.90%   58.24%   -4.67%     
==========================================
  Files         483      563      +80     
  Lines       28570    31367    +2797     
==========================================
+ Hits        17972    18269     +297     
- Misses       8393    10837    +2444     
- Partials     2205     2261      +56     
Flag Coverage Δ
58.24% <21.21%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
….ScrapeTarget

Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
….ScrapeTarget

Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
@TrafalgarZZZ TrafalgarZZZ force-pushed the feat/enable_jindo_cache_fuse_metrics branch from d60b25b to b0e7f7f Compare October 8, 2024 02:32
@TrafalgarZZZ
Copy link
Member Author

/unhold

}

metricsEnabled, exists := fusesToUpdate.Spec.Template.ObjectMeta.Annotations[common.AnnotationPrometheusFuseMetricsScrapeKey]
if exists && metricsEnabled != common.True {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about letting the code exit early if the annotation exists but is not equal to common.True, reducing nesting?

if exists && metricsEnabled != common.True {
    e.Log.V(1).Info(fmt.Sprintf("Found user-defined annotation %s != %s, skip syncing.", common.AnnotationPrometheusFuseMetricsScrapeKey, common.True))
    return
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheyang The nesting is necessary. L235-L248 updates FUSE's daemonset if changed==true. If the code exits early, the update will be skipped when runtime.Spec.Fuse.Resources is modified(at L213).

Signed-off-by: trafalgarzzz <trafalgarz@outlook.com>
@sonarqubecloud
Copy link

sonarqubecloud bot commented Oct 8, 2024

Copy link
Collaborator

@cheyang cheyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@fluid-e2e-bot
Copy link

fluid-e2e-bot bot commented Oct 8, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheyang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fluid-e2e-bot fluid-e2e-bot bot merged commit c84a499 into fluid-cloudnative:master Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants