Skip to content

support DNS SRV record by RFC2782#2496

Closed
bootjp wants to merge 5 commits into
fluent:masterfrom
bootjp:master
Closed

support DNS SRV record by RFC2782#2496
bootjp wants to merge 5 commits into
fluent:masterfrom
bootjp:master

Conversation

@bootjp

@bootjp bootjp commented Jul 14, 2019

Copy link
Copy Markdown
Contributor

Which issue(s) this PR fixes:
Fixes : None

What this PR does / why we need it:
https://groups.google.com/forum/#!topic/fluentd/Lxx0-rVtsfo

Support for RFC2782 in out_forward allows for more flexible fluentd operations
Specifically, public cloud service discovery

Docs Changes:
It will be necessary when this request is merged.
I will send a pull request to the document separately.

Release Note:

To the reviewer
SRV#available? may be a better implementation.
For example, fluent/compat/socket_util might seem like a smarter way to implement it, but fluentd's code is too complex for my technical capabilities to figure out how to do it.
I will correct the code better if you give me advice.

@bootjp bootjp force-pushed the master branch 2 times, most recently from adb4c61 to c0231ca Compare July 15, 2019 04:12
bootjp added 2 commits July 16, 2019 13:44
Signed-off-by: ぶーと / Yoshiaki Ueda <oh@bootjp.me>
Signed-off-by: ぶーと / Yoshiaki Ueda <oh@bootjp.me>
@repeatedly

Copy link
Copy Markdown
Member

@ganmacs How about this? You are now working on pluggable SD on out_forward.

@ganmacs

ganmacs commented Jul 17, 2019

Copy link
Copy Markdown
Member

You are now working on pluggable SD on out_forward.

Right. but anyway, I'll take a look this later.


# @param host string
# @return [SRV]
def resolve_srv(host)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SRV records should be cached for performance.
And specifying the timing of purging cache for users is better.

assert{ logs.any?{|log| log.include?("no response from node. regard it as unavailable.") } }
end


Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test which checks if the SRV record service discovery works?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should this use a real domain name?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. mocking or stubbing seems enough.

bootjp added 2 commits July 18, 2019 19:23
Signed-off-by: bootjp <oh@bootjp.me>
Signed-off-by: bootjp <oh@bootjp.me>

@ganmacs ganmacs left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think updating caches should be invoked in another thread since performance issue.
Creating EvenTime and comparing it every write isn't efficient. And Node#resolve_dns is called by multiple threads, right? This means that if there is the only Node which uses this feature and multiple threads try to write (and the cache is expired), every thread tries to resolve the SRV record.

end

# swap config host to srv response host.
@srv_host_mutex.synchronize do

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If thread1(t1) and thread2(t2) which use the same Node try to write, thread1 and thread2 can use the same target.
Is it intentional?

  1. t1 does L918(and get target1)
  2. t2 does L918(and get target2)
  3. t1 does L930~L938,(and set @host and @port target1)
  4. context switch from t1 to t2 (t1 is still L939)
  5. t2 does L930~L938,(and set @host and @port target2)
  6. t1 and t2 send message to target2

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to lock other places where node uses @host and @port(e.g. https://github.com/fluent/fluentd/pull/2496/files#diff-40c0e8c88efdc9a297a9fc63bc2bc69fR948).
This code can be a race condition.

  1. t1 tries to do this line and evaluates arguments @host
  2. context switch happens to t2
  3. t2 does L930~L938,(and set @host and @port to another value)
  4. context switch happens to t1
  5. @port is invalid for evaluating @host at step 1 since @port has changed by t2

@bootjp

bootjp commented Oct 8, 2019

Copy link
Copy Markdown
Contributor Author

Close because it seems to be implemented in #2541 .

@bootjp bootjp closed this Oct 8, 2019
@repeatedly

Copy link
Copy Markdown
Member

We are now considering DNS SRV record based SD plugin in the core.
If you are interested in it, patches are welcome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants