Skip to content

search.sh may bind to an address unreachable by the query worker, ignoring potentially reachable ones #1316

@junhaoliao

Description

@junhaoliao

Bug

The search.sh script, which wraps search.py -> native/search.py, may listen on an IP address that is unreachable by the query worker when the hostname of the machine is mapped to 127.0.0.1 in the /etc/hosts file. This breaks communication between the search script and query workers in multi-machine deployments, and is easily reproducible on a single machine after migrating orchestration to Docker Compose with isolated networking (in #1178 ).

Root cause

The existing implementation in search.py used socket.gethostbyname_ex(socket.gethostname()) to determine the IP addresses of the host:

ip_list = socket.gethostbyname_ex(socket.gethostname())[2]
if len(ip_list) == 0:
logger.error("Couldn't determine the current host's IP.")
return
host = ip_list[0]
for ip in ip_list:
if ipaddress.ip_address(ip) not in ipaddress.IPv4Network("127.0.0.0/8"):
host = ip
break

When the hostname resolves to 127.0.0.1 via /etc/hosts, only ['127.0.0.1'] is returned, bypassing DNS resolution. As a result, the script listens on localhost, which is unreachable from query workers in separate Docker containers.

This causes the script to exit silently with no results and a 0 exit code.

CLP version

b83e3cf

Environment

root@ASUS-X870E:~# cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
root@ASUS-X870E:~# cat /etc/hostname && echo --- && cat /etc/hosts
ASUS-X870E
---
# This file was automatically generated by WSL. To stop automatic generation of this file, add the following entry to /etc/wsl.conf:
# [network]
# generateHosts = false
127.0.0.1       localhost
127.0.1.1       ASUS-X870E.localdomain  ASUS-X870E

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
root@ASUS-X870E:~# docker run --rm --network host --uts host ubuntu bash -c "cat /etc/hostname && echo --- && cat /etc/hosts"
ASUS-X870E
---
# This file was automatically generated by WSL. To stop automatic generation of this file, add the following entry to /etc/wsl.conf:
# [network]
# generateHosts = false
127.0.0.1       localhost
127.0.1.1       ASUS-X870E.localdomain  ASUS-X870E

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
root@ASUS-X870E:~# docker version
Client:
 Version:           27.5.1
 API version:       1.47
 Go version:        go1.22.2
 Git commit:        27.5.1-0ubuntu3~22.04.2
 Built:             Mon Jun  2 12:18:38 2025
 OS/Arch:           linux/amd64
 Context:           default

Server:
 Engine:
  Version:          27.5.1
  API version:      1.47 (minimum version 1.24)
  Go version:       go1.22.2
  Git commit:       27.5.1-0ubuntu3~22.04.2
  Built:            Mon Jun  2 12:18:38 2025
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.27
  GitCommit:
 runc:
  Version:          1.2.5-0ubuntu1~22.04.1
  GitCommit:
 docker-init:
  Version:          0.19.0
  GitCommit:

Reproduction steps

  1. Ensure when running docker run --rm --network host --uts host ubuntu bash -c "cat /etc/hostname && echo --- && cat /etc/hosts", the hosts file contains an entry that maps 127.0.0.1 to the hostname. The /etc/hosts file is generated by Docker, though the exact generation mechanism is unknown:
    1. Entries in the host machine's /etc/hosts might get picked up, so ensure the host machine has its hostname added in /etc/hosts first. If not, apply one of below tweaks in the native script's docker launch command:
    2. --add-host :127.0.0.1` to specify extra hosts.
    3. OR map the host machine's /etc/hosts to the container.
  2. (With feat(deployment)!: Migrate package orchestration to Docker Compose (resolves #1177); Temporarily remove support for multi-node deployments. #1178 ) Deploy CLP using Docker Compose with isolated container networking (not using host networking). Alternatively, launch the package on another machine with the database listening on 0.0.0.0; then modify the current machine (where search.sh will be launched)'s pacakage db credentials and clp-config to match the CLP package deployment machine.
  3. On the package deployment machine, compress some sample logs: ./sbin/compress.sh ~/samples/hive-24hr
  4. On the current machine, run a search query using the search.py script:
    ./sbin/search.sh "1"
  5. Observe that the search script exits without printing any results or reporting any error.
  6. Check the query worker logs on the package deployment machine in ./var/log/<hostname>/query_worker/* and found the workers complained that unable to connect to the results receiving host.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions