Skip to content

Conversation

@pirate
Copy link
Member

@pirate pirate commented Jun 18, 2017

I added full-text search of the wget archives using ag (the silver searcher). A simple Flask app provides the search endpoint to the frontend.

Instructions were added to the readme for how to run the ag search backend, or how to use ./search.py from the CLI.

screen shot 2017-06-18 at 5 36 41 am

@ilvar
Copy link
Contributor

ilvar commented Jun 18, 2017

Hm, not sure. One of the largest advantages of this software is portability and zero configuration. Server with search endpoint definitely breaks these.

@ilvar
Copy link
Contributor

ilvar commented Jun 18, 2017

Also, ag only supports exact matches and regex, hence no full-text search, right?


def search_archive(archive_path, pattern, regex=False):
args = '-gi' if regex else '-Qig'
ag = run(['ag', args, pattern, archive_path], stdout=PIPE, stderr=PIPE, timeout=60)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: confirm this isn't a potential source of shell-injection vulns

@pirate
Copy link
Member Author

pirate commented Jun 18, 2017

You may be right, I'll take a closer look today.

Not sure what you mean by "hence no full-text search", do you mean no full-text fuzzy-search?

@ilvar
Copy link
Contributor

ilvar commented Jun 18, 2017

I mean, no stemming or synonyms, so "programs" won't match "programming". Not sure if this is 100% necessary though.

@pirate
Copy link
Member Author

pirate commented Jun 18, 2017

I've been leaning towards adding a backend for a little while now, because that way we can run a GUI with a submit links page that allows you to upload bookmark exports or link third party sites for constant syncing.

I feel a backend is the natural next step as this grows in complexity, but I do agree that it would be nice to keep things static and backendless until absolutely necessary.

@pirate
Copy link
Member Author

pirate commented Jun 21, 2017

Elasticlunr looks pretty good! I just have to figure out how to serialize a pre-built index and load it into client's browsers (which they don't seem to mention how to do in the docs).

@ilvar
Copy link
Contributor

ilvar commented Jun 21, 2017

@pirate
Copy link
Member Author

pirate commented Jun 21, 2017

Ahh it's in the API reference, thanks, I should've looked there first. I'll mockup an integration with it sometime this week or next.

@pirate
Copy link
Member Author

pirate commented Jun 30, 2017

Hmm one of the nice parts of having a backend is that it'll make the archiving interface a lot easier to use for non-technical people, which is currently a blocker to this becoming more widely used. It would be trivial to add a page where people can upload/manage their export files, and eventually configure the backend with live syncing, etc.

I'm kind of ok with it being a command-line only thing for now, since I don't want to sap away at mozilla's Pocket revenue stream, but at the same time it would be really nice to provide a one-command setup script that lets you add new export files via http://localhost:8086 or something. I'll keep thinking about this.

Been super busy these last few weeks, working on launching a beta for my company, so probably wont have much time to code up elasticlunr just yet.

@ilvar
Copy link
Contributor

ilvar commented Jun 30, 2017

Or maybe keep it unix-way and provide web UI as a separate application? I could be calling script via OS or use it as a library.

@pirate
Copy link
Member Author

pirate commented Jun 30, 2017

Definitely, that was my plan with the Flask backend. Any web UI stuff would be run via ./server.py, but the ./archive.py script is still usable separately on it's own.
The static html index is also be usable without ./server.py. It only hits the backend if you try to search or when you go to the upload page (which I haven't added yet).

@ivar
Copy link

ivar commented Oct 13, 2018

Depending on how complicated you're willing to get, another option that would allow the addition of any number of backend services would be to package up the application as a docker image and only support that. Discourse.org does this and it allows anyone technically proficient enough to setup docker to get a server up and running locally, without having to install the myriad dependencies. ( see https://github.com/discourse/discourse/blob/master/docs/INSTALL.md#why-do-you-only-officially-support-docker )

@pirate
Copy link
Member Author

pirate commented Oct 14, 2018

Yup @ivar good intuition, that's already been on my roadmap for a while: #65 (comment)

I'm just swamped these days with my day job, so improving BA is slow going.

I'm actually almost done with the docker/docker-compose setup which includes BA itself and nginx to serve it, I will definitely add a container for the backend once that's complete too.

@pirate pirate changed the title Add full-text search with ag & Flask Add full-text search Oct 14, 2018
@pirate
Copy link
Member Author

pirate commented Nov 26, 2018

Closing this for now until the django backend is released, then I'll open a new ticket for adding full-text search in the new django app.

@pirate pirate closed this Nov 26, 2018
@jdcaballerov jdcaballerov mentioned this pull request Nov 19, 2020
6 tasks
@pirate pirate deleted the search branch January 6, 2024 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants