Skip to content

Feature/instrument chroma client query embeddings#344

Merged
nirga merged 25 commits intotraceloop:mainfrom
paolorechia:feature/instrument-chroma-client-query
Jan 26, 2024
Merged

Feature/instrument chroma client query embeddings#344
nirga merged 25 commits intotraceloop:mainfrom
paolorechia:feature/instrument-chroma-client-query

Conversation

@paolorechia
Copy link
Copy Markdown
Contributor

Hi, this is just a draft PR, not quite ready yet, but figured I would ask a couple of things before proceeding now.

I wanted to first understand what I was doing in the repository, so I worked towards a basic telemetry that pushes the Chroma query embeddings as span attribute. Once this is in good shape, I would clean the code and add the proper unit test.

Here's a first look:

image

As you can notice, it's pretty much unreadable as it is :)

image

Anyway, here goes the questions:

  1. In the issue description, it is mentioned that because this payload is too big, a span attribute might not be a good idea. Are there suggestions on how we can workaround this? The only idea I had was to do something similar to numpy does when printing an array, e.g., just store the first and last elements of the array.
  2. It seemed like it was inevitable to glue back into an private class/method in Chroma to listen to the query embeddings, which is not ideal. What's your opinion on this, is it acceptable to wrap on segment.SegmentAPI._query?
  3. I suppose this is just the first telemetry out of many, since ideally we'd support all vector databases. Would you be OK if I raised one PR per database, rather than one monolithic PR for all databases?

Thanks a lot!

@paolorechia
Copy link
Copy Markdown
Contributor Author

Oh, I just read the issue again and it mentions this: https://opentelemetry.io/docs/concepts/signals/traces/#span-events

I'll take a look at that now, would still be useful to get some feedback on the other questions though.

@paolorechia
Copy link
Copy Markdown
Contributor Author

It seems like adding events instead of attributes is not as straightforward as calling span.add_events(...), as I don't see the events rendered in the frontend.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jan 20, 2024

CLA assistant check
All committers have signed the CLA.

@nirga
Copy link
Copy Markdown
Member

nirga commented Jan 20, 2024

@paolorechia thanks! I'll try to comment on everything here:

  • Yes, span events is the recommended approach. We don't support it yet in our UI cause it's like a chicken-and-egg - we need to figure out the right format and then we can add proper support. There are other observability platforms like SigNoz that just output everything so you should be able to see it there for debugging. You can also enable console debugging
  • The main issue here is figuring out the right systematic way to output these span events so it will be easier to understand where do they belong. This may require some research and looking at other instrumentations that use span events (here are all of them - https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/instrumentation).
  • I think it's ok to patch private classes. We'll just need proper tests to make sure this isn't breaking in future versions.
  • Yes, sure. Makes total sense to separate DBs per PR.

@paolorechia
Copy link
Copy Markdown
Contributor Author

@nirga Thank you very much for the throughout explanation. I'm sorry, but there's an incoming wall of text for you :)

I used the ConsoleExporter to see the payload, the events are indeed there.

So from a "code" perspective, I can already provide something useful as soon I clean up the code.

The tough part left is, as you said, figuring out the best format.

I did a quick search on the link you provided for projects implementing opentelemetry and couldn't find any matches to add_event(, which is how I'm adding events to the trace payload.

If I simply search for event, I found a couple of unit tests that are checking for events:

Reading these unit tests, it seems both packages use events for abnormal cases such as errors or exceptions that were triggered, which seems very different from what we're proposing for traceloop.

It seems like their occurrence is consequence of using this API for recording an exception, which is implemented by the function record_exception. For instance, found this call in celery.

What about Logs?
I looked briefly at OpenTelemetry Logs documentation, and also don't seem to fit well into our use case because it's meant to capture actual log events. For instance, this documentation example shows that it's useful to automatically ingest typical log events:

import logging

from opentelemetry import trace

tracer = trace.get_tracer_provider().get_tracer(__name__)

# Trace context correlation
with tracer.start_as_current_span("foo"):
    # Do something
    current_span = trace.get_current_span()
    current_span.add_event("This is a span event")
    logging.getLogger().error("This is a log message")

Also, the Log API for Python SDK is experimental on OpenTelemetry.

Conclusions So Far

  1. Logging the entire embedding vector into a span attribute yields an unreadeable data in frontend because it's too large.
  2. Log API does not make much sense in our use case.
  3. Events API might make sense, but I couldn't find good examples out there. So we'll have to come up with our own conventions.

Proposal
I still agree that we should probably go with Events API. It seems the simplest way to go for is to just loop over the query embeddings list and add an event for each vector. As you might have seen in the gist above, it currently looks like this (but without the truncation):

    "events": [
        {
            "name": "[-0.07656381279230118, ...,  0.06179634854197502]",
            "timestamp": "2024-01-21T18:44:56.736717Z",
            "attributes": {}
        },
        {
            "name": "[0.040931396186351776,  ..., -0.046594858169555664]",
            "timestamp": "2024-01-21T18:44:56.736929Z",
            "attributes": {}
        }

Now I'd ask you though a few questions:

  1. How would you expect this feature to help the user?
  2. Do we really need to render the full vector as a string here?

Could we instead do the following?

  1. Show only the first and last element like in this example snippet?
  2. Serialize it into a binary format to reduce size?

Anyway, I'll start cleaning up the code from the PR and will add an unit test. Will let you know once it's ready for a real review.

@paolorechia
Copy link
Copy Markdown
Contributor Author

An alternative format could be (to avoid converting the list to string)

    "events": [
        {
            "name": "query_embedding_0",
            "timestamp": "2024-01-21T18:44:56.736717Z",
            "attributes": {
                "0": -0.07656381279230118,
                "1": 0.06179634854197502
             }
        },
        {
            "name": "query_embedding_1",
            "timestamp": "2024-01-21T18:44:56.736929Z",
            "attributes": {
               "0": 0.040931396186351776,
               "1": -0.046594858169555664
            }
        }

@paolorechia
Copy link
Copy Markdown
Contributor Author

Here's an gist for the second proposed format as dict

@paolorechia
Copy link
Copy Markdown
Contributor Author

Apparently there's a limitation in the number of attributes of an event, and it's deleting the first 256 elements. So the second format doesn't seem to work.

@paolorechia
Copy link
Copy Markdown
Contributor Author

Here's format 3

    "events": [
        {
            "name": "query_embeddings_0",
            "timestamp": "2024-01-21T20:11:11.301499Z",
            "attributes": {
                "embeddings": "[-0.02116772159934044, -0.034392599016427994, 0.0058018239215016365, -0.007966593839228153, 0.023765088990330696, -0.031745124608278275, -0.012614404782652855, -0.06414490193128586, 0.020165612921118736, -0.08882293105125427, -0.017861273139715195, -0.06814910471439362, -0.014057088643312454, -0.04547572880983353, 0.05225884169340134, 0.08119108527898788, -0.01767163723707199, -0.04887743294239044, 0.012606356292963028, -0.013008205220103264, -0.006311111617833376, 0.03886748477816582, 0.0580018050968647, 0.10872279107570648, 0.056401580572128296, -0.09888928383588791, -0.062367286533117294, 0.08906786143779755, -0.003973421175032854, 0.0130155673250556, 0.013766610063612461, -0.08719685673713684, -0.018849939107894897, 0.05287815257906914, -0.04795437678694725, 0.018592344596982002, 0.0032591281924396753, 0.06537958234548569, -0.06702830642461777, 0.04129442572593689, 0.07399817556142807, -0.05151423066854477, 0.04187086224555969, 0.00841656792908907, -0.03619099780917168, 0.013703015632927418, -0.003468855516985059, 0.05873265489935875, 0.0006399745470844209, 0.1366841048002243, 0.03454412519931793, -0.051180340349674225, 0.025222616270184517, 0.009749426506459713, 0.1408623307943344, -0.04582144692540169, -0.05453970283269882, 0.05933956801891327, -0.0189070887863636, 0.041336771100759506, 0.008158235810697079, -0.06166510283946991, 0.021563773974776268, 0.058680061250925064, 0.038593072444200516, -0.029848352074623108, 0.05385812371969223, -0.07630191743373871, -0.05035138502717018, 0.0011880375677719712, -0.033094506710767746, -0.05798248574137688, 0.03359077870845795, 0.05653800070285797, 0.04601288214325905, 0.055406488478183746, 0.01185411773622036, -0.06335809081792831, 0.0012864562449976802, 0.0884130448102951, -0.01645367592573166, 0.042720042169094086, 0.031635817140340805, 0.047262225300073624, -0.05914713069796562, -0.034223783761262894, 0.040359824895858765, -0.025106508284807205, -0.05876951664686203, -0.027694320306181908, 0.019040467217564583, -0.03460254520177841, -0.05918479710817337, 0.01678035967051983, -0.0463358536362648, 0.05575210228562355, 0.004100689198821783, 0.012866359204053879, -0.05400336906313896, -0.03429315239191055, -0.06431399285793304, 0.042409516870975494, -0.03721172735095024, -0.08108613640069962, -0.04848033934831619, -0.05714285746216774, 0.026641640812158585, 0.0175885371863842, -0.1126064583659172, 0.014960185624659061, 0.01984730176627636, -0.0723264142870903, 0.026617666706442833, 0.001284027355723083, -0.007888917811214924, 0.046482671052217484, -0.08495210856199265, -0.07271620631217957, -0.0791686624288559, -0.05127553269267082, -0.09453306347131729, -0.050351157784461975, 0.09153080731630325, 0.07441414147615433, -0.01406775414943695, -0.012814581394195557, 0.008186688646674156, -2.550196239529574e-33, 0.041300028562545776, -0.06071057543158531, -0.01504976861178875, 0.0404525101184845, 0.055767517536878586, -0.08132638782262802, -0.05763687565922737, -0.006044602487236261, -0.0542701855301857, -0.02134171687066555, 0.06812343746423721, -0.04240185767412186, 0.05246894061565399, -0.07120763510465622, 0.004225532058626413, -0.04063568264245987, -0.012352949008345604, -0.02392967790365219, -0.01860208809375763, -0.00415491359308362, -0.02353159338235855, -0.03413528949022293, -0.04750525951385498, 0.010172173380851746, -0.031765639781951904, 0.08414986729621887, 0.02144705317914486, 0.025972707197070122, 0.0012261227238923311, 0.0066565764136612415, -0.10672400891780853, -0.01784089393913746, -0.01788044162094593, -0.08224824070930481, -0.027167940512299538, -0.008427068591117859, -0.028093988075852394, -0.019784461706876755, 0.07002786546945572, -0.029444312676787376, -0.049080029129981995, 0.023307733237743378, 0.037553559988737106, -0.08113834261894226, 0.021111419424414635, 0.017380794510245323, -0.03756498172879219, 0.044616296887397766, -0.02021588571369648, -0.035405613481998444, 0.06681309640407562, -0.012360909953713417, 0.029090281575918198, 0.026111498475074768, -0.058952804654836655, -0.007940704002976418, -0.08418276906013489, -0.008791160769760609, -0.0733630582690239, 0.013196403160691261, -0.02187538892030716, -0.07236422598361969, 0.00539324339479208, 0.006821902934461832, -0.031029751524329185, -0.030586281791329384, 0.03250444307923317, -0.07072892785072327, -0.018130192533135414, 0.03186289966106415, 0.038112930953502655, -0.035873644053936005, 0.095953069627285, -0.02694508247077465, 0.00044656661339104176, 0.07757934182882309, 0.07206101715564728, 0.04408591240644455, 0.035360414534807205, 0.10247664153575897, -0.06628499925136566, -0.008462687954306602, -0.06628119200468063, 0.0043883174657821655, 0.01896938495337963, -0.07177954167127609, 0.017673881724476814, -0.05325653776526451, 0.01633145660161972, -0.005883366335183382, 0.04394879937171936, 0.05145736038684845, 0.03457028046250343, -0.034802552312612534, 0.05082659795880318, -1.0038539921739942e-33, -0.00989322829991579, -0.030083367601037025, 0.03563651069998741, -0.1545729637145996, 0.012237811461091042, 0.020480550825595856, 0.043667323887348175, -0.04488461837172508, 0.06599176675081253, 0.003979787230491638, 0.11247958987951279, 0.010472193360328674, -0.05629342794418335, -0.04810982197523117, -0.028754331171512604, -0.058029528707265854, 0.005881939548999071, -0.030452433973550797, 0.006983770523220301, 0.016244152560830116, 0.1156434416770935, 0.09782039374113083, -0.02106589451432228, -0.002723715268075466, -0.0483408123254776, 0.023100990802049637, -0.019015265628695488, 0.09837323427200317, 0.057356055825948715, 0.04899922013282776, -0.013888491317629814, 0.08989647775888443, -0.011580187827348709, 0.06197701767086983, -0.04763265699148178, 0.001992901787161827, -0.047406408935785294, -0.03334582597017288, -0.011319214478135109, 0.003437228500843048, 0.04213413968682289, 0.03559786081314087, -0.062017180025577545, -0.057563357055187225, 0.06403704732656479, 0.0555698424577713, 0.03860729932785034, 0.06643760204315186, 0.040622562170028687, -0.07947804033756256, -0.046062324196100235, 0.08903946727514267, -0.058934710919857025, 0.019603371620178223, 0.023877574130892754, 0.06719093769788742, 0.00343230739235878, -0.006425931118428707, -0.012115007266402245, -0.005697796121239662, -0.08577169477939606, 0.062382396310567856, -0.006197095848619938, 0.02584468573331833, -0.07390893995761871, 0.04225707799196243, 0.036109279841184616, -0.08384167402982712, 0.03854922205209732, 0.015916531905531883, 0.09681839495897293, 0.04665353149175644, -0.059954702854156494, 0.013915347866714, -0.026282193139195442, -0.02587667480111122, -0.006469493266195059, 0.02682449109852314, -0.08243422955274582, 0.03616641089320183, -0.05064128339290619, 0.08353568613529205, -0.020785776898264885, -0.050996601581573486, 0.09417621791362762, 0.027563102543354034, 0.011122552677989006, 0.041904401034116745, -0.08705263584852219, 0.024611329659819603, -0.09531863778829575, -0.015931619331240654, -0.02929595485329628, 0.03589511662721634, -0.13687729835510254, -1.7230993520911397e-08, -0.047877296805381775, -0.02767120860517025, 0.09241931885480881, -0.10821258276700974, -0.029341138899326324, 0.10089649260044098, -0.051551349461078644, 0.05188140645623207, -0.011570358648896217, -0.03168662637472153, 0.04266354441642761, 0.08536266535520554, -0.05078452453017235, -0.004914168268442154, 0.03505035117268562, 0.018760722130537033, -0.011717845685780048, -0.015484253875911236, -0.019486278295516968, -0.008381212130188942, -0.03219609707593918, -0.04253019392490387, 0.0008489838801324368, -0.03290142863988876, -0.08145959675312042, 0.0880403146147728, -0.02264014072716236, 0.06177900359034538, -0.02675464004278183, -0.015017831698060036, 0.10546792298555374, 0.00010207392915617675, 0.09494058042764664, -0.005751856602728367, 0.03723331168293953, 0.013889284804463387, 0.01692614145576954, 0.01585487276315689, 0.008203933946788311, 0.10948973149061203, -0.00419184984639287, -0.03489655256271362, 0.06970660388469696, 0.011053892783820629, -0.04574660584330559, 0.032908301800489426, 0.03487677499651909, -0.05684138461947441, 0.0672527402639389, 0.02163241058588028, 0.032846562564373016, -0.044264256954193115, -0.003992762416601181, -0.031409405171871185, -0.0387728214263916, -0.01166848000138998, -0.03870435059070587, -0.0276800449937582, 0.08444678783416748, 0.05276680737733841, 0.1074284091591835, -0.03799744322896004, 0.1138821542263031, 0.014279186725616455]"
            }
        },
        {
            "name": "query_embeddings_1",
            "timestamp": "2024-01-21T20:11:11.301645Z",
            "attributes": {
                "embeddings": "[0.0997893437743187, 0.022948602214455605, -0.011169560253620148, 0.04117872193455696, 0.015012172050774097, -0.008154126815497875, -0.040258996188640594, 0.05935069918632507, -0.03612041473388672, 0.08318983018398285, 0.05341242253780365, -0.047200512140989304, -0.027220873162150383, -0.022699330002069473, -0.020294658839702606, 0.061525147408246994, 0.055293258279561996, 0.006619568914175034, -0.07162494957447052, 0.05573425441980362, -0.011100545525550842, -0.030885912477970123, 0.04464387521147728, 0.015436605550348759, 0.07097670435905457, -0.01957634463906288, -0.055418044328689575, 0.0409395806491375, 0.03548452630639076, 0.04842040315270424, -0.06822666525840759, 0.051599420607089996, -0.00529356487095356, -0.13064011931419373, -0.08095557242631912, 0.0059019397012889385, 0.04729953035712242, -0.01778480038046837, -0.06561431288719177, -0.006899943575263023, -0.008706382475793362, 0.03769692778587341, -0.12247271835803986, 0.13902010023593903, 0.04004276171326637, -0.004052792210131884, -0.08155430108308792, -0.0011824874673038721, -0.041769839823246, -0.006441215984523296, -0.01908370852470398, -0.028573347255587578, -0.00221846136264503, 0.04510558396577835, -0.010428866371512413, -0.14447642862796783, -0.005930235609412193, -0.0725650042295456, -0.009115641936659813, -0.07977062463760376, -0.004435649141669273, 0.06472418457269669, 0.03658299520611763, -0.035138074308633804, 0.026852352544665337, 0.02250462956726551, 0.0799485445022583, -0.07592343538999557, -0.05005541816353798, 0.0014610874932259321, 0.06110056862235069, -0.0022727237083017826, 0.06145695969462395, 0.11061852425336838, -0.04248820245265961, 0.016958916559815407, 0.0072581572458148, 0.015496050007641315, -0.0065643941052258015, 0.0035091473255306482, -0.017641054466366768, 0.015574407763779163, -0.007524855900555849, 0.04231639951467514, -0.013182743452489376, 0.013051258400082588, -0.004923546686768532, -0.04101457819342613, -0.0900903195142746, -0.016008833423256874, -0.02586738020181656, 0.05397757887840271, -0.02296164073050022, 0.06175468489527702, -0.018003597855567932, 0.07339103519916534, -0.10132711380720139, -0.05835726857185364, -0.09757678955793381, -0.0398850217461586, -0.05257895961403847, -0.01055340375751257, 0.06117485836148262, 0.13510403037071228, -0.017872212454676628, 0.01047473307698965, -0.04420994967222214, -0.015034333802759647, -0.06990381330251694, 0.05960138514637947, -0.04867609590291977, -0.01701764203608036, 0.008776960894465446, 0.11422017961740494, -0.004013977479189634, -0.00582858407869935, 0.08356377482414246, -0.01792655885219574, -0.002648422494530678, 0.04818098992109299, -0.08418523520231247, 0.051113247871398926, -0.09561111778020859, -0.006842063274234533, -0.0021441776771098375, -0.05336717143654823, -0.02014319784939289, 1.2025035375843374e-33, 0.05788170173764229, -0.10865034908056259, 0.06635970622301102, 0.04702013358473778, 0.005239957012236118, -0.03272874653339386, -0.09067631512880325, -0.0664852038025856, 0.07834789901971817, -0.01437671110033989, -0.0464889220893383, -0.11967373639345169, 0.05046040192246437, -0.06776474416255951, -0.014983299188315868, 0.01338113285601139, -0.028854170814156532, 0.04709985852241516, 0.03651491552591324, 0.02057662419974804, -0.09171269088983536, 0.00011649716179817915, -0.04678709805011749, -0.045050933957099915, 0.05225302651524544, 0.07864781469106674, 0.034440141171216965, -0.0407889261841774, -0.09549272805452347, -0.018596358597278595, -0.04121081903576851, -0.019311849027872086, 0.0849977657198906, -0.2126682698726654, -0.044512584805488586, -0.024094676598906517, 0.009602997452020645, 0.09596658498048782, -0.01995079591870308, 0.09156998246908188, -0.08441020548343658, -0.030003445222973824, 0.06585976481437683, 0.019483929499983788, -8.718121534911916e-05, -0.03710867464542389, 0.07222463190555573, -0.0674130991101265, -0.05113456770777702, -0.022796200588345528, 0.03185687214136124, -0.07024770975112915, -0.03128896653652191, 0.019909225404262543, -0.02412145771086216, 0.034116994589567184, -0.02186812274158001, 0.020713631063699722, -0.05207843333482742, -0.002464938210323453, 0.04007623344659805, -0.0013206696603447199, -0.013416601344943047, -0.03578285872936249, -0.017792265862226486, -0.047248028218746185, 0.03235795348882675, -0.0120637072250247, 0.007779013831168413, 0.034577880054712296, 0.03453468531370163, -0.02962951734662056, 8.038306987145916e-05, -0.0249159038066864, 0.00904132891446352, 0.06395066529512405, 0.0817883312702179, 0.0054230582900345325, 0.04218931123614311, -0.013731177896261215, 0.02910746820271015, 0.04045373201370239, 0.07168399542570114, -0.028973514214158058, 0.04604659602046013, 0.01987454853951931, 0.0007684282027184963, 0.06311367452144623, -0.01318366639316082, 0.023197734728455544, 0.04301830008625984, -0.030717194080352783, -0.022356446832418442, 0.024328287690877914, 0.015824390575289726, -2.465283670413957e-33, -0.08943895250558853, 0.007111871615052223, -0.044753290712833405, 0.009091022424399853, -0.017374180257320404, -0.018613263964653015, -0.0034747240133583546, -0.051988549530506134, 0.00735121127218008, 0.06518445163965225, 0.059448957443237305, 0.0305667482316494, 0.002512900158762932, 0.04473946616053581, 0.039475779980421066, 0.040489088743925095, -0.008804863318800926, 0.049400556832551956, 0.028108656406402588, -0.0011522136628627777, 0.04103529453277588, 0.025415364652872086, -0.0669972151517868, 0.08025732636451721, 0.03862803429365158, 0.013297397643327713, -0.04796769097447395, 0.05326379835605621, -0.01414249837398529, -0.08025096356868744, -0.027148175984621048, -0.0014614060055464506, 0.022830745205283165, -0.055063311010599136, 0.021074647083878517, -0.02802189625799656, -0.08820299059152603, 0.10477113723754883, -0.054373327642679214, -0.07502874732017517, 0.033765245229005814, 0.004409633576869965, -0.005116122774779797, 0.00019034311117138714, -0.07367024570703506, -0.06210801750421524, 0.11904768645763397, -0.006371106021106243, 0.08595491200685501, 0.0013043420622125268, 0.020757026970386505, -0.07819365710020065, -0.002912333235144615, 0.12783017754554749, 0.011510351672768593, 0.027679534628987312, 0.03214767947793007, -0.0192237738519907, 0.02236470766365528, 0.055370740592479706, -0.009758103638887405, 0.009881236590445042, -0.003825633553788066, -0.04488564655184746, 0.006579669192433357, 0.013026083819568157, 0.07773709297180176, 0.018052827566862106, -0.01155920047312975, 0.019260182976722717, 0.050944022834300995, 0.005912024062126875, 0.0591074600815773, -0.015576627105474472, -0.0225001722574234, -0.05273643881082535, 0.03933640941977501, 0.12446136772632599, 0.05313074216246605, 0.04717400670051575, -0.01844554767012596, -0.035796795040369034, 0.012923024594783783, -0.04271334782242775, -0.0995861366391182, -0.07042274624109268, -0.0335342176258564, -0.04136381670832634, -0.019310185685753822, 0.05267738550901413, -0.06938282400369644, 0.05837249010801315, -0.05333778262138367, -0.05267792195081711, -0.037437520921230316, -1.9511086080115092e-08, 0.06678164005279541, 0.012061617337167263, -0.0015395471127703786, -0.012131051160395145, 0.005734710022807121, -0.027203088626265526, -0.004066807683557272, 0.06270375102758408, 0.0534822940826416, 0.08690287172794342, -0.06949903070926666, 0.07799918949604034, 0.0023524726275354624, -0.0038765529170632362, -0.07984147220849991, 0.07094661146402359, -0.0067856693640351295, 0.04359535500407219, 0.026922304183244705, -0.028203092515468597, -0.05087830871343613, -0.026181263849139214, 0.03961761295795441, 0.009698685258626938, 0.053713928908109665, -0.04838588088750839, 0.05358177423477173, 0.00019739950948860496, -0.0142503771930933, -0.05036390945315361, 0.04114571586251259, 0.010588202625513077, 0.030864866450428963, -0.022866753861308098, -0.010937424376606941, 0.022535979747772217, 0.06918143481016159, 0.017251599580049515, -0.05735079199075699, -0.01518857292830944, 0.005762364715337753, -0.021266311407089233, 0.0670674666762352, -0.02433491311967373, 0.06812980026006699, -0.08411750942468643, -0.05480976775288582, 0.07060239464044571, 0.03694016858935356, -0.06977853924036026, 0.02373524196445942, -0.07710632681846619, -0.05118044838309288, 0.04847894608974457, -0.027761010453104973, 0.033303745090961456, 0.011466726660728455, -0.020935457199811935, -0.02688666433095932, 0.07309101521968842, 0.011429277248680592, -0.08532138168811798, 0.08636607974767685, 0.023006994277238846]"
            }
        }
    ],

@paolorechia paolorechia changed the title [WIP] Feature/instrument chroma client query Feature/instrument chroma client query embeddings Jan 21, 2024
@paolorechia paolorechia marked this pull request as ready for review January 21, 2024 20:16
@paolorechia
Copy link
Copy Markdown
Contributor Author

@nirga I think this is good for a review, please let me know what you think :)
And sorry for being too verbose here, but this issue needs a lot of discussion.

@nirga
Copy link
Copy Markdown
Member

nirga commented Jan 23, 2024

Thanks @paolorechia this looks really good overall. And I do agree that this requires discussions so thanks so much for this :)
I have 2 questions / thoughts:

  1. Should we also output the result from the query call as span events?
  2. Can we add the span attribute name as a semantic attribute?

@paolorechia
Copy link
Copy Markdown
Contributor Author

paolorechia commented Jan 23, 2024

@nirga regarding question 1., sure, we can easily do it. It also not a bad idea, I’m just thinking now whether it’s better in ‘query’ or ‘segment._query’ function.

I’m not sure I understand the second question. Could you elaborate which name you would like to add and where?

@nirga
Copy link
Copy Markdown
Member

nirga commented Jan 23, 2024

For the 2nd question I meant that maybe we should add query_embeddings as a Semantic Convention

@paolorechia
Copy link
Copy Markdown
Contributor Author

I see, that makes sense! I’m not familiar yet with the Semantic Convention, but I’ll check the source code and give it a try on the requested changes - maybe tonight :)

@paolorechia
Copy link
Copy Markdown
Contributor Author

Hi, @nirga

I'm trying to add a new attribute into the semantic-conventions-ai, but I'm having a hard time testing it locally. If I install the package in dev mode, the module is not imported correctly.

How do you handle it normally? Do you bump the version separately of semantic-conventions-ai? I already opened a PR with the new enum/attribute:

#358

@nirga
Copy link
Copy Markdown
Member

nirga commented Jan 24, 2024

Thanks @paolorechia
Because poetry doesn't support nested dependencies in monorepos, we need to manually release and update the semconv. I've done that now on #358 and will merge it soon so you can rebase here.

@paolorechia
Copy link
Copy Markdown
Contributor Author

@nirga I was able to use the new EventName from semantic conventions ai :)

Now that I'm understanding it, I'm thinking we should add more elements to the semantic-conventions-ai, specially now that we are also returning the query results.

@nirga
Copy link
Copy Markdown
Member

nirga commented Jan 24, 2024

@paolorechia nice :) what did you have in mind?

@paolorechia
Copy link
Copy Markdown
Contributor Author

paolorechia commented Jan 24, 2024

@nirga What do you think of these?

#362

@paolorechia
Copy link
Copy Markdown
Contributor Author

@nirga let me know what you think of the extra semanatic-conventions-ai. I think we're close to finishing this PR :)

Maybe this helps you to visualize, the trace of current version (without the newer proposed conventions):
https://gist.github.com/paolorechia/6688faeabc0ed04faa0ce547d5239b29

I'm also curious regarding the scope of the issue I'm tackling (#250), from the description it seems like we should implement this convention on all vendors:

  • ChromeDB
  • Pinecone
  • Weaviate

What would you see a next step? I saw someone working on Pinecone, so maybe Weaviate (I've never worked with this one before)?

@nirga
Copy link
Copy Markdown
Member

nirga commented Jan 25, 2024

@paolorechia looking into this now. I'd say all vector DBs. No one is working on Pinecone on Python.

@paolorechia
Copy link
Copy Markdown
Contributor Author

@nirga Got it. I guess it was pinecone js then. I will work on it next

@nirga
Copy link
Copy Markdown
Member

nirga commented Jan 25, 2024

Is there a way to get the textual response if the query call?

@paolorechia
Copy link
Copy Markdown
Contributor Author

@nirga Do you mean maybe what is displayed currently as db.chroma.query.result.0.documents.0? e.g., the document that matched the query the best?

@paolorechia
Copy link
Copy Markdown
Contributor Author

paolorechia commented Jan 26, 2024

@nirga I've updated this PR to use the newer semantic conventions we merged yesterday, but I think it's not yet ideal. Here's an example:

            "name": "vector_db.query.result.1",
            "timestamp": "2024-01-26T06:46:10.304852Z",
            "attributes": {
                "vector_db.query.result.ids": [
                    "70490",
                    "4983",
                    "18670"
                ],
                "vector_db.query.result.distances": [
                    1.5630203485488892,
                    1.5724878311157227,
                    1.733376383781433
                ],
                "vector_db.query.result.metadata": [],
                "vector_db.query.result.documents": [
                    "Simplifying likelihood ratios. Likelihood ratios are one of the best measures of diagnostic accuracy, although they are seldom used, because interpreting them requires a calculator to convert back and forth between \u201cprobability\u201d and \u201codds\u201d of disease. This article describes a simpler method of interpreting likelihood ratios, one that avoids calculators, nomograms, and conversions to \u201codds\u201d of disease. Several examples illustrate how the clinician can use this method to refine diagnostic decisions at the bedside.",
                    "Microstructural development of human newborn cerebral white matter assessed in vivo by diffusion tensor magnetic resonance imaging.. Alterations of the architecture of cerebral white matter in the developing human brain can affect cortical development and result in functional disabilities. A line scan diffusion-weighted magnetic resonance imaging (MRI) sequence with diffusion tensor analysis was applied to measure the apparent diffusion coefficient, to calculate relative anisotropy, and to delineate three-dimensional fiber architecture in cerebral white matter in preterm (n = 17) and full-term infants (n = 7). To assess effects of prematurity on cerebral white matter development, early gestation preterm infants (n = 10) were studied a second time at term. In the central white matter the mean apparent diffusion coefficient at 28 wk was high, 1.8 microm2/ms, and decreased toward term to 1.2 microm2/ms. In the posterior limb of the internal capsule, the mean apparent diffusion coefficients at both times were similar (1.2 versus 1.1 microm2/ms). Relative anisotropy was higher the closer birth was to term with greater absolute values in the internal capsule than in the central white matter. Preterm infants at term showed higher mean diffusion coefficients in the central white matter (1.4 +/- 0.24 versus 1.15 +/- 0.09 microm2/ms, p = 0.016) and lower relative anisotropy in both areas compared with full-term infants (white matter, 10.9 +/- 0.6 versus 22.9 +/- 3.0%, p = 0.001; internal capsule, 24.0 +/- 4.44 versus 33.1 +/- 0.6% p = 0.006). Nonmyelinated fibers in the corpus callosum were visible by diffusion tensor MRI as early as 28 wk; full-term and preterm infants at term showed marked differences in white matter fiber organization. The data indicate that quantitative assessment of water diffusion by diffusion tensor MRI provides insight into microstructural development in cerebral white matter in living infants.",
                    "The DNA Methylome of Human Peripheral Blood Mononuclear Cells. DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies."
                ]
            }

You see, the event is named vector_db.query.result.1 and the attribute vector_db.query.result.ids. I think it would be better if the attribute would be named vector_db.query.result.1.ids.

For the time being, I went ahead and used only the new Events(Enum) directly, to implement this change:

            name=f"{Events.VECTOR_DB_QUERY_RESULT.value}.{i}",
            attributes={
                f"{Events.VECTOR_DB_QUERY_RESULT.value}.{i}.ids": tuple_[0] or [],
                f"{Events.VECTOR_DB_QUERY_RESULT.value}.{i}.distances": tuple_[1] or [],
                f"{Events.VECTOR_DB_QUERY_RESULT.value}.{i}.metadata": tuple_[2] or [],
                f"{Events.VECTOR_DB_QUERY_RESULT.value}.{i}.documents": tuple_[3] or [],

Which yields IMO a better final format. Vector embeddings:

    "events": [
        {
            "name": "vector_db.query.embeddings.0",
            "timestamp": "2024-01-26T07:01:20.163930Z",
            "attributes": {
                "vector_db.query.embeddings.0.vector": [
                    -0.0879037007689476,
                    -0.14504265785217285,
                   ...

Vector result event:

"events": [
        {
            "name": "vector_db.query.result.0",
            "timestamp": "2024-01-26T07:01:20.165695Z",
            "attributes": {
                "vector_db.query.result.0.ids": [
                    "33370",
                    "36474",
                    "7912"
                ],
                "vector_db.query.result.0.distances": [
                    1.4396635293960571,
                    1.4754178524017334,
                    1.4771554470062256
                ],
                "vector_db.query.result.0.metadata": [],
                "vector_db.query.result.0.documents": [
                    "Targeting A20 Decreases Glioma St...

Sadly, we would require yet another update to semantic-conventions-ai... I'm sorry for not seeing this before 😞 🙈
I see three options at the moment:

  1. Deleting the EventAttributes enum entirely
  2. Making it contain only the last name of the namespace, e.g., ids, or documents and so forth, like this:
class EventAttributes(Enum):
    # Query Embeddings
    VECTOR = "vector"

    # Query Result
    IDS = "ids"
    DISTANCES = "distances"
    METADATA = "metadata"
    DOCUMENTS = "documents"
  1. Making the semantic conventions indexed by a {i} variable, which must be formatted when generating traces:
class EventAttributes(Enum):
    # Query Embeddings
    VECTOR_DB_QUERY_EMBEDDINGS_VECTOR = "vector_db.query.embeddings.{i}.vector"

    # Query Result
    VECTOR_DB_QUERY_RESULT_IDS = "vector_db.query.result.{i}.ids"
    VECTOR_DB_QUERY_RESULT_DISTANCES = "vector_db.query.result.{i}.distances"
    VECTOR_DB_QUERY_RESULT_METADATA = "vector_db.query.result.{i}.metadata"
    VECTOR_DB_QUERY_RESULT_DOCUMENTS = "vector_db.query.result.{i}.documents"

Again, really sorry for taking so many iterations on this PR to get it polished. I want to make sure this looks good before moving into the next Vector Databases.

What are your thoughts though?

@nirga
Copy link
Copy Markdown
Member

nirga commented Jan 26, 2024

No worries @paolorechia let's update the semconv to be the templated one. I think it's cleanest.

@paolorechia
Copy link
Copy Markdown
Contributor Author

@nirga I think this is ready for another review and maybe merge.

@nirga nirga merged commit e81dada into traceloop:main Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants