photo of meerkats looking at the light

Monitoring Celery Tasks with Sentry

Sentry is a great tool for monitoring celery tasks, and alerting when they fail or don’t run on time. But it requires a bit of work to setup properly. Below is some sample code for setting up sentry monitoring of periodic tasks, followed by an explanation.

import math
import sentry_sdk
from celery import signals
from sentry_sdk import monitor
from sentry_sdk.integrations.celery import CeleryIntegration
@signals.beat_init.connect # if you use beats
@signals.celeryd_init.connect
def init_sentry(**kwargs):
    sentry_sdk.init(
        dsn=...,
        integrations=[
            CeleryIntegration(monitor_beat_tasks=False)
        ]
    )
@signals.worker_shutdown.connect
@signals.task_postrun.connect
def flush_sentry(**kwargs):
    sentry_sdk.flush(timeout=5)
def add_periodic_task(celery, schedule, task):
    max_runtime = math.ceil(schedule * 4 / 60)
    monitor_config = {
        "recovery_threshold": 1,
        "failure_issue_threshold": 10,
        "checkin_margin": max_runtime,
        "max_runtime": max_runtime,
        "schedule": {
            "type": "interval",
            "value": math.ceil(schedule / 60.0)
            "unit": "minute"
        }
    }
    name = task.__name__
    task = monitor(monitor_slug=name, monitor_config=monitor_config)(task)
    celery.add_periodic_task(schedule, celery.task(task).s(), name=name)

Initialize Sentry

The init_sentry function must be called before any tasks start executing. The sentry docs for celery recommend using the celeryd_init signal. And if you use celery beats for periodic task execution, then you also need to initialize on the beat_init signal.

Monitoring Beats Tasks

In this example, I’m setting monitor_beat_tasks=False to show how you can do manual monitoring. monitor_beat_tasks=True is much simpler, and doesn’t require any code like in add_periodic_task. But in my experience, it’s not reliable when using async celery functions. The automatic beats monitoring uses some celery signals that likely don’t get executed correctly under async conditions. But manual monitoring isn’t that hard with a function wrapper, as shown above.

Adding a Periodic Task

The add_periodic_task function takes a Celery instance, a periodic interval in seconds, and a function to execute. This function can be normal or async. It then does the following:

  1. Calculates a max_runtime in minutes, so that sentry knows when a task has gone over time. This is also used for checkin_margin, giving the task plenty of buffer time before an issue is created. You should adjust these according to your needs.
  2. Creates a monitor_config for sentry, specifying the following:
    • schedule in minutes (rounded up, because sentry doesn’t handle schedules in seconds)
    • the number of failures allowed before creating an issue (I put 10, but you should adjust as needed)
    • how many successful checkins are required before the issue is marked as resolved (1 is the default, but adjust as needed)
  3. Wraps the function in the sentry monitor decorator, using the function’s name as the monitor_slug. With default beats monitoring, the slug is set to the full package.module.function path, which can be quite long and becomes hard to scan when you have many tasks.
  4. Schedules the task in celery.

Sentry Flush

While this may not be strictly necessary, calling sentry_sdk.flush on the worker_shutdown and task_postrun signals ensures that events are sent to sentry when a celery task completes.

Monitoring your crons

Once this is all setup and running, you should be able to go to Insights > Crons in your sentry web UI, and see all your celery tasks. Double check your monitor settings to make sure they’re correct, then sit back and relax, while sentry keeps track of how your tasks are running.

green celery on blue background

Async Python Functions with Celery

Celery is a great tool for scheduled function execution in python. You can also use it for running functions in the background asynchronously from your main process. However, it does not support python asyncio. This is a big limitation, because async functions are usually much more I/O efficient, and there are many libraries that provide great async support. And parallel data processing with async.gather becomes impossible in celery without async support.

Celery Async Issues

Unfortunately, based on the current Open status of these issues, celery will not support async functions anytime soon.

But luckily there are two projects that provide async celery support.

AIO Celery

This project is an alternative independent asyncio implementation of Celery

aio-celery “does not depend on the celery codebase”. Instead, it provides a new implementation of the Celery Message Protocol that enables asyncio tasks and workers.

It is written completely from scratch as a thin wrapper around aio-pika (which is an asynchronous RabbitMQ python driver) and it has no other dependencies

It is actively developed, and seems like a great celery alternative. But there are some downsides:

  1. “Only RabbitMQ as a message broker” means you cannot use any other broker such as Redis
  2. “Only Redis as a result backend” means you can’t store results in any other database
  3. “Complete feature parity with upstream Celery project is not the goal”, so there may be features from celery you want that are not present in aio-celery

Celery AIO Pool

celery-aio-pool provides a custom worker pool implementation that works with celery 5.3+. Unlike aio-celery, you can keep using your existing celery implementation. All you have to do to get async task support in celery is:

  1. Start your celery worker with this environment variable: CELERY_CUSTOM_WORKER_POOL='celery_aio_pool.pool:AsyncIOPool'
  2. Run the celery worker process with --pool=custom

So your worker command will look like

CELERY_CUSTOM_WORKER_POOL='celery_aio_pool.pool:AsyncIOPool' celery worker --pool=custom

plus whatever other arguments or environment variables you need. Once you have this in place, you can start using async functions as celery tasks.

While celery-aio-pool is not as actively developed, it works, and has the following benefits:

  • Simple to install and configure with Celery >= 5.3
  • Works with any celery support message broker or result backend
  • Works with your existing celery setup without requiring any other changes
burned matches on white background in minimalist style

Python Async Gather in Batches

Python’s asyncio.gather function is great for I/O bound parallel processing. There’s a simple utility function I like to use that I call gather_in_batches:

async def gather_in_batches(tasks, batch_size=100, return_exceptions=False):
    for i in range(0, len(tasks), batch_size):
        batch = tasks[i:i+batch_size]
        for result in await asyncio.gather(*batch, return_exceptions=return_exceptions):
            yield result

The way you use it is

  1. Generate a list of tasks
  2. Gather your results

Here’s some simple sample code to demonstrate:

tasks = [process_async(obj) for obj in objects]
return [result async for result in gather_in_batches(tasks)]

objects could be all sorts of things:

  • records from a database
  • urls to scrape
  • filenames to read

And process_async is an async function that would just do whatever processing you need to do on that object. Assuming it is mostly I/O bound, then this is very simple and effective method to process data in parallel, without getting into threads, multi-processing, greenlets, or any other method.

You’ll need to experiment to figure out what the optimal batch_size is for your use case. And unless you don’t care about errors, you should set return_exceptions=True, then check if isinstance(result, Exception) to do proper error handling.

Optimize Youtube Embeds with Facades

Embedding Youtube videos on your website can be very easy, but the typical ways of doing it can slow down your page load time. If you want to ensure the best user experience, then you can improve your page speed performance by lazy loading Youtube videos with a facade. Here’s how to do it using lite-youtube.

1. Add the Script

First, add the following script to your web page footer:

<script type="module" src="https://cdn.jsdelivr.net/npm/@justinribeiro/lite-youtube@1.5.0/lite-youtube.js" async></script>

2. Replace YouTube Embeds

For each YouTube video on your page, replace the standard YouTube embed code with the following custom HTML:

<lite-youtube videoid="VIDEO_ID" videotitle="VIDEO_TITLE" width="WIDTH" height="HEIGHT">
<a class="lite-youtube-fallback" href="https://www.youtube.com/watch?v=VIDEO_ID">Watch "VIDEO_TITLE" on YouTube</a>
</lite-youtube>

The HTML link inside the lite-youtube element is a fallback, that will only be displayed when someone’s browser can’t run the lite-youtube script.

3. Customize for Each Video

Make sure to update the following attributes for each video:

  • videoid: The unique identifier of the YouTube video, a series of random letters & numbers in the embed link
  • videotitle: The title of the video
  • width: The width of the video player
  • height: The height of the video player
  • href: The full YouTube URL of the video
  • Fallback text: Update the text inside the <a> tag

4. Determine Video Size

To determine the correct width and height for each video:

  1. Open your browser’s developer tools
  2. Inspect the current YouTube embed
  3. Look for the computed style to find the exact dimensions

That’s it! Now when your page loads, the videos will initially be rendered as images. The embedded Youtube player won’t load until you click the image. This method eliminates a lot of javascript execution and load time, improving your page speed performance and SEO.

seo audit white blocks on brown wooden surface

Programmatic SEO: A Case Study

I recently worked with a client at the beginning of their SEO strategy. This case study outlines our analysis and recommendations to improve their search visibility using programmatic SEO techniques. I wrote up my general recommendations for anyone interested in Programmatic SEO here.

Initial Assessment

The client had a significant amount of unique content, but it was not being indexed properly due to a number of technical issues. Some of the problems we identified:

  1. Most of the content lived on a subdomain, not the primary domain – not a huge issue, if everything else is good
  2. The primary domain had pages with redirects to the subdomain, but they were 302 instead of 301 redirects
  3. Neither the subdomain nor the primary domain had a robots.txt or sitemap.xml
  4. Page content was largely being loaded through React Javascript calls, instead of being pre-rendered on the page, leading to various page speed issues
  5. No directory index pages
  6. No inter-linking across pages

I showed them the Zapier app directory, which is an excellent example of how to generate many thousands of pages with unique content, as well as directory index pages for browsing the content. One interesting question they had was “are more pages better?”. The answer is: yes, more pages are better, but only if each page has mostly unique content. All the pages can follow the same template structure, but if you diff across pages, there should be plenty of significant content differences.

Recommendations

Based on our findings, we figured out a set of prioritized recommendations, starting with the easiest wins:

  1. Implement robots.txt and sitemap.xml: This is the first step to guide search engines on how to crawl and index the site. Both the top-level domain (TLD) and subdomains need these files. A simple robots.txt that allows all user agents is fine to start with, as long as there’s something. And you can also link to the sitemap.xml in the robots.txt.
  2. Switch to 301 Redirects: We advised the client to use 301 redirects instead of 302 to ensure that link equity is passed correctly, which is crucial for SEO.
  3. Optimize On-Page Elements: Titles, meta descriptions, and H1 tags should be optimized to highlight the unique keywords on each page. Each page should have a single H1 tag that closely matches the page title.
  4. Use SEO Tools: Setup Google Search Console, Bing Webmaster Tools, and an SEO tool like ahrefs to see how the content is being indexed and how well it is performing. Most people don’t realize that Bing is the backend search index for most alternative search engines like duckduckgo.com, so it’s important to factor them in to improve your reach.
  5. Generate Directory Index Pages: Create index pages from top-level category items to enhance navigation and SEO.
  6. Inter-Linking Across Content: Create links between related pages using categories, or any other content relationships.
  7. Static HTML Generation: Instead of loading content through React in real-time, generating static HTML pages can greatly improve load times and indexing quality.
  8. Load Balancer Implementation: By putting a load balancer in front of the main domain, they can then point a specific path to the subdomain, thus eliminating the need for a visibly separate subdomain and 301 redirects.
  9. Mobile Optimization: Ensuring the site is optimized for mobile rendering is crucial for improving search ranking. All search engines now prioritize mobile pages and mobile rendering.

Conclusion

Programmatic SEO can provide substantial improvements in search visibility and user experience when implemented correctly. Generating thousands of pages programmatically is a great way to scale your content and visibility. But it’s crucial to get the technical implementation correct, otherwise the search engines will not properly index the content, and your pages just won’t show up in search results.

If you’d like some specialized help on your programmatic SEO strategy, feel free to reach out to me at Streamhacker Technologies.

Salt Recipe for Creating a MySQL User with Grants for Scalyr

Salt is a great tool for managing the configuration of many servers. And when you have many servers, you should also be monitoring them with a tool like Dataset (aka Scalyr). The scalyr agent can monitor many things, but in this example, I’m going to show you how to create a MySQL user for the scalyr agent with just the right amount of permissions.

Salt Formula

{% set scalyr_user = salt['pillar.get']('scalyr:mysql:user', 'scalyr-agent-monitor') %}
mysql_scalyr_user:
  mysql_user.present:
    # - host: localhost
    - name: {{ scalyr_user }}
    - password: {{ pillar['scalyr']['mysql']['password'] }}
  mysql_grants.present:
    - grant: 'process, replication client'
    - database: '*.*'
    # - host: localhost
    - user: {{ scalyr_user }}
    - require:
      - mysql_user: {{ scalyr_user }}

Salt uses yaml with jinja templating to define states. This template does the following:

  1. Creates a MySQL user for scalyr
  2. Grants permissions for that scalyr user to access MySQL process & replication metrics on all databases

You can view the full range of options for the mysql_user and mysql_grants states if you need to customize it more.

Pillar Configuration

The above salt recipe requires a corresponding pillar configuration that looks like this:

scalyr:
  mysql:
    user: scalyr-agent-monitor
    password: RANDOM_PASSWORD

Scalyr Agent Configuration

Then in your scalyr agent JSON, you can use a template like this:

{
  logs: [{
    path: "/var/log/mysql/error.log",
    attributes: {parser: "mysql_error"}
  }, {
    path: "/var/log/mysql/slow.log",
    attributes: {parser: "mysql_slow"}
  }],
  monitors: [{
    module: "scalyr_agent.builtin_monitors.mysql_monitor",
    database_username: "{{ salt['pillar.get']('scalyr:mysql:user') }}",
    database_password: "{{ salt['pillar.get']('scalyr:mysql:password') }}",
    database_socket: "/var/run/mysqld/mysqld.sock"
  }]
}

How to use it

If you’re already familiar with salt, then hopefully this all makes sense. Let’s say you named your state mysql_user in a scalyr state directory. Then you could apply it like this:

salt SERVER_ID state.sls scalyr.mysql_user

And now you have a MySQL user just for scalyr. This same idea can likely be applied to any other MySQL monitoring program.

If you’d like some help automating your server configuration and monitoring using tools and formulas like this, contact us at Streamhacker Technologies.

A Quick Simple Way to Download All the Images on a Page

You don’t need to write a web scraper to do this, just some simple code and standard linux/unix commands.

  1. Open the page in your web browser
  2. Open the Developer Tools
  3. Paste in the following Javascript
var images = document.getElementsByTagName('img'); 
var srcList = [];
for(var i = 0; i < images.length; i++) {
    srcList.push(images[i].src.split('?', 1)[0]);
}
srcList.join('\n');
  1. Create a folder to store the images
  2. Copy the text output from above into a file & save as images.txt in your folder
  3. Inspect images.txt to make sure it looks right
  4. Run the following commands in a terminal, from your folder
cat images.txt | sort | uniq > uniq_images.txt
wget -i uniq_images.txt

Now all the images from that page should be in the folder you created.

woman writing on whiteboard

Developing an Etsy App – Getting Started

I’m working on an Etsy app for some client work (an Etsy listing scheduler) and just getting started is quite a process. So I’m documenting it here for anyone else that may be interested in creating an app for Etsy.

Step 1: Have a Real Etsy Account

To begin, you’ll need a real Etsy account. If you don’t have one already, sign up for an account on the Etsy website.

Step 2: Create a Test Etsy Store

Next, create a test Etsy store that is real enough for testing purposes. You need to create a real listing, even if it’s a digital item that is some throw-away photo. You also need to connect a real bank account to receive payouts. This Etsy store will be your test environment for developing your app.

Step 3: Set Store to Developer Mode

To ensure that your listings are not visible in Etsy’s search, set your store to Developer mode. Only do this if you’re working through your own personal account. Do not do this for a real Etsy shop.

Step 4: Create a Webpage

Create a webpage for your Etsy app. This will serve as the main interface for users to interact with your app. But for now, it’s really for the Etsy app approval team, so they can learn about the purpose of your app.

Step 5: Review Etsy’s Terms and Conditions

Before proceeding further, carefully review Etsy’s terms and conditions. A specific restrictions they have is that you cannot use the term “Etsy” in the name of your app or the title/heading of your website. You should also include the following text on your website: The term ‘Etsy’ is a trademark of Etsy, Inc. This application uses the Etsy API but is not endorsed or certified by Etsy, Inc.

Step 6: Register for the Etsy API

Register a new Etsy app. This app will be directly tied to your Etsy account, which you created in Step 1. You must also agree to their API Testing Policy.

Step 7: Contact Etsy Developer Support

Reach out to the Etsy Developer Support team by emailing developers@etsy.com. This shouldn’t be necessary since you registered in the previous step, but if you actually want to get a response and your app approved, you need to email them. Someone will review your website and registration to ensure it complies with their terms before approving your app.

Step 8: Start with Personal Token

Initially you will only get a personal API token, which means you can only interact with your own store through the API. This will allow you to test and iterate on your app’s functionality. You’ll need to actually create an initial version of your app before requesting commercial access.

Step 9: Provisional Users

As you progress with your app development, you may want to test your app on other stores. You can add provisional users with a special API.

Step 10: Create Material for Etsy Review

Prepare all the necessary materials, such as documentation and screenshots, for Etsy’s commercial API review process. This might include OAuth permissions required, and API calls that your app makes. These materials will help Etsy understand and evaluate your app.

Step 11: Request Commercial Access

Once your app is ready for wider usage, request commercial access from Etsy. This will allow anyone to authenticate and use your app via OAuth, once you’re approved.

charts on black wooden table

Sentiment Analysis Survey 2023

Do you use a sentiment analysis API or tool? Or have you considered using one but couldn’t find one that fit your needs?

If the answer to either of those questions is yes, then please fill out this Sentiment Analysis Survey. I’d like to learn more and hear directly from users and customers of sentiment analysis tools. Thanks for your time.

shopping business money pay

5 Lesser Known Risk Factors in Payment Fraud

When you’re analyzing payments to determine if they are fraudulent, what should you look for? Stripe Radar is great at blocking the more obvious fraudulent payments, and allowing the payments that are clearly not fraud, but what about the payments that are in between? There are a number of less obvious factors you can look at to determine whether a payment is fraud.

How to Decide if a Payment Under Review is Fraudulent

Here are 5 lesser known factors we’ve identified when working with clients of Streamhacker Technologies. We’ll describe each of these in more detail below

  1. History of adding & removing cards
  2. Specific fraud insights
  3. Fast plan upgrades
  4. Lack of product usage
  5. Multiple IPs and payment attempts

While this article uses examples from Stripe, these factors can apply to almost any payment platform.

History of adding & removing cards

When a customer uses multiples cards to make payments over a relatively short period of time, that’s a big warning sign of card testing. When combined with fast plan upgrades, multiple IPs, and lack of product usage, then you can be confident it’s fraud.

Much of the time, Stripe will show this behavior in the Related Payments section of a charge. You can see an example here.

Related credit card fraud payments in stripe

However, sometimes you need to go into the customer profile to get the full picture. In the Recent Activity section, you can see if the customer added a new card. Here’s an example of what it looks like when someone changes cards within ~1 day of signing up.

Recent stripe customer activity showing credit card changes

On its own, this is suspect but not necessarily fraud. However, if there’s more than 2 cards, that’s quite suspect. Also very suspect if the cards come from multiple countries. If you click on Show details for any of the cards, you can see the countries.

Credit card change history in stripe showing 2 different countries

Above you can see 2 different cards from 2 different countries. And in this case, the customer’s IP address was in a third country. Very suspicious behavior.

Specific Fraud Insights

On a Stripe charge payment, there’s a Fraud Insights button that shows you various fraud factors. Three that we’ve found to be useful are shown below.

Stripe fraud insights showing authorization rates for email

A low authorization rate and more than 0 declines associated with the customer’s email are significant fraud indicators. The name-email similarity match is a small additional indicator on top. These insights are most useful when combined with the other indicators discussed here.

Fast Plan Upgrades

A “fast plan upgrade” is when someone subscribes to the lowest plan of your service, then upgrades to one of your highest plans within a few minutes. This may be another form of card testing. Maybe your lowest plan is $10 and your highest plan is $100 – those are very different purchase amounts, and a card tester may want to find out if the card that works at a low amount can also be used for larger purchases. If the first upgrade attempt fails, and they switch cards to try again, fraud risk looks a lot more likely. These related payments show an example of this exact behavior.

Related credit card fraud payments showing fast plan upgrades

Here’s what happened:

  1. Attempted to purchase low level plan at $10, but that failed
  2. Switched cards and tried again, Stripe risk score was still 0
  3. One minute later, successfully upgraded to a higher plan, and got a risk score of 47, which Stripe still considers “normal”
  4. 1 day later, tried to upgrade again to an even higher plan, but that failed with a higher risk score

Note: 2 payments are showing as Refunded because they were successful until being refunded as fraud.

Lack of Product Usage

If a new customer doesn’t use your product much right away, that’s ok. But if they also change cards and/or try to upgrade plans without using your product at all, that’s suspicious. In Streamlining Stripe Reviews with Webhooks and Zapier I described how we helped a client highlight product usage metrics as part of their Stripe review process. Getting some product usage metrics into your Stripe charge metadata is very useful for fraud analysis, so you can quickly look at all your risk factors in one place.

Multiple IPs and Payment Attempts

Many people use VPNs and proxy servers for very legitimate reasons. And sometimes people are traveling. Just because the credit card country doesn’t match the IP country, or there’s a low authorization rate for an IP address, that doesn’t necessarily mean a payment is fraud. But when the IP address of a customer changes over a short period of time, and they make multiple payment attempts from multiple IP addresses, that’s unusual. Stripe’s Related Payments section helps to show this kind of behavior.

Conclusion

Deciding whether a payment is fraud can be tricky, and is not always obvious. But there are risk indicators you can look for, and when you see multiple indicators together, you can be more confident in a fraud assessment. Conversely, if you only see one of these indicators, then a payment likely isn’t fraud. Whatever your assessment is, take detailed notes. Stripe’s charge UI has a nice feature where you can leave a note for future reference – be sure to use this so you have a history of why you made a decision, and can revisit these decisions in the future, when you have more information.

Here are some helpful links from Stripe on identifying and preventing payment fraud:

If you think your team or company needs help managing payment fraud, contact Streamhacker Technologies to see what we can do for you.