Security Was Already a Mess. Generative AI Is About to Prove It.

I was thinking about some of the points from the Polyglot Conf list of predictions for Gen AI, titled “Second Order Effects of AI Acceleration: 22 Predictions from Polyglot Conference Vancouver“. One thing that stands out to me, and I’m sure many of you have read about the scenario, of misplaced keys, tokens, passwords and usernames, or whatever other security collateral left in a repo. It’s been such an issue orgs like AWS have setup triggers that when they find keys on the internet, they trace back and try to alert their users (i.e. if a user of theirs has stuck account keys in a repo). It’s wild how big of a problem this is.

Once you’ve spent any serious amount of time inside corporate IT, you eventually come to a slightly uncomfortable realization. Exponentially so if you focus on InfoSec or other security related things. Security, broadly speaking, is not in a particularly great state.

That might sound dramatic, but it’s not really. It is the standard modus operandi of corporate IT. The cost of really good security is too high for more corporations to focus where they should and often when some corporations focus on security they’ll often miss the forrest for the trees. There are absolutely teams doing excellent security work, so don’t get the idea I’m saying there aren’t some solid people doing the work to secure systems and environments. There are some organizations that invest heavily in it. There are people in security roles who take the mission extremely seriously and do very good engineering.

A lot of what passes for security is really just a mixture of documentation, policy, and a little bit of obscurity. Systems are complicated enough that people assume things are protected. Access is restricted mostly because people don’t know where to look. Credentials are hidden in configuration files or environment variables that nobody outside the team sees.

And that becomes the de facto security posture.

Not deliberate protection.

Just… quiet obscurity.

I’ve lost count of the number of times I’ve been pulled into a system review, or some troubleshooting session, where a secret shows up in a place it absolutely shouldn’t be. An API key sitting in a script. A database password in a config file. An environment file committed to a repository six months ago that nobody noticed.

That sort of thing happens constantly. Not out of malice. Out of convenience. But now we’ve introduced something new into the environment.

Generative AI.

More importantly though, the agentic tooling built around it. Tooling that literally takes actions on your behalf. Tools that can read entire repositories, analyze logs, scan infrastructure configuration, generate code, and help debug systems in seconds. Tools that engineers increasingly rely on as a kind of external thinking partner while they work through problems.

All that benefit is coming with AI tools. However AI doesn’t care about the secret. It’s just processing text. But the act of pasting it there matters. Because the moment that secret leaves your controlled environment, you no longer know exactly where it goes, how it’s stored, or how long it persists in the LLM.

The mental model a lot of people are using right now is wrong. They treat AI like a scratch pad or an extension of their own thoughts.

It isn’t.

The more accurate model is this: an AI tool is another resource participating in your workflow. Another staff member, effectively.

Except instead of being a person sitting at the desk next to you, it’s a system operated by someone else, running on infrastructure you don’t control, processing information you send to it. Including keys and secrets.

Once you start looking at it that way, a few things become obvious. You wouldn’t casually hand a contractor your production API keys while asking them to help debug something. You wouldn’t drop a full .env file containing service credentials into a conversation with someone who doesn’t actually need those values.

Yet that is exactly the pattern that is quietly emerging with generative AI tools. Especially among new users of said tools! Developers paste configuration files, snippets of infrastructure code, environment variables, connection strings, and logs directly into prompts because it’s the fastest way to get an answer.

It feels harmless. But secrets have a way of spreading through systems once they start moving.

The real issue here is that generative AI doesn’t create security problems. It amplifies the ones that already exist. Problems that the industry has failed (miserably might I add) at solving. If an organization already has sloppy credential management, AI just gives those credentials another place to leak. If engineers already pass secrets around informally to get work done, AI becomes another convenient channel for that behavior.

And because AI tools accelerate everything, they accelerate the consequences too. What used to take hours of searching through documentation can now happen instantly. A repository full of configuration files can be analyzed in seconds. Systems that were once opaque are now far easier to reason about.

The Takeaway (Including secrets!)

The practical takeaway here isn’t that people should stop using AI tools. That’s not realistic and frankly a career limiting maneuver at this point. The tools are genuinely useful and they’re going to become a permanent part of how software gets built.

What needs to change – desperately – is operational discipline.

Secrets should never be treated casually, and that includes interactions with generative systems. API keys, tokens, passwords, certificates, environment files, connection strings—none of those belong in prompts or screenshots or debugging sessions with external tools.

If you need to ask an AI for help, scrub the sensitive pieces first. Replace real values with placeholders. Remove anything that grants access to a system. Setup ignore for the env files and don’t let production env values (or vault values, whatever you’re using) leak into your Generative AI systems.

Treat every AI interaction the same way you would treat a conversation with another engineer outside your organization, or better yet outside the company (or Government, etc) altogether.

But not someone you hand the keys to the kingdom. Don’t give them to your AI tooling.

Security and Authentication in Relational Databases

This is a continuation of my posts on relational databases, started here. Previous posts on this theme, “Data Modeling“, “Let’s Talk About Database Schema“, “The Exasperating Topic of Database Indexes“, “The Keys & Relationships of Relational Databases“, “Relational Database Query Optimization“, “Normalization in Relational Databases“, and “Database (DB) Caching and DB Tertiary Caching“.

Basic Security Architecture

Relational Database Management Systems (RDBMS) like SQL Server, Oracle, MariaDB/MySQL, and PostgreSQL are designed with robust security architectures. The core components typically include:

  1. Authentication: Verifying the identity of users accessing the database.
  2. Authorization: Determining what authenticated users are allowed to do.
  3. Access Control: Implementing permissions and roles to manage user access.
  4. Encryption: Protecting data at rest and in transit.
  5. Auditing and Logging: Tracking access and changes to the database for security and compliance.

Authentication Mechanisms

Authentication is the first line of defense in database security. It ensures that only legitimate users can access the database. The primary methods include:

  • Username and Password: The most common method, where users provide credentials that are verified against the database’s security system.
  • Integrated Security: Leveraging the operating system or network security (like Windows Authentication in SQL Server) for user validation.
  • Certificates and Keys: Using digital certificates or cryptographic keys for more secure authentication.

Database-Specific Security and Authentication

SQL Server

  • Windows Authentication: Integrates with Windows user accounts, providing a seamless and secure authentication mechanism.
  • SQL Server Authentication: Involves creating specific database users and passwords.
  • Roles and Permissions: SQL Server offers fixed server roles, fixed database roles, and user-defined roles for granular access control.
  • SSO Integration: Supports integration with Active Directory for single sign-on capabilities.

Oracle

  • Oracle Database Authentication: Users are authenticated by the database itself.
  • OS Authentication: Oracle can use credentials from the operating system.
  • Roles and Privileges: Oracle has a sophisticated role-based access control system, with predefined roles like DBA and user-defined roles.
  • Oracle Wallet: For storing and managing credentials, enhancing security for automated login processes.

MariaDB/MySQL

  • Standard Authentication: Username and password-based, with the option of using SHA256 for password encryption.
  • Pluggable Authentication: Supports external authentication methods like PAM (Pluggable Authentication Modules) or Windows native authentication.
  • Role-Based Access Control: Introduced in later versions, allowing for more flexible and manageable permissions.
  • SSL/TLS Encryption: For securing connections between the client and the server.

PostgreSQL

  • Role-Based Authentication: PostgreSQL uses roles to handle both authentication and authorization.
  • Password Authentication: Supports MD5 or SCRAM-SHA-256 for password encryption.
  • Peer Authentication: For local connections, relying on OS user credentials.
  • SSO Integration: Can integrate with external authentication systems like Kerberos.
Continue reading “Security and Authentication in Relational Databases”

Emerald City Technology Conversations Premier with Archis Gore

Today I’ve cut the final, the video and audio are getting better and better as I determine how exactly these conversations are best recorded and streamed! Archis and I had a conversation a few weeks back and the premier is going live today at 2:00pm. My plan with these conversations is pretty straight forward, after each recording I’ll do some post-processing on the video and audio and set the conversation to premier on YouTube and Twitch and join the premiers to answer questions in chat and general add more conversation to the conversation.

For more information or come and join me for a conversation, check out this post “Join Me for a Live Stream Conversation on Programming, Infrastructure, Data, Databases, or Your Opinions!“.

Today, enjoy the conversation with Archis and I’ll see you in chat!

DataStax Developer Days

Over the last week I had the privilege and adventure of coming out to Chicago and Dallas to teach about operations and security capabilities of DataStax Enterprise. More about that later in this post, first I’ll elaborate on and answer the following:

  • What is DataStax Developer Day? Why would you want to attend?
  • Where are the current DataStax Developer Day events that have been held, and were future events are going to be held?
  • Possibilities for future events near a city you live in.

What is DataStax Developer Day?

The way we’ve organized this developer day event at DataStax, is focused around the DataStax Enterprise built on Apache Cassandra product, however I have to add the very important note that this isn’t merely just a product pitch type of thing, you can and will learn about distributed databases and systems in a general sense too. We talk about a number of the core principles behind distributed systems such as the pivotally important consistent hash ring, datacenter and racks, gossip, replication, snitches, and more. We feel it’s important that there’s enough theory that comes along with the configuration and features covered to understand who, what, where, why, and how behind the configuration and features too.

The starting point of the day’s course material is based on the idea that one has not worked with or played with a Apache Cassandra or DataStax Enterprise. However we have a number of courses throughout the day that delve into more specific details and advanced topics. There are three specific tracks:

  1. Cassandra Track – this track consists of three workshops: Core Cassandra, Cassandra Data Modeling, and Cassandra Application Development. [more details]
  2. DSE Track – this track consists of three workshops: DataStax Enterprise Search, DataStax Enterprise Analytics, and DataStax Enterprise Graph. [more details]
  3. Bonus Content – This track has two workshops: DataStax Enterprise Overview and DataStax Enterprise Operations and Security.  [more details]

Why would you want to attend?

  • One huge rad awesome reason is that the developer day events are FREE. But really, nothing is ever free right? You’d want to take a day away from the office to join us, so there’s that.
  • You also might want to even stay a little later after the event as we always have a solidly enjoyable happy hour so we can all extend conversations into the evening and talk shop. After all, working with distributed databases, managing data, and all that jazz is honestly pretty enjoyable when you’ve got awesome systems like this to work with, so an extended conversation into the evening is more than worth it!
  • You’ll get a firm basis of knowledge and skillset around the use, management, and more than a few ideas about how Apache Cassandra and DataStax Enterprise can extend your system’s systemic capabilities.
  • You’ll get a chance to go beyond merely the distributed database system of Apache Cassandra itself and delve into graph, what it is and how it works, analytics, and search too. All workshops take a look at the architecture, uses, and what these capabilities will provide your systems.
  • You’ll also have one on one time with DataStax engineers, and other technical members of the team to ask questions, talk about architecture and solutions that you may be working on, or generally discuss any number of graph, analytics, search, or distributed systems related questions.

Where are the current DataStax Developer Day events that have been held, and were future events are going to be held? So far we’ve held events in New York City, Washington DC, Chicago, and Dallas. We’ve got two more events scheduled with one in London, England and one in Paris, France.

Future events? With a number of events completed and a few on the calendar, we’re interested in hearing about future possible locations for events. Where are you located and where might an event of this sort be useful for the community? I can think of a number of cities, but organizing them into order to know where to get something scheduled next is difficult, which is why the team is looking for input. So ping me via @Adron, email, or just send me a quick message from here.

Security, A Rant Sort Of… and a shocker.

Here’s a shocking statement for a lot of people in technology and especially outside of technology.  “Your money is at greater risk because it isn’t in a cloud.”  Here’s another shocker “Your medical information is at greater risk at your on-premises Doctor than if it were stored and protected by access control in the cloud.

Is that shocking to you?  If you aren’t shocked you probably know a lot about cloud technology.  The cloud is more secure than most of the IT Departments, physical server locations, secure Government installation, and other environments than one might imagine.

Why Am I Writing This Blog Entry?

While I was listening to Steve Riley’s talk on AWS Security I started this blog entry.  A few of the questions that were brought up made me realize how little of the physical and platform level security is actually understood.  Even though this was about AWS it also applies to Azure, Google, and other cloud environments and platforms.  After several weeks of studying Azure and several years of working with Cloud type technology at Webtrends this statement shocked me, “A bank or a medical entity wouldn’t put its data in the cloud.”*  I couldn’t help but think that someone posing this statement as a fact (even though I know that it is absolutely not a fact) is sorely misinformed about cloud computing and technology.

Well, I wanted to retort this this statement myself, but Steve handled the question as a rock star presenter would.  But I still want to elaborate on this topic.  Also check my previous blog entry “Your Cloud, My Cloud, Security in the Cloud” (* See Addendum) as I touched on this topic from the vantage point of web analytics.  What we have here is the conversation of data that truly needs to be secure.

Cloud Security – Physical

The cloud environments has physical locations all over the world.  Each of these locations are not advertised or easily located.  They are obfuscated and not listed for the reasons of security.  Once you get to one of these facilities the location has numerous physical security restrictions including; time based access codes, security cards, some have retinal scanners, and the list goes on.  In addition, many of these security methods are used concurrently with others.

In addition to this, people maintaining the cloud technology centers don’t have access to the data.  They do not even know how, nor could someone specifically tell them how to gain access to specific drives or machines that have the data of specific instances without extensive work.  That alone provides an immediate level of security, both for data and physically.  That leads me to this next point.

Data Security in the Cloud

Having data spread across virtualized storage mediums is a step into another realm of security.  For more than just security reasons data is spread across multiple storage locations.  Because of the virtualized nature of this storage the actual data is located in a number of locations that is shared among machines.  These machines are not maintained in relation to these storage points.  The storage points are tracked by the machines, in secure ways, so that only an account can access that data.  In addition to this spread of the data, the storage is actually moved from point to point on machine at various times to maintain uptime and redundancy.  Because of this it also increases the complexity in finding this data by nefarious means.

One final point of physical security for data is that each customer, has completely segmented data stored in separate virtual instances.  This separation is equivalent to two storefront businesses side by side.  They are separated by a physical wall just like the manipulation of data in the cloud.  This is important to grasp on many levels as nobody would question placing one business next to another – entire cities have existed for hundreds of years that way – so can businesses within the cloud.

Security at the Platform Level…

…I wanted to continue on this topic but I’m going to hold off.  Right now for work and personally I’m researching a number of additional security ideas within the cloud.  It includes physical, data, access control and other security principles.  I’ll have that write up for for another day, inclusive of the platform level security.

…as for now, that wraps up this semi-ranting piece.