307

I want to decode a Base64 encoded string, then store it in my database. If the input is not Base64 encoded, I need to throw an error.

How can I check if a string is Base64 encoded?

4
  • 2
    Why? How can the situation arise? Commented Oct 9, 2015 at 9:25
  • 3
    without specifying which programming language (and/or) Operating System you are targeting, this is a very open question Commented Jan 5, 2016 at 16:32
  • 13
    All that you can determine is that the string contains only characters that are valid for a base64 encoded string. It may not be possible to determine that the string is the base64 encoded version of some data. for example test1234 is a valid base64 encoded string, and when you decode it you will get some bytes. There is no application independent way of concluding that test1234 is not a base64 encoded string. Commented Feb 10, 2016 at 11:56
  • play.golang.org/p/RnEBFCJ9h0 Commented Sep 26, 2019 at 15:29

30 Answers 30

360

You can use the following regular expression to check if a string constitutes a valid base64 encoding:

^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$

In base64 encoding, the character set is [A-Z, a-z, 0-9, and + /]. If the rest length is less than 4, the string is padded with '=' characters.

^([A-Za-z0-9+/]{4})* means the string starts with 0 or more base64 groups.

([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$ means the string ends in one of three forms: [A-Za-z0-9+/]{4}, [A-Za-z0-9+/]{3}= or [A-Za-z0-9+/]{2}==.

Sign up to request clarification or add additional context in comments.

20 Comments

Just wanted to verify so please help with my question : What is the guarantee that this regex will always refers to only base64 string?? If there is any string having no space and it is multiple of 4 characters, then will that string be considered as base64 string????
Then it is a valid base64 string which can be decoded. You could add a minimum length constraint; for example, instead of zero or more repetitions of groups of four, require (say) four or more. It depends on your problem, too; if your users often enter a single word in a language with long words and pure ASCII (Hawaiian?) it's more error-prone than if non-base64 input typically contains spaces, punctuation, etc.
This only tell that an input could have been a b64 encoded value, but it does not tell whether or not the input is actually a b64 encoded value. In other words, abcd will match, but it is not necessarily represent the encoded value of rather just a plain abcd input
Your regexp is incorrect, since it does not match the empty string, with is the base64 encoding of zero-length binary data according to RFC 4648.
@Adomas, "pass" is a perfectly valid base64 string, that decodes into the sequence of bytes 0xa5, 0xab and 0x2c. Why to discard it a priori, if you don't have more context to decide?
|
74

If you are using Java, you can actually use commons-codec library

import org.apache.commons.codec.binary.Base64;

String stringToBeChecked = "...";
boolean isBase64 = Base64.isArrayByteBase64(stringToBeChecked.getBytes());

[UPDATE 1] Deprecation Notice Use instead

Base64.isBase64(value);

   /**
     * Tests a given byte array to see if it contains only valid characters within the Base64 alphabet. Currently the
     * method treats whitespace as valid.
     *
     * @param arrayOctet
     *            byte array to test
     * @return {@code true} if all bytes are valid characters in the Base64 alphabet or if the byte array is empty;
     *         {@code false}, otherwise
     * @deprecated 1.5 Use {@link #isBase64(byte[])}, will be removed in 2.0.
     */
    @Deprecated
    public static boolean isArrayByteBase64(final byte[] arrayOctet) {
        return isBase64(arrayOctet);
    }

9 Comments

from the documentation: isArrayByteBase64(byte[] arrayOctet) Deprecated. 1.5 Use isBase64(byte[]), will be removed in 2.0.
You can use also Base64.isBase64(String base64) instead of converting it to byte array yourself.
Sadly, based on documentation: commons.apache.org/proper/commons-codec/apidocs/org/apache/… : "Tests a given String to see if it contains only valid characters within the Base64 alphabet. Currently the method treats whitespace as valid." This means that this methods has some false positives like "whitespace" or numbers ("0", "1").
This answer is wrong because given stringToBeChecked="some plain text" then it sets boolean isBase64=true even though it's not a Base64 encoded value. Read the source for commons-codec-1.4 Base64.isArrayByteBase64() it only checks that each character in the string is valid to be considered for Base64 encoding and allows white space.
@Ajay, politicalstudent is a valid base64 string, it decodes into the sequence: a6 89 62 b6 27 1a 96 cb 6e 75 e9 ed
|
60

Well you can:

  • Check that the length is a multiple of 4 characters
  • Check that every character is in the set A-Z, a-z, 0-9, +, / except for padding at the end which is 0, 1 or 2 '=' characters

If you're expecting that it will be base64, then you can probably just use whatever library is available on your platform to try to decode it to a byte array, throwing an exception if it's not valid base 64. That depends on your platform, of course.

3 Comments

Parsing differs from validation at least by the fact that it require memory for decoded byte array. So this is not the most effective approach in some cases.
@VictorYarema: I suggested both a validation-only approach (bullet points) and also a parsing approach (after the bullet points).
This gets to be real fun when you're trying to detect base64 encoding in raw email. MIME headers are occassionaly a strange mix of quoted-printable, non-ascii, non-utf8, sort of base64. e.g. =?windows-874?B?M0JCIGUtQmlsbCCgIEEvQyBOby4gNDEwMDQ1Nzg3IOC01825IDAzLzIwMjIgSU5WOjM2NTAzMjYwMzAwNjky?= off the shelf email clients handle all this, without blinking, but it's a mess.
43

As of Java 8, you can simply use java.util.Base64 to try and decode the string:

String someString = "...";
Base64.Decoder decoder = Base64.getDecoder();

try {
    decoder.decode(someString);
} catch(IllegalArgumentException iae) {
    // That string wasn't valid.
}

10 Comments

yes, it's an option, but don't forget that catch is quite expensive operation in Java
That is not the case anymore. Exception handling is performing pretty good. You better not forget that Java Regex is pretty slow. I mean: REALLY SLOW! It's actually faster to decode a Base64 and check that it is (not) working instead of matching the String with the above Regex. I did a rough test and Java Regex matching is around six times slower (!!) than catching an eventual exception on the decode.
With Java 11 (instead of Java 8) the Regex check is even 22 times slower. 🤦 (Because the Base64 decoding got faster.)
Using this approach with string "Commit" will return as a valid value that is just gibberish. So it doesn't seem to be fool proof.
@seunggabi why would it throw on the string "dev"?
|
15

Try like this for PHP5

// Where $json is some data that can be base64 encoded
$json=some_data;

// This will check whether data is base64 encoded or not
if (base64_decode($json, true) == true)
{          
    echo "base64 encoded";          
}
else 
{
    echo "not base64 encoded"; 
}

Use this for PHP7

// $string parameter can be base64 encoded or not
function is_base64_encoded($string) {
    // This will check if $string is base64 encoded and return true, if it is.
    return base64_decode($string, true) !== false;
}

3 Comments

Which language is this? The question was asked without referring to a language
this will not work. read the docs Returns FALSE if input contains character from outside the base64 alphabet. base64_decode
How? if input contains outside character then it is not base64, right?
9
var base64Rejex = /^(?:[A-Z0-9+\/]{4})*(?:[A-Z0-9+\/]{2}==|[A-Z0-9+\/]{3}=|[A-Z0-9+\/]{4})$/i;
var isBase64Valid = base64Rejex.test(base64Data); // base64Data is the base64 string

if (isBase64Valid) {
    // true if base64 formate
    console.log('It is base64');
} else {
    // false if not in base64 formate
    console.log('it is not in base64');
}

Comments

9

It is impossible to check if a string is base64 encoded or not. It is only possible to validate if that string is of a base64 encoded string format, which would mean that it could be a string produced by base64 encoding (to check that, string could be validated against a regexp or a library could be used, many other answers to this question provide good ways to check this, so I won't go into details).

For example, string flow is a valid base64 encoded string. But it is impossible to know if it is just a simple string, an English word flow, or is it base 64 encoded string ~Z0

Comments

8

Try this for Java:

public boolean checkForEncode(String string) {
    String pattern = "^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$";
    Pattern r = Pattern.compile(pattern);
    Matcher m = r.matcher(string);
    return m.find();
}

1 Comment

Thanks, it did the work. Actually am getting prepend following data:image/jpeg;base64, Removed it and working fine.
5

There are many variants of Base64, so consider just determining if your string resembles the varient you expect to handle. As such, you may need to adjust the regex below with respect to the index and padding characters (i.e. +, /, =).

class String
  def resembles_base64?
    self.length % 4 == 0 && self =~ /^[A-Za-z0-9+\/=]+\Z/
  end
end

Usage:

raise 'the string does not resemble Base64' unless my_string.resembles_base64?

Comments

5

Check to see IF the string's length is a multiple of 4. Aftwerwards use this regex to make sure all characters in the string are base64 characters.

\A[a-zA-Z\d\/+]+={,2}\z

If the library you use adds a newline as a way of observing the 76 max chars per line rule, replace them with empty strings.

3 Comments

The link mentioned shows 404. Please check and update.
Sorry @AnkurKumar but that's what happen when people have uncool URLs: they change all the time. I have no idea where it's moved to. I hope you find other useful resources through Google
You can always get old pages from web.archive.org - here's the original url. web.archive.org/web/20120919035911/http://… or I posted the text here: gist.github.com/mika76/d09e2b65159e435e7a4cc5b0299c3e84
3

In Java below code worked for me:

public static boolean isBase64Encoded(String s) {
    String pattern = "^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$";
    Pattern r = Pattern.compile(pattern);
    Matcher m = r.matcher(s);
    return m.find();
}

Comments

2
/^([A-Za-z0-9+\/]{4})*([A-Za-z0-9+\/]{4}|[A-Za-z0-9+\/]{3}=|[A-Za-z0-9+\/]{2}==)$/

this regular expression helped me identify the base64 in my application in rails, I only had one problem, it is that it recognizes the string "errorDescripcion", I generate an error, to solve it just validate the length of a string.

5 Comments

The above regex /^.....$/.match(my_string) gives formatting error by saying 'Unmatched closing )'
And with 'premature end of char-class: /^(([A-Za-z0-9+/' syntax errors.
Nevermind fixed it by adding \ in front of every / character.
errorDescription is a valid base64 string, it decodes into the binary sequence of bytes (in hex): 7a ba e8 ac 37 ac 72 b8 a9 b6 2a 27.
Its worked perfect for me to check base64 encoded string.
2

For Flutter, I tested couple of the above comments and translated that into Dart function as follows:

static bool isBase64(dynamic value) {
    if (value.runtimeType == String) {
        final RegExp rx = RegExp(r'^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$',
            multiLine: true,
            unicode: true,
        );
        return rx.hasMatch(value);
    }
    return false;
}

Comments

1

This works in Python:

import base64

def IsBase64(str):
    try:
        base64.b64decode(str)
        return True
    except Exception as e:
        return False

if IsBase64("ABC"):
    print("ABC is Base64-encoded and its result after decoding is: " + str(base64.b64decode("ABC")).replace("b'", "").replace("'", ""))
else:
    print("ABC is NOT Base64-encoded.")

if IsBase64("QUJD"):
    print("QUJD is Base64-encoded and its result after decoding is: " + str(base64.b64decode("QUJD")).replace("b'", "").replace("'", ""))
else:
    print("QUJD is NOT Base64-encoded.")

Summary: IsBase64("string here") returns true if string here is Base64-encoded, and it returns false if string here was NOT Base64-encoded.

Comments

1

C# This is performing great:

static readonly Regex _base64RegexPattern = new Regex(BASE64_REGEX_STRING, RegexOptions.Compiled);

private const String BASE64_REGEX_STRING = @"^[a-zA-Z0-9\+/]*={0,3}$";

private static bool IsBase64(this String base64String)
{
    var rs = (!string.IsNullOrEmpty(base64String) && !string.IsNullOrWhiteSpace(base64String) && base64String.Length != 0 && base64String.Length % 4 == 0 && !base64String.Contains(" ") && !base64String.Contains("\t") && !base64String.Contains("\r") && !base64String.Contains("\n")) && (base64String.Length % 4 == 0 && _base64RegexPattern.Match(base64String, 0).Success);
    return rs;
}

2 Comments

Console.WriteLine("test".IsBase64()); // true
Recommend to switch programming language to solve a problem is in general not a valid response.
1

In Java,for a Given String you can check if its Base64 string or not, using

Base64.isBase64("<any string>")

Using this dont not required any regex matching.

Comments

1

Throwing in my 2c here, but I believe that this can be refined even further to reduce possible false-positives.

Here is a more elegant solution.

([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]=|[A-Za-z0-9+/][AQgw]==)?

Firstly, It will not be dependent on capturing strings that start with the first character and end on the last in the line.

Secondly you'll note that the second capturing group:

([A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]=|[A-Za-z0-9+/][AQgw]==)?

This group limits the characters that can appear before the = sign.

The characters A, E, I, M, Q, U, Y, c, g, k, o, s, w, 0, 4, and 8 are the only characters that can appear in the last position of a Base64-encoded string before padding with a single =

Furthermore the only viable characters that can appear before a double == are A,Q,gorw.

4 Comments

Nice note, I'm looking for a way to distinguish between b64 and normal strings (without regex - simple algorithms). But need to have a complete one.
@HamzaHajeir I'm not following you, Are you trying to programmatically identify base64 encoded text? Bu tot use regex? Why would you want to do that?
Where regex is not available in some areas, as embedded world.
Added another Answer. This one does not use regex and is written in Python.
0

There is no way to distinct string and base64 encoded, except the string in your system has some specific limitation or identification.

Comments

0

This snippet may be useful when you know the length of the original content (e.g. a checksum). It checks that encoded form has the correct length.

public static boolean isValidBase64( final int initialLength, final String string ) {
  final int padding ;
  final String regexEnd ;
  switch( ( initialLength ) % 3 ) {
    case 1 :
      padding = 2 ;
      regexEnd = "==" ;
      break ;
    case 2 :
      padding = 1 ;
      regexEnd = "=" ;
      break ;
    default :
      padding = 0 ;
      regexEnd = "" ;
  }
  final int encodedLength = ( ( ( initialLength / 3 ) + ( padding > 0 ? 1 : 0 ) ) * 4 ) ;
  final String regex = "[a-zA-Z0-9/\\+]{" + ( encodedLength - padding ) + "}" + regexEnd ;
  return Pattern.compile( regex ).matcher( string ).matches() ;
}

Comments

0

If the RegEx does not work and you know the format style of the original string, you can reverse the logic, by regexing for this format.

For example I work with base64 encoded xml files and just check if the file contains valid xml markup. If it does not I can assume, that it's base64 decoded. This is not very dynamic but works fine for my small application.

Comments

0

Try this using a previously mentioned regex:

String regex = "^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$";
if("TXkgdGVzdCBzdHJpbmc/".matches(regex)){
    System.out.println("it's a Base64");
}

...We can also make a simple validation like, if it has spaces it cannot be Base64:

String myString = "Hello World";
 if(myString.contains(" ")){
   System.out.println("Not B64");
 }else{
    System.out.println("Could be B64 encoded, since it has no spaces");
 }

1 Comment

Ok, could you please give a solution then?
0

if when decoding we get a string with ASCII characters, then the string was not encoded

(RoR) ruby solution:

def encoded?(str)
  Base64.decode64(str.downcase).scan(/[^[:ascii:]]/).count.zero?
end

def decoded?(str)
  Base64.decode64(str.downcase).scan(/[^[:ascii:]]/).count > 0
end

Comments

0
Function Check_If_Base64(ByVal msgFile As String) As Boolean
Dim I As Long
Dim Buffer As String
Dim Car As String

Check_If_Base64 = True

Buffer = Leggi_File(msgFile)
Buffer = Replace(Buffer, vbCrLf, "")
For I = 1 To Len(Buffer)
    Car = Mid(Buffer, I, 1)
    If (Car < "A" Or Car > "Z") _
    And (Car < "a" Or Car > "z") _
    And (Car < "0" Or Car > "9") _
    And (Car <> "+" And Car <> "/" And Car <> "=") Then
        Check_If_Base64 = False
        Exit For
    End If
Next I
End Function
Function Leggi_File(PathAndFileName As String) As String
Dim FF As Integer
FF = FreeFile()
Open PathAndFileName For Binary As #FF
Leggi_File = Input(LOF(FF), #FF)
Close #FF
End Function

Comments

0
import java.util.Base64;

    public static String encodeBase64(String s) {
        return Base64.getEncoder().encodeToString(s.getBytes());
    }

    public static String decodeBase64(String s) {
        try {
            if (isBase64(s)) {
                return new String(Base64.getDecoder().decode(s));
            } else {
                return s;
            }
        } catch (Exception e) {
            return s;
        }
    }

    public static boolean isBase64(String s) {
        String pattern = "^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$";
        Pattern r = Pattern.compile(pattern);
        Matcher m = r.matcher(s);

        return m.find();
    }

Comments

0

For Java flavour I actually use the following regex:

"([A-Za-z0-9+]{4})*([A-Za-z0-9+]{3}=|[A-Za-z0-9+]{2}(==){0,2})?"

This also have the == as optional in some cases.

Best!

Comments

0

This works in Python:

def is_base64(string):
    return len(string) % 4 == 0 and re.test('^[A-Za-z0-9+\/=]+\Z', string)

Comments

0

https://pub.dev/packages/string_validator

https://pub.dev/documentation/string_validator/latest/string_validator/isBase64.html

import 'package:string_validator/string_validator.dart';

// ..

String messageText = // .. 

final isBase64Encoded = messageText.isBase64Encoded;
if (isBase64Encoded) {
  // ..
}

Comments

0

Using simple algorithm where no Regex is available can be done by this line (C++17):

bool checkB64(const std::string_view input) {
    if (input.length() % 4 == 0 && std::all_of(input.begin(), input.end(), 
                        [](const char c) { 
                            return ((c >= 'a' && c <= 'z') || 
                                    (c >= 'A' && c <= 'Z') || 
                                    (c >= '0' && c <= '9') || 
                                    (c == '\\') || 
                                    (c == '+') || 
                                    (c == '='));}))
    {
        // filter by the location of '=' sign.
        if (auto pos = input.find("==="); pos != std::string_view::npos)
            if (pos < input.length() - 3) return false;
        else if (auto pos = input.find("=="); pos != std::string_view::npos)
            if (pos < input.length() - 2) return false;
        else if (auto pos = input.find("="); pos != std::string_view::npos)
            if (pos < input.length() - 1) return false;
        return true;
    }
    return false;
}

Note: It doesn't distinguish between normal strings and B64, as the Regex-based solutions.

Demo: https://onlinegdb.com/_SmRuSkVq

Comments

0

Here's another answer based on the comment from – @Hamza Hajeir This one uses no regex, but does use Python that is available on most devices.

def is_enc(s):
    b64_c = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="
    pad = "="
    p_c = 0
    for ch in reversed(s):
        if ch == pad:
            p_c += 1
        else:
            break
    if p_c > 2 or len(s) % 4 != 0:
        return False
    for ch in s:
        if ch not in b64_c:
            return False
    return True

It can be tested with the below:

enc_str = "SGVsbG8gd29ybGQh"
print("yes" if is_enc(enc_str) else "no")
enc_str = "SGk="
print("yes" if is_enc(enc_str) else "no")
enc_str = "SGk=!"
print("yes" if is_enc(enc_str) else "no")

expected output:

yes
yes
no

Obviously this python is NOT complete as it's missing some code to make it complete. But it is a good starting point.

Items to add to it:
1: removal of = as a valid character not at the end
2: Incorporation of valid characters preceding padding (see https://stackoverflow.com/a/78462814/3979230) 3: ?

Comments

-3

I try to use this, yes this one it's working

^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$

but I added on the condition to check at least the end of the character is =

string.lastIndexOf("=") >= 0

3 Comments

Why check for =: What specification of Base64 are you using? What does end of the character mean, and how does non-negative lastIndexOf() check that?
mostly the return of my base64 character always has = at the end
Not all base 64 encoded strings end with =, for example: rYNltxhaxFAdr3ex8JFFtyCWHNRLCKyPyYei3xo05yHJEXmh3GZQxWm0NSP3tWBkMoIqrHQibfQmYpw-i6TspDJ0M3A1Z1FRWU1wM3V3aGZ1eTViOGJk

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.