Mastering Substring Extraction in COBOL with the SUBSTR Function

As an experienced COBOL developer who has worked on various enterprise systems over the past 15 years, substring manipulation is an essential aspect I utilize daily. The versatile SUBSTR function is the perfect tool for tackling complex string parsing needs. In this comprehensive guide, I will impart my substring expertise to help fellow COBOL programmers thoroughly maximize their skills.

Overview of SUBSTR Capabilities

The COBOL SUBSTR function enables extracting a portion of a string into a separate substring variable. Its format is:

SUBSTR(source-string, start-position, substring-length)

Where:

source-string is the original COBOL string variable
start-position is the 1-based starting index of characters to copy
substring-length defines total characters to extract

Consider this example string:

01 STR PIC X(20) VALUE "This is my example string".

We can utilize SUBSTR to extract portions of this string:

SUBSTR(STR, 1, 4)         > "This"
SUBSTR(STR, 5, 2)         > "is" 
SUBSTR(STR, 10, 7)        > "my exam"

The function copies the requested substring into a new string target based on our specified start position and length.

Key Benefits:

Simple and self-contained for substring extraction
Supports variables for dynamic start position and length
Automatically pads results with spaces if substring exceeds source length

Combining SUBSTR with other string functions like INSPECT and INDEX enables parsing and transforming string data in powerful ways, as we will explore through practical examples.

Real-World Example – Parsing Addresses

To demonstrate SUBSTR‘s capabilities, let‘s walk through a common real-world use case – parsing postal addresses. In a typical customer database we may store addresses like:

01 FULL-ADDRESS      PIC X(50)  
                   VALUE "123 Main St, City, ST 12345".

01 STREET-NUM        PIC 9(5).  
01 STREET-NAME       PIC X(20).
01 CITY-NAME         PIC X(20).
01 STATE-CODE        PIC XX.
01 ZIP-CODE          PIC 9(5).

We need to split this full address into its constituent parts for processing. COBOL‘s STRING and UNSTRING statements can accomplish this, but require lengthy code and interim fields.

Instead, we can leverage SUBSTR and INSPECT for a cleaner solution:

INSPECT FULL-ADDRESS TALLYING WS-COMMAS FOR ALL ","

COMPUTE STREET-NUM = FUNCTION NUMVAL(SUBSTR(FULL-ADDRESS, 1, 
                                             INDEX(FULL-ADDRESS, " ") - 1))
COMPUTE STREET-NAME = SUBSTR(FULL-ADDRESS, 
                              INDEX(FULL-ADDRESS, " ") + 1,
                              WS-COMMAS(1) - INDEX(FULL-ADDRESS, " "))

COMPUTE CITY-NAME = SUBSTR(FULL-ADDRESS, 
                          WS-COMMAS(1) + 2, 
                          WS-COMMAS(2) - WS-COMMAS(1) - 2)

COMPUTE STATE-CODE = SUBSTR(FULL-ADDRESS, 
                          WS-COMMAS(2) + 1, 2)

COMPUTE ZIP-CODE = FUNCTION NUMVAL(SUBSTR(FULL-ADDRESS, 
                                 WS-COMMAS(2) + 4, 
                                 LENGTH(FULL-ADDRESS)))

Let‘s analyze what‘s happening:

INSPECT tallies total commas for later use
STREET-NUM extracted via position + NUMVAL conversion
STREET-NAME uses INDEX, comma positions, and SUBSTR
CITY-NAME parsed between comma positions
STATE-CODE grabs 2 character state abbreviation
ZIP-CODE numerally converted after second comma

Executing this logic would properly split our address into its key elements for further use, such as inserting into the proper database tables.

While there are other possible solutions without SUBSTR, none achieve this degree of brevity and readability. By leveraging COBOL’s native string manipulation capabilities, we avoid unnecessary complexity.

Handling Dynamic Length Values with Variable Subscripts

A key SUBSTR benefit is supporting variables for start position and substring length instead of just literals. Consider a scenario where we extract file records having dynamic-length name values:

01 NAME-RECORD.
   05 NAME-LENGTH      PIC 9(3). 
   05 NAME-VALUE       PIC X(100).

01 FIRST-NAME         PIC X(25).
01 MIDDLE-INITIAL     PIC A.  
01 LAST-NAME          PIC X(25).

We need to parse the name from NAME-RECORD into standard format with first/middle/last. If this was fixed-length we could use literals, but since NAME-VALUE varies we must use variables:

MOVE NAME-LENGTH TO WS-NAME-LENGTH

COMPUTE WS-FIRST-SPACE = INDEX(NAME-VALUE, " ")  

IF WS-FIRST-SPACE > 0
   COMPUTE FIRST-NAME = SUBSTR(NAME-VALUE, 1, 
                                WS-FIRST-SPACE - 1)

   IF WS-NAME-LENGTH > WS-FIRST-SPACE + 3   
      COMPUTE WS-SECOND-SPACE = 
                INDEX(NAME-VALUE, " ", WS-FIRST-SPACE + 1)

      IF WS-SECOND-SPACE > 0                    
         COMPUTE MIDDLE-INITIAL = 
                  SUBSTR(NAME-VALUE, WS-FIRST-SPACE + 1, 1)

         COMPUTE LAST-NAME = SUBSTR(NAME-VALUE, 
                                    WS-SECOND-SPACE + 1,
                                    WS-NAME-LENGTH - WS-SECOND-SPACE)
      ELSE
         MOVE SPACES TO MIDDLE-INITIAL
         COMPUTE LAST-NAME = SUBSTR(NAME-VALUE, 
                                  WS-FIRST-SPACE + 2,
                                  WS-NAME-LENGTH - WS-FIRST-SPACE - 1)
      END-IF

   ELSE
      MOVE SPACES TO MIDDLE-INITIAL
      COMPUTE LAST-NAME = SUBSTR(NAME-VALUE, 
                                WS-FIRST-SPACE + 2, 
                                WS-NAME-LENGTH - WS-FIRST-SPACE - 1)
   END-IF

ELSE
   DISPLAY "INVALID NAME VALUE" LINE 1 POSITION 1 CRT STATUS IS ERROR
END-IF

While this may seem complex, by using SUBSTR variables instead of fixed indexes, this will properly parse a name value of ANY VALID length into standard first/middle/last components without additional code changes.

This substring flexibility enables handling dynamic data in a reusable way not feasible with alternatives like reference modification.

Handling Edge Cases and Exceptions

Another recommendation from numerous substring issues over the years – always validate input data and handle exceptions. Some common cases I regularly account for:

1. Source string too short

Check length before extracting substr, display warning if inadequate.

2. Invalid string format

If my substring logic relies on delimiters being present ensure format first.

3. Numeric conversion failures

String contains non-numeric where digits expected – gracefully handle errors.

4. Upper/lower case mismatches

Routinely down-convert if case matters for comparison.

Here is an example checking length first:

EVALUATE-FULL-NAME.
   IF LENGTH(FULL-NAME) > 25
      * Do complex parsing logic here...
   ELSE
      DISPLAY "Invalid name length " WITH LENGTH(FULL-NAME) LINE 5
      ...handle error...
   END-IF 
END-EVALUATE-FULL-NAME.

Proactively coding defensive substring handling will save enormous headaches compared to letting exceptions crash programs later down the line!

Conclusion

The versatile SUBSTR function is a vital instrument in every COBOL programmer‘s toolkit for manipulating string data. Properly leveraging SUBSTR can greatly simplify the parsing and extraction of substrings from larger character values.

Chaining SUBSTR with other text functions such as INSPECT and INDEX enables cleanly handling complex parsing needs such as addresses and names with dynamic lengths.

However, SUBSTR logic can also quickly become convoluted. Following best practices around organization, validation, and exception handling is critical for maintainable substring code.

With robust defensive coding and well-structured logic, SUBSTR can empower building COBOL programs to skillfully handle substring operations at an enterprise level.

I hope this guide sharing my extensive hands-on COBOL substring experience provides deep insight on maximizing this function in your own development. Please comment below on any questions or how you have leveraged SUBSTR!

Mastering Substring Extraction in COBOL with the SUBSTR Function

Overview of SUBSTR Capabilities

Real-World Example – Parsing Addresses

Handling Dynamic Length Values with Variable Subscripts

Recommended Practices for Readability

Handling Edge Cases and Exceptions

Conclusion

Harnessing the Power of MySQL Boolean Data Type for Optimal Application Design

Monitor Network Traffic using Darkstat on Raspberry Pi

Listing and Managing Network Interfaces in Debian

How to Restart or Reboot the Raspberry Pi Remotely

Mastering the PySpark Row Class for Effective Data Processing

Transforming Big Data with PySpark‘s translate() and overlay()

Linuxhaxor.net – About Open Source & Linux

Overview of SUBSTR Capabilities

Real-World Example – Parsing Addresses

Handling Dynamic Length Values with Variable Subscripts

Recommended Practices for Readability

Handling Edge Cases and Exceptions

Conclusion

Related posts:

Similar Posts

Linuxhaxor.net – About Open Source & Linux