As an experienced COBOL developer who has worked on various enterprise systems over the past 15 years, substring manipulation is an essential aspect I utilize daily. The versatile SUBSTR function is the perfect tool for tackling complex string parsing needs. In this comprehensive guide, I will impart my substring expertise to help fellow COBOL programmers thoroughly maximize their skills.
Overview of SUBSTR Capabilities
The COBOL SUBSTR function enables extracting a portion of a string into a separate substring variable. Its format is:
SUBSTR(source-string, start-position, substring-length)
Where:
source-stringis the original COBOL string variablestart-positionis the 1-based starting index of characters to copysubstring-lengthdefines total characters to extract
Consider this example string:
01 STR PIC X(20) VALUE "This is my example string".
We can utilize SUBSTR to extract portions of this string:
SUBSTR(STR, 1, 4) > "This"
SUBSTR(STR, 5, 2) > "is"
SUBSTR(STR, 10, 7) > "my exam"
The function copies the requested substring into a new string target based on our specified start position and length.
Key Benefits:
- Simple and self-contained for substring extraction
- Supports variables for dynamic start position and length
- Automatically pads results with spaces if substring exceeds source length
Combining SUBSTR with other string functions like INSPECT and INDEX enables parsing and transforming string data in powerful ways, as we will explore through practical examples.
Real-World Example – Parsing Addresses
To demonstrate SUBSTR‘s capabilities, let‘s walk through a common real-world use case – parsing postal addresses. In a typical customer database we may store addresses like:
01 FULL-ADDRESS PIC X(50)
VALUE "123 Main St, City, ST 12345".
01 STREET-NUM PIC 9(5).
01 STREET-NAME PIC X(20).
01 CITY-NAME PIC X(20).
01 STATE-CODE PIC XX.
01 ZIP-CODE PIC 9(5).
We need to split this full address into its constituent parts for processing. COBOL‘s STRING and UNSTRING statements can accomplish this, but require lengthy code and interim fields.
Instead, we can leverage SUBSTR and INSPECT for a cleaner solution:
INSPECT FULL-ADDRESS TALLYING WS-COMMAS FOR ALL ","
COMPUTE STREET-NUM = FUNCTION NUMVAL(SUBSTR(FULL-ADDRESS, 1,
INDEX(FULL-ADDRESS, " ") - 1))
COMPUTE STREET-NAME = SUBSTR(FULL-ADDRESS,
INDEX(FULL-ADDRESS, " ") + 1,
WS-COMMAS(1) - INDEX(FULL-ADDRESS, " "))
COMPUTE CITY-NAME = SUBSTR(FULL-ADDRESS,
WS-COMMAS(1) + 2,
WS-COMMAS(2) - WS-COMMAS(1) - 2)
COMPUTE STATE-CODE = SUBSTR(FULL-ADDRESS,
WS-COMMAS(2) + 1, 2)
COMPUTE ZIP-CODE = FUNCTION NUMVAL(SUBSTR(FULL-ADDRESS,
WS-COMMAS(2) + 4,
LENGTH(FULL-ADDRESS)))
Let‘s analyze what‘s happening:
- INSPECT tallies total commas for later use
- STREET-NUM extracted via position + NUMVAL conversion
- STREET-NAME uses INDEX, comma positions, and SUBSTR
- CITY-NAME parsed between comma positions
- STATE-CODE grabs 2 character state abbreviation
- ZIP-CODE numerally converted after second comma
Executing this logic would properly split our address into its key elements for further use, such as inserting into the proper database tables.
While there are other possible solutions without SUBSTR, none achieve this degree of brevity and readability. By leveraging COBOL’s native string manipulation capabilities, we avoid unnecessary complexity.
Handling Dynamic Length Values with Variable Subscripts
A key SUBSTR benefit is supporting variables for start position and substring length instead of just literals. Consider a scenario where we extract file records having dynamic-length name values:
01 NAME-RECORD.
05 NAME-LENGTH PIC 9(3).
05 NAME-VALUE PIC X(100).
01 FIRST-NAME PIC X(25).
01 MIDDLE-INITIAL PIC A.
01 LAST-NAME PIC X(25).
We need to parse the name from NAME-RECORD into standard format with first/middle/last. If this was fixed-length we could use literals, but since NAME-VALUE varies we must use variables:
MOVE NAME-LENGTH TO WS-NAME-LENGTH
COMPUTE WS-FIRST-SPACE = INDEX(NAME-VALUE, " ")
IF WS-FIRST-SPACE > 0
COMPUTE FIRST-NAME = SUBSTR(NAME-VALUE, 1,
WS-FIRST-SPACE - 1)
IF WS-NAME-LENGTH > WS-FIRST-SPACE + 3
COMPUTE WS-SECOND-SPACE =
INDEX(NAME-VALUE, " ", WS-FIRST-SPACE + 1)
IF WS-SECOND-SPACE > 0
COMPUTE MIDDLE-INITIAL =
SUBSTR(NAME-VALUE, WS-FIRST-SPACE + 1, 1)
COMPUTE LAST-NAME = SUBSTR(NAME-VALUE,
WS-SECOND-SPACE + 1,
WS-NAME-LENGTH - WS-SECOND-SPACE)
ELSE
MOVE SPACES TO MIDDLE-INITIAL
COMPUTE LAST-NAME = SUBSTR(NAME-VALUE,
WS-FIRST-SPACE + 2,
WS-NAME-LENGTH - WS-FIRST-SPACE - 1)
END-IF
ELSE
MOVE SPACES TO MIDDLE-INITIAL
COMPUTE LAST-NAME = SUBSTR(NAME-VALUE,
WS-FIRST-SPACE + 2,
WS-NAME-LENGTH - WS-FIRST-SPACE - 1)
END-IF
ELSE
DISPLAY "INVALID NAME VALUE" LINE 1 POSITION 1 CRT STATUS IS ERROR
END-IF
While this may seem complex, by using SUBSTR variables instead of fixed indexes, this will properly parse a name value of ANY VALID length into standard first/middle/last components without additional code changes.
This substring flexibility enables handling dynamic data in a reusable way not feasible with alternatives like reference modification.
Recommended Practices for Readability
While SUBSTR is immensely capable, logic like the name parser above can become complex and hard to maintain. From painful experience over the years, I recommend these practices when leveraging SUBSTR:
1. Break into paragraphs for discrete functions
Encapsulating logical units related to substrings into their own paragraphs/sections with descriptive headings vastly improves readability and debugging.
2. Validate lengths before manipulation
Verify the source string contains enough characters before trying to substring beyond its bounds.
3. Use descriptive variable names
For subscripts, positions and lengths – e.g. WS-FIRST-COMMA or SUBSTRING-START-POS
4. Implement exception handling
Code DEFENSIVE LOGIC for bad substring bounds, formatting, conversions, etc.
5. Add comments explaining logic flow
Use comments liberally to document the flow and intent of multi-step parsing.
Here is an example following these best practices:
* Extract middle initial from name value
EVALUATE-NAME-VALUE.
PERFORM CHECK-IF-VALID-NAME-VALUE
IF WS-NAME-OK
PERFORM FIND-FIRST-SPACE
IF WS-FIRST-SPACE > 0 AND
WS-NAME-LENGTH > WS-FIRST-SPACE + 3
PERFORM ISOLATE-MIDDLE-INITIAL
END-IF
END-IF
END-EVALUATE-NAME-VALUE
ISOLATE-MIDDLE-INITIAL.
* Get character after first space as initial
COMPUTE WS-INITIAL-START = WS-FIRST-SPACE + 1
COMPUTE WS-INITIAL-LEN = 1
MOVE SUBSTR(NAME-VALUE,
WS-INITIAL-START,
WS-INITIAL-LEN)
TO MIDDLE-INITIAL
END-ISOLATE-MIDDLE-INITIAL.
Following style guidelines like this may require some additional effort but pays off tremendously in maintainability and knowledge transfer. SUBSTR logic can turn convoluted very easily without deliberate structure.
Handling Edge Cases and Exceptions
Another recommendation from numerous substring issues over the years – always validate input data and handle exceptions. Some common cases I regularly account for:
1. Source string too short
Check length before extracting substr, display warning if inadequate.
2. Invalid string format
If my substring logic relies on delimiters being present ensure format first.
3. Numeric conversion failures
String contains non-numeric where digits expected – gracefully handle errors.
4. Upper/lower case mismatches
Routinely down-convert if case matters for comparison.
Here is an example checking length first:
EVALUATE-FULL-NAME.
IF LENGTH(FULL-NAME) > 25
* Do complex parsing logic here...
ELSE
DISPLAY "Invalid name length " WITH LENGTH(FULL-NAME) LINE 5
...handle error...
END-IF
END-EVALUATE-FULL-NAME.
Proactively coding defensive substring handling will save enormous headaches compared to letting exceptions crash programs later down the line!
Conclusion
The versatile SUBSTR function is a vital instrument in every COBOL programmer‘s toolkit for manipulating string data. Properly leveraging SUBSTR can greatly simplify the parsing and extraction of substrings from larger character values.
Chaining SUBSTR with other text functions such as INSPECT and INDEX enables cleanly handling complex parsing needs such as addresses and names with dynamic lengths.
However, SUBSTR logic can also quickly become convoluted. Following best practices around organization, validation, and exception handling is critical for maintainable substring code.
With robust defensive coding and well-structured logic, SUBSTR can empower building COBOL programs to skillfully handle substring operations at an enterprise level.
I hope this guide sharing my extensive hands-on COBOL substring experience provides deep insight on maximizing this function in your own development. Please comment below on any questions or how you have leveraged SUBSTR!


