Skip to content

Refactor: Fix bug and improve internal logic of workpatterns_classify_bw() #216

@moralec

Description

@moralec

Context
workpatterns_classify_bw classifies individuals by their hourly collaboration habits. This method groups individuals in 5 groups, depending on their number of active hours and flexibility of work.

The groups that result from the code are:

  • 1 Standard with breaks workday: active for fewer than expected hours, with no activity outside working hours
  • 2 Standard continuous workday: number of active hours equal expected hours, with no activity outside working hours
  • 3 Standard flexible workday: number of active hours are less than or equal to expected hours, with some activity outside working hours
  • 4 Long flexible workday: number of active hours exceed expected hours, with breaks occurring throughout
  • 5 Long continuous workday: number of active hours exceed expected hours, with activity happening in a continuous block (no breaks)
  • 6 Always on (13h+): number of active hours greater than or equal to 13

Issues identified
There are three issues to address:

  1. [Improvement] Code is currently very difficult to understand. Variable names are not always meaningful, and there is not enough commentary on the code.
  2. [Improvement] A customer has highlighted that the classification algorithm should run if you specified the expected working hours instead a start - end time.
  3. [Bug] There is a misclassification problem where individuals that are working flexibly the expected hours (e.g. 8 hours), are classified as 4 Long flexible workday instead of 3 Standard flexible workday

Steps to reproduce this last issue:

em_data %>% workpatterns_classify(start_hour = "0900", end_hour = "1700", return="data") %>% filter(Personas == "4 Long flexible workday") %>% View()

Suggested Solution

I would suggest we make the following changes to this code:

  1. Rename variable to make the code easier to read: Change "D" for exp_hours, and signals_total for active_hours.
  2. Make exp_hours a parameter that is by default calculated as end_hour - start_hour
  3. Remove the -1 for the calculation of expected hours and update the classification rules accordingly.
  4. Make sure return=data also returns end_hour , exp_hours and start_hour
  5. Remove unnecessary variables (start_hours_0)
  6. Add morae commentary to the code.
  7. Update plot so that the limits of the buckets are clearer (e.g. Standard hours 3-8 hours, extended hours 8-13)

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions