
Source: https://www.digitalgolf.ca/
I was engaged in a Freelance Data Analysis job where I was asked to assess some time-bound golfing information. The person was interested in discovering time metrics and then inferring potential golfing group times. During a phone call that I had, they were able to give me some basic information about the sport (as my golfing knowledge is as deep as my baseball knowledge) and the datasets. This information included an average (or good) course completion time, longest and shortest courses, and which datasets (according to them) look like they are on par (pun intended).
Description of Task and Requirements
The data shows how long golfers are playing. The records are measured by an app that the golfers communicate with; start and stop. (The app must be synced such that all players commence at the same time.) However, each play stops the app when each individual finishes.
Main metrics to derive:
- Average and Median Times
- The average and median times for all 4-ball rounds
- Average and Median Times by Course
- The average and median times for all 4-ball rounds by course LI, Old, Moonah ,Gunn _Each one its own tab
- Average and Median Times by Competition Type
- The average and median times for all 4-ball rounds by competition type (stableford, par, stroke) Column C Data
- Average and Median Times by Times
- The average and median times for all 4-ball rounds by tee time (early am field – first tee time to 9am, mid am field – 9am to 11am , later than mid am field – after 11am)
Process of Completing task
Before I began gathering the screenshots for this post, I anonymised the names within each of the courses. To do this, I created a new sheet and listed all unique names for each course. Next to the list of unique names, I created a function =”Player “&SEQUENCE (COUNTA(A1:A)) that can dynamically change when a new name is added into the original spreadsheet and assigns each player their own unique ID, i.e., Player 1, Player 2, Player 3, etc.
There were specific cleaning tasks that were necessary to perform before analysis could begin.
Cleaning: Convert Text Time columns (Tee Start Time, Finish Time) to actual Time
Due to the information located in each file, some of the columns will be slightly different. This can be taken care of with PowerQuery (when initially loading the data in), but in this case, it wasn’t absolutely necessary.
When I initially completed this task, I had to use the following formula (specifically for the Old course):
=IF(ISNUMBER(SEARCH(“AM”,I2)),
((HOUR(IF(NOT(ISNUMBER(I2)),LEFT(I2,SEARCH(“AM”,I2)-2),I2))+(MINUTE(IF(NOT(ISNUMBER(I2)),LEFT(I2,SEARCH(“AM”,I2)-2),I2))/60))/24),
IF(ISNUMBER(SEARCH(“PM”,I2)),
((HOUR(IF(NOT(ISNUMBER(I2)),LEFT(I2,SEARCH(“PM”,I2)-2),I2))+(MINUTE(IF(NOT(ISNUMBER(I2)),LEFT(I2,SEARCH(“PM”,I2)-2),I2))/60))/24),))
However, now I would update it (and other times and courses) to the following function:
=TIMEVALUE(I2)
Processing
Allocate players to a group
To best allocate people into groups, I utilised the available date and time information (for datetime) each spreadsheet and used this group starting time to allocate players:
Using these two pieces of information, we can allocate players into groups.
=IF(
AND(K3=K2,A3=A2),
L2,L2+1)
From this screenshot, we can see that when K2 = K3 (the same start time) and A2 = A3 (the same date, then these two individuals are apart of the same group, i.e. K2 = K3. Alternatively, from the same screenshot, we can see that K5 = 2 as:
J5 != J4, therefore:
K5 = K4 + 1
= 1 + 1
= 2.
Find the number of players in each group
This group size is based only on the information in the L column, Group Number. When I created this function, I was thinking about creating a bucket for each group out of ‘x’ number of players. Here, we can count the number of people within that allocated group.
=IF(
COUNTIF(L1:L,L1)-COUNTIF(L2:L,L2)<=0,
((COUNTIF(L1:L,L1)-COUNTIF(L2:L,L2))*-1)+1,
M1)
The logical condition of IF is counting the different between two groups:
- All numbers from L1 down that are equal to L1, and
- All numbers from L2 down that are equal to L2.
For ease (and not having to concern ourselves with the headings), we will be moving to the L3 cell.
Essentially, I counted the maximum number of people (n_i) within that bucket b_i (based on the Start Time). When that bucket was full, I counted the next number of maximum people (n_i+1) in the next bucket (b_i+1) based on the next Start Time.
=IF(
COUNTIF(L2:K,L2)-COUNTIF(L3:L,L3)<=0,
((COUNTIF(L2:K,L2)-COUNTIF(L3:L,L3))*-1)+1,
M2)
We will be looking at the transition between Group Number 1 and 2.
As mentioned, this formula is assessing two different pools of numbers. In this first pool, the formula counts the number of 1’s in the L column from L2 onwards. This total value is 3 as L2, L3, and L4 are each equal to 1. The second pool contains all of the 1 values from L3 downwards; this value is 2 as L3 and L4 are both equal to 1.
The logical condition for the following cells yields:
L2: 3 (L2+L3+L4) – 2 (L3+L4) = 1
L3: 2 (L3+L4) – 1 (L4) = 1.
L4: 1(L4) – 0 (no cells) = 1
Since all of these results (L2 to L4) are = 1, this means that the logical condition is False as 1 not equal or less than 0. Therefore, we assign these cells (M2, M3, M4) = 3.
M5 triggers the True statement of the IF function.
Since, from L5 onwards, there are no more 1’s in the Group Number column, this COUNTIF function is now:
((COUNTIF(L4:L,L4)-COUNTIF(L5:L,L5))*-1)+1,
((Count the number of values in L4 from L4 onwards) – (Count the number of values in L5 from L5 onwards) * -1) + 1
= (1 (L4 = 1) – 3 (L5, L6, L7 = 2) * -1) + 1
= ((-2) * -1) + 1
= 2 + 1
= 3
The *-1)+1 part of this function will turn this negative value positive, and increase its value back to the number of people within the next group.
We can see the transition equations for the following groups:
The table below
| Row 27: Group 8 transition | Row 30: Group 9 transition | Row 31: Group 10 transition |
| = (1 – 3) * -1) + 1 (L26 = 7) (L27, L28, L29 = 8) = ((-2) * -1) + 1 = 2 + 1 = 3 | = (1 – 1) * -1) + 1 (L29 = 8) (L30 = 9) =((0) * -1) + 1 = 0 + 1 = 1 | = (1 – 2) * -1) + 1 (L30 = 9) (L31, L32 = 10) = ((-1) * -1) + 1 = 1 + 1 = 2 |
| Since L27:L contains three 8 values, we know that 3 is the number of people in Group 8. | Since L30:L contains one 9 value, we know that 1 is the number of people in Group 9. | Since L31:L contains two 10 values, we know that 2 is the number of people in Group 10. |
Assign all players the same finish time
We are assessing Column L of sheet Gun.
=IF(L2=L1,
MIN(MINIFS($O$2:$O2,$L$2:$L2,L2),P1),
MINIFS(O2:O,L2:L,L2))
=IF(L2=L1,
This checks if the player of interest is a part of the same group.
If they are:
MIN(MINIFS($O$2:$O2,$L$2:$L2,L2),P1),
This will check minimum times.
- Firstly, it checks the minimum time for the previous player and the current. It will continue to check the growing number of players within any group size, hence the $L$2:$L2 element of the MINIFS formula.
- Secondly, it compares the MINIFS formula to the record (cell) before it and selects the minimum value. This will yield a true minimum group value as it is assessed whilst L3=L2.
MINIFS($O$2:$O2,$L$2:$L2,L2)
If L3<>L2 then, it will continue the MINIFS search whilst players are a part of the same group.
Final Data Queries
The data shows how long golfers are playing. The records are measured by an app that the golfers communicate with; start and stop. (The app must be synced such that all players commence at the same time.) However, each play stops the app when each individual finishes. To efficiently develop these queries, I created named ranges for each course and various metrics, i.e., Group Size, Competition Type, and the group rounded Playing Time.
What follows are the queries that were requested by the client:
Query 1 – Average and Median Times for all 4-ball Rounds
Where the ‘Median Time’ formula was: =IFERROR(ARRAYFORMULA(MEDIAN(IF(OldGroupSize=$B$2,Old!P2:P),IF(MoonahGroupSize=$B$2,Moonah!Q2:Q),IF(GunGroupSize=$B$2,,Gun!N2:N),IF(LIGroupSize=$B$2,LI!Q2:Q))),””)
And Average Time utilised data in the following query:
=IFERROR(AVERAGE(B11:B14),””)
Query 2 – Average and Median Times for all 4-ball Rounds by Course
For the Old course:
Average time – =IFERROR(AVERAGEIF(OldGroupSize=$B$9,OldRoundTimeByTeeTime),””)
Median Time – =ARRAYFORMULA(IFERROR(MEDIAN(IF(OldGroupSize=Statistics!$B$9,OldRoundTimeByTeeTime)),””))
Min Time – =MINIFS(OldRoundTimeByTeeTime,OldGroupSize,$B$9)
Max Time – =MAXIFS(OldRoundTimeByTeeTime,OldGroupSize,$B$9)
Standard Deviation – =IF(OldGroupSize=$B$9,STDEV(OldRoundTimeByTeeTime))
Where B9 is the number of ball rounds.
Query 3 – Average and Median Times for all 4-ball Rounds by Competition and Course
The following formula is for Old course, Stableford Competition, Average Time:
=IFERROR(AVERAGEIFS(OldRoundTimeByTeeTime,OldCompType,B$21,OldGroupSize,$B$20),””)
The following formula is for Old course, Stableford Competition, Median Time:
=ARRAY_CONSTRAIN(ARRAYFORMULA(IFERROR(MEDIAN(IF(OldGroupSize=$B$20,IF(OldCompType=B$21,OldRoundTimeByTeeTime))),””)), 1, 1)
Query 4 – Average and Median Times for all 4-ball Rounds by Tee Time and Course
The following formula is for the Old Course, 09:00:00 time, with 4 ball rounds:
=IFERROR(AVERAGEIFS(OldRoundTimeByTeeTime,OldGroupSize,$B$47,OldGroupStartTime,”<=”&B$48),””)
It is noteworthy that each subsequent query is essentially adding on more constraints to narrow in on specific metrics for each course.
Query 5 – Fastest Group per Course
=INDEX(Old!G2:Q,MATCH(MINIFS(Old!Q2:Q,Old!L2:L,4),Old!Q2:Q,0),5)
Query 6 – Fastest Player per Course
=INDEX(Old!G2:Q,MATCH(MINIFS(Old!Q2:Q,Old!L2:L,4),Old!Q2:Q,0),1)
Conclusion
At first, I felt a bit overwhelmed by the amount of data and the cleaning that was necessary to process. Well, really, I think I was worried by analysing time data…for some weird reason…given how much time I’ve spent working in digital music (which is specifically based around working with digital time). But, the client was extremely happy with the results and felt comfortable processing this data on their end to best create groups that would be tailored around time and networking connections. Whether it’s been an ad-hoc or long-term project, I thoroughly enjoy delivering a solution to whomever has asked me for it.
