City of Dallas animal shelter dataset contains 5 types of animals with solid lead belonging to dogs:
|Admissions by Animal Types|
For consistency and plausibility of analysis we will focus on the records with dogs only.
More exactly, each shelter record contains an animal admitted to a shelter with certain intake type and later discharged with certain outcome. Top 3 reasons why dogs turn up at shelters are Confiscated (abused, no owner, etc.), Owner Surrender (willingly brought in by owner), and Stray (lost or abandoned):
|Dogs Admitted by Intake Types|
Dogs leave shelters (either alive or dead) for 4 main reasons (outcomes): Adoption (good), Euthanized (bad), Returned to Owner (good), and Transfer (neutral):
So what is the relationship between top intake types and outcomes? Which and to what extent intake types drive outcomes? The good news there is some causality effect because each stay begins with intake type and ends with outcome.
Let's begin with higher level (in that case) but visually appealing visualization called sankey diagram (or just sankey). It is a specific type of flow diagram, in which the width of the arrows is shown proportionally to the flow quantity:
Each dog shelter stay contributed to the size of one of the pipes flowing from left (an intake type) to right (an outcome). With this we basically visualized conditional probabilities of dog leaving shelter with certain outcome given its admission with known intake type.
Next, we go beyond total aggregates used in the sankey (counts of intakes and outcomes above) to computing correlations. To compute correlations between intake types and outcomes we aggregated and computed counts over time (monthly) to obtain trends (time series). Then we computed correlations between monthly trends of dogs brought in and removed from Dallas animal shelters for each pair of top animal intake types (Confiscated, Owner Surrender, and Stray) and outcomes (Adoption, Euthanized, Returned to Owner, and Transfer) - 12 coefficients in total:
In this case strong correlation implies (at least to some extent) causation effect due to presence of temporal relationship, consistency, and plausibility criteria (see here and here). Few observations to note:
- The highest correlation at 0.82 is between intake Surrendered by Owner and output Euthanized which is almost as obvious and unfortunate.
- The second hightest correlation at 0.8 is between Stray and Returned to Owner. This is a good news that owners receive their lost pets back - the higher this correlation the healthier the city for 2 reasons. First, lost animals returned home, and second, it means that most stray dogs are lost and not abandoned (given that city keeps collecting them).
- No outcomes are affected by variations in Confiscated dogs, but this is likely due to smaller share of admissions of this type.
- Variation in Stray dogs admitted affect every outcome (more or less) except Euthanized which is sort of surprising (Stray intake type is the largest and is almost twice as big as the 2d largest type Owner Surrender).
But can we do better than correlations of these trends? What if instead of coefficients (which technically are still sophisticated aggregates) we observe actual actual monthly trends? Next visual places actual time series instead of correlation coefficients inside the same matrix grid :
Each row corresponds to an intake type and each column to an outcome (just like correlation matrix before). Now we can see trends over time (months) in volume so note the following observations (following the matrix order top down):
- Confiscated intake trends flat with only significant spike in January 2016. This spike is so unusual, relatively big, and contained within single month or two that it begs additional investigation into probable external event or procedural change that may have caused it.
- Number of Confiscated dogs is relatively low to noticeably affect outcomes. Still, if we can reduce effect of other intake types some relationships are possible.
- Owner Surrender trend correlation with Euthanized outcome is so obvious that this type of visualization is sufficient to find it. Yes, it is unfortunate but people bring their old or unhealthy pets for a reason.
- Owner Surrender has significant seasonal component spiking in summer possibly due to hot weather or holiday season or both.
- Euthanized trends together with Owner Surrender which causes it to a large degree.
- Stray dogs trend slowly upwards in Dallas and it's alarming.
- Adoption also trends upwards but not steep enough to compensate for inflow of dogs into shelters. Targeted campaign to encourage more adoptions of pets in the city is due.
- Transfer outcome trending upward also compensates for the growth in stray dogs. It's not clear if it's positive or negative though as there is no means to track what happens to dogs after transfer (or is it?).
- Stray trend dipped in January 2016 exactly when confiscated trend spiked - it could be a coincidence or related - for sure something to consider when investigating further.
- Euthanized trend correlates strongly with Stray intake until the summer of 2016 when they start to diverge in opposite directions - again some policy or procedural change apparently caused it. Indeed, if we observe other outcomes we notice that Returned to Owner trend began its uptick at around the same time (indeed, after I observed this I found out about this and this - significant changes in Dallas Animal Services leadership and policies around summer and fall of 2016).
I will be back with more analysis (survival analysis) and R code for data processing, analysis, and visualizations used in this post is available here.