Monday, April 23, 2018

Genomic Collapse is a Bitch

Sometimes distant cousins marry.

Sometimes not-so-distant cousins marry.

And then there's Jamie and Cersei Lannister, but that's another story.

In any case when this happens, the existence of shared ancestors begins to shrink down the number of people in the family tree, compared to what could possibly be there.

I'm defining "Genomic Collapse" as:

The ratio of unique people in a subset of the family tree to the number of "filled slots" in that subset.
 So in the extreme case of Joffrey Lanister (whose parents were brother and sister), that's 50% genome collapse at a minimum because the maternal and paternal blood lines are identical; and "at a minimum" because another consanguine relationships further up continue to contribute to the genome collapse.

So - what is it for my maternal family tree?

  • Out to 10 generations:  41.4% complete tree but with 30.2% collapse (296 people in 424 filled slots out of 2046 total slots);
  • Out to 15 generations: 5.9% complete but with 49.0% collapse (988 people in 1,937 filled slots out of 65,534 total slots);
  • Out to 20 generations: 0.2% complete but with 51.4% collapse (1,074 people in 2,208 filled slots out of 2,097,150 total slots).
I'm not sure if this is the right way to do this because one you hit a non-unique ancestor, that effect double with every generation back.

In any case, going from generation to generation it looks like this:

The blue line is the percent overlap at that generation.   The green line is the % overlap for that generation and all the ones preceding it.    The orange line is the highest possible percentage of overlap (i.e., all unknown ancestors for that generation are all repeats of known ancestors), the red line is the lowest possible percentage (all unknown ancestors are new people).

Wednesday, April 18, 2018

I think I know how to find Célina Boulé

It occurred to me last night as I was falling asleep that I might be able to use the DNA results to identify Célina Boulé's parents.

It's something of a long-shot, but given this tree:

starting with my grandmother, what we're trying to find is the set of 3rd great-grandparents that's missing.  

All of the DNA matching products try to match you up with distant cousins.   I'm slowly making my way through that list of people, looking at their family trees (when they've bothered to make one) to see if I can find common ancestors, and thereby establish our relationship.   A few have had Alexandre Guimond and Célina Boulé as the common ancestor.

But some of these supposedly "high-confidence" matches appear to have no common ancestors.  This got me thinking - what if the common ancestors are the missing parents for Célina Boulé?  And - how could we identify who they are?

What I need to do is find 4th cousins, determined genetically but NOT through a shared family tree.  Why?  Because if we can identify them through a family tree, then we know they're not Célina's parents.    What we want to find are all the 4th cousins for whom it's not possible to find a common ancestor but that we are certain is on the Québec side of the family (because otherwise they might be one of the unknown Irish or British 3rd great-grandparents).

  1.  Are we genetic 4th cousins?
  2.  Do our family trees overlap in Québec?   (Or not - see below!)
  3.  Do we not see any overlap at the 5th generation?    

Note that this isn't as difficult to confirm as you might think.  There are only three family groups to consider:   a) Narcisse Guimond and Céleste Sévigny,  Basile Tousignant and Marguerite Maillot,  and Pierre Bélanger and Thérèse Maillot.  If none of these families are there, then the only one that remains are the parents of Célina Boulé.   

Nonetheless, for this to work, it still requires one other thing:  Célina MUST have a sibling and my 4th cousin must be a descendant of that sibling.   Otherwise my 4th cousin and I will have the same "hole" in the family tree with Célina being a dead-end.
So - given a set of 4th cousin candidates, under the conditions above, the intersection of all of our family trees' common ancestors should point to the parents of Célina Boulé.   This can get tricky, because it's possible that her parents are already in my family tree through some other route.   If that location in the tree is far away from the 3rd great-grandparent position (i.e., they're also C:9,4's or something) it might still work because the DNA correlation for such a distant cousin would be negligible.  

What if Célina is not Québecois?

One of the likely possibilities is that Célina is adopted and might not have a Québec background!  Instead, it's possible she's an Irish immigrant escaping the Irish potato famine.  There's no evidence to support this but the timing fits, and there was a huge influx of refugees from Ireland in 1847 (at which point Célina would be about 7 years old).  Nearly 100,000 immigrants came though quarantine at La Grosse Île, about 30 miles downstream from Québec before being relocated to Québec, Canada West, and the US:

The mortality rate from a typhus was huge: about 1 in 6 died during, or shortly after the crossing.  So, it's also possible that she was orphaned in this way.     There were programs set up for adoption, typically done by the church, where (usually older) children were placed with families.  It was more of a "foster" system than adoption; the children were seen more as convenient labor than part of the family, and typically kept their original family names.   So Célina might be one of the exceptions: if  - as I suspect - she were adopted/fostered by Moïse Boulé and Domitille Bernier (who do not appear to have had any children of their own), she took on their name (at least some of the time).

(Also, "orphan" has a slightly different meaning in the context of the situation:  some "orphans" had a living parent, but one who was not able to care for them (sickness or destitution) which makes the "adoption" more like foster care.

But her Irishness can be tested too, if there's clearly Québec integration from the late 19th century (i.e., Célina's sibling also married into a Québec family and had children) but the only common ancestors are Irish and never Québecois, then that would lend support to the "Célina is an Irish immigrant" theory.

It doesn't appear there's much in the way of records showing which children were placed with which families.  Another avenue I have not attempted is to see if there are other children that might've been adopted/fostered by Moïse Boulé.  

Sunday, April 15, 2018

Estimating variance based on family DNA

So I'm still curious about the DNA results...

I'm thinking that the lack for French ethnicity might be due to a weird roll of the dice in terms of the DNA makeup I inherited from my father and mother.   Part of this comes from comparison to a first cousin: we share the Bradish/Guimond side of things but his mother's side of his family is far removed geographically from my father's side of the family.   Yet he comes up with 11% French, whereas I come up with 0%.   Since the overlap is on my mother's side, I think I can safely preclude any "Jerry Springer"-esque explanation.  :-)

So - how to model this?   It occurred to me that getting DNA samples from my father and sister (or brother) might be enlightening.

For my father, there should be roughly 50% overlap (from a zeroth-order assumption).  So anything that doesn't overlap comes from my mother.   If the overlap is far more than 50% (and I'm not a biologist so i don't know if this is a valid hypothesis) then - borrowing from Game of Thrones - "the blood is strong" and genetically, I'm more Irish than French.   I'd expect that my father should come out almost exclusively Irish with some English.   But for my siblings, I suppose anything is possible in terms of the Irish/English/French ratios (unless my hypothesis is entirely invalid).

In the case of my siblings, they should also have roughly the same 50/50 split (as a base assumption), but with different overlaps.   Quantitatively, the difference would be some kind of estimate of the variance within one generation of "shifting" - in other words, "you have your father's nose, but your mother's eyes" might apply to one child, but the opposite for another. 

So, if there's X% variance between siblings, then that effect gets multiplied as you go back in time, complicated by the additional consanguine relationships with in-laws with the eventual "genome collapse" that happens when those family lines intersect from a shared common ancestor.  (The math for that might be completely impossible to do - but I think it would be a fun challenge!).

I also have to wonder to what extent the rabid genome researchers would love to sample the DNA from these ancestors.  Aside from the nasty thought of "grave robbing DNA", it would be extremely interesting and could solve some mysteries.   The biggest dead-end on my mother's side is her great-grandmother Célina Boulé:  was she adopted?  Who are her parents?  It's extremely likely that she is also a distant cousin from some blood line, but I think I've successfully ruled out the handful of "possible relationships" that others have used on their family trees. 

There's also the intriguing -- though I think unlikely -- possibility that she isn't Québecois at all.   Apparently there are cases of Irish children fleeing the potato famine being sent to North America.  Perhaps she's one of them - which would also inject Irish ethnicity into what otherwise would've been a nearly 100% French genome, for me only 4 generations back (so 1/16th overall but 1/4 of the Guimond line).   If it's also true that she was adopted by Moïse Boulé and Domitille Bernier it might make since since as far as I can tell they had no children of their own. 

I have two hopes:  1) that I can get some breakthroughs on my father's side of the family.  Finding a few bona-fide third or fourth cousins might open up some avenues;  2) ditto on the Guimond side, in the hopes that eventually we solve Célina's past.

Saturday, April 14, 2018

DNA results - part 2 - distant cousins

So one of the OTHER "features" of the DNA testing is that it's supposed to help you identify potential distant cousins based on likely common ancestors.

From the side of things, this is something of a bust too.   At least for me.

I've got 55,000+ people on the family tree.   But there are several dead-ends - especially on the Irish side and on the Bradish line.   So I was hoping to find others who might be able to provide a breakthrough or two.

Ancestry has identifed ~250 people with whom I have a shared ancestor.   However looking at their profiles, they fall into three categories:

  1. People who took the DNA test but haven't done their family tree - no help there;
  2. People who started on their tree but it only has a few (e.g., 10) people on it - no help there;
  3. People who have fairly large family trees.  YAY...  but hey - their entries use the same format I do for names, etc...
... oh dear - their family tree is based mostly on getting data from another tree...   Mine!

Oh well  :-)

But there is one first-cousin whose tree has a different set of parents for John Patrick Bradish (the great-grandparent who was born in Punjab) and they're Irish not English.   This changes the Ethnicity assessment (skewing from mostly English to mostly Irish).   The parents I have came from Google search, are VERY low-confidence, and relied on the assumption they'd be English (since he served in the British army).   Although Ireland was part of Great Britain at the time, it just seemed to me to be unlikely many people would enlist in the British Army from Ireland.   But that might not be a valid assumption at all, and even if it were, it doesn't mean that this is an exception to the rule!

Something to follow up on!

DNA Result are in... What the hell?

So I get my Ancestry DNA results.

  • 59% Ireland/Scotland/Wales 
  • 32% Great Britain
  • 8% Other

I expected:

  • 50% English (based on the Bradish/Murphy and the Hall/Murphy lines, but see the next posting)
  • 25% Irish (Donahue line)
  • 25% Frence (Guimond line)
Where's the French?

Now all the Ancestry DNA ads have some kind of "Find Exciting Surprises" where someone takes the test thinking they're an Italian/German mix and discover they're 80% Russian and 20% Japanese or something like that.

But the largest chunk of my family tree that's mapped are Québecois - and I have HUNDREDS (probably a few thousand at this point) of original settlers who were born in FRANCE.   Now I know SOME of them started out somewhere else, went to France and from there went to Québec.   I know that some of the immigrants came from England, Ireland, Germain, Switzerland, Spain, and Italy.  But not THAT many.

I don't know what to make of this.   We've joked that there's some kind of "Jerry Springer" episode in here somewhere "you are NOT the father" - except that it would have to be "you are NOT the mother" and that would be difficult to pull off since surrogate motherhood is definitely more of a 21st century occurrence.

But there are other possibilities.   The Ethnicity estimate comes from where your current-day relatives who have taken the DNA test reside NOW.   So one might expect some degree of variance from that.  EXCEPT that one could do the appropriate weighting based upon coverage of test subjects versus actual population (and they have to do this I think, otherwise NO ONE's results would be accurate).

There's also the fact that one doesn't necessarily inherit the same degree of genetic material from EACH ancestor proportionally.   Perhaps "the blood is strong" (to quote John Arryn) is in play here, and just that my paternal contributions to my DNA overtake the maternal.   

So I think I'll splurge and get a comparison test with 23andMe.   I'm also wondering if I can convince dad or a sibling to also do the test so that we can compare.

Hey! Wait a minute...

So I'm knee-deep in Acadians.   The records are tricky because once the English invaded and started forcibly removing them, they moved around.   A lot.

Many escaped to Québec, others ended up back to France.   Some settled elsewhere in the Saint-Lawrence River islands:  Miquelon, Prince Edward Island, the Iles-de-Madeleine, etc., and some went to other French colonies: Louisiana, and the Caribbean (Haiti, Guadeloupe, etc.).

But many ALSO went to the (now) USA:  Boston, the Carolinas, Georgia, and so on.

This doesn't make any sense to me.   If you've just been invaded by a foreign empire, why would you travel to that empire's colonies? 

But there's a simple explanation --- tt turns out that this was actually the British government's doing; at first they "relocated" Acadians to the British Colonies, to rural parts of Massachusetts, New York, and so on.  However this cunning plan didn't work out the way they wanted:  the Acadians refused to stay and just to the cities forming Francophone communes or tried to get back to Canada (which is exactly what the British did NOT want to happen).   So, the second wave of deportations were made to France instead.

From THERE, many once again moved - this time from France either back to the St. Lawrence River settlements, or south to the Caribbean and Louisiana (which I just learned came under Spanish control in 1762 - I really need to bone up on this history).  Some even went as far as the Falkland Islands! 

I've also noticed that many died shortly after leaving Acadie.   The records in Québec are rife with burial of Acadians in 1759.   But I'm also finding similar spikes in deaths in France among the repatriated Acadians.  (Some others apparently died at sea trying to get to France.)

Sunday, March 18, 2018

Phase 1 1/2: The Acadians

So, I've got about 25-30 Acadian familes (from generations 8 to 15) to do.

Not sure how successful this will be because there isn't quite the comprehensive indexing that there is for the Québec familes: some of them are in the PRDH, but typically not in Lafrance, although many are also in the GQAF.

The canonical material is from S.A. White published in 1999 but out of print.  A revised 10-volume set is in preparation but with no projected date of release.