As part of my effort to create one big reference dataset for my use, I have been going over all the datasets I have and make sure there's no duplicates or relatives or any other strange things that could cause issues with my analysis.
I found one set of duplicates, NA21344 is identical to NA21737. And a whole bunch of pairs with high identity-by-descent values, which I calculated using Plink. You can see the samples with PI_HAT greater than 0.5 in this spreadsheet. PI_HAT is the proportion IBD estimated by plink. Notice also that all these pairs also have high IBS similarity (the DSC column), more than 85% similar in fact.
All the 41 samples I have removed as a result of this are listed in this spreadsheet.