2,345 questions
Advice
0
votes
4
replies
88
views
How to create stable person identifiers when names vary across years
I am working with a university faculty salary dataset where the same person appears across many years, but their name strings are inconsistent. The dataset has about 8,000 unique people and years from ...
-1
votes
2
answers
167
views
Java regex - Optional Match Capturing Group [closed]
I'd like to process some input queries in 3 possible ways:
query: select * from People
query: select * from People exclude addresses
query: select * from People include department
I have two regex1 ...
0
votes
2
answers
87
views
Automatically map messy column names to a standard schema in pandas
I'm working with many tabular datasets (Excel, CSV) that contain inconsistent or messy column names due to typos, different naming conventions, spacing, punctuation, etc.
I have a standard schema (as ...
0
votes
1
answer
56
views
expect-5.45.4 shows unexpected spawn output, causing string match to fail; is it a bug?
In SLES15 SP6 on x86_64 I'm using a bash script and expect-5.45.4 to do automated program testing.
Basically I'm checking whether the program to test (./pwg.pl) outputs a specific string.
Starting to ...
-2
votes
1
answer
116
views
How to match German province names between 2 data sets in R?
I'm working with two datasets for German NUTS-3 level regions:
A shapefile from Eurostat via the giscoR package:
> library(giscoR)
> nuts3_germany <- gisco_get_nuts(country = "Germany&...
4
votes
4
answers
169
views
Match start of line in multiline string in lua?
Let's say I want to match any sequence of the hash sign # at the start of a string; so I'd want to match ## here:
local mystr = "##First line\nSecond line\nThird line"
... and ### here:
...
2
votes
3
answers
123
views
Pandas DataFrame column partial match and extract matching value
I have a column in Pandas DataFrame(Names) with a large collection of names. I have another DataFrame(Title) text column and in between text, the names in Name frame are there. What would be the ...
2
votes
0
answers
88
views
Find Substrings In A Dynamic Collection Of String
This question is a little complicated, so I try to describe it through an example.
First, we get a string foo, and put it into collection S.
Then we get a string sample, and put it into S too.
Next, ...
1
vote
1
answer
71
views
Match similar names [duplicate]
I have a database with three columns: name, occupation, and organization. In these columns, I have duplicates with slightly different names. For example, Anne Sue Frank and Anne S. Frank refer to the ...
0
votes
2
answers
86
views
How to match cross-referenced names from table without duplicates
savvy people,
I will have participants of an event sign up where they, aside from their personal details, also provide a duo partners name or leave that blank. So, I will have two columns, ...
1
vote
3
answers
96
views
Find str.contains in two large Pandas DataFrames
I have a large pandas DataFrames like below.
import pandas as pd
import numpy as np
df = pd.DataFrame(
[
("1", "Dixon Street", "Auckland"),
("2&...
0
votes
1
answer
90
views
Full string matching in Pandas dataframes comparison
this seems like it should be an easy problem to solve, but I've been battling with it and cannot seem to find a solution.
I have two dataframes of different sizes and different column names. I am ...
1
vote
1
answer
79
views
How to match a function but exclude object methods without negative lookbehind
I'm trying to write a regex that matches every occurrence of some_function(...), but it should not match when it's part of an object method like my.some_function(...) or if it is a substring of ...
2
votes
2
answers
88
views
Do Kotlin's List/Array data structures have a findSublist method analogous to String.indexOf(CharSequence)?
Do Kotlin's List/Array data structures have a findSublist method analogous to String.indexOf(CharSequence), that takes a List/Array/Sequence to match against the list?
1
vote
0
answers
78
views
Trying to fix names in my database with fuzzywuzzy
What I'm trying to do is find and correct similar names in my database, like 'Patrick Maxwell' and 'Patrick Maxwel.' However, the issue I'm facing is that the best match for each name is often itself, ...