Using same UUID in two tables


#1

I’m scraping data from a web pages that holds data about companies.Every summary page has tables includes summary of data.And list pages have list of persons and their info (name, salary etc…)
My app scarpes the summary pages and navigates to related list page and gets the data.
App scrapes the data multiple times and writes to db.In my view if I filter with FK and code it shows all employee_lists rows that app scrapped ever.

Example:

summary_table row1 -> employee_list rows 1,2,8,9
summary_table row3 -> employee_list rows 1,2,8,9

I want to relate two tables with same UUID so I could be able to filter the related employee list via summary_table which are scrapped in same
scrapping sesion.

Example:

summary_table row1 -> employee_list rows 1,2
summary_table row3 -> employee_list rows 8,9

Do you have any idea to do it ?

summary_table :

No. Company RegNo Total Code UUID
#1 Comp1 123456 100.000 111 UUID_1
#2 Comp1 123456 400.000 222 UUID_2
#3 Comp1 123456 110.000 111 UUID_3
#4 Comp1 123456 420.000 222 UUID_4

employee_list:

No. First Last Wage RegNo(FK) Code UUID
#1 First Person 50.000 123456 111 UUID_1
#2 Second Person 50.000 123456 111 UUID_1
#3 Third Person 100.000 123456 222 UUID_2
#4 Fourth Person 100.000 123456 222 UUID_2
#5 Fifth Person 100.000 123456 222 UUID_2
#6 Sixth Person 100.000 123456 222 UUID_2
#7 Seventh Person 100.000 123456 222 UUID_2
#8 First Person 50.000 123456 111 UUID_3
#9 Second Person 50.000 123456 111 UUID_3
#10 Third Person 100.000 123456 222 UUID_4
#11 Fourth Person 100.000 123456 222 UUID_4
#12 Fifth Person 100.000 123456 222 UUID_4
#13 Sixth Person 100.000 123456 222 UUID_4
#14 Seventh Person 100.000 123456 222 UUID_4

(Vitor Freitas) #2

What are you using in your scrapper?

If you are using Django, you can build a query without foreign keys… something like:

class Summary(models.Model):
    company = models.CharField(max_length=30)
    regno = models.CharField(max_length=30)
    total = models.IntegerField()
    code = models.CharField(max_length=10)
    uuid = models.UUIDField()
    
    def get_employees(self):
        return Employee.objects.filter(summary_uuid=self.uuid)

class Employee(models.Model):
    first_name = models.CharField(max_length=30)
    last_name = models.CharField(max_length=30)
    # other fields...
    summary_uuid = models.UUIDField()

So in this case you wouldn’t have a foreign key constraint at database level, but you could still build the business logic filtering the list of employees in the get_employees method.

You could also create the relationships using foreign keys modeling the tables accordingly

But the answer will depend on what you are using…Django, Flask, SQL Alchemy, Python + raw SQL interacting with the database, etc…


#3

Hi Vitor
I’m using Selenium for scrapping, Django for backend and Sqlite for DB.
I did it with using UUID’s to get summary and related list items.
In summary page my app counts the number of summary tables in the page and creates UUID for every table then starts the scrape from first summary table and writes the first UUID (session)to DB then navigates to list page and gets the data and writes to DB with the same UUID (list_session_id) used in summary table data.

my query is:
qs=EmployeeList.objects.filter(list_session_id=session)