Parser missing features (M): #4

Closed
opened 2023-05-27 13:23:47 +02:00 by joshniemela · 0 comments
joshniemela commented 2023-05-27 13:23:47 +02:00 (Migrated from github.com)

These issues are in decreasing order of priority

  • It would be very wise to start writing unit tests to ensure how the parser is behaving and if its doing the expected of the below, or use property based testing
  • Almost everything is a list that contains 1 element, we need to figure out if they can be turned into key-vals or if they appear as larger items often enough to warrant being lists
  • workload is currently a list of the shape [..., "exams", 40, "total", 60, ...] this needs to be a list of key vals: [..., "exams": 40, "total": 60", ...], the first two values of category and type can be entirely omitted since we already implicitly know what they should be
  • WARNING DIV is a placeholder that needs to be removed once we know the parser is doing stuff as intended with divs
  • Some courses contain escape codes for danish letters instead of the actual danish letters: \uXXXXXinstead of æøå, is this reproducible?
  • Some keys have not been localised to english, etc Engelsk titel \rightarrow english title
  • Schedule possibly still squishes div's together by calling .text too early, confirm?
  • Nulls appear in one of the fields in LBIK10214U, why does this happen and does it appear in other courses too?
  • duration & placement should match for the numbering using regex so we can save them as INT(1) in the database
  • credit extract the number of credits with regex so we can save them as FLOAT in the database
  • language change the keys so they are en or dk so we can save them as CHAR(2)
  • level change so we just save the abbreviations of the titles, it can therefore be CHAR(3)
These issues are in decreasing order of priority - [ ] It would be very wise to start writing unit tests to ensure how the parser is behaving and if its doing the expected of the below, or use property based testing - [x] Almost everything is a list that contains 1 element, we need to figure out if they can be turned into key-vals or if they appear as larger items often enough to warrant being lists - [x] `workload` is currently a list of the shape `[..., "exams", 40, "total", 60, ...]` this needs to be a list of key vals: `[..., "exams": 40, "total": 60", ...]`, the first two values of `category` and `type` can be entirely omitted since we already implicitly know what they should be - [x] `WARNING DIV` is a placeholder that needs to be removed once we know the parser is doing stuff as intended with divs - [x] Some courses contain escape codes for danish letters instead of the actual danish letters: `\uXXXXX`instead of `æøå`, is this reproducible? - [x] Some keys have not been localised to english, etc `Engelsk titel` $\rightarrow$ `english title` - [x] Schedule possibly still squishes div's together by calling `.text` too early, confirm? - [x] Nulls appear in one of the fields in `LBIK10214U`, why does this happen and does it appear in other courses too? - [x] `duration` & `placement` should match for the numbering using regex so we can save them as `INT(1)` in the database - [x] `credit` extract the number of credits with regex so we can save them as `FLOAT` in the database - [x] `language` change the keys so they are `en` or `dk` so we can save them as `CHAR(2)` - [x] `level` change so we just save the abbreviations of the titles, it can therefore be `CHAR(3)`
Sign in to join this conversation.
No description provided.