Knowt Video/Audio flow (Technical)
Knowt Video/Audio flow steps (Technical)
Flow 1 : Accessing Video/Audio (Possible methods)
Previously Recorded
Grabbing from other platforms (browser extension)
Live record on knowt’s recorder (in the future)
Flow 2 : Uploading Video/Audio
Drag n Drop (or select from file explorer) for previously recorded
Extension will take you to video upload page directly
Uploaded live (will have to run lambda and transcription api simultaneously?, aws resource 1 , aws resource 2 )
Flow 2.5 : Process Vidoe/Audio
For non-live uploads
S3 → trigger lambda
, integrate with cloudfront?Tasks for lambda
Transcribe → save as .vtt in s3 or in db? (depends if users can modify)
Make a vtt for captions (also enable custom uploads if users want’s it, not editable)
Transcription as js object → paragraphs, speaker, timeStart, timeEnd.
generate_transcription lambda should make both, save captions in s3 and update dynamodb for transcription.
Generate Video Thumbnail + Preview thumbnail spritesheet + .vtt file for sprite sheet → save to s3
First generate a main thumbnail for video preview
Then generate spritesheet for slider preview
Then vtt for slider preview
Flow 3 : Content Consumption
Fetch links from content table → video, transcription, thumbnail links (s3 links) from the table.
Links
Transciption Editor → https://github.com/bbc/react-transcript-editor
Tech Spec
Accessible sources:
Upload video to s3 bucket from whichever method (chrome extension, etc)
If there is already a transcript on the page, pull that as well, and insert it along with the video with the same ID.
Potential naming, uuid, with tag of userId
S3 trigger on the bucket when SUPPORTED_TYPES are inserted
Figure out how to add authentication to s3 bucket.
Potentially, go through cloudfront
Use deepgram to create the transcript, if it is not there. Also create all other resources necessary (TBD, fill in)
Transcript 1 files: array of paragraph content with metadata
Transcript 2 files: full deepgram return
Create entry in Content Table with proper fields
Send notification to user that their upload is complete
TBD - Annotations at a specific timestamp
Unaccessible resources, transcript available:
This is specifically for chrome extension for a place like zoom cloud, where we can embed the video, but not download the file for the user.
Get the link to embed the video (usually just the url, except youtube’s case).
The Chrome Extension will pull the transcripts, with different logic depending on what site its on
Trigger a new appsync lambda (uploadEmbeddedVideo), with the link + transcript + other necessary info
This lambda will upload the file to the s3 bucket, along w the transcript (formatting it however necessary), and create other resources needed (TBD)
This insert will trigger the S3 trigger, so make sure it just lets it pass if the other files have already been created
Create the entry in the conten table
Send notification to the user that their upload is complete
Live Recording
TBD
Knowt Video/Audio flow steps (Technical)
Flow 1 : Accessing Video/Audio (Possible methods)
Previously Recorded
Grabbing from other platforms (browser extension)
Live record on knowt’s recorder (in the future)
Flow 2 : Uploading Video/Audio
Drag n Drop (or select from file explorer) for previously recorded
Extension will take you to video upload page directly
Uploaded live (will have to run lambda and transcription api simultaneously?, aws resource 1 , aws resource 2 )
Flow 2.5 : Process Vidoe/Audio
For non-live uploads
S3 → trigger lambda
, integrate with cloudfront?Tasks for lambda
Transcribe → save as .vtt in s3 or in db? (depends if users can modify)
Make a vtt for captions (also enable custom uploads if users want’s it, not editable)
Transcription as js object → paragraphs, speaker, timeStart, timeEnd.
generate_transcription lambda should make both, save captions in s3 and update dynamodb for transcription.
Generate Video Thumbnail + Preview thumbnail spritesheet + .vtt file for sprite sheet → save to s3
First generate a main thumbnail for video preview
Then generate spritesheet for slider preview
Then vtt for slider preview
Flow 3 : Content Consumption
Fetch links from content table → video, transcription, thumbnail links (s3 links) from the table.
Links
Transciption Editor → https://github.com/bbc/react-transcript-editor
Tech Spec
Accessible sources:
Upload video to s3 bucket from whichever method (chrome extension, etc)
If there is already a transcript on the page, pull that as well, and insert it along with the video with the same ID.
Potential naming, uuid, with tag of userId
S3 trigger on the bucket when SUPPORTED_TYPES are inserted
Figure out how to add authentication to s3 bucket.
Potentially, go through cloudfront
Use deepgram to create the transcript, if it is not there. Also create all other resources necessary (TBD, fill in)
Transcript 1 files: array of paragraph content with metadata
Transcript 2 files: full deepgram return
Create entry in Content Table with proper fields
Send notification to user that their upload is complete
TBD - Annotations at a specific timestamp
Unaccessible resources, transcript available:
This is specifically for chrome extension for a place like zoom cloud, where we can embed the video, but not download the file for the user.
Get the link to embed the video (usually just the url, except youtube’s case).
The Chrome Extension will pull the transcripts, with different logic depending on what site its on
Trigger a new appsync lambda (uploadEmbeddedVideo), with the link + transcript + other necessary info
This lambda will upload the file to the s3 bucket, along w the transcript (formatting it however necessary), and create other resources needed (TBD)
This insert will trigger the S3 trigger, so make sure it just lets it pass if the other files have already been created
Create the entry in the conten table
Send notification to the user that their upload is complete
Live Recording
TBD